A Quick Introduction to HDFS

The Hadoop filesystem is called HDFS, and today I’m going to give a short introduction to how it works for a beginner.

The Hadoop File System (HDFS) sits on top of a Hadoop cluster and facilitates the distributed storage and access of files.  When a file is stored in HDFS, it is split into chunks called “blocks”. They can be of different sizes. The blocks are scattered between the nodes. These nodes have a daemon running called a datanode. There is one node called the namenode that has metadata about the blocks and their whereabouts.

To protect against network or disk failure data is replicated in three places across the cluster. This makes the data redundant. Therefore if one datanode goes down, there are other copies of the data elsewhere. When this happens a new copy of the data is created, so that there are always three.

The namenode is even more important, because it has metadata about all the files. If there is a network issue, all of the data will be unavailable. However, if the disk on the namenode fails, the data may be lost forever, because the namenode has all the information about how the pieces of the files go together. We’d still have all the chunks on the data nodes, but we’d have no idea what file they go to.

To get around this issue, one solution is to also mount the drive on a network file system (NFS). Another way to approach this (which is a better alternative) is to have an active namenode and a standby namenode. This way, there is a “backup” if something goes wrong.

Some commands:
  • To list files on HDFS:
    • $ hadoop fs -ls
  • To put files on HDFS:
    • $ hadoop fs -put filename
    • this takes a local file and puts on HDFS
  • To display the end of a file :
    • $ hadoop fs -tail filename
  • Most bash commands will work if you put a dash in front of them
    • $ hadoop fs -cat
    • $ hadoop fs -mv
    • $ hadoop fs -mkdir
    • etc…
Advertisements

The ABCs of Computer Science

So over the next few months or so, I’d like to write a series of educational posts on “The ABCs of Computer Science”. In each post, I would talk about a topic that begins with the letter of the alphabet I am on. This will not only serve as an educational resource for people wanting to learn new things or get an overview of some of the big topics in computer science, but also as a learning experiment for me.

I plan to make an ABCs post at least every two weeks, and my regular posts about thesis updates and mathematical curiosities will continue. I have a tentative list of topics but I’d like your input. What topics would you like to see me write about? What’s something you are interested to learn? What can I use for elusive letters such as Q, X, and Z? Leave your ideas in the comments below and I will do my best to incorporate them.

Quantifying the Self

So I’ve been doing a bit of an experiment this year. Sure, everyone says they want to do this or do that, lose weight, eat better, exercise more, etc, but how do we keep ourselves to these goals? As you may be aware, the you of the future is always a bit more conscientious than the you of today: “I’ll eat  ice cream today and go on the diet tomorrow”, “Just one more day of sleeping in and I’ll get up early tomorrow.” The list goes on. Now being the geek that I am, I began to wonder if there were a more scientific way to go about all this.

Something that I’ve found that is pretty easy to do and had a big impact is simply tracking the things you want to do with your time and see how it stacks up over time. I started when I found a website called Beeminder. You can start as many “goals” as you like, such as “go to the gym twice a week” or “floss every day”. You know, those things we want to do but have a hard time actually doing. You go in and plot a data point each day and you can see your progress towards the goal. It has a “yellow brick road” for you to follow and if you do more than average one day you get “safe days” where you don’t have to work as hard. It’s really engaging to me to see my graphs grow.

Another main point of Beeminder is the concept of commitment contracts. As far as I know, this is optional, but I can see how it would definitely improve motivation. Have you ever given $20 to a friend and said “I’m going to try to do <insert thing here>. If I succeed give me my money back, but if I don’t you can keep it.” Basically what you can do with Beeminder is “bet” that you will achieve your goal. You go along and plot your points and if at any point you fall below the “yellow brick road” of success, then you have to pay up. Stay on the road? No payment. Another feature to help you stick to it is that for each time you get “off the road”, the penalty increases. The idea is that at some point you think, “wow, I really don’t want to lose x amount of money, I better go to the gym/eat healthier/read more.” Now that may seem like a pretty negative motivational technique, but think about it this way. No one is forcing you to do anything. These are things that you claim you want to do. The phenomenon that causes us to put off things we want to do is called akrasia. It happens when you go to the store intending to buy vegetables and then you see your favorite ice cream on sale. It happens to all of us, and one of the best ways to stop it is to consciously track what you do and hold yourself accountable for it.

As an example, here’s one of my Beeminder graphs for reading more often. I’ve been saying for years I’d like to read more, but I always seem to find other things to do instead. By tracking my reading time each day, I can see my progress over time and its really helped me to stick with it. I started out with a goal of reading 15  minutes each day, but soon bumped it up to 20, and now I’m at 25. I’ve been reading nearly every day and it feels great. I have finished The Alchemist and I’m almost done with The Hobbitwhich is more than I can usually say I’ve read 2 months into the year!

readmore

But I started to think, after using Beeminder for a while, what other things can I track? I started using a pedometer to see how many steps I walk at my university daily, and boy was I surprised! I usually walk 3 miles or more in a day just walking around to classes, to eat, to meetings, and to work. The steps add up quick, and its really neat to see what my trends are for walking as well. I haven’t been tracking this one for as long, but here’s a graph I created using google spreadsheets: (guess which data points are the weekends…heh. Of course, the pedometer is on my phone so it only tracks wherever I carry it around, which is not usually within my apartment).

Walking Graph

If you’re more interested in tracking your mental fluctuations rather than your physical activities, I uncovered the site Quantified MindIt has a series of experiments where you can track your reaction time, memory, focus, and other basic mental skills. You simply log in and play a few simple games and it gives you scores. There is a wide variety of different activities you can do and I find it pretty fun. I have just started playing around with this site but I imagine if you kept with it and gathered enough data you could determine trends of when your brain is at its best and use that to your advantage. They also have experiments that ask questions like “Does coffee improve cognitive performance?” (tested by doing the games after drinking coffee one day, no coffee the next). Another experiment tests the age old motto of “Never skip breakfast” and asks users to test themselves on days when they have eaten breakfast and days they have not eaten breakfast. They even have one to test the effect that sex has on mental functioning! Finally, if you’re so inclined you can make your own experiments to test out whatever you want.

But why stop there? Some ideas that I have for tracking myself in the future include plotting my going to sleep/wake up times and the time I spend working on my thesis (if you’re curious about that, see here.) I know for me, implementing self tracking into my life has really opened my eyes to a lot of things I do (and don’t do!) and if you’re tired of not meeting your goals or just a huge data nerd like I am, I highly recommend you give this a try. If you have any questions or ideas please leave them in the comments!

claimtoken-511d413ce9198

Day 12: Ethical Issues Concerning Artificially Intelligent Agents

If one thinks back as little as twenty years, or even ten years and considers the technology, it is easy to see that there has been an exponential increase in all sorts of technological research. Computers are becoming increasingly prevalent in our everyday lives. Most people have laptops, tablets, smartphones, or some combination thereof. Even cars and some refrigerators have small computers in them. With all of this comes an increase in our knowledge of automation and artificial intelligence. We’ve already got machines to do our laundry and dishes for us, so that begs the question: what’s next?

Google has created a fleet of self-driving cars that are slowly hitting the streets. Will taxi-drivers soon be a thing of the past as well? While all this innovation is great to see, one must step back and consider the effect that this has on the American workforce. In the self-driving car example, thousands of taxi drivers that depend on the job to make a living would be out of a job. But the same thing happened to many independent seamstresses and cobblers with the advent of factories. With even more jobs being automated by these increasingly “intelligent” machines, is human labor destined to be completely replaced by machines? Futhermore, if we can create intelligent robots to do our work for us, should we?

I choose to reject sensationalist views of AI researchers and enthusiasts who claim that the rise of superintelligence will usher in an era of fear and tyranny when the machines seek to overtake the human race. On the other hand, I believe that autonomous robotic agents can be beneficial to us as a race. If the robots do end up taking over menial labor jobs as some claim, then this opens up new job fields to program and oversee these new intelligences. We will have moved to a more technologically advanced way of life.

Day 11: On Burnout

So I was thinking about things this weekend. I came to the conclusion that I really have way too much on my plate. I’m stressed out and busy all the time, but I don’t know what I can cut or do better to ease this. I have a history of overcommitting myself, and its coming back to bite me in the butt once again. Let’s see, to start with, I’m taking 17 credit hours of classes, along with about 15 hours per week of work, I volunteer at the high school robotics club on Wednesday afternoons, I often drive my roommate to and from work because she can’t get around easily, and somehow I’m supposed to find time to study and do homework in there as well. Let’s not start on graduate school applications (I don’t want to talk about it…) Oh yeah, and having some time to eat and sleep would be nice too.

I’ve tried managing my time better, I’ve tried doing my most important work during my best hours of the day (I have learned I can’t get *anything* done if I’m sleepy). I just can’t seem to find anything that works for me. I guess I should probably drop the robotics club but I need something that seems relevant for my grad school applications that I’m having a miserable time completing. Everything just seems really overwhelming right now. I’ll get back on track tomorrow, or the next day maybe. I’ll be fine. I just feel like I am constantly going and going and I can’t seem to find a break. Winter break can’t come quickly enough.

Day 3: Becoming Good at Something

A bit late, but a Day 3 post nonetheless. One of my friends from out of town came to visit me last minute for my birthday, so I’ll be keeping this short. I want to talk about something I’ve had on my mind recently. Now I know that not everyone can be good at every thing. Most people don’t have the good fortune to even be really good at one thing. However, as long as I can remember I’ve had trouble asking for help. Yes, I know its one of my flaws. But I just get intimidated when someone is vastly better at something than I am and I don’t know how to ask for help without being judged.

Perhaps its my own insecurities. Even my closest friends I feel weird about asking how to do things, because I keep thinking “oh they’re so much better at this than me, it will seem silly to ask such a question”, but at the same time, how does one improve if not by asking questions? I’ve  always had this weird sort of issue, and I’ve been trying to do better about it, but is it possible that I’m holding myself back because I feel intimidated by asking people for help? I’m so afraid of being judged or looked down upon I often don’t ask at all, and when someone does try to help me with things, I often wrongly feel like they are being condescending toward me. It’s really an unfortunate phenomenon, but I’m not sure what else I can do about it but keep on moving on.

Sorry for the short personal post tonight, was running out of time and didn’t have time to write a whole technical thing. Back to the SCIENCE tomorrow!

NaBloPoMo: An Experiment in Blogging (Day 1)

So as my more creative readers may be aware, today is November 1st. This is important because it heralds the start of National Novel Writing Month, known colloquially as NaNoWriMo. A frenetic literary adventure, NaNoWriMo challenges its participants to hammer out a 50,000 word novel in just 30 days. No one said it had to be good, just that it had to be 50,000 words. Could you write the word “the” over and over and still count yourself as a “winner”? Sure, but the point of NaNoWriMo is to give people that say “I have an idea for a story I’ll write one day” the kick in the pants they need to get started. Ending the month with a sloppy first draft is definitely better than what you started with: a dreamy “one day” aspiration and a half-formed idea.

I have participated in NaNoWriMo three or four times over the past few years. Unfortunately, I have never reached the goal of 50,000 words in a month. Last year’s attempt was on track to reaching my goal. I had a good idea, some free time, and was writing my heart and soul into this admittedly haphazard piece of work. However, Nature decided to intervene and I injured my hand right as I passed the 36,000 word mark. Am I disappointed? A little. But looking back and seeing how much I accomplished over that short period is still inspiring to me today.

Therefore, in the spirit of NaNoWriMo, I’ve decided to instate what I call NaBloPoMo, or “National Blog Posting Month”, in which I will write one blog post per day. The topics may vary across the fields of math and science, peppered in with a few anecdotes and opinions about technology and education. Completing a post each day will not only encourage me to write more, it will more than double the current number of posts I have on this blog and allow me to develop my ideas in a structured format. I’m in the process of applying to graduate school for computer science, and I need the practice writing about my interests and goals.

So here goes. This will be my first post of NaBloPoMo, November 1st. Follow along by signing up for email updates or accessing the RSS feed. Each update is also posted to my twitter account, which you can find at @musegarden. If you have ideas or questions for me, I’d be happy to hear them. Send me a message on twitter or leave an idea right in the comments here.