JSON can come in all shapes and sizes, and sometimes you want to see it in a structured format that’s easier on the eyes. This is called pretty printing. But how do you accomplish that, especially if you have a really large JSON file? While there are some converter tools online to show json files in a pretty format, they can get very slow and even freeze if your file is too large. Let’s do it locally.
Requirements: Have Python installed, and a terminal environment.
And that’s all there is to it! Python comes built in with a JSON encoding/decoding library, and you can use it to your advantage to get nice formatted output. Alternatively, if you are receiving JSON from an API or HTTP request, you can pipe your results from a curl call directly into this tool as well.
The Hadoop filesystem is called HDFS, and today I’m going to give a short introduction to how it works for a beginner.
The Hadoop File System (HDFS) sits on top of a Hadoop cluster and facilitates the distributed storage and access of files. When a file is stored in HDFS, it is split into chunks called “blocks”. They can be of different sizes. The blocks are scattered between the nodes. These nodes have a daemon running called a datanode. There is one node called the namenode that has metadata about the blocks and their whereabouts.
To protect against network or disk failure data is replicated in three places across the cluster. This makes the data redundant. Therefore if one datanode goes down, there are other copies of the data elsewhere. When this happens a new copy of the data is created, so that there are always three.
The namenode is even more important, because it has metadata about all the files. If there is a network issue, all of the data will be unavailable.However, if the disk on the namenode fails, the data may be lost forever, because the namenode has all the information about how the pieces of the files go together. We’d still have all the chunks on the data nodes, but we’d have no idea what file they go to.
To get around this issue, one solution is to also mount the drive on a network file system (NFS). Another way to approach this (which is a better alternative) is to have an active namenode and a standby namenode. This way, there is a “backup” if something goes wrong.
To list files on HDFS:
$ hadoop fs -ls
To put files on HDFS:
$ hadoop fs -put filename
this takes a local file and puts on HDFS
To display the end of a file :
$ hadoop fs -tail filename
Most bash commands will work if you put a dash in front of them
In the tradition of my Holiday MathPuzzles, I’m here with an appropriately themed puzzle for this time of year.
It’s that time of year all right. You’re out and about, trick or treating with your friends or family and when you come home, you decide to dump all the candy out on the floor to sort through it. But then, as siblings often do, you begin to bicker about who has “more” than the other. In fact, there are some candies that you really like, and some you don’t like. You’d rather have a bunch of chocolate than a bunch of peppermints, for instance. But wait a minute! Of course you can’t like the same thing! Your sibling actually likes peppermints!
Here’s a table of the different candies you have. Each candy has a “value” to it; that is, how much you “want” it. Try to split up the candies such that you and your sibling both have as equal value as possible at the end. And no fighting!
So I don’t know if anyone else is familiar with this game, but I just remember playing it with friends in middle school and it occurred to me the other day that it would be an interesting game to analyze combinatorially, and perhaps write a game playing algorithm for. This game can be found in more detail here: http://en.wikipedia.org/wiki/Chopsticks_(hand_game)
Rules of Play: Players each begin with two “piles” of points, and each pile has 1 point to begin with. We used fingers to represent this, one finger on each hand.
On each turn, a player can choose to do one of two things:
Send points from one of the player’s pile to one of the opponents piles. So if Player 1 wanted to send 1 point to Player 2’s left pile, then Player 2 would have 2 points in the left pile and Player 1 would still have 1 point in his left pile. Player 1 does not lose points, they are simply “cloned” over to the opponents pile.
If the player has an even number of points in one pile and zero points in the other pile, the player may elect to split his points evenly between the two piles. This consumes the player’s turn. Example: if Player 1 has (0 4) then he can use his turn to split his points, giving him (2 2).
If a player gets exactly 5 points in either pile, that pile loses all of its points and reverts to 0. However, if points applied goes over 5 (such as adding 2 points to a 4 point pile), then the remainder of points are added. (meaning that 4 + 2 = 1). The opponent simply gets points mod 5.
If a player gets to 0 points in both their piles, then they lose. The last person that has points remaining wins.
Okay, so let’s break this down. Here’s an example game for those of you that are more visually oriented (follow the turns by reading left to right, moves are marked with red arrows):
Okay, so let’s point out a few things about this game.
On turn 4, player 2 adds 3 points to player 1’s 2 points, making 5. The rules state that any pile with exactly 5 points reverts to zero.
On turn 7, player 2 adds 3 points to 3 points. as we all know, but , so player 1 now has one point in his pile.
On turn 13, player 2 decides to split his points, turning his one pile of 4 into two piles of 2. This consumes his turn.
On turn 15, player 2 adds 2 points to player 1’s 3, thus reverting his pile to zero. Player 1 now has no more points to play with, so player 2 wins.
We can think about this game as a combinatorial problem. What are the optimal positions to play? How would one program a computer to play this game? I plan to create an interactive web game where players can try this for themselves.
You have 10 bags with 10 gold coins each. Nine bags have genuine coins that weigh 10g each, but one bag has fake coins that are only 9g each. Using a digital scale and only one weighing, how do you figure out which bag has the fake coins?