4 Is The Magic Number – Riddle

So fishes56 alerted me to this riddle in a comment here, and if you didn’t see it I wanted to share it here in a separate post cause its a little fun thing to think about.

12 is 6, 6 is 3, 3 is 5, 5 is 4, 4 is 4.

4 is the magic number, why?

68 is 10, 10 is 3, 3 is 5, 5 is 4, 4 is 4.

4 is the magic number, why?

26 is 9, 9 is 4, 4 is 4. 4 is the magic number, why?

Give it a try. ūüôā

The Collatz Conjecture and Hailstone Sequences: Deceptively Simple, Devilishly Difficult

Here is a very simple math problem:

If a number is even, divide the number by two. If a number is odd, multiply the number by three, and then add one. Do this over and over with your results. Will you always get down to the value 1 no matter what number you choose?

Go ahead and test this out for yourself. Plug a few numbers in. Try it out. I’ll wait.

Back? Good. Here’s my example: I start with the number 12.

12 is even, so I divide it by 2.

12/2=6.

6/2=3.

3 is odd, so we multiply it by 3 then add 1.

(3*3)+1=10.

10/2=5.

(5*3)+1=16.

16/2=8.

8/2=4.

4/2=2.

2/2=1.

And we have arrived at one, just like we thought we would. But is this ALWAYS the case, for ANY number I could possibly dream of?

This may sound easy, but in fact it is an unsolved mathematics problem, with some of the best minds in the field doing research on it. It’s easy enough to check low values, but numbers are infinite, and can get very, very, very large. How do we KNOW that it works for every single number that exists?

That is what is so difficult about this problem. While every single number we have checked (which is a LOT) ends up at 1 sooner or later (some numbers can bounce around for a very long time, hence why they are called “hailstone sequences”), we still have no method to prove that it works for every number. Mathematicians have been looking for some method to predict this, but despite getting into some pretty heavy mathematics in an attempt to attack this problem, we still do not know for sure.

This problem is known as the Collatz Conjecture and is very interesting to mathematicians young and old because it is so easy to explain and play with, yet so tough to exhaustively prove. What do you think? Will this problem ever be solved? And what would be the implications if it was?

Here is some simple Python code that can display the cycles for any number you type in:

#!/usr/bin/python
num = int(input("Enter a number: "))
while num!=1:
if num%2 == 0:
num = num/2
else:
num = (num*3)+1
    print num

Periodic Last Digits of Fibonacci Numbers

Here’s something I was playing around with the other day.

Consider the Fibonacci sequence: 0, 1, 1, 2,3,5,8,13,21, \dots. F(n) = F(n-1)+F(n-2). I wanted to see if there was a pattern regarding the values of the ones digit in the fibonacci sequence. This is what I found:

The ones digits of the Fibonacci numbers are periodic, with period 60. This means that if you continue the sequence and look only at the ones digits of the numbers, they begin to repeat the same pattern after the 60th number. In addition, every 15th number in the sequence has a ones digit of zero. I wonder why this is. I can’t seem to find any other patterns at the moment. It probably has something to do with mod.

Here is the list of the 60 digit period (ones digits only):

1
1
2
3
5
8
3
1
4
5
9
4
3
7
0
7
7
4
1
5
6
1
7
8
5
3
8
1
9
0
9
9
8
7
5
2
7
9
6
5
1
6
7
3
0
3
3
6
9
5
4
9
3
2
5
7
2
9
1
0
---this is where the cycle starts over
1
1
2
3
5
8
...

Can you find any other interesting properties?

A Genetic Algorithm for Computing Ramsey Numbers: Update

Friends_strangers_graph

All the 78 possible friends-strangers graphs with 6 nodes. For each graph the red/blue nodes shows a sample triplet of mutual friends/strangers.

In my last post on this topic, I discussed how I was working on a genetic algorithm to search mathematical graphs for elusive properties called Ramsey Numbers. (For a refresher on genetic algorithms, ¬†visit here, and for a refresher on Ramsey Numbers, visit here). I’ve been doing some work on it since then (check out the code here), and I thought I would describe some improvements and further progress I’ve made in this area.

New features:

  • colorings dumped to a file at the end of each run
  • ability to load in data sets from file, further refining of the data than starting from scratch each time

The next problem I ran up against while working through this was that even if I am able to load in previously analyzed data, I still only have one fitness function that checks a static set of edges. As I see it, there are two ways to solve this:

  • Make the current fitness function dynamic; that is, it tests a different set of edges every time. However, this is counterproductive to the purpose of the program “eliminating” certain sets of edges in each “round”. However, this would be easier to maintain than the other option, which is
  • Make a “FitnessHandler” method that takes in a value for which method to run, and uses that to determine what set of edges to test. However, this would lead to a lot of extra code and overhead. I’m thinking having a static variable at the beginning of each run with what “fitness method” to start on, so that it doesn’t have to start on round one each time.

I haven’t fully decided which of these I will go with. I feel like the second one fills my purpose of methodically “weeding out” the improbable graphs, but its going to be a lot of extra work. Oh well, nothing worthwhile ever came easy…

Leave a note here or on my github if you have suggestions!

Introduction to P vs. NP

800px-Jigsaw_puzzle_01_by_Scouten

Its a Millennium Problem (reward $1,000,000) and the subject of a movie. But what is this mysterious problem, and what makes it so hard?

Creation vs. Verification

Pop quiz: Choose which one of these tasks takes longer.

  • A) someone hands you a jigsaw puzzle and asks you if it is finished or not
  • B) someone hands you a jigsaw puzzle and asks you to put it together

Answer: B.¬†While you can determine if a puzzle is finished or not in a split second (it is easy to tell if pieces are missing), it is not so easy to put all those pieces together yourself. We have problems like this in computer science and mathematics as well. Think about this: what if there was a method that could solve a jigsaw puzzle as fast as you could check if its right? Wouldn’t that be amazing? That’s what this problem seeks to find out: if such a method exists.

The Class P

There is a class of problems that can be solved in “polynomial time”, and we call this P. Polynomial time means that as the complexity of the problem increases, the time it takes to solve increases at a rate no greater than a polynomial would. Informally, we can say that for P problems, we can¬†solve the problem “quickly”. (Polynomial time).

Here’s an example: Say we have a list of numbers in front of us, and we want to pick out the number that is the greatest. At the very worst, we could start at the beginning of the list and compare every number until we got to the end of the list. Yes, the list may be very very long. But the time it takes to check every number increases in a linear fashion. More numbers does not make it exponentially more difficult. If we compare the growth rates of linear time O(n) (graph A) and polynomial time O(n^2) (graph B) we can clearly see that solving this problem is quicker than polynomial time, thus it is in P.

yeqx2

Graph A

Graph B

Graph B

The Class NP

cookies

“Mom, she got more than me!”

The class of NP problems, put simply, can be¬†verified¬†in polynomial time. NP stands for “nondeterministic polynomial time” and harkens back to a computational construct known as a Turing Machine. Consider this example: a woman is dividing up cookies of different sizes into two groups for her two young children. Naturally, they are very picky about things being fair, so she needs to make sure that each child has exactly the same amount of cookies (by weight). While we can show that this problem can get quite hard to sit down and solve, if someone showed us two piles of cookies we could quickly add up the weights of the two piles and tell if they were the same or not.¬†

If you have only 2 cookies to separate out, sure, its easy. Put one cookie in each pile and that’s the best you can do. But what if you had 5 cookies? 10? Keep in mind that each cookie weighs a slightly different amount and we want the two groups of cookies to be equal in weight as much as possible. Adding more cookies to work with increases our options for separating them exponentially. This is not good. If she has 5 cookies, the number of different ways to separate them is 2^5 = 32, but if she doubles the amount of cookies to 10, the possibilities skyrocket because 2^{10} = 1024.

Let’s consider, for fun, the number of possibilities if she had 100 cookies! 2^{100}=1,267,650,600,228,229,401,496,703,205,376. This…is pretty large. But wait! Computers can do that in a heartbeat, right? They’re way faster than we are! This is all fine and good, until you realize that the number of seconds in the age of the entire universe is (only) 450,000,000,000,000,000. So¬†even if our computer could go through thousands or millions of computations per second, it would still take longer than the entire age of the universe.¬†This is what makes these sorts of problems so hard.

P = NP?

So now that you know a little about what P and NP are, you may be wondering what all this business about “P=NP” is. What does it mean, after all? If someone were to prove that P equals NP, this would mean that every problem in the class NP (the hard ones) can be solved in a polynomial time, just like the P problems. This would have huge implications for all sorts of applications, not just in the mathematical world. For instance, a cornerstone of many security¬†systems rests on the difficulty of factoring very large numbers. If P=NP, it would mean that there exist very easy methods for factoring these numbers, and as such, financial systems everywhere would be vulnerable.

This can also be applied to many logistical and optimization problems (see my posts on intractable problems here and here). For many of the NP problems, finding an “easy” method to solve one can be applied to any of the problems in that class. We don’t need to show an easy method to solve every problem in the world. We only need one. That shouldn’t be too hard, right? Unfortunately, researchers have been working on this problem for decades without success. There have been many attempted “proofs” but they all ultimately have fallen short. The problem is so difficult that the Clay Mathematics Institute is offering $1,000,000 for a correct proof of this problem one way or another.

Although an official answer to this question has not been decided one way or another, many professionals believe that P \neq NP. This means that problems in NP will always be inherently “harder” than problems in P, and at the worst case require an exhaustive (brute force) search to find a solution. No matter how good our computers get, these problems will still be very difficult for us to solve, especially at large cases.

I hope you enjoyed this little introduction to P and NP and if you have any comments, questions or corrections, please leave them in the comments below!

Twitter FriendCloud

Today I’m going to talk about a personal project I’ve been working on recently. I was trying to come up with some way to make a cool project with natural language processing and I had also noticed that with the rise of social networks, there is a treasure trove of data out there waiting to be analyzed. I’m a fairly active user of Twitter, and its a fun way to get short snippets of info from people or topics you’re interested in. I know personally, I have found a lot of math, science, and tech people that have twitter accounts and post about their work or the latest news in the field. I find just on a short inspection that the people I follow tend to fall into certain “groups”:

  • math
  • computer science
  • general science
  • academia
  • authors
  • tech bloggers
  • feminism
  • various political and social activist figures (civil rights, digital privacy, etc)

This list gives a pretty good insight into the things I am most interested in and like hearing about on my twitter timeline. Now I thought to myself, what if I could automate this process and analyze any user’s timeline to find out what “groups” their friends fell into as well? I had my project.

Currently in the beginning stages, I decided to tentatively call my project “FriendCloud” and registered with the API to start messing around. I’m using Python-Twitter¬†to interact with the twitter API, and its helping me to get practice with Python at the same time. The first thing I wanted to do was be able to pull down a list of all the people that I follow. Since I follow a little over 1000 people, this proved to be a daunting task with the rate limiting that Twitter has built in to their API. At the moment, what I had to do was get my script to pull as many user objects as possible until the rate limit ran out, then put the program to sleep for 15 minutes until the rates refreshed and I could download more.

It took a little over an hour to get the list of all my friends and I am trying to look into a way to do this quicker in the future. After that, I can go through users and pull down a selection of their tweets. After this is done, I have a corpus of text that I can analyze. I have been using NLTK¬†(a Python NLP toolkit) to pick out some of the most common keywords and themes. There is a lot of extraneous data to deal with, but as I pare it down I’ve noticed some interesting trends just in my own tweets.

I hope in the future to be able to extend this to the people I follow on twitter and be able to place them into rough “groups” based on their most commonly tweeted keywords (similar to how a word cloud works). In this way, a user can get an at a glance look at what topics the person is most likely interested in and what sort of people they may be likely to follow in the future.

Intractable Problems — Part Two: Data Storage

This post continues my series on intractable problems. In this installment, I will talk about problems relating to Data Storage. As a refresher, remember that an intractable problem is one that is very computationally complex and very difficult to solve using a computer without some sort of novel thinking. I will discuss two famous problems related to Data Storage below, as well as provide a few examples and references.

Part Two — Data Storage

486px-Knapsack.svg

Knapsack¬†— given a set of items with weights w and values v, and a knapsack with capacity C, maximize the value of the items in the knapsack without going over capacity.

To start with, here is a small example that I refer to in this previous post. If you’re trying to place ornaments on a tree, you want to get the max amount of coverage possible. However, the tree can only hold so much weight before it falls over. What is the best way to pick ornaments and decorations such that you can cover as much of the tree as possible without it falling over?

I actually saw a really interesting video describing an example of this problem in this video (watch the first few minutes). The professor here sets up a situation of Indiana Jones trying to grab treasures from a temple before it collapses. He wants to get the most valuable treasures he can, but he can only carry so much.

Class: There are several different types of knapsack problems, but the most common one (the one discussed above) is one-dimensional knapsack. The decision problem (can we get to a value V without exceeding weight W?) is NP-Complete. However, the optimization problem (what is the most value we can get for the least weight?) is NP-Hard.

References: 

Wikipedia Page РGeneral discussion of the Knapsack problem, different types, complexity, and a high level view of several algorithms for solving

Coursera Course on Discrete Optimization – The source for the above video and a great discussion of not only Knapsack but quite a few of these problems

Knapsack Problem at Rosetta Code – a good example data set and a variety of implementations in different languages

============================

TetrisPieces2

Bin Packing¬†— given a set of items with weights w and a set of n bins with capacity c each, place the items into the bins such that the minimum amount of bins are used.

This problem is very similar to the knapsack problem found above, but this time we don’t care how much the items are worth. We just want to pack them in the smallest area possible. Solving this problem is invaluable for things like shipping and logistics. Obviously, companies want to be able to ship more with less space.

A more commonplace example could be thinking of this problem as Tetris in real life. Consider that you’re moving to a new place, ¬†and you have you and your friends car in which to move things. How can you place all the items in your cars such that you take up space the most efficiently?

Some progress has been made on reasonably large data sets by using what is called the “first fit decreasing algorithm”. This means that you pick up an item, and place it into the first bin that it will fit in. If it can’t fit in any of the current bins, make a new bin for it. Decreasing means that before you start placing items, you sort them all from biggest to smallest. You probably do this in your everyday life. If you want to pack a box, you start with the big items, right? No need to put lots of small items on the bottom. By getting the big items out of the way first, you can be more flexible with the remaining space because you will then have smaller items.

Class: This problem is NP-Hard.

References:

Wikipedia Page – a high level description of the problem

First Fit Decreasing Paper (pdf) Рthis is a technical paper describing computational bounds for using the first fit decreasing algorithm. Not for beginners.

3D Bin Packing Simulation Рlooks like a resource for companies to use to pack boxes and such

============================

I hope this provided a little taste of why these problems are so important. If you know any other good resources please let me know.