Tuesday, September 28, 2010

Data Storage in Permeable combination

Much of this is very similar to computing odds and statistics, but is more from the view point of information theory rather then probability. In game theory and odds we are concerned with the frequency of certain outcome, but here we want to know how much information is in that outcome and how much data will it take to store these outcome as well as the reverse of how much data can be stored in these outcomes.

How much data in a Pair of Dice?



Look at the structure of a six sided die, how much data is in each roll? The outcome of a roll is a single number between 1 and 6.



In binary:
   2 Bits is a number from 0 to 3.
   3 Bits is a number from 0 to 7.

Add one to make these 1-4 and 1-8. A 1-6 outcome is more then 2 bits of data but less then 3. Right now there is no good way to deal with this. Most people would just round this up to 3 bits.

Using the logarithm function we can determine how many bits are needed to store all the possible outcomes.

Bits = log(6) / log(2); // This is equivalent to log(Base 2)
Bits = 2.58496

2.58496 Bits is where we would expect 2 < x < 3, but this is not a whole number.

I found I could describe this as a Fractional Bits. A fractional bit, in reality there is no way to subdivide an on/off, 1 or 0 Bit. When really storing data we are forced to round up, but we could add up these fractional bits to store whole bits.

A pair of dice rolls 1 to 36 possible outcomes. (Die1 – 1 * 6) + Die2
If we take the number of bits and times 2 we get 5.17 Bits.

For 3 rolls we get a 1 to 216 outcome that is 7.754 Bits. This rounds up to 8 bits, we already saved 1 bit from just round up to 3 bits x 3 and storing them together as 9 bits.

With a type of compression called arithmetic coding this just what is done.

Another way of thinking about it you can treat each die rolls a producing 1 digit of a base 6 number and then convert the results to base 2.

By reversing this for storing data we would have to round down to 2 bits per roll. But with 3 rolls we gain a bit and have 7 bits instead of 6.

Logarithm and exponent math
Getting the log base 2 of a number is extracting the exponent for a number.
Where Log(n) / log(2) convert the log function to a base 2.

log(16) / log(2) = 4 bits
log(32) / log(2) = 5 bits.

For powers of 2 this is very straightforward.

Bits = log(6) / log(2); // This is equivalent to log(Base 2)
Bits = 2.58496

This equivalent to 2 to the power 2.58496 equals 6. But raising something to a non whole number power is difficult.

This can be done using the exponent function.

    exp( log(N) ) = N
    log( exp(N) ) = N

Logarithm and Exponent are reversible functions.

X^Y can be done using exp(Y * log(X) )
   ( In most computer languages ^ mean to the power of)

So 2^2.58496 can be done using exp(2.58496 * log(2) )

Most log and exponent functions are by default Natural meaning
    Log (2.71828182845904523536) = 1
and
    Exp(1) = 2.71828182845904523536

Which is why we need to covert these to base 10 or base 2 to be useful for this application. This is done by
Multiplying or dividing by log(2) is these equations.

Math of exponents is also different.

X * Y = exp( log(X) + log(Y) )
Addition of exponents is equivalent to multiplication of regular number.

X / Y = exp( log(X) - log(Y) )
Subtraction of exponents is equivalent to division of regular number.

X^Y = exp( log(X) * Y )
Multiplication can be used to raise something to a power.
   _____
Y /
\/  X = exp( log(X) / Y )
The Root of a number can be taken.

How much data can a deck of poker cards hold?


There are 54 cards in a deck.
The cards are A,2,3,4,5,6,7,8,9,10,J,Q,K 13 Ranks in all times 4 suites Jacks, Spades, Hearts, Diamonds.
13 * 4 = 52
There are also 2 Jokers giving 54 cards total.
The two jokers are sometimes indistinguishable from one another but for this example we will assume we can differentiate between them.

Our first card can be any 1 of 54.
Our second card can be any 1 of 53. And so on.

This gives us the equation:
54! = 54 x 53 x 52 x 51 x 50 x 49 x 48 x . . . x 2 x 1

The Factorial operation is notated as n! and gives the number of ways in which n objects can be permuted.

For example, 3! = 6. This come from by 3 x 2 x 1, since the six possible permutations of are:
{1,2,3}, {2,3,1}, {3,1,2}, {1,3,2}, {2,1,3}, {3,2,1}.

This is called factorial 54 and would be written as “54!” and give us the number of permeable combination that a deck of cards could be shuffled into.

I used the Unix bc program to compute this. “bc” according to the online manual is “An arbitrary precision calculator language”, it is very convenient when dealing with big number.

Here is the program to computer factorial 54!
c = 1
for( a = 54; a >= 1; a--){
c = c * a
}
c

The result is:
54! = 230843697339241380472092742683027581083278564571807941132288000000000000

72 Digits long and in Based 2, 238 Bits of data to store

log( 54! ) / log(2) = 237.06381108042942967244 Base 2
log( 54! ) / log(10) = 71.36331802162852843476 Base 10

With 237 bits of data we can store English text in 5 Bit EBCDIC characters.
Using this we could store an uncompressed 47-character message in a deck of cards.

We could hid a 224 Bit DES encryption key for decoding data stored elsewhere on a CD or floppy disk and carried along with our deck of cards.

If we increase the number of decks we can store more data. For instance we mix a deck of red backed and blue backed cards together for 108 cards. This gives more then the expected doubling of data, instead there is a 2.44 time increase.
54! * 2 = 474 Bits
(54 * 2)! = 578 Bits

decks Cards Bits Bits per card
1 54 237.06 4.390
2 108 578.42 5.355
3 162 960.33 5.928
4 216 1368.64 6.336

What is happening is as the number of cards increases the amount of data per card increases.

With 4 decks, 216 Cards we see 6.336 Bits per card; this is equivalent of a 1 to 80 possible selection per card. To phrase it another way a 216 Digit base 80 numbers. Although we must keep in mind we also need 216 uniquely identifiable cards for this.

4 decks give 1368 Bits is 171 Bytes or 273 characters stored with 5 bit each letter.

No comments: