On Packing Data

[updated 16 April 2010]

The number of bits needed to store an integer, a, a > 0, is:



Using this formula, 13 bits are needed for the value 7333. In binary, 7333 is 1110010100101, which is indeed 13 bits.

The number of bits needed to store two values,
a and b, independently would be:



Using the identity:



The number of bits needed to store the product of
a and b is:



Therefore, we can save one bit by multiplying
a and b when



As a trivial example, the values 5 (101) and 6 (110) each require three bits so storing them independently would take 6 bits. However, 6*5 is 30, which takes 5 bits.

Just for curiousity, this graph shows the fractional part of



using the equation



It's an interesting "sawtooth" pattern, with zero values at 2
n. The larger version of the graph also shows lines of constant slope between the endpoints of each tooth (in blue), and the point where the tooth equals 0.5 (√2*2n - in red).

Click on the graph for a larger version.



Let's consider a practical application. The start and end of Daylight Savings Time can be specified as "the nth day of the week in month at hour and minute". For example, in 2010, DST in the US begins on the second Sunday in March at 2AM. Let:

d = day of week, Sunday..Saturday = 0..6
r = day rank, first..fifth = 0..4
m = month, January..December = 0..11
h = hour, 0..59
μ = minute, 0..23

Naively, this can be stored in 3+3+4+6+5 = 21 bits.

When multiplying 6 * 5 to save one bit, one of the numbers is needed in order to determine the other. Since we don't a priori know what values were multiplied together, we can't unpack the result. So while the possibility of saving a bit by multiplication made for an interesting analysis, it doesn't help here. There's no point in storing the product of n numbers if you have to know n-1 numbers to recover the nth number.

If we're going to conserve space, it will have to be through application of the first equation. Given two integers,
a and b, a requires fewer bits than b when:



So we want to minimize the maximum value of the number being stored.

For the packing algorithm, construct a set, ω, of an upper limit for each item.

Then the largest number that can be encoded is:



To pack the values vi which correspond to the upper limit values ωi, evaluate:



To unpack
rn into values vi which correspond to the upper limit values ωi, evaluate:



Consider the decimal number 12,345. For base 10 encoding,
ωi would be (10, 10, 10, 10, 10), and the largest number that could be encoded would be 99,999. But suppose we knew that the largest possible value of the first digit was one. Then let ωi be (2, 10, 10, 10, 10). The largest value that could be encoded would be 19,999. Since 19,999 < 99,999, it requires fewer bits.

For the daylight savings time values, construct a set, ω, of the maximum value of each item + 1. For (d, r, m, h, μ) this would be (7, 5, 12, 60, 24).

Packing the start times would requlre 20 bits:



Packing both the start and end times would use 39 bits, for a savings of 3 bits.



Thanks to Lee for his comments on this article.
Comments

Shiny Things for Suckers

While preparing another blog entry a television commercial caught my attention. The spiel was for a gold-plated buffalo nickel for the special deal of only $19.95. Limit five per customer!

The pitch stated that the coin was plated with 31mg of 24 karat gold.

Let's do the math. 31mg
is 0.001093 ounces. Today's gold price is $1096.63/ounce. So the gold in each coin is worth $1.20.

$19.95 + 4.95 shipping and handling = $24.90.

$24.90 for $1.20 worth of gold? Only if you're stupid.

The website is
here. If you visit, turn your sound off beforehand since a commercial video starts playing immediately.

What a racket.
Comments

More on the Sum of Powers

This is a continuation of this entry which derives the general solution for the coefficients of:



Computed coefficients for 0 <= k <= 10 were provided here.

It was noted that:



These values were determined by hand. However, it’s easy for LISP code to create the symbolic expression for each coefficient. The prefix expressions used by LISP (e.g. “(+ 3 (* 4 5) 6)” ) were converted to infix form (e.g. “(3 + (4 * 5) + 6)” ) and given to Maxima for simplification. The results are here.

What’s interesting is that the coefficients:



are all zero.

Now I have to figure out why...

Comments

My Contribution to the Mathematical Arts

Many, many years ago, sometime in high school, I learned that the formula for computing



is



I'm pretty sure we were shown the geometrical derivation of this formula. In college, in late 1975 or early '76, in a Math Lab one problem asked to guess the formula for



In my handwritten lab notebook I used induction to show that the solution to (3) is:



Having solved this specific case, I wanted to see if there was a general solution for any positive integer n and any positive integer exponent. Based on equations (2) and (4), I conjectured that the general solution would be a polynomial of this form:



The
derivation used induction on the general formula and found that the coefficients to the solution are:




Computed coefficients for 0 <= k <= 10 are
here.

Perhaps of interest are these properties of the coefficients:










Comments