Reversible Cantor Pairing

The goal here is to find an invertible (reversible) function that maps two natural numbers, a and b onto a single natural number. We can name this function anything we like, but I'll use the letter C, for Georg Cantor, the eponymous discoverer of this pairing method. Interesting aside: Cantor, who was born in Russia but lived and worked in Germany through the late 19th and early 20th centuries, was also the inventor of Set Theory.

Andrew Paradis. Last edit: 31 December 2015

Derive the Pairing Function

Let C map two natural numbers onto a single natural number, and let a and b be in the natural numbers:
\[C:\mathbb{N} \times \mathbb{N} \to \mathbb{N}\]

\[{a, b} \in \mathbb{N}\]

If you think about a spreadsheet, then this notion of mapping is not all that complicated. Mapping two numbers into one is no different than using a "row" and "column" in a spreadsheet to represent the value in that cell, for example A7 = "Lasagna" is the mapping of the row A and column 7 to the string value "Lasagna" (and really, what better mapping to have).

Let's proceed along the lines of a spreadsheet. Obviously, we want whatever mapping we use to be unique. It won't do us any good if, once we map a and b into some value, we can't get a and b back out! If we fill a spreadsheet with all the natural numbers, we certainly are on the right track, since each combination of row and column will direct us to one, and only one, of the natural numbers. And, a clever trick is to be had... if we fill the spreadsheet diagonally, then we can grow it to any size we like!

For example, consider this matrix (aka, spreadsheet, table, array, etc.) of values, identified by row i and column j:

C
j = 0
1
2
3
4
i = 0
0
1
3
6
10
1
2
4
7
11
 
2
5
8
12
 
 
3
9
13
 
 
 
4
14
 
 
 
 

The matrix is populated with the natural numbers: start counting from the top and increment by one going down and to the left, until you hit the left edge, then go back to the top and continue in a similar pattern. We start our counting with zero.

Now, there is something very interesting going on here! Look at the i = 0 row... those values look familiar, don't they? They are, in fact, part of Pascal's Triangle! Pascal's Triangle is filled with patterns, and one of them is called the "triangular numbers," which are highlighted below.

 
 
 
 
 
1
 
 
 
 
 
 
 
 
 
1
 
1
 
 
 
 
 
 
 
1
 
2
 
1
 
 
 
 
 
1
 
3
 
3
 
1
 
 
 
1
 
4
 
6
 
4
 
1
 
1
 
5
 
10
 
10
 
5
 
1

So, why do we really care that these i = 0 values are the same as those in Pascal's Triangle? Because the values in Pascal's Triangle can be easily generated! That means, in our Cantor matrix, we get the first row for free, thanks to Pascal's Triangle.

The values in Pascal's Triangle are generated with the combinatorics definition \( \binom{n}{k}\) (aka, "n choose k"). Essentially, this means "how many ways are there of selecting n things, taking k at a time. Fortunately, we don't have to figure that out for each set of numbers, because "n choose k" is equivalent to:

\[\binom{n}{k}=\frac{n!}{k!(n-k)!}\]

Now, we can't just jump right in and apply this definition directly to i and j. We need to offset the combinatorics to account for the leading zero that starts our Cantor matrix, but only by one (as it is only shifted one place to the right). Doing that shift, and knowing that the Triangle numbers are in diagonal 2 of Pascal's Triangle (counting starts with 0), we get a generating expression for the top row:

\[C_{0,j}=T_j = \binom{j+1}{2}= \frac{(j+1)!}{2!(j+1-2)!} = \frac{(j+1)(j)(j-1)(j-2)...}{2(j-1)(j-2)...} = \frac{1}{2}(j+1)(j)\]

Looking back at our Cantor matrix, we can see that each subsequent row after the first has a pattern to how it increases. Each cell value is the sum of the value above it and the next value in diagonal 1 of Pascal's Triangle! But for each additional row, we need to start from one more offset of diagonal 1 in the calculation. Fortunately, it's not as messy as it sounds. We can just try a few variations of our Triangular number expression, adding in the row offset, and we arrive at our Cantor Pairing function.

\[C_{i,j}=\binom{i+j+1}{2}+i\]

Now we can stop talking about generic i and j indices, and get back to our two inputs, a and b. And, just as we did with \(T_j\), we will rewrite our combinatorics expression as a standard algebraic equation for ease of calculation.

\[C_{a,b}=\binom{a+b+1}{2}+a = \frac{1}{2}(a+b+1)(a+b) + a \tag{1} \]

We can simplify this somewhat, and set ourselves up for an easier time of inverting this function in the next section. Again relying on the definition of a Triangular number, we can see that for any natural number n:

\[T_n = \binom{n+1}{2} = \frac{1}{2}(n+1)(n)\]

We will use a simple substitution definition:

\[n = a + b \tag{2} \]

And now we arrive at our pairing function:

\[C_{a,b}= T_{a+b} + a \tag{3} \]

Invert the Pairing Function - There and Back Again

It won't do us much good if all we can do is calculate \(C_{a,b}\) and not retrieve a and b back out! So, we need to find the inverse, and then use the inverse of the function to obtain a and b from \(C_{a,b}\).

Let's return to our use of the substitution \(n = a + b\), and introduce \(z = C_{a,b}\) and \(k = a\). We could stick with all our original labels, but sometimes it's easier to make things look less "mathy" and simpler to solve. So, without any loss of generality, we now have:

\[z = T_n + k \tag{4} \]

The goal is to find some unique n and k that give us our known value of z. Once we do, we can solve for our original a and b very easily, since \(a=k\) and \(b = n - k\), and that will mean our inversion is complete.

Looking at the Cantor matrix once more, we can deduce that any given z is going to be contained within some "bounding set" of indices. Let's consider, then, finding a value of n so that:

\[T_n \leq z < T_{n+1}\]

\[\frac{1}{2}(n+1)(n) \leq z < \frac{1}{2}(n+2)(n+1) \tag{5}\]

Considering just the Left Hand Side (LHS) of equation (4), we see a happy little quadratic inequality to solve:

\[n^2 + n - 2z \leq 0\]

As usual with quadratics, this will have two solutions:

\[n \geq \frac{-1-\sqrt{8z+1}}{2} \tag{6a} \]

\[n \leq \frac{-1+\sqrt{8z+1}}{2} \tag{6b} \]

Notice the RHS of solution (6a)... it is always going to be less than zero! But, we already know that n must be greater or equal to zero, from substitution (2), because both a and b are, by definition, natural numbers.

So, we will keep (6b) as part of our overall solution for n, but let (6a) go the way of the Dodo.

Likewise, we can solve the RHS of equation (4), another quadratic inequality, again releasing the negative root, to find:

\[n > \frac{-3+\sqrt{8z+1}}{2} \tag{7} \]

Combining our solutions (6b) and (7), we can establish a range within which we expect to find n:

\[\frac{-3+\sqrt{8z+1}}{2} < n \leq \frac{-1+\sqrt{8z+1}}{2} \tag{8} \]

Now, whoa there! This bristly looking set of inequalities is not all that bad, but we can pretty it up a bit. Why bother prettying the math up? Because sometimes all the dangly bits just distract us from seeing a "big picture" within the equations, and it would be a shame to miss out on some solid conceptual understanding simply because we didn't bother to simplify our work!

To that end, let's introduce a substitution:

\[\alpha = \frac{-1+\sqrt{8z+1}}{2} \tag{9} \]

Using this definition for \(\alpha\), we get another piece of good news. Subtracting one from equation (9) shows us that:

\[\alpha - 1 = \frac{-3+\sqrt{8z+1}}{2} \]

Well, that's just the RHS of our stated range in equation (8)! That simplifies things nicely, and we see that our n of interest must satisfy:

\[\alpha - 1 < n \leq \alpha \]

THEREFORE the largest value of n occurs when \(n = \alpha\). Now, of course, \(\alpha\) is not necessarily a natural number, but n must be, so we simply round down (aka, use the floor function). Math is filled with interesting symbols, and the symbol for the floor operation on some variable x is \(\left \lfloor x \right \rfloor\). So, we can write for n:

\[n = \left \lfloor \alpha \right \rfloor \tag{10} \]

Right on! From (4), we can get a solution for k, since \(k=z - T_n = z - \frac{1}{2}(n+1)(n) \), where n is given to us through equations (9) and (10). And, using \(a = k \), substitution (2), and returning from z to \(C_{a,b}\) we finally find our inversion.

\[\begin{align} n=\left \lfloor {\frac{-1+\sqrt{8C_{a,b}+1}}{2}} \right \rfloor \\ a=C_{a,b}-\frac{1}{2}(n+1)(n)\\ b=n-a \end{align} \tag{11} \]

An Example

Let's take \(a=12, b=7\) and apply both the pairing function and its inverse.

We start by calculating the pairing, using equation (1):

\[C_{12,7} = \frac{1}{2}(12+7+1)(12+7)+12 = 202\]

Okay, that was quick! Now let's run the inverse and make sure we get back \(a=12, b=7\). Following equations (11) from the top down, we find:

\[n=\left \lfloor {\frac{-1+\sqrt{8(202)+1}}{2}} \right \rfloor = 19 \]

\[a=202-\frac{1}{2}(19+1)(19) = 12\]

\[b=19-12=7\]

Well, slick, it actually worked!