A Different Take on Infinity
In the previous post, we talked about the various ‘levels’ of infinity and how they come about through relatively simple ideas.
Now, I want to take about a different kind of infinity, that we briefly touched upon in the last article.
In the last article, we talked about how we can have infinitely long sequences of rational numbers that had no rational ‘limit’. For example, there is a infinitely long sequence of fractions that get ever closer to a number that, when multiplied by itself, will yield 2, but no fraction in this sequence, when multiplied by itself will ever be 2. We invented the real numbers to give names to these various ‘unreachable’ numbers.
For a long time, mathematicians believed that all these numbers could be described by algebraic equations with various powers. For example, they thought that every real number was the solution to some quadratic, cubic, quartic, etc. quations. For example, \(\sqrt{2}\) is the solution to the equation \(x^2 - 2 = 0\). The polynomial \(x^3 - 8x^2 + x - 1 = 0\) has other solutions that are irrational, but can be described as solutions to that polynomial equation.
Indeed, infinitely many reals are described this way. These are called the algebraic numbers.
But there are way more reals than that. Leibniz was the first to conjecture the existence of what we call the transcendental numbers, when he proved that \(\sin{x}\) was not the solution of any polynomial. In fact, most interesting reals are not algebraic. For example, both \(\pi\) and \(e\) are transcendental. That is, there is no polynomial with rational coefficients whose solution is \(\pi\) or \(e\).
Interestingly enough, when it comes to countability, the algebraic numbers are countable (polynomial equations are). That means the uncountability of the reals comes from the transcendental numbers.
As an aside, here are some particularly weird numbers:
- Chaitlin’s Constant (\(\Omega\)) – A real number that is the chance a randomly generated Turing machine will halt. This is actually a computable number, despite haltingness being impossible.
- Champernowne Constant – This is the number \(0.123456789101112\ldots\) (i.e., concatenate all the digits of the natural numbers). This is not a rational number, but is certainly real (you can construct a sequence of rationals that get ever closer to it, just take the numbers \(0.1\), \(0.12\), \(0.123\), etc.). There are infinitely many of these in all natural bases.
- Euler-Mascharoni Constant – Divergence between the harmonic series (\(\frac{1}{1}\), \(\frac{1}{2}\), \(\frac{1}{3}\), …) and the natural logarithm.
So, if we can’t write polynomials to describe these numbers and there are uncountably infinitely many of them, how can we describe them in a meaningful way?
One way to do this is to write a program that generates the rational sequence of numbers that eventually converges to them. The easiest way to do this is to write a program that computes the digits of the decimal expansion. Note that not all real numbers can be described this way (we’ll get to this later). In fact, since there’s a countable number of programs 1, there’s a countable number of these reals.
With this in mind, we can define a function \(K(x)\) that maps each computable real to the length of the shortest program that generates its digits 2. That is,
\[K(x) = {|p| , : , T(p) = x},\]
where \(|p|\) is the length of the program and \(T(p)\) is the output of the program when run. In mathematics, this concept is called Kolmogorov complexity.
Either way, it is clear that every computable real has a shortest program, given some language.
Note that the computable reals contain both transcendental numbers and all the algebraic numbers. Thus, they are ‘larger’ than the algebraic numbers. However, the computable reals are also countable. Thus, we have that the uncountability of the reals comes from the non-computable reals.
A random aside
We now interrupt this fascinating treatise on the real numbers for a random discussion on randomness.
Let’s consider a sequence of numbers:
\[{1, 192, 3281, 3, 112, }\].
Is this sequence random?
Oof… a loaded question. In order to answer it, you would have to know what comes next. But knowing that \(8192\) comes next wouldn’t actually help you to determine if the sequence is random any more than knowing just the first five terms.
No, in order to say whether the sequence is random, we must instead think about it holistically. We say a sequence is random if we cannot predict the next number. When something is predictable, then it’s not random. To be random means to have an uninspectable interiority that cannot be predicted.
So now we go back to what it means to be predictable. If something is predictable, then I can describe to you how to predict it. In other words, I can … write a program!
So now, going back to the sequence above. While I can’t say whether it’s random looking at only five terms, I can conjecture that, if you give me every term and I successfully write a program to predict every one, then the sequence is not random. Of course, in real life, we cannot get every term and determine if your program predicts them all correctly. To deal with this ambiguity we invented science and decided that, if we repeat enough times, then it probably works. This attitude is the main separation between scientists and mathematicians 3.
A step back to reality
Now we have two disparate concepts both concerning computer programs. On one hand we have random sequences of numbers, which are sequences that cannot be described using computer programs. On the other hand, we have the computable reals which are numbers that can be described by computer programs giving their digits to an arbitrary degree.
What does this say about all the non-computable reals? Well, if we take each digit in the exapansion of a non-computable real and put it in a sequence, then what do we know about that sequence?
We know that there is no program that can successfully calculate the next number 100% percent of the time. If we had this program, then the number would be computable.
Thus, the digits of the non-computable reals are random! There is no way to distinguish their expansions from random sequences. And yet, they certainly exist as the limit of rational numbers.
Turing Machines and infinity
When I said above that by program, I meant any description of a deterministic process, I think it’s important to say what that means. Deterministic does not mean finite. A Turing machine (which is one universal model for computation) is explicitly infinite. That is, their tape is infinite. In reality, no real computer is infinite. Computers are limited by the number of bits in their RAM. This means, any real computer is limited in that it can only simulate finite Turing machines. This means that there are infinitely many computable reals that cannot be described by even the largest modern computer (because there are reals whose Kolmogorov complexity is larger than the size of the memory of the largest computer).
Let’s examine one consequence of this with regard to AI models. GPT3 has 1.5 billion parameters. It’s parameters are 32-bit floating point numbers, so let’s say that it is characterized by 48 billion bits. A bit can be thought of as a yes/no question. There are \(2^{48,000,000,000}\) billion possibilities for the parameters of GPT3, and OpenAI, through the use of massive GPUs managed to come across one of those combinations that seems to encode general intelligence.
But there’s a problem with this narrative. There’s a limit to how much computation any computer can do. In GPT world, this is called the ‘context length’ and there’s a limit. Current SoTA LLMs have context lengths in the millions (i.e., they can analyze and respond to a conversation with approximately 1 million words). To go beyond this, you have to compress history to fit on the computer. Of course, there are only so many bits you have to do this.
Suppose you have \(N\) bits of space to compress the history of the conversation, and you’ve said \(2^N + 1 million\) words. Now what? Well, to generate the next token, you take into account the previous 1 million words and the history. You then produce a new token and a new set of bits for history. However, you’ve now generated \(2^N + 1\) total history values. By the Pigeonhole principle, the history value must be the same as a previous one. What happens now? At this point, since LLMs are deterministic (result of a computer program), it must start producing identical results.
To emphasize: any deterministic computer program that runs with a finite state will eventually produce the same output given the same input.
In other words, eventually, even the best LLM running on physical hardware must repeat itself.
Humans
Now the 1 billion dollar question: do humans suffer this same limitation? There are two thoughts.
The first thought is that our brains are physical and thus encode a certain possibly number of states (physicists call this entropy).
The second is that we are something more and thus unencodable in any number of states.
If the first thought is correct, then the human brain is simulatable on a computer of proper size. If the second, then the brain is harnessing some esoteric aspect of physics to achieve what it does. Computers cannot do this, thus cannot simulate the brain (and thus consciousness).
Or put another way, either the brain is computing something computable, or it is computing something uncomputable. If it’s uncomputable, then it’s random (in the sense described above), and this something more, which we call consciousness, is intrinsically tied to randomness (because if it weren’t, then it would be predictable, and inspectable, and interiority is a characteristic of consciousness).
Keep in mind that when I say ‘programs’, I don’t just mean computer programs. I mean, any deterministic process that arrives at a result.↩︎
The language used here is irrelevant in the sense that, while the result changes, it doesn’t change the conclusions reached. Moreover, we need not even talk about Kolmogorov complexity using a programming language. Natural language is sufficient. What’s the shortest list of English instructions that uniquely determine a method to write the digits of this number.↩︎
This is why I’m so adamant that science is a faith-based discipline. We conjecture that predicting ‘enough’ terms of some physical measurements is good enough to say we can predict every one. Then we are surprised when we encounter phenomena we cannot predict as if that ought to be unexpected. When in reality, that ought to be considered the norm. It’s surprising we can predict anything frankly.↩︎