LLMs are an Index Into the Library of Babel

A preson holding a smartphone in a Library of Babel

„The Library of Babel“ is my favorite short story from Jorge Luis Borges, and probably in a top 5 of my favorite short stories overall. In it author describes extremely large, yet finite, library containing myriads of rooms filled with books. Books contain all possible permutations of characters, and as such every possible book is in a library. Furthermore, every unwritten book is in the library. The library contains the story of your life, correct to the tiniest of details, as well as countless such biographies that are riddled with errors.

I like this story because it seems to throw our way of thinking off the balance. Our mental shortcuts lead us to the idea that the library contains all of the words information. After all, any kind of information that can be expressed in the book format exists somewhere on a shelf. And since the library is finite, we don’t get suspicious of any mathematical shenanigans going on like those weird infinite hotel paradoxes everybody just gave up trying to understand empirically.

In reality, of course, the library contains exactly as much information as a blank paper. The chance that any book you pick up from a shelf will be even intelligible, not even talking about in any sense consistent, is so diminishingly small that it’s effectively zero. Mathematically, choosing a book at random is exactly equivalent to writing such book by choosing each character at random. No new information is revealed in the either process.

The Library of Babel leads to a tiny cognitive dissonance because it forces us to think about the lack of information in terms of over-abundance of choice, firing up the cognitive bias we think about so rarely that it doesn’t even have a name as far as I’m aware: our tendency to assume order, rather than entropy, is the natural state of things. However small the distinction might seem, blank page contains no information not because it lacks letters, but because it could potentially contain any sequence of letters with an equal probability.

Imagine yourself in a library of babel, but now you have a map in your possession. How exactly such a map could work is hard to comprehend, so let’s imagine it as a smartphone app and hide its complexity in a black box of technology. The library of babel index app lets you enter whatever book you search for and the app will tell you which book to pick. Of course, unlike the library itself, the app was created by mere humans so it works by generalising from a small subset of books that have been discovered and categorised before. It often finds the right books, especially if the answer is in one of those previously known tomes. Since the app is guided by heuristic search, the further the question lies from the realm of pre-catalogued knowledge, the more likely the answer to be wrong or useless.

Modern Large Language Models, similarly to the library map, navigate us through deserts of entropy to an oasis of human knowledge. If the empty page represents perfect entropy and lack of information, LLM represents information condensed to its probabilistic essence, a way to compress and archive the knowledge itself. Yet, they echo the library’s map in a crucial way: the path to truly novel ideas, to undiscovered knowledge, remains hidden. The enigmatic process of invention, of forging new knowledge by melding existing ideas, is an endeavor distinct from mere search and retrieval. Somewhere in the library there is a book describing a cure for cancer, but unless its on a periphery of human knowledge as it stands, it’s effectively impossible for map to reach it.

LLMs stand as a testament to one of the century’s most significant technological advancements, revolutionizing our interaction with the known. However, in their current form, they lack the capacity for genuine invention, the creative spark that remains uniquely human. Thus, while they represent a significant leap forward, they are not harbingers of a technological singularity. Not yet, anyway.