When will we see a real Universal Library?

All of humanity’s knowledge at your fingertips. This promise sounds familiar – this is how a lot of people describe the internet. Indeed, we have unprecedented access to tons of information, and we need search engines to help us navigate this ocean of data. However, even if Google, Bing, DuckDuckGo and others are getting better at directly answering our questions, we can’t get access to all human knowledge on the internet yet. Wikipedia is “just” a summary of our knowledge; to access all of it, we need to be able to read every single book, article, journal, etc… ever published. Access to all of humanity’s knowledge means access to humanity’s entire scientific (and non-scientific) literature. Even if we have access to movies, songs, either legally or not, with the likes of Netflix, Spotify, or The Pirate Bay, access to books is much more restrained, even if some websites such as the Russian Library Genesis or the late Library.nu are prototypes of what a modern Universal Library could look like. Library Genesis, the only one of these sites still up, isn’t without its flaws, and is a bit confidential – it doesn’t come close to the title of Universal Library -, but it’s a good first step.
What should a Universal Library be like? What’s holding us back? How can we build one?

“Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.” — Samuel Johnson
A Universal Library is a representation of science. Gathering all human knowledge in one place creates a monolithic artefact I call the Universal Library. It contains all of what Popper called the third world or world three: all of humankind’s literature.
As Popper said, “instead of growing better memories and brains, we grow paper, pens, pencils, typewriters, dictaphones, the printing press, and libraries.”, yet today brain-enhancing tools like libraries are scattered around the globe, and are (academic libraries especially) inaccessible for most of us. The Universal Library is the ultimate tool we can create in order to store and retrieve all of our knowledge easily.
This is perhaps the only artefact we should send to an extraterrestrial civilization, the sum of all human knowledge. Such an artefact doesn’t exist today.
It is also our ultimate legacy. We have to make sure all our knowledge can survive our own existence. Like the monoliths in 2001: A Space Odyssey, which could have been created by a long extinct civilization, our Universal Library will have to still exist and be readable even if we disappear, just in case other intelligent beings stumble upon it.
L’Encyclopédie, perhaps the single most important book of the Enlightenment, was written with this sense of “urgency”. It was designed as The Book, the only book to preserve in the case of a global cataclysm if we were to preserve (most of, since the Encyclopédie was obviously a summary) our knowledge.
In a sense, books replaced cathedrals as mankind’s most enduring creation. Victor Hugo even wrote that “architecture will never again be the social, the collective, the dominant art. The great epic, the great monument, the great master-piece of mankind will never again be built; it will be printed.” Many compare scientific research to cathedral-building, likening the scientists, each publishing papers on a particular topic, to the masons adding stones to a particular part of a cathedral. (For example, Popper wrote: “All work in science is work directed towards the growth of objective knowledge. We are workers who are adding to the growth of objective knowledge as masons work on a cathedral.”)
Just like cathedrals were meant to last for millennia and took decades to be built, science transcends our existence as individuals; in As We May Think, Vannevar Bush wrote that “science has provided the swiftest communication between individuals; it has provided a record of ideas and has enabled man to manipulate and to make extracts from that record so that knowledge evolves and endures throughout the life of a race rather than that of an individual.”
Access to this “record”, as Bush called it, should be a human right. Every individual on this planet, no matter how much he earns or where he lives, should be able to read everything anyone ever published.
Yet fragmented access and storage of mankind’s literature make using and preserving our legacy difficult. Hence the idea of a Universal Library.

Having everything that’s ever been written (and published) all stored in one place isn’t a new idea, but it was always extremely costly and difficult to set up in the past due to technological limitations, which is why all the attempts at setting up universal libraries have failed.
From clay tablets to the Web, each new technology in publishing allowed more knowledge to be stored and disseminated more cheaply and easily. The amount of information we were able to store and the cost of access to this information respectively increased and decreased exponentially. It is mind boggling to imagine the difference between clay tablet encyclopedias, reserved to the Sumerian elite, and Wikipedia, accessible anywhere on Earth for anyone with an internet connection or a cellphone.
The most famous innovation/revolution in publishing is arguably the printing press, which industrialized knowledge dissemination and conservation: books were no longer scarce and expensive; the scientific community grew exponentially and thrived in all of the developed world thanks to cheap access to scientific works, and mass-education was made possible.
The cheap and easy copying of books made possible by the printing press enabled more texts to be preserved, since many different copies could be stored in different locations, and reprinted at will. With the printing press, the physical book didn’t matter anymore. Only the information it contained did (this is even more true with digital texts now). Hugo also referred to this revolution in preservation, writing “Under the form of printing, thought is more imperishable than ever; it is volatile, intangible, indestructible”.
The modern Universal Library will be the next revolution. The internet allows many copies of the Universal Library to be made by both institutions and individuals, so that preservation won’t be an issue. As Time journalist Michael Scherer put it: “The internet is Gutenberg on steroids, a printing press without ink, overhead or delivery costs”.

Yet the internet isn’t seen this way by publishers. They still behave like books are a “scarce” commodity, while the internet allows unlimited distribution of books for free. If the publishers really embraced the internet, they would publish their books/journals for free, instead of charging exorbitant amounts of money for pdfs.
Such resistance to change isn’t new. In its infancy, – in the 1500s and even in the 1600s – the printing press wasn’t recognized as the best tool to distribute knowledge more broadly and easily. In his Novum Organum (110), Bacon talked about those who didn’t acknowledge the superiority of the printing press over hand-copied books: “For however the discovery of gunpowder, silk, the compass, sugar, paper, or the like, may appear to depend on peculiar properties of things and nature, printing at least involves no contrivance which is not clear and almost obvious. But from want of observing that although the arrangement of the types of letters required more trouble than writing with the hand, yet these types once arranged serve for innumerable impressions, while manuscript only affords one copy; […] this most beautiful invention (which assists so materially the propagation of learning) remained unknown for so many ages.” So yes, the transition from copying by hand to the printing press wasn’t immediate. It took the actions of individuals (in this case, publishers) to bring this technology to the mainstream. With the internet, these individuals won’t be publishers. They will be programmers, “pirates”, hackers, amateurs, people who care about knowledge and its dissemination and want all human knowledge to be at the fingertips of everybody.

The most famous attempt at setting up a Universal Library was arguably the Library of Alexandria. Though it probably was the largest and richest library of its time, it never was a universal library, but was meant to become one, acquiring as many books as possible, and making them available to scholars from different regions of the mediterranean. Just imagine what it meant at the time to be able to browse through the stacks of the library, through thousands of manuscripts from around the (known) world, to be able to let your mind wander among the knowledge of entire civilizations! Such unprecedented access allowed the scholars of Alexandria to make remarkable discoveries that would have been impossible without the ressources of the Library. Unfortunately, the Library was completely destroyed by fire.
In China, an early fifteenth-century encyclopedia called the Yongle dadian ran to more than 10,000 volumes. It was so expensive to print that very few copies were made. Less than 4 percent of it has survived.
Books are fragile. When they are rare, or unique, a single fire, storm, earthquake, can wipe them out of the surface of the Earth. Even books for which hundreds of copies exist can disappear with time, wars, or natural disasters…
The internet offers us a new and unique opportunity to preserve all of them. An internet Universal Library could be easily copied on hard drives placed in safes, and allowing users to get their own copies of the books they consult will create even more numerous back ups.
Of course this method isn’t immune to everything. Hackers and malevolent governments and corporations would try to destroy the modern Universal Library. Time and decay could corrupt the hard drives the Library is stored on. However, the whole point of having an internet Universal Library is decentralization. In other words, there would be so many copies of it that it would be nearly impossible to destroy (and along with the preservation of the physical books, there would be a physical backup just in case… ). With offline backups, users’ backups, and many copies stored on state of the art servers in different locations, it would be very tough to destroy the modern Universal Library.

Another weakness of physical libraries that would be addressed by the Universal Library is the one of access. In the past, only noblemen and bourgeois had access to knowledge, because they could afford to learn to read and could be allowed in universities which had libraries. Though some libraries allowed more people than others, like the Ambrosiana in Milan in the early seventeenth-century, which seemed to be open to basically anyone who asked, the individual’s location and wealth were still key to his access to libraries. If you had the chance to live in Milan you could have visited the Ambrosiana, but if you were in rural Italy and not rich enough to afford the trip, you were basically out of luck.
This is still the main impediment to access to libraries. Not a lot of Americans can have easy access to the Library of Congress. Most of the world’s population lives in places where libraries are small, poor, or more often inexistant. If in theory most of the libraries across the globe don’t put restrictions on who can access them based on one’s wealth or social status, they are still mostly inaccessible and can’t offer many ressources.
On the other hand, an internet Universal Library, while by definition offering as many ressources as possible (everything that’s ever been published, no less), is accessible to anyone with an internet connection through a cyber-café, a personal computer or a cellphone. Of course billions of people lack access to the internet, but I’m hopeful the internet will reach them soon enough – at least before libraries reach them, for sure…

When you are lucky enough to have access to a library, it does a pretty terrible job at helping you navigate the maze of information we are faced with now. They of course only let you search through their catalog, since searching on another library’s catalog wouldn’t be very useful if you can’t access the physical books…but what about all the other books out there, those which aren’t in the library’s catalog?
Even when there wasn’t as many literature as nowadays, people needed tools to guide them through the forest of publications. That was one of the purposes of encyclopedias. Now in the twenty-first century we have search engines run by powerful algorithms, but they’re not quite good enough yet.
Google is a great tool, but it doesn’t have access to everything – scholarly publications especially are locked inside publishers’ databases and are behind paywalls – if you want to really get a good look at most of the literature, you have to switch between multiple tools: Google, Elsevier, Wiley, Springer’s databases, etc… It’s a very time consuming process the Universal Library should make fast and simple.

Before we look at what the modern Universal Library should look like, let me take a moment to address the issue of copyright and of the publishers retaining it. If the Universal Library is to give free access to all human literature, this is likely going to be quite an issue.
I am going to focus on the scientific literature here – though most of the outrageous publishing environment (abusive corporations, lengthy copyright) applies to non-scientific publishing as well -.
So, publishers. The “big three” (Wiley, Springer, Elsevier) and a few others retain a monopoly on scientific publications, and behave like a cartel, making deals to not compete with one another (just look at their prices, which are kept very high and are the same for all the different publishers). As they refuse to compete, they are very unlikely to change their business model. I’m surprised they haven’t been under investigation for antitrust… As they have the copyrights of most of the scientific publications in circulation, they can charge sky-high prices for simple pdfs, and they are quick to call “pirate” anyone who tries to make these papers more available. “Pirate” is a word linked to brutality, theft and blood. But in publishing, pirates are hardly what you would normally call criminals. There were pirates from the beginning of the publishing business. What were and are they still doing? Refusing to respect the copyright monopoly in order to make publications more affordable. That is much less brutal than boarding a ship and massacring all its crew, right? Furthermore, pirates served an essential role by publishing censored works (like those of Newton!), which “respectable” publishing houses were too scared to print. In publishing, a pirate is a person or corporation which makes publications more affordable or just available in the first place. The scholar Adrian Johns, in his amazing book Piracy takes his reader to the historical roots of “piracy”, explaining the pirate reprint industry like this: “A Dickens novel might appear first in a good-quality reprint by Carey or some other respectable firm; then a cheap piracy of that reprint; then in chapbooks, then in serialized forms; then in provincial newspapers; then in 25 cents “railroad” editions; and finally as chapters printed on railway timetables. As this happened, distinctions between propriety and transgression became increasingly blurred. Reprinters who ignored the courtesies issued popular works in enormous quantities and at very low prices. A five-volume Macaulay appeared in sixty thousand copies, at 15 cents per volume. Reprinters also issued science (Liebig’s Chemistry) in impressions well into the tens of thousands. And just as Careys and Harpers justified their own reproductions as moral enterprises, so these “pirates” (as Carey called them) openly defended theirs as exemplifying republican values. Here, after all, was an endeavor that distributed improving literature and authoritative ideas in unprecedented quantities and at extraordinarily low prices. It arguably did more to make America a truly lettered republic than any number of polite Philadelphia publications. It was in monarchial England, one pirate observed, that special societies had to be created to push useful knowledge out; here, entrepreneurs of knowledge responded to the pull of the masses.”
Did you know that publishers even tried to outlaw libraries? They argued that if people could read books for free, no one would buy any new books, and thus no one would write books anymore. This was in the late eighteenth to mid nineteenth century. Publishers lost their fight against libraries, and as we know, no one wrote a book since 1850, right?
Today they fight against file-sharers who make their publications available for free online, like library.nu. Their argument is still completely ridiculous, just as it was in 1850, but with the corrupt governments we have now, they are winning the legal battle against file-sharing (though hopefully technology won’t let them win the war). Tom Reller from Elsevier, which relies on libraries subscribing to its publications for most of its revenues, said: “We can’t allow published journal articles to be freely accessible on a large scale – especially not through other for-profit companies, who want to benefit from our and other publishers’ efforts. What library will continue to subscribe if a growing proportion of articles is available for free elsewhere?”. So basically, people shouldn’t have access to knowledge, even though it’s technologically absolutely possible, because it could hurt the big publishing multinationals.
I can imagine Shell, Exxon or BP saying the same thing about, say, electric cars: “If people buy electric cars, who will buy our oil?” Technological progress is not here to please corporations but to advance the human race. If we can have clean, electric transportation, that’s a progress, and if it kills the oil giants, so be it. If we can have free access to knowledge on the internet, that’s a progress, and if it kills the publishing giants, so be it. As Peter Murray-Rust said, “Publishers should be the servants of knowledge – at present they are becoming the tyrants.”
So when we face a big, inefficient (I ordered a book on neurology from Springer in February, it’s out of print and there isn’t an ebook version – I have yet to receive it – : a chance I am not a doctor needing some vital information for a patient…) and monopolistic system, our best course of action is to rebel. To follow Aaron Swartz’s Guerilla Open Access Manifesto, to liberate all this knowledge. To make all scientific literature available online for free. That’s basically the definition of the modern Universal Library.

Making all these publications available for free would violate copyright law. A law that has been invented to create a monopoly for publishers. This law has no moral basis, just like a law prohibiting people from buying electric cars to preserve the oil industry.
Copyright law is supposed to be about property. However property is based on scarcity, and there is no such thing as scarcity with electronic files. If you take my car, I don’t have a car anymore. If you make a copy of my ebook, I still have my ebook. Stephan Kinsella wrote a great book on this topic, Against Intellectual Property. In it he gives a very libertarian view of this issue, which is the only reasonable one: “Only tangible, scarce ressources are the possible object of interpersonal conflict, so it is only for them that property rules are applicable. Thus, patents and copyrights are unjustifiable monopolies granted by government legislation.”
As there cannot be conflict over things of infinite abundance, property rights cannot apply to them. Thus copyright is untenable.

Now that the issue of copyright is settled, that we have looked a bit into the historical roots of the idea of a Universal Library and at the book technologies throughout history, let’s see what a modern Universal Library should look like.
Two quotes to begin with:
“There are worse crimes than burning books. One of them is not reading them.” — Joseph Brodsky
“Free Libraries for every soul.” — Melvil Dewey

-Access to every book, article; everything that’s ever been published: The modern Universal Library needs to be the repository of all humankind’s knowledge, thus of everything man has ever published. This will allow scattered publications (for example, in scientific publishing articles on any topic appear on a very large number of publications, making tracking every one of them difficult) to be gathered in one place.
Search will be made orders of magnitude more simple, since you’ll only have to search one database to embrace the whole of human knowledge.

-Open APIs: users ought to be able to build on top of the Universal Library’s code if they want to.

-All the publications in the Universal Library must be machine readable and indexable: users will need powerful algorithms to search through the immense collection of the Universal Library. Perhaps some users or companies will want to improve the search tool and build their own, or even create an AI that could look through the sum of human knowledge to find information. They should be able to do so thanks to open APIs and the entire content of every publication being machine readable. To quote Vannevar Bush on the need for good search tools: “The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record.
The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.”

-No limitation on data mining, of course. I want ContentMine to be able to mine all the literature! Note that publishers are actively trying to restrict data mining. They can try, but the Universal Library won’t let them.

-Users should be able to download any publication they want.

-The Universal Library should be protected from Law Enforcement and rogue governments, corporations and individuals. Perhaps the Universal Library should be hosted on the Deep Web, and thus only be accessible via Tor? (If it is compatible with the openness of the project: can the Universal Library still be machine readable if it is hosted on the Deep Web?) Servers should be backed-up in different places, as described earlier, too, so that copies exist even if the website is taken down.
The World Brain described by H. G. Wells in the eponymous novel was supposed to be backed-up too: “In these days of destruction, violence and general insecurity, it is comforting to think that the brain of mankind, the race brain, can exist in numerous replicas throughout the world.”
Finally the Universal Library should probably be open source so that people can improve the code.

-Users should be protected, too. Encryption must be used so that it’s difficult to trace what users do and/or download on the Universal Library. Using the Deep Web would be a plus here too, since it would be very hard for Law Enforcement to trace users’ IP addresses.
No matter what technologies are used, the Universal Library must be super secure for its users, so that knowledge is accessible even in countries where the internet is censored.
To quote Timothy C. May in the Crypto-Anarchist Manifesto: “And just as a seemingly minor invention like barbed wire made possible the fencing off of vast ranches and farms, thus altering forever the concepts of land property rights in the frontier West, so too will the seemingly minor discovery out of an arcane branch of mathematics come to be the wire clippers which dismantle the barbed wire around intellectual property.”

These are the basic specs of a modern Universal Library.

What would its impact be? For society at large, according to technologists like Kevin Kelly, it would be quite huge: “If you have access to anything that’s been written, not just theoretical access, but like, instant access, next to your brain, that changes your idea of who you are” Kelly said in the documentary Google and the World Brain, adding “It’s all human knowledge… woven into a single entity that’s accessible by anybody, anywhere in the world, anytime. And that “all knowledge” is transformative, it just really kicks up the civilization and our society into another level.”
For science, it would mean a tremendous acceleration of research, especially for scientists in the developing world and citizen scientists, who usually have very little access to scientific literature. By making knowledge more accessible, new discoveries are made faster. It’s a correlation the nineteenth-century scientist Hermann von Helmholtz noticed, remarking the link between intellectual progress and the availability of “appliances” (catalogues, lexicons, etc) that made knowledge “immediately accessible”. Though catalogues were quite primitive, the ultimate “appliance” being the Universal Library, the improved accessibility made possible by such tools created noticeable progress. So we shouldn’t underestimate the impact a Universal Library could have on scientific research.
Furthermore, little known publications would be much more visible, since they would be in the same database as the famous and big ones. This is crucial so that scientists don’t waste their time doing things that have already been done, and can communicate their discoveries easily. An old example of a discovery being delayed because of a lack of visibility, but a telling one: Gregor Mendel’s work on heredity took more than thirty years to be noticed by other researchers who built upon it to ultimately create the whole new field of genetics, just because it was published in the proceedings of a local natural history society, and thus wasn’t visible to a lot of scientists…

When will we see a Universal Library? Who will create it?
We can’t trust publishers, which have their shareholders’ interests in mind, not their customers’.
We can’t trust governments, which are corrupt. Banks, the military-industrial complex, and even publishers have bought them. In Western countries, anti-science is rampant inside governments, which are very unlikely to start a project like the Universal Library that would benefit all humankind. Even if they did, it would be plagued by bureaucracy (healthcare.gov anyone?). Governments would try to control who can access knowledge, just like they’ve always done. If this is subtle in the West today, keep in mind that black people in the US at the beginning of the twentieth-century couldn’t access public libraries! In the developing world, sectarianism is ubiquitous and corruption is much worse than in the West. There isn’t one government on Earth capable of building the modern Universal Library.
Independent individuals, so called “pirates”, will create the Universal Library.

The technology is here. We now need bold and talented people to step in and create the Universal Library. People who aren’t afraid of corrupt laws and bureaucrats, who are willing to take risks. I know there are some out there.


  1. Love the way copyright is dismissed in a few sentences, and it then assumed it is got rid of. Not so fast, and not so easy. Firstly, even if it has been hijacked by some large media companies, copyright does serve some purposes for creators, and secondly those media companies’ vested interests will ensure that it won’t be abolished any time soon.

