When will we see a real Universal Library?

All of humanity’s knowledge at your fingertips. This promise sounds familiar – this is how a lot of people describe the internet. Indeed, we have unprecedented access to tons of information, and we need search engines to help us navigate this ocean of data. However, even if Google, Bing, DuckDuckGo and others are getting better at directly answering our questions, we can’t get access to all human knowledge on the internet yet. Wikipedia is “just” a summary of our knowledge; to access all of it, we need to be able to read every single book, article, journal, etc… ever published. Access to all of humanity’s knowledge means access to humanity’s entire scientific (and non-scientific) literature. Even if we have access to movies, songs, either legally or not, with the likes of Netflix, Spotify, or The Pirate Bay, access to books is much more restrained, even if some websites such as the Russian Library Genesis or the late Library.nu are prototypes of what a modern Universal Library could look like. Library Genesis, the only one of these sites still up, isn’t without its flaws, and is a bit confidential – it doesn’t come close to the title of Universal Library -, but it’s a good first step.
What should a Universal Library be like? What’s holding us back? How can we build one?

“Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.” — Samuel Johnson
A Universal Library is a representation of science. Gathering all human knowledge in one place creates a monolithic artefact I call the Universal Library. It contains all of what Popper called the third world or world three: all of humankind’s literature.
As Popper said, “instead of growing better memories and brains, we grow paper, pens, pencils, typewriters, dictaphones, the printing press, and libraries.”, yet today brain-enhancing tools like libraries are scattered around the globe, and are (academic libraries especially) inaccessible for most of us. The Universal Library is the ultimate tool we can create in order to store and retrieve all of our knowledge easily.
This is perhaps the only artefact we should send to an extraterrestrial civilization, the sum of all human knowledge. Such an artefact doesn’t exist today.
It is also our ultimate legacy. We have to make sure all our knowledge can survive our own existence. Like the monoliths in 2001: A Space Odyssey, which could have been created by a long extinct civilization, our Universal Library will have to still exist and be readable even if we disappear, just in case other intelligent beings stumble upon it.
L’Encyclopédie, perhaps the single most important book of the Enlightenment, was written with this sense of “urgency”. It was designed as The Book, the only book to preserve in the case of a global cataclysm if we were to preserve (most of, since the Encyclopédie was obviously a summary) our knowledge.
In a sense, books replaced cathedrals as mankind’s most enduring creation. Victor Hugo even wrote that “architecture will never again be the social, the collective, the dominant art. The great epic, the great monument, the great master-piece of mankind will never again be built; it will be printed.” Many compare scientific research to cathedral-building, likening the scientists, each publishing papers on a particular topic, to the masons adding stones to a particular part of a cathedral. (For example, Popper wrote: “All work in science is work directed towards the growth of objective knowledge. We are workers who are adding to the growth of objective knowledge as masons work on a cathedral.”)
Just like cathedrals were meant to last for millennia and took decades to be built, science transcends our existence as individuals; in As We May Think, Vannevar Bush wrote that “science has provided the swiftest communication between individuals; it has provided a record of ideas and has enabled man to manipulate and to make extracts from that record so that knowledge evolves and endures throughout the life of a race rather than that of an individual.”
Access to this “record”, as Bush called it, should be a human right. Every individual on this planet, no matter how much he earns or where he lives, should be able to read everything anyone ever published.
Yet fragmented access and storage of mankind’s literature make using and preserving our legacy difficult. Hence the idea of a Universal Library.

Having everything that’s ever been written (and published) all stored in one place isn’t a new idea, but it was always extremely costly and difficult to set up in the past due to technological limitations, which is why all the attempts at setting up universal libraries have failed.
From clay tablets to the Web, each new technology in publishing allowed more knowledge to be stored and disseminated more cheaply and easily. The amount of information we were able to store and the cost of access to this information respectively increased and decreased exponentially. It is mind boggling to imagine the difference between clay tablet encyclopedias, reserved to the Sumerian elite, and Wikipedia, accessible anywhere on Earth for anyone with an internet connection or a cellphone.
The most famous innovation/revolution in publishing is arguably the printing press, which industrialized knowledge dissemination and conservation: books were no longer scarce and expensive; the scientific community grew exponentially and thrived in all of the developed world thanks to cheap access to scientific works, and mass-education was made possible.
The cheap and easy copying of books made possible by the printing press enabled more texts to be preserved, since many different copies could be stored in different locations, and reprinted at will. With the printing press, the physical book didn’t matter anymore. Only the information it contained did (this is even more true with digital texts now). Hugo also referred to this revolution in preservation, writing “Under the form of printing, thought is more imperishable than ever; it is volatile, intangible, indestructible”.
The modern Universal Library will be the next revolution. The internet allows many copies of the Universal Library to be made by both institutions and individuals, so that preservation won’t be an issue. As Time journalist Michael Scherer put it: “The internet is Gutenberg on steroids, a printing press without ink, overhead or delivery costs”.

Yet the internet isn’t seen this way by publishers. They still behave like books are a “scarce” commodity, while the internet allows unlimited distribution of books for free. If the publishers really embraced the internet, they would publish their books/journals for free, instead of charging exorbitant amounts of money for pdfs.
Such resistance to change isn’t new. In its infancy, – in the 1500s and even in the 1600s – the printing press wasn’t recognized as the best tool to distribute knowledge more broadly and easily. In his Novum Organum (110), Bacon talked about those who didn’t acknowledge the superiority of the printing press over hand-copied books: “For however the discovery of gunpowder, silk, the compass, sugar, paper, or the like, may appear to depend on peculiar properties of things and nature, printing at least involves no contrivance which is not clear and almost obvious. But from want of observing that although the arrangement of the types of letters required more trouble than writing with the hand, yet these types once arranged serve for innumerable impressions, while manuscript only affords one copy; […] this most beautiful invention (which assists so materially the propagation of learning) remained unknown for so many ages.” So yes, the transition from copying by hand to the printing press wasn’t immediate. It took the actions of individuals (in this case, publishers) to bring this technology to the mainstream. With the internet, these individuals won’t be publishers. They will be programmers, “pirates”, hackers, amateurs, people who care about knowledge and its dissemination and want all human knowledge to be at the fingertips of everybody.

The most famous attempt at setting up a Universal Library was arguably the Library of Alexandria. Though it probably was the largest and richest library of its time, it never was a universal library, but was meant to become one, acquiring as many books as possible, and making them available to scholars from different regions of the mediterranean. Just imagine what it meant at the time to be able to browse through the stacks of the library, through thousands of manuscripts from around the (known) world, to be able to let your mind wander among the knowledge of entire civilizations! Such unprecedented access allowed the scholars of Alexandria to make remarkable discoveries that would have been impossible without the ressources of the Library. Unfortunately, the Library was completely destroyed by fire.
In China, an early fifteenth-century encyclopedia called the Yongle dadian ran to more than 10,000 volumes. It was so expensive to print that very few copies were made. Less than 4 percent of it has survived.
Books are fragile. When they are rare, or unique, a single fire, storm, earthquake, can wipe them out of the surface of the Earth. Even books for which hundreds of copies exist can disappear with time, wars, or natural disasters…
The internet offers us a new and unique opportunity to preserve all of them. An internet Universal Library could be easily copied on hard drives placed in safes, and allowing users to get their own copies of the books they consult will create even more numerous back ups.
Of course this method isn’t immune to everything. Hackers and malevolent governments and corporations would try to destroy the modern Universal Library. Time and decay could corrupt the hard drives the Library is stored on. However, the whole point of having an internet Universal Library is decentralization. In other words, there would be so many copies of it that it would be nearly impossible to destroy (and along with the preservation of the physical books, there would be a physical backup just in case… ). With offline backups, users’ backups, and many copies stored on state of the art servers in different locations, it would be very tough to destroy the modern Universal Library.

Another weakness of physical libraries that would be addressed by the Universal Library is the one of access. In the past, only noblemen and bourgeois had access to knowledge, because they could afford to learn to read and could be allowed in universities which had libraries. Though some libraries allowed more people than others, like the Ambrosiana in Milan in the early seventeenth-century, which seemed to be open to basically anyone who asked, the individual’s location and wealth were still key to his access to libraries. If you had the chance to live in Milan you could have visited the Ambrosiana, but if you were in rural Italy and not rich enough to afford the trip, you were basically out of luck.
This is still the main impediment to access to libraries. Not a lot of Americans can have easy access to the Library of Congress. Most of the world’s population lives in places where libraries are small, poor, or more often inexistant. If in theory most of the libraries across the globe don’t put restrictions on who can access them based on one’s wealth or social status, they are still mostly inaccessible and can’t offer many ressources.
On the other hand, an internet Universal Library, while by definition offering as many ressources as possible (everything that’s ever been published, no less), is accessible to anyone with an internet connection through a cyber-café, a personal computer or a cellphone. Of course billions of people lack access to the internet, but I’m hopeful the internet will reach them soon enough – at least before libraries reach them, for sure…

When you are lucky enough to have access to a library, it does a pretty terrible job at helping you navigate the maze of information we are faced with now. They of course only let you search through their catalog, since searching on another library’s catalog wouldn’t be very useful if you can’t access the physical books…but what about all the other books out there, those which aren’t in the library’s catalog?
Even when there wasn’t as many literature as nowadays, people needed tools to guide them through the forest of publications. That was one of the purposes of encyclopedias. Now in the twenty-first century we have search engines run by powerful algorithms, but they’re not quite good enough yet.
Google is a great tool, but it doesn’t have access to everything – scholarly publications especially are locked inside publishers’ databases and are behind paywalls – if you want to really get a good look at most of the literature, you have to switch between multiple tools: Google, Elsevier, Wiley, Springer’s databases, etc… It’s a very time consuming process the Universal Library should make fast and simple.

Before we look at what the modern Universal Library should look like, let me take a moment to address the issue of copyright and of the publishers retaining it. If the Universal Library is to give free access to all human literature, this is likely going to be quite an issue.
I am going to focus on the scientific literature here – though most of the outrageous publishing environment (abusive corporations, lengthy copyright) applies to non-scientific publishing as well -.
So, publishers. The “big three” (Wiley, Springer, Elsevier) and a few others retain a monopoly on scientific publications, and behave like a cartel, making deals to not compete with one another (just look at their prices, which are kept very high and are the same for all the different publishers). As they refuse to compete, they are very unlikely to change their business model. I’m surprised they haven’t been under investigation for antitrust… As they have the copyrights of most of the scientific publications in circulation, they can charge sky-high prices for simple pdfs, and they are quick to call “pirate” anyone who tries to make these papers more available. “Pirate” is a word linked to brutality, theft and blood. But in publishing, pirates are hardly what you would normally call criminals. There were pirates from the beginning of the publishing business. What were and are they still doing? Refusing to respect the copyright monopoly in order to make publications more affordable. That is much less brutal than boarding a ship and massacring all its crew, right? Furthermore, pirates served an essential role by publishing censored works (like those of Newton!), which “respectable” publishing houses were too scared to print. In publishing, a pirate is a person or corporation which makes publications more affordable or just available in the first place. The scholar Adrian Johns, in his amazing book Piracy takes his reader to the historical roots of “piracy”, explaining the pirate reprint industry like this: “A Dickens novel might appear first in a good-quality reprint by Carey or some other respectable firm; then a cheap piracy of that reprint; then in chapbooks, then in serialized forms; then in provincial newspapers; then in 25 cents “railroad” editions; and finally as chapters printed on railway timetables. As this happened, distinctions between propriety and transgression became increasingly blurred. Reprinters who ignored the courtesies issued popular works in enormous quantities and at very low prices. A five-volume Macaulay appeared in sixty thousand copies, at 15 cents per volume. Reprinters also issued science (Liebig’s Chemistry) in impressions well into the tens of thousands. And just as Careys and Harpers justified their own reproductions as moral enterprises, so these “pirates” (as Carey called them) openly defended theirs as exemplifying republican values. Here, after all, was an endeavor that distributed improving literature and authoritative ideas in unprecedented quantities and at extraordinarily low prices. It arguably did more to make America a truly lettered republic than any number of polite Philadelphia publications. It was in monarchial England, one pirate observed, that special societies had to be created to push useful knowledge out; here, entrepreneurs of knowledge responded to the pull of the masses.”
Did you know that publishers even tried to outlaw libraries? They argued that if people could read books for free, no one would buy any new books, and thus no one would write books anymore. This was in the late eighteenth to mid nineteenth century. Publishers lost their fight against libraries, and as we know, no one wrote a book since 1850, right?
Today they fight against file-sharers who make their publications available for free online, like library.nu. Their argument is still completely ridiculous, just as it was in 1850, but with the corrupt governments we have now, they are winning the legal battle against file-sharing (though hopefully technology won’t let them win the war). Tom Reller from Elsevier, which relies on libraries subscribing to its publications for most of its revenues, said: “We can’t allow published journal articles to be freely accessible on a large scale – especially not through other for-profit companies, who want to benefit from our and other publishers’ efforts. What library will continue to subscribe if a growing proportion of articles is available for free elsewhere?”. So basically, people shouldn’t have access to knowledge, even though it’s technologically absolutely possible, because it could hurt the big publishing multinationals.
I can imagine Shell, Exxon or BP saying the same thing about, say, electric cars: “If people buy electric cars, who will buy our oil?” Technological progress is not here to please corporations but to advance the human race. If we can have clean, electric transportation, that’s a progress, and if it kills the oil giants, so be it. If we can have free access to knowledge on the internet, that’s a progress, and if it kills the publishing giants, so be it. As Peter Murray-Rust said, “Publishers should be the servants of knowledge – at present they are becoming the tyrants.”
So when we face a big, inefficient (I ordered a book on neurology from Springer in February, it’s out of print and there isn’t an ebook version – I have yet to receive it – : a chance I am not a doctor needing some vital information for a patient…) and monopolistic system, our best course of action is to rebel. To follow Aaron Swartz’s Guerilla Open Access Manifesto, to liberate all this knowledge. To make all scientific literature available online for free. That’s basically the definition of the modern Universal Library.

Making all these publications available for free would violate copyright law. A law that has been invented to create a monopoly for publishers. This law has no moral basis, just like a law prohibiting people from buying electric cars to preserve the oil industry.
Copyright law is supposed to be about property. However property is based on scarcity, and there is no such thing as scarcity with electronic files. If you take my car, I don’t have a car anymore. If you make a copy of my ebook, I still have my ebook. Stephan Kinsella wrote a great book on this topic, Against Intellectual Property. In it he gives a very libertarian view of this issue, which is the only reasonable one: “Only tangible, scarce ressources are the possible object of interpersonal conflict, so it is only for them that property rules are applicable. Thus, patents and copyrights are unjustifiable monopolies granted by government legislation.”
As there cannot be conflict over things of infinite abundance, property rights cannot apply to them. Thus copyright is untenable.

Now that the issue of copyright is settled, that we have looked a bit into the historical roots of the idea of a Universal Library and at the book technologies throughout history, let’s see what a modern Universal Library should look like.
Two quotes to begin with:
“There are worse crimes than burning books. One of them is not reading them.” — Joseph Brodsky
“Free Libraries for every soul.” — Melvil Dewey

-Access to every book, article; everything that’s ever been published: The modern Universal Library needs to be the repository of all humankind’s knowledge, thus of everything man has ever published. This will allow scattered publications (for example, in scientific publishing articles on any topic appear on a very large number of publications, making tracking every one of them difficult) to be gathered in one place.
Search will be made orders of magnitude more simple, since you’ll only have to search one database to embrace the whole of human knowledge.

-Open APIs: users ought to be able to build on top of the Universal Library’s code if they want to.

-All the publications in the Universal Library must be machine readable and indexable: users will need powerful algorithms to search through the immense collection of the Universal Library. Perhaps some users or companies will want to improve the search tool and build their own, or even create an AI that could look through the sum of human knowledge to find information. They should be able to do so thanks to open APIs and the entire content of every publication being machine readable. To quote Vannevar Bush on the need for good search tools: “The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record.
The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.”

-No limitation on data mining, of course. I want ContentMine to be able to mine all the literature! Note that publishers are actively trying to restrict data mining. They can try, but the Universal Library won’t let them.

-Users should be able to download any publication they want.

-The Universal Library should be protected from Law Enforcement and rogue governments, corporations and individuals. Perhaps the Universal Library should be hosted on the Deep Web, and thus only be accessible via Tor? (If it is compatible with the openness of the project: can the Universal Library still be machine readable if it is hosted on the Deep Web?) Servers should be backed-up in different places, as described earlier, too, so that copies exist even if the website is taken down.
The World Brain described by H. G. Wells in the eponymous novel was supposed to be backed-up too: “In these days of destruction, violence and general insecurity, it is comforting to think that the brain of mankind, the race brain, can exist in numerous replicas throughout the world.”
Finally the Universal Library should probably be open source so that people can improve the code.

-Users should be protected, too. Encryption must be used so that it’s difficult to trace what users do and/or download on the Universal Library. Using the Deep Web would be a plus here too, since it would be very hard for Law Enforcement to trace users’ IP addresses.
No matter what technologies are used, the Universal Library must be super secure for its users, so that knowledge is accessible even in countries where the internet is censored.
To quote Timothy C. May in the Crypto-Anarchist Manifesto: “And just as a seemingly minor invention like barbed wire made possible the fencing off of vast ranches and farms, thus altering forever the concepts of land property rights in the frontier West, so too will the seemingly minor discovery out of an arcane branch of mathematics come to be the wire clippers which dismantle the barbed wire around intellectual property.”

These are the basic specs of a modern Universal Library.

What would its impact be? For society at large, according to technologists like Kevin Kelly, it would be quite huge: “If you have access to anything that’s been written, not just theoretical access, but like, instant access, next to your brain, that changes your idea of who you are” Kelly said in the documentary Google and the World Brain, adding “It’s all human knowledge… woven into a single entity that’s accessible by anybody, anywhere in the world, anytime. And that “all knowledge” is transformative, it just really kicks up the civilization and our society into another level.”
For science, it would mean a tremendous acceleration of research, especially for scientists in the developing world and citizen scientists, who usually have very little access to scientific literature. By making knowledge more accessible, new discoveries are made faster. It’s a correlation the nineteenth-century scientist Hermann von Helmholtz noticed, remarking the link between intellectual progress and the availability of “appliances” (catalogues, lexicons, etc) that made knowledge “immediately accessible”. Though catalogues were quite primitive, the ultimate “appliance” being the Universal Library, the improved accessibility made possible by such tools created noticeable progress. So we shouldn’t underestimate the impact a Universal Library could have on scientific research.
Furthermore, little known publications would be much more visible, since they would be in the same database as the famous and big ones. This is crucial so that scientists don’t waste their time doing things that have already been done, and can communicate their discoveries easily. An old example of a discovery being delayed because of a lack of visibility, but a telling one: Gregor Mendel’s work on heredity took more than thirty years to be noticed by other researchers who built upon it to ultimately create the whole new field of genetics, just because it was published in the proceedings of a local natural history society, and thus wasn’t visible to a lot of scientists…

When will we see a Universal Library? Who will create it?
We can’t trust publishers, which have their shareholders’ interests in mind, not their customers’.
We can’t trust governments, which are corrupt. Banks, the military-industrial complex, and even publishers have bought them. In Western countries, anti-science is rampant inside governments, which are very unlikely to start a project like the Universal Library that would benefit all humankind. Even if they did, it would be plagued by bureaucracy (healthcare.gov anyone?). Governments would try to control who can access knowledge, just like they’ve always done. If this is subtle in the West today, keep in mind that black people in the US at the beginning of the twentieth-century couldn’t access public libraries! In the developing world, sectarianism is ubiquitous and corruption is much worse than in the West. There isn’t one government on Earth capable of building the modern Universal Library.
Independent individuals, so called “pirates”, will create the Universal Library.

The technology is here. We now need bold and talented people to step in and create the Universal Library. People who aren’t afraid of corrupt laws and bureaucrats, who are willing to take risks. I know there are some out there.

The Virtual Scientist

The Virtual Scientist. This idea has been inspired by writers such as H.G. Wells (with the World Brain) and J.L. Borges (with The Library of Babel, for example), who envisioned a future where every published information is readily available at the tip of a finger. And it really changes what we are as a species. Suddenly many applications open up for us. One of them being the creation of a Virtual Scientist.
Now, what exactly is a Virtual Scientist ? A Virtual Scientist would be a digital entity – an AI – that would behave like a researcher and help scientists do their job faster. It would be a personal assistant that would scan all the scientific literature in an instant to find answers to your questions.
This essay will focus on three things basically: defining what a Virtual Scientist would be, how it would “behave” and what it would be able to accomplish; looking at current approaches toward building a Virtual Scientist; and finally, explaining how I’d create a Virtual Scientist myself.IBM Watson

So, how would it work? A Virtual Scientist would be capable of creating hypothesis with a set of data, to answer any question in natural language about anything (anything that is fact-checkable at least, don’t try to ask what is the meaning of life). Thanks to the access to a Universal Library that the Virtual Scientist would have, it would be able to understand the written informations contained in it. And, benefiting from its silicon power, it would mine facts to construct an answer at lightning speed. You can expect it to answer questions such as “what genes are likely responsible for breast cancer?” by looking at the medical papers about breast cancer and genetics. You can expect it to help you build a bibliography, to interpret sentences contained in written works to help you find the best references about a very precise subject you’re researching for example.  You would ask a question, and get your answer. It would an all knowing AI, a kind of Digital Aristotle that could help people learn, and research.
The Virtual Scientist would allow research to be done faster. It would help science. With an access to the Universal Library, you wouldn’t miss any paper relevant to your research.
You could also program the Virtual Scientist to create new hypothesis when you give it a gigantic data set. Computers are already capable of creating answers to mathematical problems that humans would be incapable to come up with. The Virtual Scientist could use its existing knowledge to create new hypothesis and help you process big data.

Daniel Dewey summed this up quite nicely: “Think about how long it took humans to arrive at the idea of natural selection. The ancient Greeks had everything they needed to figure it out. They had heritability, limited ressources, reproduction and death. But it took thousands of years for someone to put it together. If you had a machine that was designed specifically to make inferences about the world, instead of a machine like the human brain, you could make discoveries like that much faster”

Today two types of approaches are working toward this super AI, the biological and the computerized one.

The Computerized approach involves initiatives such as the Paul Allen Institute’s Project Aristo, IBM’s Watson, and Google’s “Star Trek Computer”.
Paul Allen plans to create a Digital Aristotle with his Project Aristo (aka: Project Halo): “What if you could collect all the world’s information in a single computer mind, one capable of intelligent thought, and be able to communicate in simple human language?” said an article by The Verge earlier this year about the project. For now, the project focuses on training an AI to succeed at a high school biology test by giving it the course’s material and asking it questions about it. They estimate it will take them five years to have an AI that can pass the grade in biology. It’s a pretty long time, but that would be a breakthrough. They would have coded a computer that can effectively learn from a textbook, that can “understand”, so to speak, a biology course. The future for such a program could well be a Virtual Scientist. It would “just” have to learn more courses, to “read” more books. And when Paul Allen is talking about a Digital Aristotle, he means a digital erudite that could teach, research, and would be all knowing, like Aristotle was in his time. No human being can now be all knowing, we just have too many knowledge for one (wo)man to absorb, but at the time, Aristotle probably mastered most of what was known to humankind. Paul Allen could be the one to create the first software to come close to this.project_halo2
IBM’s Watson on the other hand is a bit more primitive and much more business-oriented (IBM already uses Watson to crunch Big Data for Financial and Medical institutions). However, Watson won Jeopardy against top players. That wasn’t true intelligence, but that is not what they were looking for. Watson was able, most of the time, to interpret the meaning of the Jeopardy game’s questions, and it looked for answers in its database made of mostly web content (literally scanning millions of pages ranging from Wikipedia to the Constitution to find the most “probable answer”). Watson is all about probabilities. But as far as I know, it was not able to search through most of the literature, through say the millions of books Google has digitized with its Google Books project, or through academic papers from Elsevier for example. Watson winning Jeopardy was an amazing achievement, and it proved that an AI can find answers to very complex questions if it is designed to do so. Watson still has some issues, but I think if it had access to the Universal Library, Watson could be able to achieve much more (and I truly hope IBM is working with the major digital libraries around the world to make that happen).
IBM is pushing people and developers to actively use Watson and to find new uses for it. Here is the opportunity to let it become even smarter by giving it access to a Universal Library and to train it to understand scientific literature!

Watson at Jeopardy
Watson at Jeopardy

Google’s on a quest to build a Star Trek Computer. It wants its search engine to answer any question that humankind has found an answer for. With its enormous array of services, Google has what it takes to build the Star Trek Computer. Google Books for example. When you are looking for an answer, books are the first place where you can find answers to your questions, after the internet. And Google does business in these two fields. The Star Trek Computer isn’t very far: as Google fine tunes its algorithms, and as its computers dig deeper and deeper in the Google Books texts, we can expect answers from Google Search becoming more and more precise. Google now has what it calls a Knowledge Graph, which allows its search engine to “understand” concepts, like “a lion is an animal” and such. The Knowledge Graph allows Google Search to give more direct answers. For example when you type “CEO of Facebook” the first answer will be a white box with “Mark Zuckerberg” written in it along with a picture. Google also develops natural language processing for its search engine, so that it actually understands what you mean by “CEO” , “of”, and “Facebook”. Ray Kurzweil is the one developing this at Google.
Without the Knowledge Graph, Google would have simply be able to give you the best link where you could find your answer, for example an article from a newspaper or a Wikipedia article, but you wouldn’t have your answer directly.
Also, with Google Scholar, Google could mine information from peer reviewed papers and Google Search could be a good candidate to transform itself into a Virtual Scientist. I’m sure we can expect to see their Star Trek Computer taking off in a few years!

Computer_access_room

The biological approach on the other hand tries to understand how the brain works, and how to recreate it in code. The biological approach uses connectomes – wiring diagrams of the brain – to understand what intelligence is. Neuroscientists hope that connectomes will help us understand what makes us intelligent but also at what level other species (like chimps, whales, octopuses, etc) are intelligent too.
The Human Brain Project in Switzerland tries to create in less than ten years a simulation of the brain down to the molecular level, a goal it is very unlikely (to say the least) to achieve, even with funding from the European Union (a billion euros over ten years) and from private partners, and with the help of some top notch scientists. The main questions (if this controversial project were to succeed) are: how would a simulation of the human brain behave? Would it be conscious? Could it communicate? Could it learn new things? If yes, it could also be capable of reasoning like a scientist, and could benefit from the amazing capabilities of the human brain to interpret things, but also from the capabilities of silicon to crunch enormous data sets. If a simulated brain is realizable (and I believe it is), and if it behaves like a real brain, The Human Brain Project could also lead to the creation of a Virtual Scientist, but it would raise big philosophical concerns about what a Virtual Scientist is. Because of course, if a simulated brain can have critical thinking, do we have the right to turn it off? Is it a human being? Or a non-human person, at least?human-brain-project

Other projects are simply mapping the brain, and not only the human brain.
The Human Connectome Project, backed by the Obama administration, aims at mapping the human brain down to the individual synapses. It aims at becoming the most precise atlas of the human brain on Earth. But is intelligence explainable only by the way our neurons are connected? I hope so. I think what we need to understand intelligence is a lot of connectomes from a lot of different subjects. Like for genetics, we need a broad set of subjects to recognize patterns. Every connectome will be different, as every genome is different, but we will hopefully be able to distinguish common patterns for the way the human brain is wired, and that could explain why we are intelligent: because we’ll all have certain circuits in common. We could even find out which circuits make some people smarter than others.connectome
It would also be amazing to have connectomes for many other species on Earth (and not just for AI research, but because it would be a great way to understand the very complex machine the brain is). Chimps, whales (I’m really looking toward understanding cetaceans’ brains), but also animals like octopuses which are extremely intelligent and whose brain varies so greatly from the kind of ones we, mammals, have (Wired wrote a great piece on the quest to understand the intelligence of the cephalopods recently). Some projects, like connectomes.org, openconnectomeproject.org, already propose open data on the partial connectomes of the mouse and of the human. You can even find the complete connectome of C. elegans, a very simple round worm used frequently in biology (it is the only animal for which we have a full connectome, but it only has 302 neurons – the human brain is made of about 85 000 000 000 neurons). Understanding the brain of every species on this planet would be a tremendous achievement for modern science. An international collaboration would be needed, an entire network of neuroscientists, technicians, working toward building a global database of connectomes for every species on Earth. Such a database would be awesome to play with.

Whatever approach we use, trying to mimic the brain with code will be extremely tough. David Deutsch wrote a good article in Aeon Magazine about the challenges the AI field faces. He wrote: “What is needed is nothing less than a breakthrough in philosophy, a new epistemological theory that explains how brains create explanatory knowledge and hence defines, in principle, without ever running them as programs, which algorithms possess that functionality and which do not”. In other words, we don’t know how the brain “computes” new knowledge, and without knowing how it does that (metaphysically, not biologically), we can’t expect to simulate the brain in software.
The actual problem with computers and AIs is that they are not programmed to get the concept of absurdity. For example, a typical AI doesn’t get that saying “green” when the question is “How many varieties of kiwis exist?” is absurd because it is a color and it isn’t related to the topic of the question. It won’t seem absurd to the AI because the AI doesn’t “understand” what it is talking about. Absurdity is our ability to eliminate false answers by analyzing what we know. If I read a book on kiwis (the fruit) and if I understand it, I can answer a question about kiwis’ diversity without saying absurd things, because the knowledge I gained by reading the book is clearly understood by me. The concept of absurdity is proper to mankind, and comes from the understanding of some kind of notion.
If I had the chance to work on this AI, this Digital Aristotle, I would probably adopt the Google way, using machine learning to teach the computer the meaning of words and sentences. For me, the Virtual Scientist is impossible if the computer running the program can’t access a Universal Library. You can’t pretend to have an all knowing AI if it can’t look through every books ever written by mankind to find some information. That’s why Google Books would probably be the backbone of what I would build (given that Google lets individual developers access the APIs to dig through Google Books’ database).
I would also train the AI starting with simple text studies like we do in elementary school. With very simple texts and questions asking the AI to sort informations from the text and to deduce its general meaning. Then I would try with more complicated texts, until it reaches a state where it can deduce the meaning of basically any kind of text. Of course Google has a totally different approach since they have much more data and business constraints to deal with, but their technique seems to work well so far, as Google Search is getting better and better. I think they will be the first to build something really significant.

We probably have all the computing power we need to make the Virtual Scientist a reality. I truly hope Daniel Dewey and I will be around to see it in action.

Do you have enough money to buy an iMac? If yes, you could build a universal library of science instead

The 27 inches new iMac is a killer, but for the same price...
The 27 inches new iMac is a killer, but for the same price…

Jacques Mattheij, the deutsch programmer who wrote the code for The Paper Bay as a tribute to the late Aaron Swartz, estimated in a blog post that one would need about 50-75Tb of storage to archive all the papers produced historically. That’s a very small amount of storage. Most laptops now come with 1Tb of storage and are about as thick as a small book. So for the «thickness of 75 books», which constitutes a rather small bedroom-library, you can get enough storage to host every scientific publications of all time. Let me repeat that: you can get enough storage to host every scientific publications of all time. I believe it’s fucking awesome.

The benefits of having a Universal Library of Science are obvious: fast scientific research, accessible knowledge, centralization of every ressources needed on every subjects, etc.

...38 HD like this one will give you a powerful knowledge transmitting machine, a bit like...
…38 HD like this one will give you a powerful knowledge transmitting machine, a bit like…

Just imagine what could the geniuses of our time do with such a powerful tool right in the palm of their hands. The teenager that discovered a new, super efficient test for pancreatic cancer, did all his research online, reading only Open Access papers, and probably missing many great but expensive articles, thus slowing down his work (you can watch his interview). Imagine also how easy it might be to fact check all the bullshit politicians, preachers, can say about global warming, history, etc.

If we take the high estimate of 75Tb, it’s affordable for many people to buy such a storage capacity nowadays. Take the Seagate Expansion Desktop External Hard Drive with a capacity of 2Tb. This thing costs a hundred dollars on Amazon , and you would need about 38 hard drives like this one to have the sufficient storage capacity, rougly 3800$, or the cost of a high end 27’ iMac. And, quite a lot of people seem to be able to afford a computer of that price range, since Apple is selling loads of it.

Once you’ve got your 38 hard drives, you’ll be able to store every paper ever written and thus have the capability of hosting the complete Universal Library of Science. Isn’t that cool? Just imagine that a few years ago it would have cost millions of dollars and only companies like Google could afford this! Even before that, when there were no computers, no internet around, having a «universal library» was even more difficult and expensive, since books were published by poorly know editors, in small quantities, and often lost or destroyed before reaching a great audience. Many attemps to build a Universal Library of any kind (the most famous being the Library Of Alexandria) failed, because of the high costs it implied at that time, and also because of the silliness of religious people (one of the main theory explaining the destruction of the Library of Alexandria being a conflict between Christians and Pagans around 250-300).

...The monolith in 2001, A Space Odyssey which in a way is transmitting the knowledge of an advanced civilization, helping the progress of mankind
…The monolith in 2001, A Space Odyssey which in a way is transmitting the knowledge of an advanced civilization, helping the progress of mankind
And the resemblance with the Seagate is really uncanny, isn’t it?
And the resemblance with the Seagate is really uncanny, isn’t it?

Now with the internet and cheap, enormous storage, we have what it takes to build a Universal Library of Science. Darwin, Einstein, (insert the name of a genius here), would have loved this kind of service, and maybe Darwing would have spent less time gathering facts to coin his theory of Evolution if he had access to powerful semantic search in texts, exactly the kind of thing a digitized text allows (Darwin spent more than 25 years gathering facts in books, but also by direct contact with other naturalists, farmers, etc.). Plus, mining the data inside the Universal Library could be very interesting in terms of data visualization (see initiatives like Paperscape).

The next thing you have to do is a little bit trickier and highly illegal [And to be clear, I’m not encouraging you to do it]. You have to find a way, write a script, hire a hacker, whatever, and download every papers and every books from Elsevier, Springer, Nature, and all the other big publishers (or you can legally buy all the publications, if you’re a billionaire). Find the list of all the publishers, then download massively everything they ever published. Aaron Swartz had written a script to download millions of papers from JSTOR, so this kind of webapp is needed if you want to succeed. Donwload everything that you can find. Oh, and try not to be detected. Remember how MIT, JSTOR, and the government acted like complete douchebags when Swartz did what he thought was civil desobedience? You don’t want that to happen to you. So, find a way to do all this undercover. Perhaps you want to teach yourself how to be completely anonymous and how to leave no trace behind you online (you really don’t want General Alexander or one of his fascist henchman to knock at your door).

That should make you puke
That could make you less shameful

And if you need additional motivation to start downloading, read the Guerilla Open Access Manifesto, a great stuff Swartz wrote in 2008, a few years before putting it into practice at MIT. Or, if you feal shameful about «stealing» all these papers, you can just look at the profit margin of Elsevier, and the recent controversy over publishers like this one and their business model and attitude toward openness.

To sum up all this: Buy hard drives. Download. Enjoy the Universal stuff.

There. You have the Universal Library of Science. For the price of an iMac. But remember, it’s an illegal library. Who cares? A Universal Library of Science benefits all mankind. Well, only you for now. But put the content of these hard drives online, and let everyone enjoy the science.

How can a library be illegal anyway?BDDefinition-2001-4-1080

Images credits: Apple Inc., Seagate Inc., MGM Pictures/ Warner Bros. (2001 A Space Odyssey by Stanley Kubrick), http://blogs.law.harvard.edu/pamphlet/files/2013/01/fig-margins.png, blu-raydefinition.com

Let’s build a monolith

We are one step away from the Star Trek Computer, the 2001 Monolith, whatever we call it

When Ahmed Saloum Boularaf opened his library to a reporter of The Christian Science Monitor, he wanted to share his fear with the entire world. Holding a rare, nearly 800 years old manuscript, he complained that if nothing is done to copy and preserve his collection, more than 1700 other manuscripts could be lost forever. Worse, Mister Boularaf lives in Tombouctou, Mali, where islamists related to Al Qaeda burnt libraries before leaving the city, pushed up north by french soldiers last february. Fortunately, the manuscripts in Timbuktu are alive and well, thanks to the bravoure and courage of the librarians there (see The New Republic’s article on this)

2013-manuscrits-tombouctou-t160k
Copyright T160K

The war in Mali highlighted a not so rare phenomenon. Everyday on Earth, we probably lost forever books, scrolls, manuscripts, that were never duplicated, never digitized and that we won’t be able to recover. And it doesn’t happen only in Africa -Stanford’s library has already been damaged by floods twice, the recent JFK Library fire at Harvard could have been desastrous, and Canada’s Harper Government is destroying libraries around the country. 20 years ago, we could have said «what can we do about that? It is too expensive to digitize books». But this just isn’t true anymore. We now have the technology, thanks to companies like Google, and it is intolerable to lose whole sections of our knowledge when we have such cheap, reliable and performant processes available.

Ancientlibraryalex
We don’t want the disaster of the Library of Alexandria to happen again

Sergey Brin described his vision of a «Library to last forever» in an op-ed contribution in The New York Times in 2009,  while Google was fighting the Author’s Guild in court over alleged abusive practices with its Google Books project.

We can’t let one company alone control our knowledge, but there are laws protecting us from this. And the lawsuits against Google Books were all about editor’s profits, not at all about this political question. We should let Google digitize every book on this planet. Humanity would tremendously benefit from being given access to such a universal library. Google should be allowed to digitize books, manuscripts (like the Dead Sea Scrolls), and every piece of written knowledge we have, like Mister Boularaf’s library.

Of course, Google will have to put every book that is in the public domain in its service for free, and it will have to find some compromises over orphan books. And for works under copyright, Google Books is a great bookstore already.

Because knowledge is power, what if Google decides to take the whole world hostage of its service? I think this won’t happen, there are laws and vigilant individuals and organizations protecting us from that, fighting for the right to access works in the public domain without any fee. Plus, the books/documents Google digitized are still available in their material form. And I think that we are today being taken hostages by libraries and editors, who are slowing the pace of digitization and of «remote access to culture». By keeping most of the world’s knowledge away from the general public, libraries and editors are already taking us hostage. And they see Google as a threat to their monopoly over knowledge.

To compete with Google Books, the US launched the Digital Public Library of America (DPLA), and the European Union launched Europeana, which are both digital librairies. The problem is, they are digitizing books really, really slowly. And when you are looking for a book, you must look at at least two different websites, instead of just typing your querie in Google…

A remarquable BBC documentary called Google and the World Brain recently discussed all these concerns. Yet I think the benefits we’ll have with a Google Universal Library are greater than the risks of monopoly.

In order to make a universal library possible, we have to let Google make its search engine even more intelligent, by extracting knowledge from the digitized books. Every Google user – that is, more than a seventh of the world’s population every month, who can claim that except Google?- will be able to access hundreds of years of knowledge via a simple Google Search.

Jonathon Keats suggested in Wired this year that Google should receive a Literature Nobel Prize for the Google Books project. I believe he’s absolutely right. His words speak for themselves:
“Copyright law was created to balance the interests of authors and the public. Giving Google the Nobel Prize would make a powerful statement in favor of fair use.
According to Nobel’s will, the accolade is to be awarded to “the person who shall have produced in the field of literature the most outstanding work in an ideal direction.” Google Books provides greater literary benefit to more people than any single title or oeuvre. Whatever your taste in reading, you’re a beneficiary, as Google’s digitization protects books from hurricanes and fires and censorship by repressive regimes. There could be as many as 20 million volumes in the public domain, and already more than 2 million of them—from Utopia to Rights of Man—can be downloaded free (with help from proxy servers, if necessary) in Iran, Syria, and China. (In that respect, Google stands in marked contrast to last year’s Nobel laureate, the Chinese novelist Mo Yan, a Communist Party favorite who recently compared state censorship to airport security checks. That doesn’t exactly move literature in an ideal direction.)”

On November 13th 2013, Google won its lawsuit against the Author’s Guild. Judge Chin said: “Google Books provides significant public benefits. It advances the progress of the arts and sciences, while maintaining respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders.” And he’s right. Google Books makes books more accessible. Their content is searchable, meaning that you can now search not only for book titles but for sentences contained inside the actual text with Google Search! With the end of this issue, Google probably has much more power to build a Universal Library. And a Universal Library would be really useful for Google to build a super smart Star Trek-like AI, a kind of Digital Aristotle.

europeana1
The competitors trying to build the Universal Library

books_logo_lg tumblr_mlgl3iKdPt1qcoev6o1_500

A Slate article  recently confirmed that Google’s ultimate goal is to create the Star Trek Computer. Nor Europeana nor the DPLA will ever be able to create such a service.

Now, in order to build a Star Trek Computer fueled by a Universal Library, Google will need to digitize a very special kind of litterature. That is, scientific litterature. Yet this particular field will represent some big challenges for Google.

Entering the academic publishing world is strange. You discover a weird field, where two leading editors control nearly the whole process of peer reviewed publishing. These two giants are the dutch Elsevier and the german Springer. Last year, the mathematician Timothy Gowers launched a boycott of Elsevier’s journals, followed by many researchers. He claimed that Elsevier was making enormous profits on unpaid authors and reviewers, and that its practice of «bundling» journals together was a terrible threat to the shrinking budgets of libraries. Since Elsevier owns most of the top peer reviewed journals, where could researchers publish their work?

It turns out that since 2008, a movement called Open Access has become more and more important. The concept is simple: the author has to pay a fee to the editor to publish his work, but the paper is then freely distributed online. Open Access is the best thing that has happened to the scientific publishing industry since Gutenberg. You can read the Open Access Manifesto wrote by the late Aaron Swartz, which is one of the most correctly written statement about freeing knowledge from ivory towers that I can think of.

Now as the scientific knowledge is contained in all these journals, some of them in Open Access, but the vast majority in standard «premium» access, Google will have to integrate all of that in order to build the Star Trek Computer.

Elsevier, Springer, Nature, all of them have online archives. But again, we need a single place, so that it will be easier to find say, all the papers published on Curiosity’s findings on Mars. Google can be this single place. It can distribute Open Access works for free, and non Open Access papers for the price the editor sets, while making all these papers more visible, and more importantly, searchable.

Once all these papers are searchable, we can easily imagine a researcher simply typing a Google querie and seing the articles he seeks ranked in a pertinent order, thanks to Google’s algorithms. As holders of the accumulation of hundreds of years of scientific works, it is the responsibility of editors to share their works so that we can build a universal library without violating their rights, even if Open Access is the way to go.

Papers in Open Access will be much more downloaded and cited than others, proving to editors that this publishing model can boost the Impact Factor of their journals.

This is a call to editors and to Google. We have the infrastructures, we have the technology, we lack a common, unifying goal of making available the wealth of our knowledge to everybody in this world. We lack the will to create such a service that would change the way we conduct science. Just imagine our most brilliant minds being able to look at everything that’s been written on everything. Whether these brilliant minds are inside the Harvard Library, in rural China or in Mali doesn’t matter: they would just need an internet connection. We can hope that HIV could be cured faster thanks to this shared wealth of knowledge for example.

It is very idealistic of course, since many people still can’t get access to the Internet, or since one billion people can’t eat enough food everyday. But this is where we need to take science, where we need to take the humanities. Where we need to take our knowledge.

Now that we saw that the biggest ivory towers keeping knowledge away from the general public are, ironically, libraries and publishers, and that a company such as Google could help them achieve their true goal of making knowledge more accessible, we have to realize that the biggest ivory tower of all perhaps is Earth.

Earth_Western_Hemisphere
Our fragile little planet ties the future of our Knowledge to its existence

What would happen to all of our knowledge if Earth was destroyed tomorrow by an asteroid? The last asteroid that came close to Earth, DA14, was just 17,200 miles away from a global disaster. And a meteor recently crashed in Siberia, damaging many buildings and causing injuries to more than a thousand people. What would remain of humanity if we were unable to detect an asteroid? Maybe a few buildings, a few people? In fact, most relics of our civilization would be in space: a few probes on the Moon (including Apollo modules), on Mars, and our ambassadors outside the Solar System: Voyager 1 & 2 (Matthew Battles wrote a story on these two probes in Aeon Magazine)

What would remain of our knowledge? Perhaps the Golden Record on the Voyager probes, carrying a few hours of sounds and a few pictures of the Earth. That is, nothing compared to the vast amount of knowledge we accumulated over centuries.

Voyager
Voyager 1, one of the two ambassadors of our Knowledge to the stars

Our most resistant ivory tower, keeping us from preserving our knowledge for the millions of years to come, is actually Spaceship Earth. If we lose it, and this could happen at nearly any time, the tireless work of millions of scientists to understand our world, to give new knowledge to future generations, helping them make better decisions, would have been totally useless. Even if humanity disappears suddenly, it is interesting to preserve our knowledge, in the case another civilization encounters the remains of our existence. This was the goal Carl Sagan had with the Voyager Golden Record, but he didn’t have the technology we have nowadays.

Escaping this last ivory tower means becoming a «multi-planet» species, which thanks to entrepreneurs like Elon Musk and his company Space X could happen quite soon. We need to have literally all our knowledge stored (on Mars let’s say); that is, having a copy of the Star Trek Computer and of the Google Universal Library on another planet than Earth.

Having our knowledge on Earth AND on Mars would significantly increase our chances of being able to carry on scientific research and transmit our knowledge even if a global cataclysm was to hit the Earth.

humansonmars
The best way to preserve our Knowledge: creating copies of it, storing them in different places. Just like the hard drive of your computer

The monoliths in 2001: A Space Odyssey represent the kind of thing we should build. The civilization that built these monoliths could even have disappeared, their knowledge is preserved forever (monoliths can last for millions of years). Building a monolith in my opinion is the act of storing all our knowledge in a sustainable manner in a single place. Today this kind of place would be a self sustaining data center for example. We lack some technology, because data centers can’t run without humans as of today, but we might be able to build automated data centers in the future, so that we could create our own monoliths one day; thus making our knowledge virtually immortal.

In order to succeed, we need first to build the Star Trek Computer, the monolith being the version of the Star Trek Computer to last forever. We need to get every book digitized and stored in a secured place. Google has already shown its ability to do that.

Once we’ll have done that on Earth, maybe we’ll be able to spread to Mars. If in ten years, the first martian colony is established; if in ten years, we complete the digital inventory of our knowledge and build the Star Trek Computer, then we’ll be ready. We’ll be ready to jump into the 21st century, ready to make our knowledge sustainable – you know, just in case.

2001ASpaceOdyssey_0130a
The monolith is the ultimate artifact of our Knowledge. The complete inventory of what we know as a species
Copyright MGM Pictures/ Warner Bros. (2001 A Space Odyssey by Stanley Kubrick)