ByteNoise

Hypertext

Considering that most nonfiction text contains references to related documents, hypertext is a logical progression for these references. It is arguably the job of computers to take care of repetitive or tedious work, such as actually retrieving the document being referenced.

Encyclopedias benefit greatly from hypertext, as it allows the reader to explore different avenues of thought, providing elaborations on exactly what she or he is interested in. This ability for non-linear retrieval of information makes learning something much easier as the individual has the freedom to explore related topics at her own pace.

Given how our brains group things together by seemingly obscure relevances — something impossible to achieve by sorting information in any standard kind of index — it seems logical that we would one day use technology to help us to link documents together by such erratic connections. The journey that has so far led to our current system of global hypertext isn't as sudden as you might think, however.

1934: Paul Otlet's database of linked documents

In 1934, Paul Otlet had a vision: a machine that would let people search, read and write documents stored in a mechanical database. They would be able to access this database remotely, via a telephone line, and even connect documents together. He called such connections links. He called the project in its entirety a web of human knowledge.

Perhaps the most useful practical achievement of Paul Otlet was his improvement of the existing classification systems, such as the Dewey Decimal System. His own system, Universal Decimal Classification, was the first full implementation of a faceted classification system.

Sadly, his operation was shut down, and the remains of his work were destroyed by Nazi troops.

1945: Vannevar Bush's mechanical home encyclopedia

In the July 1945 issue of The Atlantic Monthly, Vannevar Bush wrote an article speculating about possible future technologies. While it dealt with a broad range of ideas, most of them were in some way related to the storage and retrieval of information. Arguably the most interesting of the ideas related to finding useful information, at least in hindsight, was the concept of information being linked together by association.

The human mind... operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. Selection by association, rather than by indexing, may yet be mechanized.

In terms of proposing a useful way of linking non-fiction together, especially encyclopedias, the idea of links that worked the same way that human brains group ideas together had been formally proposed.

Something that Vannevar Bush didn't predict was that computers would be able to store text digitally, translating characters into numbers, such as with ASCII or Unicode. This enables them to store, manipulate, update and retrieve text in ways dramatically more efficient than analogue technology such as microfilm could achieve, although he did talk about speech recognition and speech synthesis in order for the information to be stored at a level closer to plain text. Eventually, technology caught up with these ideas, then allowed them to be not only realized but also improved upon.

1960: Ted Nelson's ambitious Xanadu® Docuverse

With the advent of digital computers which stored written words in such a way that they could be easily manipulated, real systems started to appear in place of the mechanical lever and microfilm filled dreams. One of the most ambitious of these projects, if not the most ambitious, is Xanadu®, currently forty-five years in the making.

The first thing Xanadu® supports is parallel documents: one document based on, or otherwise related to, another one. Its ability to display two related documents next to each other, with similarities and differences highlighted, is useful for keeping track of revisions in different drafts, or for comparing both sides of a debate. You can also pull data from one document to another, then build upon it, letting the computer automate the tedious tasks such as working out royalty payments for the various authors cited.

Xanadu® allows transclusion, which is the existence of the same information in more than one place. Two completely different documents can share a few paragraphs, for instance, and when those paragraphs are updated on one of the documents, the new version is automatically seen on the other. Ted Nelson sees transclusion as "what quotation, copying and cross-referencing merely attempt."

Links are also available in Xanadu®, although in 1965 Ted Nelson coined a new word for them: hyperlinks. They are bidirectional, and cannot be broken. The way they work is that a block of text in one document is linked to a block of text in another document. The link has an identity as far as both text blocks are concerned (such as "my comment on someone's idea" at one side, and "someone has commented on my idea" at the other). No matter how much either end is updated, the link remains between any individual characters that were present in the original versions of the texts, and so it cannot become obsolete.

Perhaps the most ambitious part of the Xanadu® system is transcopyright: every author has the right to demand a very small amount of money every time someone reads a piece of her work, whether the reader is trying to access it directly or it is transcluded in someone else's document.

Despite all of these innovative ideas, Xanadu® has so far failed to become popular. It is proprietary and centralized, and few people have taken the time to try it out. In the end, the hypertext protocol that changed the world was the one that was, in many ways, the least ambitious.

1989: Tim Berners-Lee's open, decentralized web

Tim Berners-Lee started off developing a hypertext system called Enquire, which was much like the other systems before it: centralized. It had one place where everything was stored. It also used bidirectional hyperlinks, just like Xanadu®. The new feature it added was external links that could connect different files together. These only went in one direction, however, to avoid cluttering a page with thousands of links, not to mention all the problems associated with storing redundant data — stating the same thing in more than one place.

Like the other hypertext systems before it, Enquire never took off. It did, however, give Tim Berners-Lee a starting point for a more adventurous idea.

The system had to have one fundamental property: it had to be completely decentralized. That would be the only way a new person somewhere could start to use it without asking for access from anyone else.

One of the main advantages of the web is that it is built on top of an existing technology: the Internet. Although the Internet had been growing since its inception in 1969, there was very little information permanently stored on it at the time, and certainly no standardized way of accessing that information. The bulk of data passing through it were in the form of e-mails, newsgroup posts and other transient messages. It offered the ideal place for a new protocol to reside, however, as anyone could put a computer on the Internet and get it to start talking to any other computer on the network. It fitted in with the decentralized philosophy perfectly. Anyone was free to join, without having to ask anyone else for permission.

In much the same way as Paul Otlet invented Universal Decimal Classification, Tim Berners-Lee invented Uniform Resource Locators, or URLs for short. These allow anything on the Internet, from a newsgroup message to a file on an FTP server, to be linked to from a hypertext document. In practice, this means that people can treat almost anything already on the Internet as if it is on the web, making it easy to include the wealth of information already available without having to move it or rewrite it.

In many ways, the web isn't as advanced as other hypertext systems, such as Xanadu®. It doesn't support transcluding or bidirectional links, and as anyone who has used it can attest to, it is full of broken links. For all these setbacks, however, it remains the most popular hypertext system so far because of its advantages: it uses an open format, that anyone can use freely; and it is decentralized, so that anyone can add to it. These factors ensure that, while it isn't the most elegant solution, it is the most accessible and widely adopted. No one person owns it, and everybody is free to add to it in any way they like.

In the end, it seems that most people agree that these freedoms outweigh the web's minor technical shortcomings.

References