Sunday, November 29, 2009

the dark side of the internet

Guardian | "The darkweb"; "the deep web"; beneath "the surface web" – the metaphors alone make the internet feel suddenly more unfathomable and mysterious. Other terms circulate among those in the know: "darknet", "invisible web", "dark address space", "murky address space", "dirty address space". Not all these phrases mean the same thing. While a "darknet" is an online network such as Freenet that is concealed from non-users, with all the potential for transgressive behaviour that implies, much of "the deep web", spooky as it sounds, consists of unremarkable consumer and research data that is beyond the reach of search engines. "Dark address space" often refers to internet addresses that, for purely technical reasons, have simply stopped working.

And yet, in a sense, they are all part of the same picture: beyond the confines of most people's online lives, there is a vast other internet out there, used by millions but largely ignored by the media and properly understood by only a few computer scientists. How was it created? What exactly happens in it? And does it represent the future of life online or the past?

Michael K Bergman, an American academic and entrepreneur, is one of the foremost authorities on this other internet. In the late 90s he undertook research to try to gauge its scale. "I remember saying to my staff, 'It's probably two or three times bigger than the regular web,"' he remembers. "But the vastness of the deep web . . . completely took my breath away. We kept turning over rocks and discovering things."

In 2001 he published a paper on the deep web that is still regularly cited today. "The deep web is currently 400 to 550 times larger than the commonly defined world wide web," he wrote. "The deep web is the fastest growing category of new information on the internet … The value of deep web content is immeasurable … internet searches are searching only 0.03% … of the [total web] pages available."

In the eight years since, use of the internet has been utterly transformed in many ways, but improvements in search technology by Google, Kosmix and others have only begun to plumb the deep web. "A hidden web [search] engine that's going to have everything – that's not quite practical," says Professor Juliana Freire of the University of Utah, who is leading a deep web search project called Deep Peep. "It's not actually feasible to index the whole deep web. There's just too much data."