Sunday, June 14, 2009

the data deluge

3Quarksdaily | Anyone reading this article cannot fail but be aware of the changing interface between eye and text that has taken place over the past two decades or so. New Media – everything from the internet database to the Blackberry – has fundamentally changed the way we connect with each other, but it has also altered the way we connect with information itself. The linear, diachronic substance of the page and the book have given way to a dynamic textuality blurring the divide between authorship and readership, expert testament and the simple accumulation of experience.

The main difference between traditional text-based systems and newer, data-driven ones is quite simple: it is the interface. Eyes and fingers manipulate the book, turning over pages in a linear sequence in order to access the information stored in its printed figures. For New Media, for the digital archive and the computer storage network, the same information is stored sequentially in databases which are themselves hidden to the eye. To access them one must commit a search or otherwise run an algorithm that mediates the stored data for us. The most important distinction should be made at the level of the interface, because, although the database as a form has changed little over the past 50 years of computing, the Human Control Interfaces (HCI) we access and manipulate that data through are always passing from one iteration to another. Stone circles interfacing the seasons stayed the same, perhaps being used in similar rituals over the course of a thousand years of human cultural accumulation. Books, interfacing text, language and thought, stay the same in themselves from one print edition to the next, but as a format, books have changed very little in the few hundred years since the printing press. The computer HCI is most different from the book in that change is integral to it structure. To touch a database through a computer terminal, through a Blackberry or iPhone, is to play with data at incredible speed:
Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition...

Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to — well, at petabytes we ran out of organizational analogies.

At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics...

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Wired Magazine, The End of Theory, June 2008
And as the amount of data has expanded exponentially, so have the interfaces we use to access that data and the models we build to understand that data. On the day that Senator John McCain announced his Vice Presidential Candidate the best place to go for an accurate profile of Sarah Palin was not the traditional media: it was Wikipedia. In an age of instant, global news, no newspaper could keep up with the knowledge of the cloud. The Wikipedia interface allowed knowledge about Sarah Palin from all levels of society to be filtered quickly and efficiently in real-time. Wikipedia acted as if it was encyclopaedia, as newspaper as discussion group and expert all at the same time and it did so completely democratically and at the absence of a traditional management pyramid. The interface itself became the thinking mechanism of the day, as if the notes every reader scribbled in the margins had been instantly cross-checked and added to the content.

In only a handful of years the human has gone from merely dipping into the database to becoming an active component in a human-cloud of data. The interface has begun to reflect back upon us, turning each of us into a node in a vast database bigger than any previous material object. Gone are the days when clusters of galaxies had to a catalogued by an expert and entered into a linear taxonomy. Now, the same job is done by the crowd and the interface, allowing a million galaxies to be catalogued by amateurs in the same time it would have taken a team of experts to classify a tiny percentage of the same amount.