Thursday, June 17, 2010

open biology's quest to explode data

The Scientist | A “science commons” at the data-intensive layer will encourage scholarly collaboration and communication—and spur drug discovery.

Robert Metcalfe, co-inventor of Ethernet and the founder of 3Com, observed that the value of a telecommunications network is proportional to the square of the number of connected users of the system. This is known as Metcalfe’s Law, and it goes a long way toward explaining why we can create and realize so much value from the Web. As more users get online, the network gets more valuable, spurring more users to get online, and so on.

Getting Metcalfe’s Law to operate for data is a long-held goal of science. Indeed, the Web was created to share data—physics data—by making it easier to link, find, download, and browse information on disparate computers. But we don’t have the functionality of the consumer Web for biological data. And we’re not getting network effects for data in the spaces that would most dramatically affect our lives—in the study of human disease.

There are a lot of reasons for this. Studying human disease is complex and almost incomprehensibly expensive. And recent studies from inside the pharmaceutical industry itself draw on 60 years of data to show us that drug discovery is essentially a random process. It’s hard to force network effects onto this world.

That’s because it’s difficult to start an “open” biology process from scratch. The cost of entry is still in the tens of millions of dollars to develop a meaningful corpus of data sets one can legally share and analytic tools one can legally place under open source licenses. Even then you’d have to find incentives to get scientists to share their new data, their models of disease, their software tools—when they’re not rewarded for doing so. It is a tall hill to climb.