We all remember the late Jim Gray, the great computer scientist and Turing award winner. During the last several years of his research work at Microsoft, he focused on data-intensive computing and called it the Fourth Paradigm in scientific discovery. In a special book dedicated to the memory of Jim, Bill Gates commented, “The impact of Jim Gray’s thinking is continuing to get people to think in a new way about how data and software are redefining what it means to do science.”
So what is the Fourth Paradigm? Here is the explanation.
1. Thousand years ago – Experimental Science
– Description of natural phenomena
2. Last few hundred years – Theoretical Science
– Newton’s Laws, Maxwell’s Equations…
3. Last few decades – Computational Science
– Simulation of complex phenomena
4. Today – Data-Intensive Science (unify theory, experiment, & simulation)
Scientists are overwhelmed with data sets from many different sources such as data captured by instruments, data generated by simulations, and data generated by sensor networks.
Jim Gray named it “eScience’ where IT (Information Technology) meets Science. It is the set of tools and technologies to support data federation and collaboration for analysis, data mining, data visualization and exploration, and for scholarly communication and dissemination. He laid out the principles, fondly called Gray’s law of data engineering:
- Scientific computing is revolving around data
- Need scale-out solution for analysis
- Take the analysis to the data!
- Start with “20 queries”
- Go from “working to working”
Interestingly, all these apply to the commercial world of Big Data. Only the scientific world has been grappling with these problems longer. Given the proliferation of devices and incoming data in petabytes, the need for tools to do analytics is of the highest priority. No wonder, 2012’s biggest buzzword is Big Data.
We miss you Jim and your pioneering thoughts on DISC (Data Intensive Scalable Computing)!