It’s great to see lots of research going on in the database software space. The new era of huge volumes of structured and unstructured data flying through the web brings new sets of problems of scalability, performance, and security – besides search and query capabilities.
Google folks wrote a paper few years back on MapReduce for dealing with efficient search on large cluster of data. Java Frameworks for data intensive distributed applications like Hadoop have included MapReduce as another programmatic way to deal with large clusters. GreenPlum, a valley start-up has blended both SQL in its PostgresSQL base as well as MapReduce for its shared-nothing massively parallel architecture for terabyte scale database handling. Netezza, another east coast vendor in the warehouse appliance space also uses MPP architecture as an alternative to the expensive Teradata solution.
Here is an interesting website called The Databse Column hosted by Vertica Systems. Seven well-known experts are writing blogs. I see Jerry Held (a former colleague at Oracle) and Don Haderle (former colleague at IBM) are part of the seven. There is Michael Stonebraker, the well known researcher and professor (founder of Ingres, and now Vertica) and also David Dewitt, another known researcher from University of Wisconsin. There is plenty of discussion on columnar search, cloud computing and BI.
There is a movement towards “more focused” solutions for database handling, compared to the 25-year old solution of Relational DBMS. Google is pushing the Big Table and in-memory database to minimize the latency and improve scale. Now cloud computing is all about handling large volumes of data. It’s back to the old days of centralized computing, now the scale is much higher than before. We do see plenty of academic research happening and that is a healthy sign. There are plenty of research opportunities in the challenging world of massive scale with multi-core processors, varieties of data types, and providing extreme reliability and fault-tolerance.
The new world of Data as a Service (DaaS) is coming.