According to a Wall Street Journal article today by Rachael King and Steven Rosenbush, the market for new databases serving Big Data reached $1.22B last year and is expected to more than double by 2014 (according to research firm Wikibon). That is quite impressive.
Since relational databases using SQl are inefficient in handling data from social chatters, smartphones, and clicks (because of volume and variety), new databases are popping up over last 3-4 years. In the past two years 119 database software companies have been funded by VC’s for $1.17B (according to Venture Source, a Dow Jones company). This is remarkable, as not too long ago, the space was declared taken by 3 incumbents – IBM, Oracle, and Microsoft. However, the scene has changed dramatically now.
Thanks must go to Google for pioneering the start of new innovations in Big Table, GFS (Googel File System), and Map-Reduce algorithms for massively parallel processing using commodity hardware clusters. These technologies became part of Apache open source foundation and the result is Hadoop, HDFS, and several associated tools for the new ecosystem. Amazon, Yahoo and Facebook have also contributed good work here.
The article mentions a client Autozone using one of the new DBMS’s called NuoDB for better managing store inventory according to local shoppers. NuoDb like many others offers a cloud service with an annual subscription, cutting Capex for customers.
Another client Trulia (online real estate) was using MySQL, but has added Cassandra to better manage the listing of home foreclosures and apartment listings of its 100 million homes in the US.
Shutterstcok, a photo agency, stores 24 million images with 10,000 added each day. It uses HDFS (Hadoop) to find out user behavior (how long they hover over an image before purchasing).
The article suggests that large financial clients will stick to existing vendors such as Oracle for various reasons, but the threat of these newcomers is there. This is much like the cloud software is shaking up Microsoft’s desktop software model.
We are in the data-intensive computing era now and the race will be fierce for leadership and market share.