Data Management, circa 2011

The world of Data Management has never been this vibrant as now. Only five years back, if you were to start a new database product company, the VC’s would have thought you to be real crazy. Why start something in an established market with 3 leaders – Oracle, IBM (DB2), and Microsoft (SQL Server)? Then we started to notice “specialized” appliance products such as Netezza (now IBM) and Greenplum (now EMC) crop up,  to focus on large scale data analytics. This trend was soon followed by Oracle (Exadata) and now HP (Vertica).

But what I am talking about is a list of new companies backed by well-known VC’s addressing the Data Management problems of the Internet era. We can roughly divide the data world into two – operational data management and analytic data management

Within the operational data camp, there are three categories:

  1. Traditional RDBMS (read Oracle, DB2, SQL Server, Sybase, Ingres, MySQL,etc.) and NewSQL products addressing mostly MySQL scalability and performance issues (e.g. Clustrix, Drizzle, VoltDB, NimbleDB, MySQL Cluster,..). I advise two companies in this category, ScaleDB and ScalArc.
  2. Traditional non-relational DBMS (Objectivity, Progress, Versant, etc.) and NoSQL which has seen a lot of new activities. The NoSQL data management products deal with key-value store, or the big table, or a document data, or a graph data. Examples of products include CouchBase, MongoDB, Riak, VoldeMart, BerkeleyDB, Hypertable, HBase, Cassandra, GraphDB, etc. They address very large number of simple structures and use parallel computing for performance. Google invented Map-Reduce algorithm that has become the Hadoop open source with HDFS as its file base.
  3. Distributed Data Grid and Cache technologies. Here Memcached came as an open source caching framework for MySQL and PHP applications. Other solutions include Terracotta, GigaSpaces, Oracle Coherence, etc.  SAP is also trying in-memory solution called Hana.

The Analytic Data Management space has two categories

  1. Non-relational (like Hadoop, Mapr, Piccolo,Dryad, ..)
  2. Relational products like Infobright, Netezza, ParAccel, SAP Sybase IQ, Teradata, EMC Greenplum, HP Vertica, IBM Infosphere, etc. The phrase Big Data is applied here, typically exceeding a petabyte. Social networking sites like Facebook and Tweeter are dealing with this.

I have seen the acronym SPRAIN (Scalability, Performance, Relaxed Consistency, Agility, Intricacy, and Necessity) to explain why the incumbents are inadequate to address the new challenges of unstructured data as well as Big Data.

These are exciting times for Data Management research and development.

About these ads

One response to “Data Management, circa 2011

  1. Indeed, data management has shaken up in recent years and will continue morphing until the transition to distributed computing model (aka “Cloud”) sets in.

    GigaSpaces (for which I work) has been doing quite a bit more than Data Grid for a while now. For a big data application platform you might want to read Nati Shalom’s blog (our CTO) as well as other posts on real-time/near-real-time analytics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s