Fast Data vs. Big Data

Back when we were doing DB2 at IBM, there was an important older product called IMS which brought significant revenue. With another database product coming (based on relational technology), IBM did not want any cannibalization of the existing revenue stream. Hence we coined the phrase “dual database strategy” to justify the need for both DBMS products. In a similar vain, several vendors are concocting all kinds of terms and strategies to justify newer products under the banner of Big Data.

One such phrase is Fast Data. We all know the 3V’s associated with the term Big Data – volume, velocity and variety. It is the middle V (velocity) that says data is not static, but is changing fast, like stock market data, satellite feeds, even sensor data coming from smart meters or an aircraft engine. The question always has been how to deal with such type of changing data (as opposed to static data typical in most enterprise systems of record).

Recently I was listening to a talk by IBM and VoltDB where VoltDB tried to justify the world of “Fast Data” as co-existing with “Big Data” which is narrowed to static data warehouse or “data lake” as IBM calls it. Again, they have chosen to pigeonhole Big Data into the world of HDFS, Netezza, Impala, and batch Map-Reduce. This way, they justify the phrase Fast Data as representing operational data that is changing fast. They call VoltDB as  “the fast, operational database” implying every other database solution as slow. Incumbents like IBM, Oracle, and SAP have introduced in-memory options for speed and even NoSQL databases can process very fast reads on distributed clusters.

VoltDB folks also tried to show how the two worlds (Fast Data and their version of Big Data) will coexist. The Fast Data side will ingest and interact on streams of inbound data, do real time data analysis and export to the data warehouse. They bragged about the performance benchmark of 1m tps on a 3-node cluster scaling to 2.4m on a 12-node system running in the SoftLayer cloud (owned by IBM). They also said that this solution is much faster than Amazon’s AWS cloud. The comparison is not apple-to-apple as the SoftLayer deployment is on bare metal compared to the AWS stack of software.

I wish they call this simply – real-time data analytics, as it is mostly read type transactions and not confuse with update-heavy workloads. We will wait and see how enterprises adopt this VoltDB-SoftLayer solution in addition to their existing OLTP solutions.


One response to “Fast Data vs. Big Data

  1. Jnan,

    I’m from VoltDB, really like your commentary on Fast and Big Data. However I don’t agree with characterizing fast data as “real-time data analytics” and would appreciate your perspective on my rationale. The workloads that we describe as “fast data” involve processing data (mainly writes not reads, and often require knowing ‘state’) as it is flowing into an organization — ‘data in motion’ as some people refer to it — versus doing a read on data that’s been stored in HDFS or a data warehouse — ‘data at rest.’ We manage data in real real time, enable applications to transact on a per event basis. It’s heavy write not read.

    The term “real-time analytics” in my view is widely used to refer to the speed of the response of doing a query/read from the data warehouse. It has evolved from the batch processing days where a response maybe took days or hours, now its minutes or seconds. In this use case it’s not ‘real time’ it’s just a lot faster. When we at VoltDB talk about real time, we actually mean data streaming in real time. In our view fast data has fundamentally different requirements than Big Data which is why we make the distinction. And yes VoltDB was designed specifically for these unique requirements.

    Fast + Big Data have different requirements. We’re on the front end ingesting and interacting on data, doing real real-time analytics, making data-driven decisions on each event, enabling applications to take action, and then exporting the data to the data warehouse for historical analytics, reporting, analysis, etc.

    I’d be interested to know if you think this helps clarify things or not.


    Peter Vescuso

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s