Big Data is a top technology trend for 2012 according to Forrester Research. The Economist said that Big Data is a new game changing asset and The Harvard Business Review termed it as a scientific revolution. Scientific Revolution? Because it is data-intensive computing to unify, theorize, experiment, and do simulation at scale.
Big Data is when the size of the data itself becomes part of the problem. But Big Data is not just “big”. There are the 3V’s of Big Data:
- Volume – Terabyte records, transactions, tables, files. A Boeing Jet engine spews out 10TB of operational data for every 30 minutes they run. Hence a 4-engine Jumbo jet can create 640TB on one Atlantic crossing. Multiply that to 25,000 flights flown each day and you get the picture.
- Velocity – batch, near-time, real-time, streams. Today’s on-line ad serving requires 40ms to respond with a decision. Financial services need near 1MS to calculate customer scoring probabilities. Stream data, such as movies, need to travel at high speed for proper rendering.
- Variety – structures, unstructured, semi-structured, and all the above in a mix. WalMart processes 1M customer transactions per hour and feeds information to a database estimated at 2.5PB (petabytes). There are old and new data sources like RFID, sensors, mobile payments, in-vehicle tracking, etc.
Because of these characteristics, traditional DBMS solutions are inadequate. Hence we have seen the growth of technologies such as Hadoop (map-reduce algorithm started at Google) mostly processing unstructured data in batch mode. New solutions are needed for realtime processing.
See my blog from last year on this subject.