Today, all the discussion on Big Data centers around “static data” in a data lake (old Data Warehouse) accessed by BI tools or SQL on Hadoop (Hawk, Impala) or Map/Reduce algorithms (MapR) for analysis. This is looking at historical data and finding trends. Some new tools are trying to provide predictive analysis based on past trends. This area deals with mostly the volume and variety aspect of Big Data, but not the velocity or for “data in motion”.
The term “Fast Data” is applied to data that is in motion. This component is getting more and more significant as there is a constant streaming of data coming from edge devices such as sensors, smart phones and connected devices. As these devices explode (10 Billion now going towards 50B in a few years, according to market analysis), there will be a data explosion and that is not going to be addressed by current Big Data products and tools. What is needed is capture of this data at ingestion points, efficient storage plus management and doing real time analytics for faster decisions. Streaming data has been around for a while, but we are talking about two-way sensors where constant feedback and aggregation is needed. For example, smart meters in the utilities industry can provide readings from individual homes, but aggregating it at the transformer level is important to predict seasonality and other trends. With fast data, things that were not possible before become achievable: instant decisions can be made on realtime data to drive sales, connect with customers, inform business processes, and create value.
Fast data is the payoff for big data. While much can be accomplished by mining data to derive insights that enable a business to grow and change, looking into the past provides only hints about the future. Simply collecting vast amounts of data for exploration and analysis will not prepare a business to act in real time, as data flows into the organization from millions of endpoints. The IoT (Internet of Things) makes this much more significant.
Enterprises have to figure out a combined architecture for Fast Data as well as Big Data. Streams of data from edge devices will eventually migrate to the data lake, but much realtime analysis have to happen before. Technologies such as in-memory databases and complex event processing are needed to handle the performance. This space is still new and much more work is needed in the area of analytics that is real-time. The older OLTP systems will be inadequate to handle the demands of ingestion and analysis at an affordable cost.
It is time to look at the world of data with a wider lens than just Hadoop-centric Big Data!