During the 1980s and 1990s, online transaction processing (OLTP) was critical for banks, airlines, and telcos for core business functions. This was a big step-up from batch systems of the early days. We learnt the importance of sub-second response time and continuous availability with the goal of five-nines (99.999% uptime). The yearly tolerance of system outage was like 5 minutes. During my days at IBM, we had to face the fire from a bank in Japan that had an hour long outage resulting in a long queue in front of the ATM machine (unlike here, the Japanese stand very patiently until the system came back after what felt like an eternity). They were using IBM’s IMS Fast Path software and the blame was first put on that software, which subsequently turned out to be some other issue.
Advance the clock to today. Everything is real-time and one can not talk about real-time without discussing the need for “fast data” – data that has to travel very fast for real time decision making. Here are some reasons for fast data:
- These days, it is important for businesses to be able to quickly sense and respond to events that are affecting their markets, customers, employees, facilities, or internal operations. Fast data enables decision makers and administrators to monitor, track, and address events as they occur.
- Leverage the Internet of Things – for example, an engine manufacturer will embed sensors within its products, which then will provide continuous feeds back to the manufacturer to help spot issues and better understand usage patterns.
- An important advantage that fast data offers is enhanced operational efficiency, since events that could negatively affect processes—such as inventory shortages or production bottlenecks—can not only be detected and reported, but remedial action can be immediately prescribed or even launched. Realtime analytical data can be measured against the patterns determined to predict problems, and systems can respond with appropriate alerts or automated fixes.
- Assure greater business continuity – Fast data plays a role in bringing systems—and all data still in the pipeline—back up and running quickly, before the business suffers from a catastrophic event.
- Fast data is critical for supporting Artificial Intelligence and machine learning. As a matter of fact, data is the fuel for machine learning (recommendation engines, fraud detection systems, bidding systems, automatic decision making systems, chatbots, and many more).
Now let us look at the constellation of technologies enabling fast data management and analytics. Fast data is the data that moves almost instantaneously from source to processing to analysis to action, courtesy of framework and pipelines such as Apache Spark, Apache Storm, Apache Kafka, Apache Kudu, Apache Cassandra, and in-memory data grids. Here is a brief outline on each of these.
Apache Spark – open source toolset now supported by most major database vendors. It offers streaming and SQL libraries to deliver real-time data processing. Spark Streaming offers data as it is created, enabling analysis for critical areas like real-time analytics and fraud detection. It’s structured streaming API opens up this capability to enterprises of all sizes.
Apache Storm is an open source distributed real-time computation system designed to enable processing of data streams.
Apache Cassandra is an open source low-latency data replication engine.
Apache Kafka is an open source toolset designed for real-time data streaming – employed for data pipelines and streaming apps. Kafka Connect API helps connect it to other environments. It originated at Linked-In.
Apache Kudu is an open source storage engine to support real-time analytics on commodity hardware.
In addition to powerful open source tools and frameworks, there are in-memory data grids that provides a hardware-enabled fast data enabler to deliver blazing speeds to meet today’s needs such as the IoT management and deployment of AI and machine learning and responding to events in real-time.
Yes, we have come a long way from those OLTP days! Fast data management and analytics is becoming a key area for businesses to survive and grow.