Lambda Architecture is a useful framework to think about designing big data applications. Nathan Marz designed this generic architecture addressing common requirements for big data based on his experience working on distributed data processing systems at Twitter. Some of the key requirements in building this architecture include:
- Fault-tolerance against hardware failures and human errors
- Support for a variety of use cases that include low latency querying as well as updates
- Linear scale-out capabilities, meaning that throwing more machines at the problem should help with getting the job done
- Extensibility so that the system is manageable and can accommodate newer features easily
The following pictures summarizes the framework.
The Lambda Architecture as seen in the picture has three major components.
- Batch layer that provides the following functionality
- managing the master dataset, an immutable, append-only set of raw data
- pre-computing arbitrary query functions, called batch views.
- Serving layer—This layer indexes the batch views so that they can be queried in ad hoc with low latency.
- Speed layer—This layer accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, the speed layer deals with recent data only.
Criticism of lambda architecture has focused on its inherent complexity and its limiting influence. The batch and streaming sides each require a different code base that must be maintained and kept in sync so that processed data produces the same result from both paths. Yet attempting to abstract the code bases into a single framework puts many of the specialized tools in the batch and real-time ecosystems out of reach.
The panelists rambled on details without addressing real challenges on combining two very different approaches, thus compromising the benefits of stream with added latency of the batch world. However, there is merit to the thought process of unification of the two disparate worlds into a common framework. Real deployment will be the proof point.