Netflix Technology

I attended a meetup at Netflix last evening titled “Polyglot Persistence at Netflix”. The Cloud Development Engineering (CDE) team presented various aspects of building and maintaining a highly distributed system to meet its ever-growing customer needs. There are almost 160 million users and with its growing popularity of streaming movies and TV shows (many are produced now by Netflix), the demand on its systems is growing rapidly. Polyglot implies the coexistence of many databases and associated software systems.

Netflix cloud platform forms a layer of services, tools, frameworks and technologies that run on top of AWS EC2 in order to implement an efficient and nimble (fast reacting), highly available, globally distributed, scalable and performant solution. They switched over to AWS Cloud over a seven-year period starting back in 2009. It uses Amazon’s RDS and DynamoDB besides S3 for lower cost storage. The front-end is Node.js while the backend uses Java, Python, and Javascript. The team also described how they are using SSD (solid state device) besides memory cache. The main thrust of the evening talk was their use of Cassandra as the distributed database solution.

Apache Cassandra was originally developed at Facebook as a free, open-source, highly scalable, high performance distributed database to handle large amounts of data across many servers with no single point of failure. This global network of storage servers caches content locally to where it will be viewed. This local caching reduces bandwidth costs, reduces latency, and makes it easier to scale the service over a wide area, in this case globally. Here are the key reasons Netflix is a major user of Cassandra (besides others like eBay, Apple, Comcast, Instagram and Reddit):

  • Very large production deployment – 2500 nodes, 420 TB, over 1 Trillion user requests per day. Cassandra is a NOSQL, distributed, document-oriented database that scales horizontally and dynamically as more servers are added without need to re-shard or reboot.
  • Strong write performance with no network performance bottleneck.
  • It’s data model is highly flexible. A sparse 2-dimensional “super column family” architecture allows for rich data model representation (and better performance) beyond just key-value lookup.
  • It’s geographic capabilities – single global cluster can simultaneously replicate data asynchronously as well as service applications across multiple locations. The team last evening showed how users can seamlessly switch over to another data center if failure occurs. Cassandra has been a good choice for cross-data center and cross-regional deployment as customisable replication helps determine which cluster nodes to designate as replicas.

Like Youtube, Netflix has been growing its global reach and customers in providing streaming contents. The key success factor is the database technology to enable such high scale and performance. Other databases like RDS, DynamoDB and MySQL are providing varieties of function such as analytics and metadata store. One impressive part of last evening’s presentation was how they repair any damage to data on the fly by embedding it into the database itself.


Crypto Hype vs. Blockchain

There is a lot of crypto hype these days from crypto currencies like Bitcoin to fundraising efforts like ICO (Initial Coin Offering) similar to an IPO. All this noise has obscured the real benefits of the underlying technology – Blockchain. The Internet brought us the “exchange of information” over last 3 decades. Blockchain will give us the new era of “exchange of values” or “exchange of assets” without an intermediary via highly secure transactions in a peer to peer network. New ways of transferring real estate titles, managing cargo on shipping vehicles, guaranteeing the safety of food we eat and much more mundane activities will be enabled by Blockchain. An article in today’s WSJ by Christopher Mims covers this in more detail.

Briefly Blockchain is essentially a secure database (or ledger) spread across multiple computers. Everybody has the same record of all transactions, so tampering with one instance of it will be meaningless. “Crypto” describes the cryptography that underlies it, which allows agents to securely interact (e.g. transfer assets) while also guaranteeing that once a transaction has been made, the Blockchain keeps an immutable record of it. This technology is well suited to transactions that require trust and a permanent record for traceability. It also requires the cooperation of many different parties. Here are some examples of actual deployment of Blockchain so far:

  • At Walmart 1.1 million items are on Blockchain helping the company to trace the item’s journey from manufacturer to store shelf. Global shipping company Maersk is tracking shipping containers making it faster and easier to transfer them and get them thru customs. Other companies using Blockchain technology for tracking are Kroger, Nestle, Tyson Foods and Unilever. In all these cases, IBM is providing the Blockchain technology.
  • CartaSense, an Israeli company uses Blockchain database for its customers to track every stage of the journey of a package, pallet or shipping container.
  • Everledge, a company started in 2014 uses a Blockchain-based registry of every certified diamond in the world (already 2.2. million in its registry). By recording 40 different measures of each stone, it is able to trace the journey of a stone from its source to the final sale to a customer.
  • Dubai has declared its goal to make itself a Blockchain powered government in the world by 2020. They want to streamline real estate transactions for faster and easier transfer of property titles. Other assets like birth/death certificates, passports, visa, etc. can also be managed at low cost with better efficiency.

It is a bit early to claim that Blockchain will revolutionize every industry including government, but it has that potential. It poses a tremendous challenge for the hackers to break into. It can impact on how we vote to whom we connect to what we buy.

The New AI Economy

The convergence of technology leaps, social transformation, and genuine economic needs is catapulting AI (Artificial Intelligence) from its academic roots & decades of inertia to the forefront of business and industry. There has been a growing noise since last couple of years on how AI and its key subsets like Machine Learning and Deep Learning will affect all walks of life. Another phrase “Pervasive AI” is becoming part of our tech lexicon after the popularity of Amazon Echo and Google Home devices.

So what are the key factors pushing this renaissance of AI? We can quickly list them here:

  • Rise of Data Science from the basement to the boardroom of companies. Everyone saw the 3V’s of Big Data (volume, velocity, and variety). Data is called by many names – oxygen, the new oil, new gold, or the new currency.
  • Open source software such as Hadoop sparked this revolution in analytics using lots of unstructured data. The shift from retroactive to more predictive and prescriptive analytics is growing, for actionable business insights. Real-time BI is also taking a front seat.
  • Arrival of practical frameworks for handling big data revived AI (Machine Learning and Deep Learning) which fed happily on big data.
  • Existing CPU’s were not powerful for the fast processing needs of AI. Hence GPU (Graphical Processing Units) offered faster and more powerful chips. NVIDIA provided a positive force in this area. It’s ability to provide a full range of components (systems, servers, devices, software, and architecture) is making NVIDIA an essential player in the emerging AI economy. IBM’s neuromorphic computing project provides notable success in the area of perception, speech and image recognition.

Leading software vendors such as Google have numerous projects on AI ranging from speech and image recognition, language translation, and varieties of pattern matching. Facebook, Amazon, Uber, Netflix, and many others are racing to deploy AI into their products.

Paul Allen, co-founder of Microsoft is pumping $125M into his research lab Allen Institute of AI. The focus is to digitize common sense. Let me quote from today’s New York Times, “Today, machines can recognize nearby objects, identify spoken words, translate one language into another and mimic other human tasks with an accuracy that was not possible just a few years ago. These talents are readily apparent in the new wave of autonomous vehicles, warehouse robotics, smartphones and digital assistants. But these machines struggle with other basic tasks. Though Amazon’s Alexa does a good job of recognizing what you say, it cannot respond to anything more than basic commands and questions. When confronted with heavy traffic or unexpected situations, driverless cars just sit there”. Paul Allen added, “To make real progress in A.I., we have to overcome the big challenges in the area of common sense”.

Welcome to the new AI economy!


Vitalik Buterin & Ethereum

Many of you may not have heard of this 23 year old Russian-Canadian, Vitalik Buterin. He is one of those geniuses who started loving computing and Math from an early age. His parents immigrated to Canada from Russia when he was 3 years old. After attending a private high school in Toronto, he joined the University of Waterloo (my alma mater), but dropped out after getting the Peter Thiel fellowship of $100K to pursue his entrepreneurial work in cryptocurrency.

After trying to persuade the Bitcoin community for a scripting language which got no support, he decided to start a new platform to serve cryptocurrency plus any asset like a smart contract. His first seminal paper in 2013 laid the foundation and the same year he proposed the building of a new platform called Ethereum with a general scripting language. In early 2014, a Switzerland company called Ethereum Switzerland GMBH developed the first Ethereum software project. Finally in July-August of 2014, Ethereum launched a pre-sale of Ether tokens (its own cryptocurrency) to public and raised $14M. Ethereum belongs to the same family as the cryptocurrency Bitcoin, whose value has increased more than 1,000 percent in just the past year. Ethereum has its own currencies, most notably Ether, but the platform has a wider scope than just money.

You can think of my Ethereum address as having elements of a bank account, an email address and a Social Security number. For now, it exists only on my computer as an inert string of nonsense, but the second I try to perform any kind of transaction — say, contributing to a crowdfunding campaign or voting in an online referendum — that address is broadcast out to an improvised worldwide network of computers that tries to verify the transaction. The results of that verification are then broadcast to the wider network again, where more machines enter into a kind of competition to perform complex mathematical calculations, the winner of which gets to record that transaction in the single, canonical record of every transaction ever made in the history of Ethereum. Because those transactions are registered in a sequence of “blocks” of data, that record is called the blockchain. Many Bitcoin exchanges use the Ethereum platform.

A New York Times article in January said, “The true believers behind blockchain platforms like Ethereum argue that a network of distributed trust is one of those advances in software architecture that will prove, in the long run, to have historic significance. That promise has helped fuel the huge jump in cryptocurrency valuations. But in a way, the Bitcoin bubble may ultimately turn out to be a distraction from the true significance of the blockchain. The real promise of these new technologies, many of their evangelists believe, lies not in displacing our currencies but in replacing much of what we now think of as the internet, while at the same time returning the online world to a more decentralized and egalitarian system. If you believe the evangelists, the blockchain is the future. But it is also a way of getting back to the internet’s roots”.

Vitalik wrote the idea of Ethereum at age 19. He is the new-age Linus Torvalds who fathered Linux that became the de-facto operating system for the Internet developers.


IBM’s Neuromorphic Computing Project

The Neuromorphic Computing Project at IBM is a pioneer in next-generation chip technology. The project has received ~$70 million in research funding from DARPA (under SyNAPSE Program), US Department of Defense, US Department of Energy, and Commercial Customers. The ground-breaking project is multi-disciplinary, multi-institutional, and multi-national and has a world-wide scientific impact. The resulting architecture, technology, and ecosystem breaks path with the prevailing von Neumann architecture and constitutes a foundation for energy-efficient, scalable neuromorphic systems. The head of this project is Dr. Dharmendra Modha, IBM Fellow and chief scientist for IBM’s brain-inspired computing project.

So why is the Von Neumann architecture inadequate for brain-inspired computing? The Von Neumann model goes back to 1946 where it dealt with 3 things – the CPU, memory and a bus. You move data to and from memory. The bus connects the memory & CPU via computation. It becomes the bottleneck, and also sequentializes computation. So if you have to flip a single bit, you have to read that bit from memory and write it back.

The new architecture is radically different. The IBM project takes inspiration from the structure, dynamics, and behavior of the brain to see if they can optimize time, speed, and energy of computation. Co-locate memory and computation and slowly intertwine communication, just like how the brain does, then you can minimize the energy of moving bits from memory to computation. You can get event-driven computation rather than clock-driven computation, and you can compute only when information changes.

The Von Neumann paradigm is, by definition, a sequence of instructions interspersed with occasional if-then-else statements. Compare that to a neural network, where a neuron can reach out to up to 10,000 neighbors. The TrueNorth (IBM’s new chip) can reach out to up to 256, and the reason for that disparity is because it is silicon and not organic technology. But there’s a very high fan-out, and high fan-out is difficult to implement in a sequential architecture. An AI system IBM developed last year for Lawrence Livermore National Lab had 16 TrueNorth chips tiled in a 4-by-4 array. The chips are designed to be tiled, so scalability is built in as a design principle rather than as an afterthought.

In summary, the design points of the IBM project are as follows:

  • The Von Neumann architecture won’t be able to provide the massively parallel, fault-tolerant, power-efficient systems that will be needed to create to embed intelligence into silicon. Instead, IBM had to rethink processor design.
  • You can’t throw out the baby with the bathwater: even if you rethink underlying hardware design, you need to implement sufficiently abstracted software libraries to reduce the pain of the software developer so that he can program your chip.
  • You can achieve power efficiency by changing the way you build software and hardware to become active only when an event occurs; rather than tying computation to a series of sequential operations, you make it into a massively parallel job that runs only when the underlying system changes.

AI is getting notable success in the area of perception such as speech and image recognition. In the field of reinforcement learning and deep learning, the human brain becomes the primary inspiration. Hence the IBM Neuromorphic chip design becomes a significant foundational technology.


Chaos Engineering

This phrase is new and it originated at Netflix back in 2010. I was listening to Nora Jones, a Netflix engineer at the AWS re-Invent conference few weeks back, where she talked about this. The principle of Chaos goes like this, “Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” Distributed systems have too many moving parts and failures can occur at various levels – hard disks can fail, the network can go down, a sudden surge in customer traffic can overload a functional component—the list goes on. All too often, these events trigger outages, poor performance, and other undesirable behaviors. Chaos Engineering is a method of experimentation on infrastructure that brings systemic weaknesses to light. This empirical process of verification leads to more resilient systems, and builds confidence in the operational behavior of those systems.

Netflix moved its operation to the cloud back in 2008. They started some form of resiliency testing since that time. They introduced Chaos Monkey that systematically turned off services in the production systems. Then came Chaos Kong for large scale failures like shutting off a whole data center. Another tool called FIT (Failure Injection Testing) was introduced to test all scenarios between the small (Chaos Monkey) and very large (Chaos Kong). All these experiments culminated into what is called Chaos Engineering, a discipline now used across many large companies such as Google, Amazon, Microsoft, etc.

Applying Chaos Engineering improves the resilience of a system. By designing and executing Chaos Engineering experiments, you will learn about weaknesses in your system that could potentially lead to outages that cause customer harm. You can then address those weaknesses proactively, going beyond the reactive processes that currently dominate most incident response models.

So what is the difference between Chaos Engineering (experimentation) and testing? In testing, an assertion is made: given specific conditions, a system will emit a specific output. Tests are typically binary, and determine whether a property is true or false. Strictly speaking, this does not generate new knowledge about the system, it just assigns valence to a known property of it. Experimentation generates new knowledge, and often suggests new avenues of exploration. Examples of input for chaos experiments could span from maxing out cpu cores on an Elasticserach cluster to partially deleting kafka topics over a variety of instances to recreate an issue that occured in production. Numerous experiments can be performed to understand system behavior ahead of time and take corrective actions.

At Google, Kripa Krishnan leads a team that constantly breaks the system. So a small team of testers from other big companies have started to work together to share best practices. These folks are currently working on ways to automate some of the tests. “Right now, scale is our problem. We are doing hundreds of tests, but I cannot scale my team to hundreds of people. So we are exploring automating some of this. How do you constantly cause damage so systems are constantly recovering?”

As distributed systems get more complex with thousands of microservices providing various functions, chaos engineering is emerging as a key practice to make these systems more resilient to failure.


Big Data & Analytics – what’s ahead?

Recently I read somewhere this statement – As we end 2017 and look ahead to 2018, topics that are top of mind for data professionals are the growing range of data management mandates, including the EU’s new General Data Protection Regulation that is directed at personal data and privacy, the growing role of artificial intelligence (AI) and machine learning in enterprise applications, the need for better security in light of the onslaught of hacking cases, and the ability to leverage the expanding Internet of Things.

Here are the key areas as we look ahead:

  • Business owners demand outcomes – not just a data lake to store all kinds of data in its native format and API’s.
  • Data Science must produce results – Play and Explore is not enough. Learn to ask the right questions. Visualization of analytics from search.
  • Everyone wants Real Time – Days and weeks too slow, need immediate actionable outcomes. Analytics & recommendations based on real time data.
  • Everyone wants AI (artificial intelligence) – Tell me what I don’t know.
  • Systems must be secure – no longer a mere platitude.
  • ML (machine learning) and IoT at massive scale – Thousands of ML models. Need model accuracy.
  • Blockchain – need to understand its full potential to business – since it’s not transformational, but a foundational technology shift.

In the area of big data, a combination of new and long-established technologies are being put to work. Hadoop and Spark are expanding their roles within organizations. NoSQL and NewSQL databases bring their own unique attributes to the enterprise, while in-memory capabilities (such as Redis) are increasingly being utilized to deliver insights to decision makers faster. And through it all, tried-and-true relational databases continue to support many of the most critical enterprise data environments.

Cloud becomes the de-facto deployment choice for both users and developers. Serverless technology with FaaS (Function as a Service) is getting rapid adoption amongst developers. According to IDC, enterprises are undergoing IT transformation as they rethink their business operations, including how they use information and what technology to deploy. In line with that transformation, nearly 80% of large organizations already have a hybrid cloud strategy in place. The modern application architecture, sometimes referred to as SMAC (social, mobile, analytics, cloud) is becoming standard everywhere.

The DBaaS (database as a service) is still not as widespread as other cloud services. Microsoft is arguably making the strongest explicit claim for a converged database system with its Azure Cosmo DB as DBaaS. Cosmo DB claims to support four data models – key-value, column-family, document, and graph. However, databases have been slower to migrate to the cloud than other elements of computing infrastructure mainly for security and performance reasons. But DBaaS adoption is poised to accelerate. Some of these cloud based DBaaS systems – Cosmo DB, Spanner from Google, and AWS DynamoDB – now offer significant advantages over their on-premise counterparts.

One thing for sure, big data and analytics will continue to be vibrant and exciting in 2018.