Category Archives: Open Source Software

Netflix Technology

I attended a meetup at Netflix last evening titled “Polyglot Persistence at Netflix”. The Cloud Development Engineering (CDE) team presented various aspects of building and maintaining a highly distributed system to meet its ever-growing customer needs. There are almost 160 million users and with its growing popularity of streaming movies and TV shows (many are produced now by Netflix), the demand on its systems is growing rapidly. Polyglot implies the coexistence of many databases and associated software systems.

Netflix cloud platform forms a layer of services, tools, frameworks and technologies that run on top of AWS EC2 in order to implement an efficient and nimble (fast reacting), highly available, globally distributed, scalable and performant solution. They switched over to AWS Cloud over a seven-year period starting back in 2009. It uses Amazon’s RDS and DynamoDB besides S3 for lower cost storage. The front-end is Node.js while the backend uses Java, Python, and Javascript. The team also described how they are using SSD (solid state device) besides memory cache. The main thrust of the evening talk was their use of Cassandra as the distributed database solution.

Apache Cassandra was originally developed at Facebook as a free, open-source, highly scalable, high performance distributed database to handle large amounts of data across many servers with no single point of failure. This global network of storage servers caches content locally to where it will be viewed. This local caching reduces bandwidth costs, reduces latency, and makes it easier to scale the service over a wide area, in this case globally. Here are the key reasons Netflix is a major user of Cassandra (besides others like eBay, Apple, Comcast, Instagram and Reddit):

  • Very large production deployment – 2500 nodes, 420 TB, over 1 Trillion user requests per day. Cassandra is a NOSQL, distributed, document-oriented database that scales horizontally and dynamically as more servers are added without need to re-shard or reboot.
  • Strong write performance with no network performance bottleneck.
  • It’s data model is highly flexible. A sparse 2-dimensional “super column family” architecture allows for rich data model representation (and better performance) beyond just key-value lookup.
  • It’s geographic capabilities – single global cluster can simultaneously replicate data asynchronously as well as service applications across multiple locations. The team last evening showed how users can seamlessly switch over to another data center if failure occurs. Cassandra has been a good choice for cross-data center and cross-regional deployment as customisable replication helps determine which cluster nodes to designate as replicas.

Like Youtube, Netflix has been growing its global reach and customers in providing streaming contents. The key success factor is the database technology to enable such high scale and performance. Other databases like RDS, DynamoDB and MySQL are providing varieties of function such as analytics and metadata store. One impressive part of last evening’s presentation was how they repair any damage to data on the fly by embedding it into the database itself.


The New AI Economy

The convergence of technology leaps, social transformation, and genuine economic needs is catapulting AI (Artificial Intelligence) from its academic roots & decades of inertia to the forefront of business and industry. There has been a growing noise since last couple of years on how AI and its key subsets like Machine Learning and Deep Learning will affect all walks of life. Another phrase “Pervasive AI” is becoming part of our tech lexicon after the popularity of Amazon Echo and Google Home devices.

So what are the key factors pushing this renaissance of AI? We can quickly list them here:

  • Rise of Data Science from the basement to the boardroom of companies. Everyone saw the 3V’s of Big Data (volume, velocity, and variety). Data is called by many names – oxygen, the new oil, new gold, or the new currency.
  • Open source software such as Hadoop sparked this revolution in analytics using lots of unstructured data. The shift from retroactive to more predictive and prescriptive analytics is growing, for actionable business insights. Real-time BI is also taking a front seat.
  • Arrival of practical frameworks for handling big data revived AI (Machine Learning and Deep Learning) which fed happily on big data.
  • Existing CPU’s were not powerful for the fast processing needs of AI. Hence GPU (Graphical Processing Units) offered faster and more powerful chips. NVIDIA provided a positive force in this area. It’s ability to provide a full range of components (systems, servers, devices, software, and architecture) is making NVIDIA an essential player in the emerging AI economy. IBM’s neuromorphic computing project provides notable success in the area of perception, speech and image recognition.

Leading software vendors such as Google have numerous projects on AI ranging from speech and image recognition, language translation, and varieties of pattern matching. Facebook, Amazon, Uber, Netflix, and many others are racing to deploy AI into their products.

Paul Allen, co-founder of Microsoft is pumping $125M into his research lab Allen Institute of AI. The focus is to digitize common sense. Let me quote from today’s New York Times, “Today, machines can recognize nearby objects, identify spoken words, translate one language into another and mimic other human tasks with an accuracy that was not possible just a few years ago. These talents are readily apparent in the new wave of autonomous vehicles, warehouse robotics, smartphones and digital assistants. But these machines struggle with other basic tasks. Though Amazon’s Alexa does a good job of recognizing what you say, it cannot respond to anything more than basic commands and questions. When confronted with heavy traffic or unexpected situations, driverless cars just sit there”. Paul Allen added, “To make real progress in A.I., we have to overcome the big challenges in the area of common sense”.

Welcome to the new AI economy!

Vitalik Buterin & Ethereum

Many of you may not have heard of this 23 year old Russian-Canadian, Vitalik Buterin. He is one of those geniuses who started loving computing and Math from an early age. His parents immigrated to Canada from Russia when he was 3 years old. After attending a private high school in Toronto, he joined the University of Waterloo (my alma mater), but dropped out after getting the Peter Thiel fellowship of $100K to pursue his entrepreneurial work in cryptocurrency.

After trying to persuade the Bitcoin community for a scripting language which got no support, he decided to start a new platform to serve cryptocurrency plus any asset like a smart contract. His first seminal paper in 2013 laid the foundation and the same year he proposed the building of a new platform called Ethereum with a general scripting language. In early 2014, a Switzerland company called Ethereum Switzerland GMBH developed the first Ethereum software project. Finally in July-August of 2014, Ethereum launched a pre-sale of Ether tokens (its own cryptocurrency) to public and raised $14M. Ethereum belongs to the same family as the cryptocurrency Bitcoin, whose value has increased more than 1,000 percent in just the past year. Ethereum has its own currencies, most notably Ether, but the platform has a wider scope than just money.

You can think of my Ethereum address as having elements of a bank account, an email address and a Social Security number. For now, it exists only on my computer as an inert string of nonsense, but the second I try to perform any kind of transaction — say, contributing to a crowdfunding campaign or voting in an online referendum — that address is broadcast out to an improvised worldwide network of computers that tries to verify the transaction. The results of that verification are then broadcast to the wider network again, where more machines enter into a kind of competition to perform complex mathematical calculations, the winner of which gets to record that transaction in the single, canonical record of every transaction ever made in the history of Ethereum. Because those transactions are registered in a sequence of “blocks” of data, that record is called the blockchain. Many Bitcoin exchanges use the Ethereum platform.

A New York Times article in January said, “The true believers behind blockchain platforms like Ethereum argue that a network of distributed trust is one of those advances in software architecture that will prove, in the long run, to have historic significance. That promise has helped fuel the huge jump in cryptocurrency valuations. But in a way, the Bitcoin bubble may ultimately turn out to be a distraction from the true significance of the blockchain. The real promise of these new technologies, many of their evangelists believe, lies not in displacing our currencies but in replacing much of what we now think of as the internet, while at the same time returning the online world to a more decentralized and egalitarian system. If you believe the evangelists, the blockchain is the future. But it is also a way of getting back to the internet’s roots”.

Vitalik wrote the idea of Ethereum at age 19. He is the new-age Linus Torvalds who fathered Linux that became the de-facto operating system for the Internet developers.


API-driven Economy?

I just went to a couple of sessions at the API World going on at the San Jose Convention center. I heard all kinds of new terms thrown within a span of couple of hours – the new API driven economy, iSaaS (integration software as a service), iPaaS (integration platform as a service), APIM (API management), BaaS (Backend as a service), etc. Then there was confusing & overlapping mixture of ideas in microservices, containers, connectors, API’s..all in the space of system integration. There were lots of young software developers at this conference and booths from companies I have never heard of – Jitterbit (Enterprise iPaaS), Back4App (backend dev. via Parser server), PubNub (real-time API), Rigor (API monitoring and testing). I took a deep breath and thought of all these ideas over last 3 decades – api’s, subroutines, reusable web services, service-oriented-architecture, integration via connectors, assembly of interchangeable parts from common libraries, etc. Welcome back to the future!

I see the urgency of this now that we have so many products and platforms in every category. A speaker from Jitterbit showed how Cisco’s marketing software stack has 39 different technologies – SalesForce, 6Sense, Eloqua, App Annie, Live Agent, etc. They do functions like campaign management, CRM, email blast, mobile notification… This is definitely not the ideal solution. Jitterbit wants to be the mediator via API’s to consolidate all these based on activities and work flow. No wonder, this Alameda-based startup is doing very well. I was not surprised to learn that SalesForce & private equity firm KKR are investors in Jitterbit.

Gartner predicts enterprise application integration market to be $33.5B by 2020 (CAGR of 7.1% from $25.5B in 2016), whereas the integration platform as a service (iPaaS) will be $3.6B by 2021 (CAGR of 41.5% from $526M in 2016). The data integration market is going to reach $12.2B in 2022 from $6.4B in 2017 (CAGR 13.7%). Gartner says, “IT leaders should combine on-premise integration platform, iPaaS, iSaaS and API Management capabilities into a single, yet modular enterprise capability.” Gartner defines this whole space as Application Integration Platforms.

I think it’s time we consolidate all these terms and bring real clarity. Current marketing hype of API driven economy does not help much. What used to be a programmer’s term (api – application programming interface) is now marketed as a broad term to solve the world hunger problem.

The goal has not changed – we want integration of heterogeneous systems (both inside and outside the enterprise) to be highly efficient, transparent, and less labor intensive.


Secret of Sundar Pichai’s success

I watched Sundar Pichai’s recent interaction with the students at I.I.T. (Indian Institute of Technology) Kharagpur, India, where he graduated back in 1993. Besides our common country of birth, I had never heard of Sundar until his rapid rise at Google a few years back. I have never met him or listened to him at conferences. So this was the first time, I had a chance to listen to his remarks and his answers to many questions from the audience of 3500 students at his alma mater earlier this week.

Growing up not far from I.I.T. Kharagpur, I was very aware of this institution. It was the first I.I.T. in India established during the 1950s. Other I.I.T’s like at Kanpur, Delhi, Mumbai and Chennai came later. These were the original 5 Indian Institute of Technologies. Lately many new ones have been added.

Sundar did his undergraduate studies in Metallurgy (study about metals). Then how did he switch from that into software? That was one of the questions from a student. He said that he loved Fortran language during his student days and that love for programming continued. The message he was giving was for everyone to pursue their own interest & passion. He mentioned that unlike in India, students at US universities sometimes do not decide their majors, way into their 3rd or 4th year of studies. Sundar’s passion was to build products that would impact a very large number of global users. During his interview at Google, he was asked what he thought of Gmail, which he had never seen nor used. Then the fourth interviewer actually showed it to him. Subsequently, he gave his opinion to the remaining 3 interviewers on what he thought was wrong with Gmail and how to improve it. He emphasized time and again the need to step out of the comfort zone and get an all rounded experience. Today’s students need not be afraid to take some risks and be willing to fail.

Besides technical leadership, Sundar possesses an amazing quality; egoless-ness, so rare to find in Silicon Valley executive community. He said that he truly believes in empowering his team and letting them execute with full trust. This is easier said that done, based on my experience at IBM and Oracle. Large organizations suffer from ego-driven leadership causing great amount of friction and anguish. Sunder’s rise at Google was due to his amazing ability to get teams to work very effectively. From Search, he went to manage Chrome, then he was given Android. His ability to work thru the complexities of products, fiefdoms, and internal rivalries was so evident that he was elevated to the CEO position so quickly. Humility is his hallmark combined with clarity of vision and efficient execution.

He made an interesting comment about the vision at Google. Larry Page said that the moonshot projects are worthwhile because the bar is so high (no competition). Even if you fail, you are still ahead with your knowledge and experience.

It was fun listening to Sundar’s simple and honest answers & remarks.


The resurgence of AI/ML/DL

We have been seeing a sudden rise in the deployment of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). It looks like the long “AI winter” is finally over.

  • According to IDC, AI-related hardware, software and services business will jump from $8B this year to $47B by 2020.
  • I have also read comments like, “AI is like the Internet in the mid 1990s and it will be pervasive this time”.
  • According to Andrew Ng, chief scientist at Baidu, “AI is the new electricity. Just as 100 years ago electricity transformed industry after industry, AI will now do the same.”
  • Peter Lee, co-head at Microsoft Research said,  “Sales teams are using neural nets to recommend which prospects to contact next or what kind of products to recommend.”
  • IBM Watson used AI in 2011, not DL. Now all 30 components are augmented by DL (investment from $500M – $6B in 2020).
  • Google had 2 DL projects in 2012, now it is more than 1000 (Search, Android, Gmail, Translation, Maps, YouTube, Self-driving cars,..).

It is interesting to note that AI was mentioned by Alan Turing in a paper he wrote back in 1950 to suggest that there is possibility to build machines with true intelligence. Then in 1956, John McCarthy organized a conference at Dartmouth and coined the phrase Artificial Intelligence. Much of the next three decades did not see much activity and hence the phrase “AI Winter” was coined. Around 1997, IBM’s Deep Blue won the chess match against Kasparov. During the last few years, we saw deployments such as Apple’s Siri, Microsoft’s Cortana, and IBM’s Watson (beating Jeopardy game show champions in 2011). In 2014, DeepMind team used a deep learning algorithm to create a program to win Atari games.

During last 2 years, use of this technology has accelerated greatly. The key players pushing AI/ML/DL are – Nvidia, Baidu, Google, IBM, Apple, Microsoft, Facebook, Twitter, Amazon, Yahoo, etc. Many new players have appeared – DeepMind, Numenta, Nervana, MetaMind, AlchemyAPI, Sentient, OpenAI, SkyMind, Cortica, etc. These companies are all targets of acquisition by the big ones. Sunder Pichai of Google says, “Machine learning is a core transformative way in which we are rethinking everything we are doing”. Google’s products deploying these technologies are – Visual Translation, RankBrain, Speech Recognition, Voicemail Transcription, Photo Search, Spam Filter, etc.

AI is the broadest term, applying to any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision trees, and machine learning. The subset of AI that includes abstruse statistical techniques that enable machines to improve at tasks with experience is machine learning. A subset of machine learning called deep learning is composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, by exposing multi-layered neural networks to vast amounts of data.

I think the resurgence is a result of the confluence of several factors, like advanced chip technology such as Nvidia Pascal GPU architecture or IBM TrueNorth (brain-inspired computer chip), software architectures like microservice containers, ML libraries, and data analytics tool kits. Well known academia are heavily being recruited by companies – Geoffrey Hinton of University of Toronto (Google), Yann LeCun of New York University (Facebook), Andrew Ng of Stanford (Baidu), Yoshua Bengio of University of Montreal, etc.

The outlook of AI/ML/DL is very bright and we will see some real benefits in every business sector.


Linux & Cloud Computing

While reading the latest issue of the Economist, I was reminded that August 25th. marks an important anniversary for two key events:  25 years back, on August 25, 1991, Linus Torvalds launched a new operating system called Linux and on the same day in 2006, Amazon under the leadership of Andy Jesse launched the beta version of Elastic Computing Cloud (EC2), the central piece of Amazon Web Services (AWS).

The two are very interlinked. Linux became the world’s most used piece of software of its type. Of course Linux usage soared due to backers like HP, Oracle, and IBM to combat the Windows force. Without open-source programs like Linux, cloud computing would not have happened. Currently 1500 developers contribute to each new version of Linux. AWS servers deploy Linux heavily. Being first to succeed on a large scale allowed both Linux and AWS to take advantage of the network effect, which makes popular products even more entrenched.

Here are some facts about AWS. It’s launch back in 2006 was extremely timely, just one year before the smartphones came about. Apple launched its iPhone in 2007 which ushered the app economy. AWS became the haven for start-ups making up nearly two-third of its customer base (estimated at 1 million). According to Gartner Group, the cloud computing market is at $205B in 2016, which is 6% of the world’s IT budget of $3.4 trillion. This number will grow to $240B next year. No wonder, Amazon is reaping the benefits – over past 12 months, AWS revenue reached $11B with a margin of over 50%. During the last quarter, AWS sales were 3 times more than the nearest competitor, Microsoft Azure. AWS has ten times more computing capacity than the next 14 cloud providers combined. We also saw the fate of Rackspace last week (acquired by a private equity firm). Other cloud computing providers like Microsoft Azure, Google Cloud, and IBM (acquired SoftLayer in 2013) are struggling to keep up with AWS.

The latest battleground in cloud computing is data. AWS offers Aurora and Redshift in that space. It also started a new services called Snowball, a suitcase-sized box of digital memory which can store mountains of data in the AWS cloud (interesting challenge to Box and Dropbox). IBM bought Truven Health Analytics which keeps data on 215m patients in the healthcare industry.

The Economist article said, “AWS could end up dominating the IT industry just as IBM’s System/360, a family of mainframe computers did until the 1980s.”       I hope it’s not so and we need serious competition to AWS for customer’s benefits. Who wants a single-vendor “lock-in”? Microsoft’s Azure seems to be moving fast. Let us hope IBM, Google, and Oracle move very aggressively offering equivalent or better alternatives to Amazon cloud services.