RocksDB from Facebook

I attended a HIVE-sponsored Meetup yesterday evening titled, “Rocking the database world with RocksDB”. Since I had never heard of RocksDB, I was curious to learn how it is rocking the database world.

Facebook built this key value store storage layer originally to use for MySQL (instead of InnoDB), as MySQL is used heavily at Facebook. They claim that was not the only motivation. Then in 2013, they decided to open source RocksDB. Last evening’s speaker in an earlier post on November, 2013 had said, “Storing and accessing hundreds of petabytes of data is a huge challenge, and we’re constantly improving and overhauling our tools to make this as fast and efficient as possible. Today, we are open-sourcing RocksDB, an embeddable, persistent key-value store for fast storage that we built and use here at Facebook.”

RocksDB is also ideal for SSD (Flash store) and claims fast performance. The team was excited when MongoDB opened up to other storage engines back in 2014 summer. For a period of time, MongoDB plus RocksDB was a fast combination. Then MongoDB decided to acquire WiredTiger ( a competitor) in December, 2014 to contribute to the performance, scalability, and hardware efficiency of MongoDB. That left RocksDB out of the official engagement with MongoDB. But they built something called MongoRocks that claims to be very fast. It seems several MongoDB users prefer MongoRocks over the native combo of MongoDB with WiredTiger.

Several users of RocksDB talked about their experience, specially in the IoT world where sensor data can be processed at the edge (ingestion, aggregation, and some transformation) before being sent to the cloud servers. The only issue I saw is the fact that there is no “real” owner of RocksDB as a deliverable solution. There is no equivalent of a Cloudera (For Hadoop) or Confluent (for Kafka) who can provide value-additions and support for the user base. It’s all open source download and do-your-own stuff till now. So serious production-level deployment is still a risky affair. For now, it’s a developer’s play tool.

In Memoriam – Ed Lassettre

I was out of the country when my old colleague from IBM days, Ed Lassettre passed away last November. I only found out earlier this month about his demise from a mutual friend from IBM Almaden Research. Ed was one of the best computer software professionals I knew and respected.

He was at IBM’s Santa Teresa Lab (now called Silicon Valley Lab) when I started there back in 1981 after my five-year stint at IBM Canada. That year he got promoted to a Senior Technical Staff member (STSM), the very first at the lab to get that honor. Subsequently he became an IBM Fellow, the highest technical honor. His reputation of being one of the key software engineers for IBM’s MVS operating system preceded him. Ed had spent a few years at IBM’s Poughkeepsie Lab in upstate New York. He did his undergraduate  and post-graduate studies at Ohio State University in Math. He had deep insights into the intricacies of high performance computing systems. When we were building DB2 at the IBM lab, Ed was providing guidance on its interface with the operating system.

Subsequently I went to IBM’s Austin Lab for two years in the mid-1980s to lay the foundation of the DB2 product for the PC (which at the time lacked the processing power and memory of the large mainframes). Hence our design had to accommodate to those limitations. The IBM executives wanted someone to audit our design before giving the green signal for development. I could not think of a better person than Ed Lassettre to do that. At my request Ed spent some time and gave a very positive report on our design. He had great credibility in the technical community. Many times, I sought his views on technical matters and he provided timely advice. His wisdom was complemented by a tremendous humility, a rare feature in our industry.

I had left IBM back in 1992 for Oracle and lost touch with Ed. Later on I found that he had retired from IBM and joined Microsoft Research. He was a good friend of the late Jim Gray, also at Microsoft Research at the time. Ed retired from Microsoft in 2013 at the age of 79! He was quite well-known in the HPTC (High Performance Technical Computing) world.

RIP, Ed Lassettre, a great computer scientist and friend! You will be missed.

Strategic Technologies as per Gartner

I have known Gartner for decades during my IBM and Oracle days. Even though I have observed how they invent new terms to stuff we already know (a bit annoying, but I guess that’s their business), they do a decent job in capturing key strategic trends.

In a recent article, I saw ten strategic technology trends and this is how they are grouped: the first 3 address merging the physical and the virtual worlds and the emergence of the digital mesh (their new phrase); The next 3 trends cover the algorithmic business, where much happens in the background in which people are not directly involved; the final 4 trends address the new architecture and platform trends needed to support the digital and algorithmic business.

The first 3 trends:

  • The Device Mesh – In the postmobile world the focus shifts to the mobile user who is surrounded by a mesh of devices, each with an IP address always communicating.
  • Ambient User Experience – Seamless flow of experience across a shifting set of devices. Think of shifting from IoT, to automobiles, smartphones, etc.
  • 3D Printing Materials – This will necessitate the assembly line and supply chain processes to exploit 3D printing.

The next 3 trends:

  • Information of Everything – This information goes beyond textual, audio and video and includes sensory and contextual stuff.How do you bring meaning to a chaotic deluge of information? Much work is needed here.
  • Advanced Machine Learning – Deep Neural Networks (DNNs) go beyond classic computing and information management to create systems that can autonomously learn to perceive the world on their own. DNNs (an advanced form of machine learning applicable to large complex datasets) will make smart machines “intelligent”.
  • Autonomous Agents & Things – Like robots, autonomous vehicles, virtual personal assistants and smart advisors.

The final 4 trends:

  • Adaptive Security Architecture – how to combat the hacker industry beyond the perimeter defense and rule-based security?
  • Advanced Systems Architecture – this is what Gartner said, “Fueled by field-programmable gate arrays (FPGAs) as an underlying technology for neuromorphic architectures, there are significant gains such as being able to run at speeds of greater than a teraflop with high-energy efficiency”.
  • Mesh App and Service Architecture – Monolithic, linear application designs like the 3-tier architecture are giving way to loosely coupled integrative approach. Containers(e.g. Docker) are emerging as a critical technology for enabling agile development and microservice architectures. What is needed is a back-end cloud scalability and front-end device mesh experience.
  • Internet of Things Platforms – The management, security, integration plus standards are needed for the IoT platform to succeed.

These are all known areas, but I liked the way Gartner grouped them in a logical sequence.

 

Netflix – A global player

Ten years ago, when I drove on Winchester Avenue in Los Gatos, I saw this new company called Netflix that was renting DVD’s of movies. You did not have to go to a store, but sign up for the service of home delivery of DVDs. Now Netflix is building a much bigger headquarters building near its old site. It is huge success story – a disruption to the home entertainment business. When it decided to switch to “streaming only” few years back, there were lot of doubters. But that’s all past now. It is a global company now with huge success as a business.

Reed Hastings, co-founder and CEO, just announced this morning at CES in Las Vegas, that they have started their operation in 130 new countries –  “Today you are witnessing the birth of a new global Internet TV network. With this launch, consumers around the world – from Singapore to St. Petersburg, from San Francisco to Sao Paulo – will be able to enjoy TV shows and movies simultaneously – no more waiting.”

Netflix has been producing its own contents over last few years and some of its shows have been huge successes. It releases all episodes (e.g. House of Cards) at once and this has created the “binge-watching” phenomenon. Audiences like to watch many episodes at one time (instead of weekly shows). You watch your own content at your own time and at any location using mobile devices or the TV.

Besides contents, its use of big data and analytics is very effective in understanding audience preferences and giving recommendations. The engineering team at Netflix continuously apply the latest technologies to enhance scale and performance. Now with a global reach, the challenges will grow.

Netflix is a real success story of a new-age disruptive force in the TV entertainment industry.

 

2015 – Year of Open Source explosion

Open source software – software freely shared with the world at large – is an old idea, dating back to the 1980s when Richard Stillman started preaching the gospel calling it free software. Then Linus Torvalds started working on Linux in the early 1990s. Today, Linux runs our lives. The Android operating system that runs so many Google phones is based on Linux. When you open a phone app like Twitter or Facebook and pull down all those tweets and status updates, you’re tapping into massive computer data centers filled with hundreds of Linux machines. Linux is the foundation of the Internet.

Cade Metz recently wrote in an article, “And yet 2015 was the year open source software gained new significance, thanks to Apple and Google and Elon Musk. Now more than ever, even the most powerful tech companies and entrepreneurs are freely sharing the code underlying their latest technologies. They recognize this will accelerate not only the progress of technology as a whole, but their own progress as well. It’s altruism with self-interest. And it’s how the tech world now works. This is not just a turning point, but a tipping point – says Brandon Keepers, head of Github.”

Apple, for the first time, decided to offer its Swift programming language (used to build apps for your iPad, iPhone, and Mac) to the open source. That means applications built on Swift can be deployed on machines running Linux, Android, and Windows OS. Previously Apple’s language Objective-C was only meant for Apple devices. This new move by Apple will enable developers to use Apple’s development tools across competing platforms.

Microsoft, another champion of proprietary software during the 1980s and 1990s, decided to open source its .Net software. That way, .Net can be used by developers to build applications for Linux and Apple’s operating system too. Even IBM decided to open source its own machine language IBM SystemML to Apache Spark.

Over the past 15 years, Google has built a wide range of data center technologies that have helped make it the most powerful company on the ‘net. These technologies allow all of the company’s online services to instantly handle requests from billions of people, no matter where in the world they may be. Typically, Google kept these technologies to itself, forcing others to engineer inferior imitations. Map-reduce and HDFS are examples, that grew out of Google’s file system and algorithms. But last year Google decided to open source TensorFlow, the software engine that drives its artificial intelligence services, including its image and speech recognition and language translation tools. Google realized that it could tap into a much larger team of researchers to enhance TensorFlow, much faster than done internally.

Elon Musk went even further. In mid-December, he and Sam Altman, president of Y Combinator, unveiled OpenAI, a $1 billion nonprofit dedicated to the same breed of AI that Google is developing. They have promised to open source all their work.

Yes, 2015 was the year Open Source really reached new heights!

Big Data Predictions for 2016

As every year begins, several experts and analyst firms like to make predictions. Let us try to make some observations in an area much talked about lately – Big Data. So here goes:

  • Big Data quandary will continue as companies try to understand its value to business. Just dumping all kinds of data into a data lake (read Hadoop) is not going to solve anything. There has to be business value on what insights are needed. Therefore much like the Data Warehousing era brought additional tools in the ETL space, there is need for data curation and transformation for practical use besides the analytics piece.
  • Demand for BI and Analytics will reach new heights. The next-generation BI and analytics platform should help business tap into the power of their data, whether in the cloud or on-premises. This ‘Networked BI’ capability creates an interwoven data fabric that delivers business-user self-service while eliminating analytical silos, resulting in faster and more trusted decision-making. Real-time or streaming analytics will become crucial, as decisions must be taken as soon as some events occur.
  • SPARK will get even hotter. I had described IBM’s big endorsement of SPARK last year in a blogpost. Spark gives us a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data etc) as well as the source of data (batch v. real-time streaming data). Spark enables applications in Hadoop clusters to run up to 100 times faster in memory and 10 times faster even when running on disk. In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning and graph data processing. This also says in-memory processing will continue to thrive.
  • Analytics & big events will drive demand exponentially. This year’s big events like the US presidential election and the Olympics in Brazil will see the harnessing of big data to provide data-driven insights like never before.
  • Protection of data itself will become paramount. It’s still too easy for hackers to circumvent perimeter defenses, steal valid user credentials, and get access to data  records. In 2016, as companies protect themselves from the threat of data loss, new means of data-centric security will become mainstream to consistently control user access and credentials where it matters the most.
  • Shortage of Data Scientists will drive companies to look for Big data cloud services. To circumvent the need to hire more data scientists and Hadoop admins, organizations will rely on fully managed cloud services with built-in operational support, freeing up existing data science teams to focus their time and effort on analysis instead of wrangling complex Hadoop clusters.
  • Finally, shift to cloud is getting to be main stream, because of the clear ROI. At least the dev-and-test shift is happening quite fast. AWS seems to dominate the production config, even though big data as service is still in its infancy. Microsoft Azure and IBM’s cloud service plus Oracle’s new cloud offerings will make this space quite vibrant.

Uber valued at $68 Billion!

Forbes just said this in an article, “As Uber plans a $2.1 billion funding round that would bring its total capital raised to almost $10 billion, the ride-hailing app is hoping to fetch a valuation as high as $68 billion. That’s a significant jump from the $52 billion earlier this year, and it also marks a significant milestone: for the first time, Uber is going to leapfrog iconic carmakers General Motors, Ford Motor, and Honda Motor in terms of valuation while almost catching up to other luxury carmakers like Volkswagen and BMW”.

Like many, I have been a regular user of Uber here in the bay area for last couple of years. During a recent overseas trip, I deliberately wanted to get a first hand experience of its international presence. First in Delhi, India I used Uber and it was quick and much cheaper than the regular taxis. Of course the Indian competitor Ola Cabs was the other choice. Then in Singapore I used Uber couple of times and talked to each driver on their experience. They all sounded positive. In Kuala Lumpur, Malayasia I used Uber to take me thru the long drive from the airport to my hotel in downtown (about 60 km) and the fair was around $24. I did not have to exchange dollars to local currency, it was directly billed (as here) to my credit card. When going back to the airport, the Uber driver mentioned that regular taxis waiting at the hotels do not like Uber to pick up passengers, so he had to be careful not to be noticed by those guys. While I did not use Uber in Bali, Indonesia, I was told that they were very much there. So Uber has done a great job disrupting the taxi business all around, now operating in 50 plus countries and hundreds of cities. Consumers like me love the model, hence it’s progress is unstoppable despite protests in various cities like Paris.

The skyrocketing valuation has triggered ongoing debates about whether there’s a bubble in the private fundraising market. Does Uber really deserve a higher valuation than the companies that manufacture and sell the bulk of cars around the world? While skepticism remains, Uber’s latest rounds have attracted powerful backers not only from Internet giants like Baidu but also Wall Street movers such as Tiger Global which has also invested in some of Uber’s competitors in Asia.

One may question the lofty valuation of Uber which produces no cars, compared to giants like GM. “Google and Uber plan to revolutionize mobility based on a new transportation paradigm. How can GM, a company that is often described as bureaucratic, succeed in a future that is often described as disruptive?” analysts wrote in a recent Deutsche Bank report on the auto industry. Uber has earmarked at least $1 billion toward its growth efforts in China, and continues to spend heavily to establish itself against its Asian competitors.

Uber’s fearless founder/CEO Travis Kalanick (a dropout of UCLA Comp. Science back in 1998) has shown immense courage and creativity in introducing an alternative model in transportation. Investor’s are betting on the long term growth potential rather than the short term revenue/profit picture.