Category Archives: Open Source Software

API-driven Economy?

I just went to a couple of sessions at the API World going on at the San Jose Convention center. I heard all kinds of new terms thrown within a span of couple of hours – the new API driven economy, iSaaS (integration software as a service), iPaaS (integration platform as a service), APIM (API management), BaaS (Backend as a service), etc. Then there was confusing & overlapping mixture of ideas in microservices, containers, connectors, API’s..all in the space of system integration. There were lots of young software developers at this conference and booths from companies I have never heard of – Jitterbit (Enterprise iPaaS), Back4App (backend dev. via Parser server), PubNub (real-time API), Rigor (API monitoring and testing). I took a deep breath and thought of all these ideas over last 3 decades – api’s, subroutines, reusable web services, service-oriented-architecture, integration via connectors, assembly of interchangeable parts from common libraries, etc. Welcome back to the future!

I see the urgency of this now that we have so many products and platforms in every category. A speaker from Jitterbit showed how Cisco’s marketing software stack has 39 different technologies – SalesForce, 6Sense, Eloqua, App Annie, Live Agent, etc. They do functions like campaign management, CRM, email blast, mobile notification… This is definitely not the ideal solution. Jitterbit wants to be the mediator via API’s to consolidate all these based on activities and work flow. No wonder, this Alameda-based startup is doing very well. I was not surprised to learn that SalesForce & private equity firm KKR are investors in Jitterbit.

Gartner predicts enterprise application integration market to be $33.5B by 2020 (CAGR of 7.1% from $25.5B in 2016), whereas the integration platform as a service (iPaaS) will be $3.6B by 2021 (CAGR of 41.5% from $526M in 2016). The data integration market is going to reach $12.2B in 2022 from $6.4B in 2017 (CAGR 13.7%). Gartner says, “IT leaders should combine on-premise integration platform, iPaaS, iSaaS and API Management capabilities into a single, yet modular enterprise capability.” Gartner defines this whole space as Application Integration Platforms.

I think it’s time we consolidate all these terms and bring real clarity. Current marketing hype of API driven economy does not help much. What used to be a programmer’s term (api – application programming interface) is now marketed as a broad term to solve the world hunger problem.

The goal has not changed – we want integration of heterogeneous systems (both inside and outside the enterprise) to be highly efficient, transparent, and less labor intensive.


Secret of Sundar Pichai’s success

I watched Sundar Pichai’s recent interaction with the students at I.I.T. (Indian Institute of Technology) Kharagpur, India, where he graduated back in 1993. Besides our common country of birth, I had never heard of Sundar until his rapid rise at Google a few years back. I have never met him or listened to him at conferences. So this was the first time, I had a chance to listen to his remarks and his answers to many questions from the audience of 3500 students at his alma mater earlier this week.

Growing up not far from I.I.T. Kharagpur, I was very aware of this institution. It was the first I.I.T. in India established during the 1950s. Other I.I.T’s like at Kanpur, Delhi, Mumbai and Chennai came later. These were the original 5 Indian Institute of Technologies. Lately many new ones have been added.

Sundar did his undergraduate studies in Metallurgy (study about metals). Then how did he switch from that into software? That was one of the questions from a student. He said that he loved Fortran language during his student days and that love for programming continued. The message he was giving was for everyone to pursue their own interest & passion. He mentioned that unlike in India, students at US universities sometimes do not decide their majors, way into their 3rd or 4th year of studies. Sundar’s passion was to build products that would impact a very large number of global users. During his interview at Google, he was asked what he thought of Gmail, which he had never seen nor used. Then the fourth interviewer actually showed it to him. Subsequently, he gave his opinion to the remaining 3 interviewers on what he thought was wrong with Gmail and how to improve it. He emphasized time and again the need to step out of the comfort zone and get an all rounded experience. Today’s students need not be afraid to take some risks and be willing to fail.

Besides technical leadership, Sundar possesses an amazing quality; egoless-ness, so rare to find in Silicon Valley executive community. He said that he truly believes in empowering his team and letting them execute with full trust. This is easier said that done, based on my experience at IBM and Oracle. Large organizations suffer from ego-driven leadership causing great amount of friction and anguish. Sunder’s rise at Google was due to his amazing ability to get teams to work very effectively. From Search, he went to manage Chrome, then he was given Android. His ability to work thru the complexities of products, fiefdoms, and internal rivalries was so evident that he was elevated to the CEO position so quickly. Humility is his hallmark combined with clarity of vision and efficient execution.

He made an interesting comment about the vision at Google. Larry Page said that the moonshot projects are worthwhile because the bar is so high (no competition). Even if you fail, you are still ahead with your knowledge and experience.

It was fun listening to Sundar’s simple and honest answers & remarks.

The resurgence of AI/ML/DL

We have been seeing a sudden rise in the deployment of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). It looks like the long “AI winter” is finally over.

  • According to IDC, AI-related hardware, software and services business will jump from $8B this year to $47B by 2020.
  • I have also read comments like, “AI is like the Internet in the mid 1990s and it will be pervasive this time”.
  • According to Andrew Ng, chief scientist at Baidu, “AI is the new electricity. Just as 100 years ago electricity transformed industry after industry, AI will now do the same.”
  • Peter Lee, co-head at Microsoft Research said,  “Sales teams are using neural nets to recommend which prospects to contact next or what kind of products to recommend.”
  • IBM Watson used AI in 2011, not DL. Now all 30 components are augmented by DL (investment from $500M – $6B in 2020).
  • Google had 2 DL projects in 2012, now it is more than 1000 (Search, Android, Gmail, Translation, Maps, YouTube, Self-driving cars,..).

It is interesting to note that AI was mentioned by Alan Turing in a paper he wrote back in 1950 to suggest that there is possibility to build machines with true intelligence. Then in 1956, John McCarthy organized a conference at Dartmouth and coined the phrase Artificial Intelligence. Much of the next three decades did not see much activity and hence the phrase “AI Winter” was coined. Around 1997, IBM’s Deep Blue won the chess match against Kasparov. During the last few years, we saw deployments such as Apple’s Siri, Microsoft’s Cortana, and IBM’s Watson (beating Jeopardy game show champions in 2011). In 2014, DeepMind team used a deep learning algorithm to create a program to win Atari games.

During last 2 years, use of this technology has accelerated greatly. The key players pushing AI/ML/DL are – Nvidia, Baidu, Google, IBM, Apple, Microsoft, Facebook, Twitter, Amazon, Yahoo, etc. Many new players have appeared – DeepMind, Numenta, Nervana, MetaMind, AlchemyAPI, Sentient, OpenAI, SkyMind, Cortica, etc. These companies are all targets of acquisition by the big ones. Sunder Pichai of Google says, “Machine learning is a core transformative way in which we are rethinking everything we are doing”. Google’s products deploying these technologies are – Visual Translation, RankBrain, Speech Recognition, Voicemail Transcription, Photo Search, Spam Filter, etc.

AI is the broadest term, applying to any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision trees, and machine learning. The subset of AI that includes abstruse statistical techniques that enable machines to improve at tasks with experience is machine learning. A subset of machine learning called deep learning is composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, by exposing multi-layered neural networks to vast amounts of data.

I think the resurgence is a result of the confluence of several factors, like advanced chip technology such as Nvidia Pascal GPU architecture or IBM TrueNorth (brain-inspired computer chip), software architectures like microservice containers, ML libraries, and data analytics tool kits. Well known academia are heavily being recruited by companies – Geoffrey Hinton of University of Toronto (Google), Yann LeCun of New York University (Facebook), Andrew Ng of Stanford (Baidu), Yoshua Bengio of University of Montreal, etc.

The outlook of AI/ML/DL is very bright and we will see some real benefits in every business sector.

Linux & Cloud Computing

While reading the latest issue of the Economist, I was reminded that August 25th. marks an important anniversary for two key events:  25 years back, on August 25, 1991, Linus Torvalds launched a new operating system called Linux and on the same day in 2006, Amazon under the leadership of Andy Jesse launched the beta version of Elastic Computing Cloud (EC2), the central piece of Amazon Web Services (AWS).

The two are very interlinked. Linux became the world’s most used piece of software of its type. Of course Linux usage soared due to backers like HP, Oracle, and IBM to combat the Windows force. Without open-source programs like Linux, cloud computing would not have happened. Currently 1500 developers contribute to each new version of Linux. AWS servers deploy Linux heavily. Being first to succeed on a large scale allowed both Linux and AWS to take advantage of the network effect, which makes popular products even more entrenched.

Here are some facts about AWS. It’s launch back in 2006 was extremely timely, just one year before the smartphones came about. Apple launched its iPhone in 2007 which ushered the app economy. AWS became the haven for start-ups making up nearly two-third of its customer base (estimated at 1 million). According to Gartner Group, the cloud computing market is at $205B in 2016, which is 6% of the world’s IT budget of $3.4 trillion. This number will grow to $240B next year. No wonder, Amazon is reaping the benefits – over past 12 months, AWS revenue reached $11B with a margin of over 50%. During the last quarter, AWS sales were 3 times more than the nearest competitor, Microsoft Azure. AWS has ten times more computing capacity than the next 14 cloud providers combined. We also saw the fate of Rackspace last week (acquired by a private equity firm). Other cloud computing providers like Microsoft Azure, Google Cloud, and IBM (acquired SoftLayer in 2013) are struggling to keep up with AWS.

The latest battleground in cloud computing is data. AWS offers Aurora and Redshift in that space. It also started a new services called Snowball, a suitcase-sized box of digital memory which can store mountains of data in the AWS cloud (interesting challenge to Box and Dropbox). IBM bought Truven Health Analytics which keeps data on 215m patients in the healthcare industry.

The Economist article said, “AWS could end up dominating the IT industry just as IBM’s System/360, a family of mainframe computers did until the 1980s.”       I hope it’s not so and we need serious competition to AWS for customer’s benefits. Who wants a single-vendor “lock-in”? Microsoft’s Azure seems to be moving fast. Let us hope IBM, Google, and Oracle move very aggressively offering equivalent or better alternatives to Amazon cloud services.

Hadoop, the next ten years

I attended a meetup yesterday evening at the San Jose Convention Center on the subject “Apache Hadoop, the next 10 years” by Doug Cutting, the creator of Hadoop while at Yahoo, who works at Cloudera now. That venue was chosen because of the ongoing Strata+Hadoop conference there.

It’s always fun listening to Doug recounting how Hadoop got created in the first place. Based on early papers from Google on GFS (Google File System) and Map Reduce computing algorithm, a project was launched called Nutch, subsequently renamed Hadoop (after Doug’s son’s toy elephant name). This all made sense as horizontal scaling via commodity hardware was coming to dominate the computing landscape. All the modules in Hadoop were designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework. That was all back in 2006. As an open source project, Hadoop gained momentum with community support for the overall ecosystem. Over the next seven years, we saw many new additions/improvements such as YARN, Hbase, Hive, Pig, Zookeeper, etc. Hence, Doug wanted to emphasize that there is a difference between just Hadoop and the Hadoop ecosystem.

The original Hadoop with its Map Reduce computing had its limitations and lately Spark is taking over the computing part. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. It originated at UC, Berkeley’s AMPlab and is gaining fast momentum with its added features for machine learning, streaming, graph and SQL interfaces. To a question from the audience, Doug replied that such enhancements are expected and more will come as the Apache Hadoop ecosystem grows. Cloudera has created Impala, a speedier version plus the SQL interface to meet customer needs. Another example of a key addition to the ecosystem is Kafka which originated from Linked-In. The Apache Kafka project is a message broker service and  aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. To another question on whether another general-purpose platform will replace Hadoop, Doug suggested that projects like Spark will appear to handle parts of the ecosystem better. There may be many purpose-built software to address specific needs like Kafka. He eloquently praised the “open Source” philosophy of community of developers helping faster progress compared to the speed at older companies like Oracle in enhancing its DBMS software.

From the original Hadoop meant for batch processing of large volumes of data in a distributed cluster, we are moving towards the real-time world of streaming analytics and instant insights. The popularity of Hadoop can be gauged by the growth in attendance of the San Jose Hadoop Summit…from 2700 attendees in 2013, it more than doubled last year.

Doug is a good speaker and his 40 minute talk was informative and entertaining.

RocksDB from Facebook

I attended a HIVE-sponsored Meetup yesterday evening titled, “Rocking the database world with RocksDB”. Since I had never heard of RocksDB, I was curious to learn how it is rocking the database world.

Facebook built this key value store storage layer originally to use for MySQL (instead of InnoDB), as MySQL is used heavily at Facebook. They claim that was not the only motivation. Then in 2013, they decided to open source RocksDB. Last evening’s speaker in an earlier post on November, 2013 had said, “Storing and accessing hundreds of petabytes of data is a huge challenge, and we’re constantly improving and overhauling our tools to make this as fast and efficient as possible. Today, we are open-sourcing RocksDB, an embeddable, persistent key-value store for fast storage that we built and use here at Facebook.”

RocksDB is also ideal for SSD (Flash store) and claims fast performance. The team was excited when MongoDB opened up to other storage engines back in 2014 summer. For a period of time, MongoDB plus RocksDB was a fast combination. Then MongoDB decided to acquire WiredTiger ( a competitor) in December, 2014 to contribute to the performance, scalability, and hardware efficiency of MongoDB. That left RocksDB out of the official engagement with MongoDB. But they built something called MongoRocks that claims to be very fast. It seems several MongoDB users prefer MongoRocks over the native combo of MongoDB with WiredTiger.

Several users of RocksDB talked about their experience, specially in the IoT world where sensor data can be processed at the edge (ingestion, aggregation, and some transformation) before being sent to the cloud servers. The only issue I saw is the fact that there is no “real” owner of RocksDB as a deliverable solution. There is no equivalent of a Cloudera (For Hadoop) or Confluent (for Kafka) who can provide value-additions and support for the user base. It’s all open source download and do-your-own stuff till now. So serious production-level deployment is still a risky affair. For now, it’s a developer’s play tool.

2015 – Year of Open Source explosion

Open source software – software freely shared with the world at large – is an old idea, dating back to the 1980s when Richard Stillman started preaching the gospel calling it free software. Then Linus Torvalds started working on Linux in the early 1990s. Today, Linux runs our lives. The Android operating system that runs so many Google phones is based on Linux. When you open a phone app like Twitter or Facebook and pull down all those tweets and status updates, you’re tapping into massive computer data centers filled with hundreds of Linux machines. Linux is the foundation of the Internet.

Cade Metz recently wrote in an article, “And yet 2015 was the year open source software gained new significance, thanks to Apple and Google and Elon Musk. Now more than ever, even the most powerful tech companies and entrepreneurs are freely sharing the code underlying their latest technologies. They recognize this will accelerate not only the progress of technology as a whole, but their own progress as well. It’s altruism with self-interest. And it’s how the tech world now works. This is not just a turning point, but a tipping point – says Brandon Keepers, head of Github.”

Apple, for the first time, decided to offer its Swift programming language (used to build apps for your iPad, iPhone, and Mac) to the open source. That means applications built on Swift can be deployed on machines running Linux, Android, and Windows OS. Previously Apple’s language Objective-C was only meant for Apple devices. This new move by Apple will enable developers to use Apple’s development tools across competing platforms.

Microsoft, another champion of proprietary software during the 1980s and 1990s, decided to open source its .Net software. That way, .Net can be used by developers to build applications for Linux and Apple’s operating system too. Even IBM decided to open source its own machine language IBM SystemML to Apache Spark.

Over the past 15 years, Google has built a wide range of data center technologies that have helped make it the most powerful company on the ‘net. These technologies allow all of the company’s online services to instantly handle requests from billions of people, no matter where in the world they may be. Typically, Google kept these technologies to itself, forcing others to engineer inferior imitations. Map-reduce and HDFS are examples, that grew out of Google’s file system and algorithms. But last year Google decided to open source TensorFlow, the software engine that drives its artificial intelligence services, including its image and speech recognition and language translation tools. Google realized that it could tap into a much larger team of researchers to enhance TensorFlow, much faster than done internally.

Elon Musk went even further. In mid-December, he and Sam Altman, president of Y Combinator, unveiled OpenAI, a $1 billion nonprofit dedicated to the same breed of AI that Google is developing. They have promised to open source all their work.

Yes, 2015 was the year Open Source really reached new heights!