Category Archives: IBM

The resurgence of AI/ML/DL

We have been seeing a sudden rise in the deployment of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). It looks like the long “AI winter” is finally over.

  • According to IDC, AI-related hardware, software and services business will jump from $8B this year to $47B by 2020.
  • I have also read comments like, “AI is like the Internet in the mid 1990s and it will be pervasive this time”.
  • According to Andrew Ng, chief scientist at Baidu, “AI is the new electricity. Just as 100 years ago electricity transformed industry after industry, AI will now do the same.”
  • Peter Lee, co-head at Microsoft Research said,  “Sales teams are using neural nets to recommend which prospects to contact next or what kind of products to recommend.”
  • IBM Watson used AI in 2011, not DL. Now all 30 components are augmented by DL (investment from $500M – $6B in 2020).
  • Google had 2 DL projects in 2012, now it is more than 1000 (Search, Android, Gmail, Translation, Maps, YouTube, Self-driving cars,..).

It is interesting to note that AI was mentioned by Alan Turing in a paper he wrote back in 1950 to suggest that there is possibility to build machines with true intelligence. Then in 1956, John McCarthy organized a conference at Dartmouth and coined the phrase Artificial Intelligence. Much of the next three decades did not see much activity and hence the phrase “AI Winter” was coined. Around 1997, IBM’s Deep Blue won the chess match against Kasparov. During the last few years, we saw deployments such as Apple’s Siri, Microsoft’s Cortana, and IBM’s Watson (beating Jeopardy game show champions in 2011). In 2014, DeepMind team used a deep learning algorithm to create a program to win Atari games.

During last 2 years, use of this technology has accelerated greatly. The key players pushing AI/ML/DL are – Nvidia, Baidu, Google, IBM, Apple, Microsoft, Facebook, Twitter, Amazon, Yahoo, etc. Many new players have appeared – DeepMind, Numenta, Nervana, MetaMind, AlchemyAPI, Sentient, OpenAI, SkyMind, Cortica, etc. These companies are all targets of acquisition by the big ones. Sunder Pichai of Google says, “Machine learning is a core transformative way in which we are rethinking everything we are doing”. Google’s products deploying these technologies are – Visual Translation, RankBrain, Speech Recognition, Voicemail Transcription, Photo Search, Spam Filter, etc.

AI is the broadest term, applying to any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision trees, and machine learning. The subset of AI that includes abstruse statistical techniques that enable machines to improve at tasks with experience is machine learning. A subset of machine learning called deep learning is composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, by exposing multi-layered neural networks to vast amounts of data.

I think the resurgence is a result of the confluence of several factors, like advanced chip technology such as Nvidia Pascal GPU architecture or IBM TrueNorth (brain-inspired computer chip), software architectures like microservice containers, ML libraries, and data analytics tool kits. Well known academia are heavily being recruited by companies – Geoffrey Hinton of University of Toronto (Google), Yann LeCun of New York University (Facebook), Andrew Ng of Stanford (Baidu), Yoshua Bengio of University of Montreal, etc.

The outlook of AI/ML/DL is very bright and we will see some real benefits in every business sector.

Linux & Cloud Computing

While reading the latest issue of the Economist, I was reminded that August 25th. marks an important anniversary for two key events:  25 years back, on August 25, 1991, Linus Torvalds launched a new operating system called Linux and on the same day in 2006, Amazon under the leadership of Andy Jesse launched the beta version of Elastic Computing Cloud (EC2), the central piece of Amazon Web Services (AWS).

The two are very interlinked. Linux became the world’s most used piece of software of its type. Of course Linux usage soared due to backers like HP, Oracle, and IBM to combat the Windows force. Without open-source programs like Linux, cloud computing would not have happened. Currently 1500 developers contribute to each new version of Linux. AWS servers deploy Linux heavily. Being first to succeed on a large scale allowed both Linux and AWS to take advantage of the network effect, which makes popular products even more entrenched.

Here are some facts about AWS. It’s launch back in 2006 was extremely timely, just one year before the smartphones came about. Apple launched its iPhone in 2007 which ushered the app economy. AWS became the haven for start-ups making up nearly two-third of its customer base (estimated at 1 million). According to Gartner Group, the cloud computing market is at $205B in 2016, which is 6% of the world’s IT budget of $3.4 trillion. This number will grow to $240B next year. No wonder, Amazon is reaping the benefits – over past 12 months, AWS revenue reached $11B with a margin of over 50%. During the last quarter, AWS sales were 3 times more than the nearest competitor, Microsoft Azure. AWS has ten times more computing capacity than the next 14 cloud providers combined. We also saw the fate of Rackspace last week (acquired by a private equity firm). Other cloud computing providers like Microsoft Azure, Google Cloud, and IBM (acquired SoftLayer in 2013) are struggling to keep up with AWS.

The latest battleground in cloud computing is data. AWS offers Aurora and Redshift in that space. It also started a new services called Snowball, a suitcase-sized box of digital memory which can store mountains of data in the AWS cloud (interesting challenge to Box and Dropbox). IBM bought Truven Health Analytics which keeps data on 215m patients in the healthcare industry.

The Economist article said, “AWS could end up dominating the IT industry just as IBM’s System/360, a family of mainframe computers did until the 1980s.”       I hope it’s not so and we need serious competition to AWS for customer’s benefits. Who wants a single-vendor “lock-in”? Microsoft’s Azure seems to be moving fast. Let us hope IBM, Google, and Oracle move very aggressively offering equivalent or better alternatives to Amazon cloud services.

IBM’s Software Business

IBM has come a long way from my time – 16 years spent during the 1970s, 1980’s and early 1990s. Hardware was the king for most of my years there and software was merely a means to an end of “hardware sales”. Even during the early years of the IBM PC, that mistake (of thinking it was a hardware game), helped create a new software giant called Microsoft. Hence the acronym IBM was jokingly called I Blame Microsoft.

Advance two decades and we see a big shift of focus from hardware to software, finally. IBM has sold off much of its non-mainframe hardware (x86 servers) & storage business. During the 4th. quarter of 2015, IBM’s share of server-market was 14.1% with an impressive yearly growth of 8.9%. Contrast this to the growth rates of HPE(-2.1%), Dell (5.3%), and Lenovo (3.7%).

IBM’s software is another story. While it contributed about 28% to total revenue in 2015 ($81.7B), the profit contribution was 60%. If it’s software was a separate business, it would rank as the fourth largest software company, as shown below:

  1. Microsoft  –  $93.6B Rev. —> 30.1% profit
  2. Oracle        – $38.2B Rev. —> 36.8% profit
  3. SAP            –  $23.2B Rev. —> 23.4% profit
  4. IBM           –  $22.9B Rev.  —> 34.6% profit

IBM’s software is second most profitable after Oracle’s. The $22.9B revenue can be split into three components:

  • Middleware at 19.5B (includes everything above the operating system like DB2, CICS, Tivoli, Bluemix, etc.),
  • Operating System at $1.8B,
  • Miscellaneous at $1.6B.

It does not split its cloud software explicitly. Therefore, it is hard to compare it to AWS or Azure or GCE.

The only problem is that its software business is not growing. As a matter of fact, it showed a decline last year. Given the rise of cloud services, IBM has to step up its competitive offering in that space. It did acquire Softlayer couple of years back at a hefty price, but the cloud infrastructure growth does not match that of AWS (expected to hit $10B this year).

IBM is a company in transition. Resources are being shifted toward high-growth areas like cloud computing and analytics, and legacy businesses with poor growth prospects are in decline. Still, IBM remains a major force in the software market.

Stack Fallacy? What is it?

Back in January, Tech Crunch published an article on this subject called Stack Fallacy, written by Anshu Sharma of Storm Ventures. Then today I read this Business Insider article on the reason why Dropbox is failing and it is the Stack Fallacy.  Sharma describes Stack Fallacy as “the mistaken belief that it is trivial to build the layer above yours.”

Many companies trivialize the task of building layers above their core competency layer and that leads to failure. Oracle is a good example, where they thought it was no big deal to build applications (watching the success of SAP in the ERP layer initially built on the Oracle database). I remember a meeting with Hasso Plattner, founder of SAP back in the early 1990s when I was at Oracle. He said SAP was one of the biggest customers of Oracle at that time and now Oracle competes with them. For lack of any good answer, we said that we are friends in the morning and foes in the afternoon and welcomed him to the world of  “co-opetition”. Subsequently SAP started moving out of Oracle DB and was enticed by IBM to use DB2. Finally SAP built its own database (they bought Sybase and built the in-memory database Hana). Oracle’s applications initially were disasters as they were hard to use and did not quite meet the needs of customers. Finally they had to win the space by acquiring Peoplesoft and Siebel.

Today’s Business Insider article says, “…a lot of companies often overvalue their level of knowledge in their core business stack, and underestimate what it takes to build the technology that sits one stack above them.  For example, IBM saw Microsoft take over the more profitable software space that sits on top of its PCs. Oracle likes to think of Salesforce as an app that just sits on top of its database, but hasn’t been able to overtake the cloud-software space they compete in. Google, despite all the search data it owns, hasn’t been successful in the social-network space, failing to move up the stack in the consumer-web world. Ironically, the opposite is true when you move down the stack. Google has built a solid cloud-computing business, which is a stack below its search technology, and Apple’s now building its own iPhone chips, one of the many lower stacks below its smartphone device”.

With reference to Dropbox, the article says that it underestimated what it takes to build apps a layer above (Mailbox, Carousel), and failed to understand its customers’ needs — while it was investing in the unimportant areas, like the migration away from AWS. Dropbox is at a phase where it needs to think more about the users’ needs and competing with the likes of Google and Box, rather than spending on “optimizing for costs or minor technical advantages”.

Not sure, I agree with that assessment. Providing efficient and cost-effective cloud storage is Dropbox’s core competency and they are staying pretty close to that. The move away from AWS is clearly aimed at cost savings, as AWS can be a huge burden on operational cost, plus it has its limitations on effective scaling. In some ways, Dropbox is expanding its lower layers for future hosting. It’s focus on enterprise-scale cloud storage is the right approach, as opposed to Box or Google where the focus is on consumers.

But the Stack Fallacy applies more to Apple doing its own iPhone chips, or Dell wrongfully going after big data. At Oracle the dictum used to be, “everything is a database problem – if you have a hammer, then everything looks like a nail”.

In Memoriam – Ed Lassettre

I was out of the country when my old colleague from IBM days, Ed Lassettre passed away last November. I only found out earlier this month about his demise from a mutual friend from IBM Almaden Research. Ed was one of the best computer software professionals I knew and respected.

He was at IBM’s Santa Teresa Lab (now called Silicon Valley Lab) when I started there back in 1981 after my five-year stint at IBM Canada. That year he got promoted to a Senior Technical Staff member (STSM), the very first at the lab to get that honor. Subsequently he became an IBM Fellow, the highest technical honor. His reputation of being one of the key software engineers for IBM’s MVS operating system preceded him. Ed had spent a few years at IBM’s Poughkeepsie Lab in upstate New York. He did his undergraduate  and post-graduate studies at Ohio State University in Math. He had deep insights into the intricacies of high performance computing systems. When we were building DB2 at the IBM lab, Ed was providing guidance on its interface with the operating system.

Subsequently I went to IBM’s Austin Lab for two years in the mid-1980s to lay the foundation of the DB2 product for the PC (which at the time lacked the processing power and memory of the large mainframes). Hence our design had to accommodate to those limitations. The IBM executives wanted someone to audit our design before giving the green signal for development. I could not think of a better person than Ed Lassettre to do that. At my request Ed spent some time and gave a very positive report on our design. He had great credibility in the technical community. Many times, I sought his views on technical matters and he provided timely advice. His wisdom was complemented by a tremendous humility, a rare feature in our industry.

I had left IBM back in 1992 for Oracle and lost touch with Ed. Later on I found that he had retired from IBM and joined Microsoft Research. He was a good friend of the late Jim Gray, also at Microsoft Research at the time. Ed retired from Microsoft in 2013 at the age of 79! He was quite well-known in the HPTC (High Performance Technical Computing) world.

RIP, Ed Lassettre, a great computer scientist and friend! You will be missed.

2015 – Year of Open Source explosion

Open source software – software freely shared with the world at large – is an old idea, dating back to the 1980s when Richard Stillman started preaching the gospel calling it free software. Then Linus Torvalds started working on Linux in the early 1990s. Today, Linux runs our lives. The Android operating system that runs so many Google phones is based on Linux. When you open a phone app like Twitter or Facebook and pull down all those tweets and status updates, you’re tapping into massive computer data centers filled with hundreds of Linux machines. Linux is the foundation of the Internet.

Cade Metz recently wrote in an article, “And yet 2015 was the year open source software gained new significance, thanks to Apple and Google and Elon Musk. Now more than ever, even the most powerful tech companies and entrepreneurs are freely sharing the code underlying their latest technologies. They recognize this will accelerate not only the progress of technology as a whole, but their own progress as well. It’s altruism with self-interest. And it’s how the tech world now works. This is not just a turning point, but a tipping point – says Brandon Keepers, head of Github.”

Apple, for the first time, decided to offer its Swift programming language (used to build apps for your iPad, iPhone, and Mac) to the open source. That means applications built on Swift can be deployed on machines running Linux, Android, and Windows OS. Previously Apple’s language Objective-C was only meant for Apple devices. This new move by Apple will enable developers to use Apple’s development tools across competing platforms.

Microsoft, another champion of proprietary software during the 1980s and 1990s, decided to open source its .Net software. That way, .Net can be used by developers to build applications for Linux and Apple’s operating system too. Even IBM decided to open source its own machine language IBM SystemML to Apache Spark.

Over the past 15 years, Google has built a wide range of data center technologies that have helped make it the most powerful company on the ‘net. These technologies allow all of the company’s online services to instantly handle requests from billions of people, no matter where in the world they may be. Typically, Google kept these technologies to itself, forcing others to engineer inferior imitations. Map-reduce and HDFS are examples, that grew out of Google’s file system and algorithms. But last year Google decided to open source TensorFlow, the software engine that drives its artificial intelligence services, including its image and speech recognition and language translation tools. Google realized that it could tap into a much larger team of researchers to enhance TensorFlow, much faster than done internally.

Elon Musk went even further. In mid-December, he and Sam Altman, president of Y Combinator, unveiled OpenAI, a $1 billion nonprofit dedicated to the same breed of AI that Google is developing. They have promised to open source all their work.

Yes, 2015 was the year Open Source really reached new heights!

Fast Data vs. Big Data

Back when we were doing DB2 at IBM, there was an important older product called IMS which brought significant revenue. With another database product coming (based on relational technology), IBM did not want any cannibalization of the existing revenue stream. Hence we coined the phrase “dual database strategy” to justify the need for both DBMS products. In a similar vain, several vendors are concocting all kinds of terms and strategies to justify newer products under the banner of Big Data.

One such phrase is Fast Data. We all know the 3V’s associated with the term Big Data – volume, velocity and variety. It is the middle V (velocity) that says data is not static, but is changing fast, like stock market data, satellite feeds, even sensor data coming from smart meters or an aircraft engine. The question always has been how to deal with such type of changing data (as opposed to static data typical in most enterprise systems of record).

Recently I was listening to a talk by IBM and VoltDB where VoltDB tried to justify the world of “Fast Data” as co-existing with “Big Data” which is narrowed to static data warehouse or “data lake” as IBM calls it. Again, they have chosen to pigeonhole Big Data into the world of HDFS, Netezza, Impala, and batch Map-Reduce. This way, they justify the phrase Fast Data as representing operational data that is changing fast. They call VoltDB as  “the fast, operational database” implying every other database solution as slow. Incumbents like IBM, Oracle, and SAP have introduced in-memory options for speed and even NoSQL databases can process very fast reads on distributed clusters.

VoltDB folks also tried to show how the two worlds (Fast Data and their version of Big Data) will coexist. The Fast Data side will ingest and interact on streams of inbound data, do real time data analysis and export to the data warehouse. They bragged about the performance benchmark of 1m tps on a 3-node cluster scaling to 2.4m on a 12-node system running in the SoftLayer cloud (owned by IBM). They also said that this solution is much faster than Amazon’s AWS cloud. The comparison is not apple-to-apple as the SoftLayer deployment is on bare metal compared to the AWS stack of software.

I wish they call this simply – real-time data analytics, as it is mostly read type transactions and not confuse with update-heavy workloads. We will wait and see how enterprises adopt this VoltDB-SoftLayer solution in addition to their existing OLTP solutions.