Category Archives: IBM

DB2 is 30 years old next month!

Daryl Taft’s article in eWeek reminded me that next month, on June 6th. IBM’s DB2 RDBMS product will celebrate its 30th. anniversary. This has a personal significance for me. I was part of the DB2 planning team then and on June 6th. 1983, I was in Lyon, France at the European user group meeting, ready to announce IBM’s new RDBMS on MVS called DB2. Interestingly, I had prepared two presentation decks: one for DB2, and the other for IBM’s Database directions. The second one was in hand, in case the announcement could not clear all the IBM approval process on time. Luckily I was clear to go with the announcement of the new production-ready RDBMS product called DB2 to run on the mainframe MVS platform. I still recall the excitement of doing that in front of 2000 people in the gastronomic capital of France, Lyon. Later that evening, the attendees were taken by buses to the Beaujolais winery for the evening dinner.

Why was this significant? IBM Research had worked on a prototype called System R and that was commercialized on the VM platform with the name of SQL/DS.  Even though it supported the relational model and SQL, it lacked the DBMS-robustness such as scalability, performance, and reliability. In the mean time, Oracle got started in 1977 and its first product based on System R principles and SQL was introduced in 1979 on DEC/VAX. There was a gap of four years when IBM did not have a commercial RDBMS on its flagship platform MVS. The only DBMS on MVS was IMS based on hierarchical data model and DL/1 proprietary language. One of the internal debates was on the positioning of the new RDBMS when IMS was so significant a revenue generator. I recall the “dual database strategy” presentation we used to give (which one to use when). One good thing about DB2 was that the bottom layer of the engine (buffering, locking, latching, backup-recovery, write-ahead log, etc.) drew a lot of lessons from the user experience of IMS. Hence DB2 had superior  industrial-strength features than its research cousin SQL/DS as well as Oracle.

The next year in 1984, I went to IBM’s Austin Lab for two years, to lay the foundation work for DB2 for the IBM PC (OS/2). Subsequently the development was shifted to IBM Toronto lab. I personally headed a team doing the early work of porting DB2 to Unix in the year 1990-91.

All this was done before the Internet was invented and memory and disks were expensive commodities. Now the scene has changed a great deal and we see so many new types of database engines coming to market to address the needs of extreme scale and huge volumes of data. IBM continues to be a lead player in the data management and analytics business.

It feels good to be part of that history. Happy birthday DB2.

NewSQL Meetup last week

I attended a meetup last week in Santa Clara and the topic was The Realities of NewSQL. Three companies were represented in a panel discussion – Clustrix (Raj Bains), VoltDB (Scott Jar), and TransLattice (Michael Lyle). Steve Baunach from Starview was the moderator.

This new category called NewSQL represents companies using the relational data model and SQL to impart better scalability, performance, and high availability. Following the rise of NoSQL community of companies bringing schema-less object-oriented data model with relaxed consistency and scale-out on commodity servers, the NewSQL group claims similar scale-out, but with relational DB and SQL support.

Three claims stood out in their discussion – preserving the SQL skill-base and relational model of data that has dominated the landscape for last 20 plus years; high scale-out by adding commodity servers (a weakness specially with MySQL); and better availability.

VoltDB deals with transaction processing (dominated by IBM and Oracle products) with very high throughput (due to the proliferation of devices as new data sources) and better performance. Their claim is that they have eliminated many unnecessary overheads from traditional RDBMS products by using in-memory techniques extensively.

Clustrix claims it has eliminated sharding (extra burden to users if they have to manage it) as offered by NoSQL products. Their mantra for success is scale-out on clusters – being able to handle high loads by adding commodity scale servers. They specifically focus on the MySQL user base.

The TransLattice Elastic Database (TED) is a Relational Database Management System that provides ANSI-SQL support, the ACID transactions enterprise applications require, and the ability to scale-out across wide distances using ordinary Internet connections. It uses partitioning to split databases across nodes. This notion is not new and has been deployed by IBM and Oracle for many years.

It was unclear on why existing users of IBM or Oracle will adopt one of these products, as the incumbents are marching forward to scale-out models and improving TCO. The MySQL community has been using external products for scalability for a while and that is understandable. But being part of Oracle corporation, MySQL will see enhancements in its scalability offerings. Then there is SAP Hana that claims big performance gains.

There are many companies under this umbrella – ClustrixGenieDBSchoonerVoltDBRethinkDBScaleDB, Akiban,CodeFuturesScaleBaseTranslattice, NimbusDB, etc. With the marketing noise of Big Data and Cloud, new companies are getting funded by the dozens. It is going to be a tough space to differentiate and become a winner.

Big Data – Status

According to a Wall Street Journal article today by Rachael King and Steven Rosenbush, the market for new databases serving Big Data reached $1.22B last year and is expected to more than double by 2014 (according to research firm Wikibon). That is quite impressive.

Since relational databases using SQl are inefficient in handling data from social chatters, smartphones, and clicks (because of volume and variety), new databases are popping up over last 3-4 years. In the past two years 119 database software companies have been funded by VC’s for $1.17B (according to Venture Source, a Dow Jones company). This is remarkable, as not too long ago, the space was declared taken by 3 incumbents – IBM, Oracle, and Microsoft. However, the scene has changed dramatically now.

Thanks must go to Google for pioneering the start of new innovations in Big Table, GFS (Googel File System), and Map-Reduce algorithms for massively parallel processing using commodity hardware clusters. These technologies became part of Apache open source foundation and the result is Hadoop, HDFS, and several associated tools for the new ecosystem. Amazon, Yahoo and Facebook have also contributed good work here.

The article mentions a client Autozone using one of the new DBMS’s called NuoDB for better managing store inventory according to local shoppers. NuoDb like many others offers a cloud service with an annual subscription, cutting Capex for customers.

Another client Trulia (online real estate) was using MySQL, but has added Cassandra to better manage the listing of home foreclosures and apartment listings of its 100 million homes in the US.

Shutterstcok, a photo agency, stores 24 million images with 10,000 added each day. It uses HDFS (Hadoop) to find out user behavior (how long they hover over an image before purchasing).

The article suggests that large financial clients will stick to existing vendors such as Oracle for various reasons, but the threat of these newcomers is there. This is much like the cloud software  is shaking up Microsoft’s desktop software model.

We are in the data-intensive computing era now and the race will be fierce for leadership and market share.

IBM’s focus on Big Data and Analytics

Yesterday at IBM’s investors day meeting in San Jose, CEO Ginnie Rometty specifically talked about its focus on Big Data and Analytics business. This is what she said -

IBM expects to continue its big bets on technologies like Big Data and analytics. “Data will be the basis of competitive advantage for every company, for every industry in the coming decade.”

To that end, she said that IBM now expects revenue from business analytics to account for as much as $20 billion in annual revenue by fiscal 2015. The prior target was $16 billion. And if Big Blue hits that goal it would amount to a doubling of analytics revenue from 2010.

That is quite a commitment, the likes of which has not been seen from other key players such as Oracle, HP, SAP, or Microsoft. IBM has a full division on Big Data and their coverage on the subject is quite impressive.

From my 16 years at IBM during the development of DB2 family of products, I know firsthand the talent and experience IBM has in the data business. When they set their mind on an area, good things happen. Hence this commitment by the CEO is serious and competitors better take notice!

Tricky business of tech acquisition

The big news this week is the back-firing of HP’s acquisition of British software company Autonomy last year. HP paid a whopping $11.1B to acquire Autonomy even when it’s CFO was against the deal. HP took a charge of $8.8B citing accounting improprieties done by Autonomy to inflate company value. Of course the question is how come HP scrutiny did not see that before making the big decision to acquire. This all happened under the last CEO Leo Apotheker, even though current CEO Meg Whitman voted for it as a board member.

We have seen a few other acquisitions gone wrong during last few years. HP leads the list with two others – Palm and EDS. Palm was acquired in 2010 for $1.2B to get HP into the hand-held smart device business. But that did not work. The EDS acquisition in 2008 at $13.9B was aimed at competing against IBM’s Global Services. Now HP has taken $8B charge and there is rumor of a potential sale of that unit.

Microsoft acquired eQuantive in 2007 for $6.3B, but took a write-down of $6.2B in 2011. Cisco acquired Pure Digital (makers of the Flip Video Camcorder) for $590M for reasons unclear (get into consumer electronic business, far away from its core networking gears). It closed that business last year.

eBay acquired Skype under Meg Whitman’s watch in 2005 for $3.3B. Somehow that goal of combining client’s voice to its core auction business did not pan out. Finally eBay sold that unit to SIlver Lake partners for $1.9B. Last year Microsoft bought Skype for $8.9B!

IBM and Oracle on the other hand, seem to have acquired several companies successfully adding to their growth in business and scope. The trick lies in the strategy group looking carefully as to why such a move makes sense and how to blend the acquired product and technology to its existing fabric. HP now blames its chief technology guy Shane Robison who was instrumental in the Autonomy decision. But both Apotheker and Robison are gone from HP.

Shareholders of these public companies are an unhappy lot as the write downs affect the stock value, as seen this week in HP’s stock price.

My friend, the late Chris Loosely

It is with profound sadness that I learnt of the untimely demise of my friend Chris Loosely few weeks back. He was 67 years old.

Chris and I worked at IBM during the decade of the 1980s and early 1990s at IBM’s Santa Teresa Lab (now called Silicon Valley Lab). I was part of the DB2 planning, technology, and strategy team while Chris always worked in the performance group. He had moved to the US from England where he started his IBM career during the 1970s. I had moved from Canada where I also started at IBM Canada in 1974.

Chris was not only excellent in his professional career, he was a wonderful human being. Always kind and joyful, he exuded positive energy. We shared many moments discussing technical aspects of DB2 and he was a master of technology behind scalability and high performance. We spoke at many events around the world during those years.

Chris left IBM and worked at Keynote systems, again focusing on website performance measurement and tuning issues. Subsequently he came back to IBM through an acquisition just couple of years back. He was slim, handsome, and a runner.

Late in 2010, he was diagnosed with colon cancer. After several months of varieties of treatment, he succumbed to that dreadful disease and left his mortal body.

This brief life we all have is so mysterious! What stands out is the fragrance of love and kindness one leaves behind. Someone said – when we are born, we cry while others laugh  and when we leave, we smile while others cry. What matters is the legacy one leaves behind.

Chris, my salutations to you for being such a great friend, colleague, and wonderful human being.

R.I.P.

Closer look at one NoSQL database – MongoDB

Among the new crop of NoSQL database products, MongoDB ranks quite high, in my opinion. The company that produces MongoDB is 10Gen, a venture backed new start-up since 2008. But its rapid growth over last 4 years bears testimony to its technical strength.

MongoDB’s name comes from the middle five letters of the word “humongous”, meaning big data. It is an open-source, document-oriented storage which is schema-free and can entertain dynamic queries with full indexing. The programming model is BSON – binary encoding of JSON (Javascript Object Notation), a lightweight text-based open standard designed for data interchange. Douglas Crawford of Yahoo invented JSON in 2006.

The other key tenet of MongoDB is its scalability architecture – it can scale out horizontally using its automatic “sharding” (or keyrange partitioning). It does provide master-slave or peer-to-peer replication for high availability, recovery, and performance. One of its customers Disney’s Interactive Media Group, for example, has 1400 instances of Mongo. It uses sharding for write performance and replication for read performance.

MongoDB can be deployed from the cloud via Amazon’s AWS. Their revenue model is via support services, training, and consulting. Partners include VMWare, Amazon, Redhat, etc. – all cloud platform providers offering MongoDB as an option to their clients. Although the database suits document storage the best, it can handle other unstructured data like video, and images. But initial thrust seems to be those customers looking for high scalability using commodity hardware and superior performance.

MongoDB claims over 400 customers, including many internet companies like FourSquare, Craigslist, etc. Several textbooks have been published on MongoDB and the development community is growing fast. It certainly bridges the gap between traditional RDBMS (Oracle, MySQL, SQL Server, DB2) at one end and Key-Value pair search engines (Riak, Cassandra, Voldemart,..) at the other end.

Apple Market Value exceeds $500B

Wow Apple! The market value exceeded $500B and now everyone is speculating if it will reach One Trillion, which no company has ever aspired. As I look into the valuation this morning, Apple is at $505B. Microsoft is almost at half of that at $266B. Look at the other big ones in technology sector – Oracle ($146B), Amazon($82B), Cisco ($107B), IBM ($229B), Intel ($134B), Google ($201B), and HP($50B). Other stalwarts for comparison are – Wal-Mart($202B), GE($202B), and Exxon Mobile($408B).

Someone commented that if Apple was part of the Dow Jones, then the value would have exceeded 14000 few months back. Apple is an American pride institution that symbolizes great creativity, innovation, and visionary leadership in planning and execution. Once written off as a “has-been”, Apple came back with a vengeance largely due to the dynamic leadership and imagination of the late Steve Jobs. He married liberal arts and computer science into a blend of consumer products that reached the pinnacle of success. It puts the leadership at Microsoft and even Google to the back benches. Let us not even talk about HP.

Steve put together a great leadership team and Apple will continue on its growth for next couple of years. They will announce the new iPad 3 next week on March 7th. I am sure newer versions of iPhone and MacBooks are on the pipeline. Their foray into television should begin to shake up that sector, much like what they did to music, smartphones and tablets. For now, Tim Cook and team seem to charge forward with the same vigor as their departed leader.

At a personal level, I was a user of all Apple products except the laptop, then I bought my first MacBook Air around Thanksgiving time. Now I am in love with my MacBook Air with its SSD and lightning fast boot-up. Every time I go back to my old IBM ThinkPad running Windows, I am in a time-warp of some prehistoric technology.

Hats off to Apple with its great success!

Leadership at IBM – Sam Palmisano

Sam Palmisano took over IBM’s CEO role back in 2002 from Gerstner who was brought in to restore IBM from its historic decline during the early 1990s. I was an employee of IBM for 16 years until the year 1992. I left early in 1992 and by the end of the year people were asking me how did I know that such decline was coming. My answer was that I had no idea and I really agonized over the wisdom of leaving IBM, a great company by any measure.

Palmisano went into the second phase of “value creation” and changing IBM’s course in several ways. He said that the world is instrumented, interconnected, and intelligent and hence IBM’s new solutions have to address that. Here are some of the key decisions during his tenure:

  • Acquisition of Price Waterhouse Coopers in 2002 for $3.5B, injecting business solutions services for large clients
  • Selling of the PC business to Lenovo for $1.75B back in 2004-2005 to get out of the consumer hardware commodity business. Again to refocus on large enterprises.
  • Selling of the storage business to Hitachi, which was not yielding great profits
  • Increasing the R&D budget by 20% to a whopping $6B per year, to refocus on innovation. A new research lab came up in Brazil, the ninth such lab across the globe.
  • Introduction of several initiatives like Smarter Planet in 2008, Watson supercomputer last year (answering questions posed in natural language for speed, accuracy, and confidence. It trumped in the Jeopardy game against two smart humans), Corporate Services Corps to address hard issues with clients across the globe, etc.
  • Cutting $8B cost to make IBM operationally trim

What were the results of such moves? Well, during his tenure, IBM’s earnings quadrupled and the  market cap went up above Microsoft last September(first time since 1996) to $214B, just behind Apple’s. So shareholders are happy and even Warren Buffet (who never buys technology stocks) bought significant chunk of IBM stock late last year.

Like a true leader in IBM’s tradition, Palmisano celebrated the 100th. birthday of a great company and reminded everyone of the basic principles of its founder Thomas Watson Senior. In October, 2011, he named Virginia Rometty as his successor as the new CEO starting this month and him serving as chairman of the board.

IBM’s sustainability as a great company is exemplary in the world.  Many of today’s high-flying valley technology companies can learn from IBM’s leadership and value system.

Data Management, circa 2011

The world of Data Management has never been this vibrant as now. Only five years back, if you were to start a new database product company, the VC’s would have thought you to be real crazy. Why start something in an established market with 3 leaders – Oracle, IBM (DB2), and Microsoft (SQL Server)? Then we started to notice “specialized” appliance products such as Netezza (now IBM) and Greenplum (now EMC) crop up,  to focus on large scale data analytics. This trend was soon followed by Oracle (Exadata) and now HP (Vertica).

But what I am talking about is a list of new companies backed by well-known VC’s addressing the Data Management problems of the Internet era. We can roughly divide the data world into two – operational data management and analytic data management

Within the operational data camp, there are three categories:

  1. Traditional RDBMS (read Oracle, DB2, SQL Server, Sybase, Ingres, MySQL,etc.) and NewSQL products addressing mostly MySQL scalability and performance issues (e.g. Clustrix, Drizzle, VoltDB, NimbleDB, MySQL Cluster,..). I advise two companies in this category, ScaleDB and ScalArc.
  2. Traditional non-relational DBMS (Objectivity, Progress, Versant, etc.) and NoSQL which has seen a lot of new activities. The NoSQL data management products deal with key-value store, or the big table, or a document data, or a graph data. Examples of products include CouchBase, MongoDB, Riak, VoldeMart, BerkeleyDB, Hypertable, HBase, Cassandra, GraphDB, etc. They address very large number of simple structures and use parallel computing for performance. Google invented Map-Reduce algorithm that has become the Hadoop open source with HDFS as its file base.
  3. Distributed Data Grid and Cache technologies. Here Memcached came as an open source caching framework for MySQL and PHP applications. Other solutions include Terracotta, GigaSpaces, Oracle Coherence, etc.  SAP is also trying in-memory solution called Hana.

The Analytic Data Management space has two categories

  1. Non-relational (like Hadoop, Mapr, Piccolo,Dryad, ..)
  2. Relational products like Infobright, Netezza, ParAccel, SAP Sybase IQ, Teradata, EMC Greenplum, HP Vertica, IBM Infosphere, etc. The phrase Big Data is applied here, typically exceeding a petabyte. Social networking sites like Facebook and Tweeter are dealing with this.

I have seen the acronym SPRAIN (Scalability, Performance, Relaxed Consistency, Agility, Intricacy, and Necessity) to explain why the incumbents are inadequate to address the new challenges of unstructured data as well as Big Data.

These are exciting times for Data Management research and development.