Category Archives: Oracle

DB2 is 30 years old next month!

Daryl Taft’s article in eWeek reminded me that next month, on June 6th. IBM’s DB2 RDBMS product will celebrate its 30th. anniversary. This has a personal significance for me. I was part of the DB2 planning team then and on June 6th. 1983, I was in Lyon, France at the European user group meeting, ready to announce IBM’s new RDBMS on MVS called DB2. Interestingly, I had prepared two presentation decks: one for DB2, and the other for IBM’s Database directions. The second one was in hand, in case the announcement could not clear all the IBM approval process on time. Luckily I was clear to go with the announcement of the new production-ready RDBMS product called DB2 to run on the mainframe MVS platform. I still recall the excitement of doing that in front of 2000 people in the gastronomic capital of France, Lyon. Later that evening, the attendees were taken by buses to the Beaujolais winery for the evening dinner.

Why was this significant? IBM Research had worked on a prototype called System R and that was commercialized on the VM platform with the name of SQL/DS.  Even though it supported the relational model and SQL, it lacked the DBMS-robustness such as scalability, performance, and reliability. In the mean time, Oracle got started in 1977 and its first product based on System R principles and SQL was introduced in 1979 on DEC/VAX. There was a gap of four years when IBM did not have a commercial RDBMS on its flagship platform MVS. The only DBMS on MVS was IMS based on hierarchical data model and DL/1 proprietary language. One of the internal debates was on the positioning of the new RDBMS when IMS was so significant a revenue generator. I recall the “dual database strategy” presentation we used to give (which one to use when). One good thing about DB2 was that the bottom layer of the engine (buffering, locking, latching, backup-recovery, write-ahead log, etc.) drew a lot of lessons from the user experience of IMS. Hence DB2 had superior  industrial-strength features than its research cousin SQL/DS as well as Oracle.

The next year in 1984, I went to IBM’s Austin Lab for two years, to lay the foundation work for DB2 for the IBM PC (OS/2). Subsequently the development was shifted to IBM Toronto lab. I personally headed a team doing the early work of porting DB2 to Unix in the year 1990-91.

All this was done before the Internet was invented and memory and disks were expensive commodities. Now the scene has changed a great deal and we see so many new types of database engines coming to market to address the needs of extreme scale and huge volumes of data. IBM continues to be a lead player in the data management and analytics business.

It feels good to be part of that history. Happy birthday DB2.

NewSQL Meetup last week

I attended a meetup last week in Santa Clara and the topic was The Realities of NewSQL. Three companies were represented in a panel discussion – Clustrix (Raj Bains), VoltDB (Scott Jar), and TransLattice (Michael Lyle). Steve Baunach from Starview was the moderator.

This new category called NewSQL represents companies using the relational data model and SQL to impart better scalability, performance, and high availability. Following the rise of NoSQL community of companies bringing schema-less object-oriented data model with relaxed consistency and scale-out on commodity servers, the NewSQL group claims similar scale-out, but with relational DB and SQL support.

Three claims stood out in their discussion – preserving the SQL skill-base and relational model of data that has dominated the landscape for last 20 plus years; high scale-out by adding commodity servers (a weakness specially with MySQL); and better availability.

VoltDB deals with transaction processing (dominated by IBM and Oracle products) with very high throughput (due to the proliferation of devices as new data sources) and better performance. Their claim is that they have eliminated many unnecessary overheads from traditional RDBMS products by using in-memory techniques extensively.

Clustrix claims it has eliminated sharding (extra burden to users if they have to manage it) as offered by NoSQL products. Their mantra for success is scale-out on clusters – being able to handle high loads by adding commodity scale servers. They specifically focus on the MySQL user base.

The TransLattice Elastic Database (TED) is a Relational Database Management System that provides ANSI-SQL support, the ACID transactions enterprise applications require, and the ability to scale-out across wide distances using ordinary Internet connections. It uses partitioning to split databases across nodes. This notion is not new and has been deployed by IBM and Oracle for many years.

It was unclear on why existing users of IBM or Oracle will adopt one of these products, as the incumbents are marching forward to scale-out models and improving TCO. The MySQL community has been using external products for scalability for a while and that is understandable. But being part of Oracle corporation, MySQL will see enhancements in its scalability offerings. Then there is SAP Hana that claims big performance gains.

There are many companies under this umbrella – ClustrixGenieDBSchoonerVoltDBRethinkDBScaleDB, Akiban,CodeFuturesScaleBaseTranslattice, NimbusDB, etc. With the marketing noise of Big Data and Cloud, new companies are getting funded by the dozens. It is going to be a tough space to differentiate and become a winner.

NewSQL – What is it?

There has been a lot of discussion on NoSQL databases over the past couple of years. These databases do not use the Structured Query Language (SQL), the standard data manipulation language for relational databases such as Oracle, DB2, MySQL, Sybase, and SQL Server. The data model is closer to object-oriented data and hence fits well for documents or geospatial data. Being schema-less, they accommodate well for flexible data structures, unlike their relational brethren. Examples of NoSQL databases are MongoDB (most popular), CouchDB, and Cassandra. Programming is easier and rigid consistency is not guaranteed.  They also have scale-out models with replication and sharding (partitioning) for speed. These products support multiple languages.

A new category called NewSQL databases are aiming to provide the scale-out advantages of NoSQL databases, and often their commodity hardware friendliness as well. But NewSQL databases maintain the transactional data consistency guarantees of traditional relational databases, as well as their compatibility with SQL for queries and connectivity (using technologies like ODBC and JDBC).  One such product called NuoDB believes that transactional, analytical and “Web scale,” elastic workloads can be handled by the same database; it’s just a matter of making that the design goal. This is hard to believe until proven!

Another NewSQL product, VoltDB also claims to bring ACID-compliant transactions with analytics. VoltDB focuses on using in-memory technology to perform in situ analysis on financial, clickstream, gaming, and other high-velocity data as it streams in. In the company’s own words, VoltDB is meant to “narrow the ‘ingestion-to-decision’ gap.” There is growing need for instant analysis of transactional data (Real-time BI).

You squander the value of transactional data unless you analyze it as it is being recorded. SAP said much the same thing recently, as it announced the availability of its Business Suite on its HANA in-memory data platform, and fellow NewSQL player NuoDB uses in-memory and asynchronous technology to facilitate similar real-time analyses. Other NewSQL database products include ScaleDB and Clustrix, addressing the scalability needs of MySQL customers. Most of these products are also offering their services in the cloud.

It seems a grand unification process is on its way. Conventional relational databases and NoSQL databases seem to be at opposite ends of a spectrum. NewSQL databases acknowledge the merits in both models and seek to eliminate unreasonable compromise by marrying the approaches. NewSQL products may thus win out, but traditional relational database players may also incorporate NoSQL and NewSQL features to stay competitive. Perhaps that’s why Microsoft announced in November last year that the next major release of its SQL Server relational database will include an in-memory transactional database engine, codenamed “Hekaton.”

Big Data – Status

According to a Wall Street Journal article today by Rachael King and Steven Rosenbush, the market for new databases serving Big Data reached $1.22B last year and is expected to more than double by 2014 (according to research firm Wikibon). That is quite impressive.

Since relational databases using SQl are inefficient in handling data from social chatters, smartphones, and clicks (because of volume and variety), new databases are popping up over last 3-4 years. In the past two years 119 database software companies have been funded by VC’s for $1.17B (according to Venture Source, a Dow Jones company). This is remarkable, as not too long ago, the space was declared taken by 3 incumbents – IBM, Oracle, and Microsoft. However, the scene has changed dramatically now.

Thanks must go to Google for pioneering the start of new innovations in Big Table, GFS (Googel File System), and Map-Reduce algorithms for massively parallel processing using commodity hardware clusters. These technologies became part of Apache open source foundation and the result is Hadoop, HDFS, and several associated tools for the new ecosystem. Amazon, Yahoo and Facebook have also contributed good work here.

The article mentions a client Autozone using one of the new DBMS’s called NuoDB for better managing store inventory according to local shoppers. NuoDb like many others offers a cloud service with an annual subscription, cutting Capex for customers.

Another client Trulia (online real estate) was using MySQL, but has added Cassandra to better manage the listing of home foreclosures and apartment listings of its 100 million homes in the US.

Shutterstcok, a photo agency, stores 24 million images with 10,000 added each day. It uses HDFS (Hadoop) to find out user behavior (how long they hover over an image before purchasing).

The article suggests that large financial clients will stick to existing vendors such as Oracle for various reasons, but the threat of these newcomers is there. This is much like the cloud software  is shaking up Microsoft’s desktop software model.

We are in the data-intensive computing era now and the race will be fierce for leadership and market share.

Five Questions around Big Data

Data is the new currency of business and we are in the era of data-intensive computing. Much has been written on Big Data throughout 2012 and customers around the world are struggling to figure out its significance to their businesses. Someone said there are 3 I’s to Big Data

  • Immediate (I must do something right away)
  • Intimidating (what will happen if I don’t take advantage of Big Data)
  • Ill-defined (the term is so broad that I’m not clear what it means).

In this blog post, I would like to pose five key questions that customers must find answers to with regards to Big Data. So here goes.

1. Do I understand my data and do I have a data strategy?

There are varieties of data – customer transaction data, operational data, documents/emails and other unstructured data, clickstream data, sensor data, audio streams, video streams, etc. Do I have a clear understanding the 3V’s of Big Data – Volume, Velocity, and Variety? What is data “in motion” vs. data “in rest”? Data in motion demands split-second decisions and do I have such tools? Every data source must be understood followed by their attributes and growth projections.

Customers must have an overall data strategy based on their business importance. For example, business critical data must be highly reliable, secure and of high performance. A data policy must be in place to take care of volume, growth, retention, security and compliance needs.

2. What are my reporting needs to transform my business and give me insights for growth?

Businesses are transforming to stay ahead of the competition. While we asked, “what happened” in the past, now it is “why did it happen and what is going to happen?”. From data collection, we have to move to data analysis. Instead of analyzing existing business, we must create new business. Therefore, the retail industry wants to give “today’s recommendation” on the fly to clients; internal IT needs operational intelligence to make it more efficient; customer service must provide customer insight; and fraud management must look at social profiles to reduce fraud. The list goes on…

Do you have a clear understanding of your reporting needs via data visualization on mobile devices like the iPad with touch interface? You will need a strategy of all the analytic tools for key employees/executives to make quick business-relevant decisions.

3. How do I drastically reduce my TCO of Data Warehousing and BI?

Many large enterprises are spending millions of dollars to move operational data to a data warehouse via ETL tools (Extraction, Transformation, Loading). This can be expensive and time consuming. Sears, for example, has a slogan “ETL must die”. By moving to Hadoop, they reduced the ETL time from 20 hours to 17 minutes. They claim serious cost reductions by moving from traditional ETL to direct loading of raw data to Hadoop servers. Today’s implementations must be studied for price-performance and newer technologies can bring down costs and improve processing time drastically. Would you like to develop reports in days rather than weeks?

4. How does Big Data co-exist with my current OLTP and DW data?

All enterprises have business-critical operational systems (OLTP). These are using traditional DBMS systems (such as Oracle, DB2, IMS, etc.). They also created separate Data Warehousing systems with BI tools for analysis. Now the new world of Internet data such as chatters from social networks and Web Log data (digital exhaust) are adding to the complexity. What is your approach to data integration of the legacy vs. new data?

5. What is the right technology for my needs?

I keep hearing so many new terms and vendor names – Hadoop, Cloudera, Hortonworks, Datameer, NoSQL, MongoDB, Map-reduce, Data Appliance, HBase, etc. It surely can be very confusing!

I need to know what is the right technology for my needs. If I have petabyte volumes data coming from various sources, what technology can I implement to efficiently handle that? Then, how do I get relevant information from that pile to help my business insights? I also need to know what skills I need to do that and the cost. I need an implementation roadmap for getting value from all the data that my business is coming up with.

Tricky business of tech acquisition

The big news this week is the back-firing of HP’s acquisition of British software company Autonomy last year. HP paid a whopping $11.1B to acquire Autonomy even when it’s CFO was against the deal. HP took a charge of $8.8B citing accounting improprieties done by Autonomy to inflate company value. Of course the question is how come HP scrutiny did not see that before making the big decision to acquire. This all happened under the last CEO Leo Apotheker, even though current CEO Meg Whitman voted for it as a board member.

We have seen a few other acquisitions gone wrong during last few years. HP leads the list with two others – Palm and EDS. Palm was acquired in 2010 for $1.2B to get HP into the hand-held smart device business. But that did not work. The EDS acquisition in 2008 at $13.9B was aimed at competing against IBM’s Global Services. Now HP has taken $8B charge and there is rumor of a potential sale of that unit.

Microsoft acquired eQuantive in 2007 for $6.3B, but took a write-down of $6.2B in 2011. Cisco acquired Pure Digital (makers of the Flip Video Camcorder) for $590M for reasons unclear (get into consumer electronic business, far away from its core networking gears). It closed that business last year.

eBay acquired Skype under Meg Whitman’s watch in 2005 for $3.3B. Somehow that goal of combining client’s voice to its core auction business did not pan out. Finally eBay sold that unit to SIlver Lake partners for $1.9B. Last year Microsoft bought Skype for $8.9B!

IBM and Oracle on the other hand, seem to have acquired several companies successfully adding to their growth in business and scope. The trick lies in the strategy group looking carefully as to why such a move makes sense and how to blend the acquired product and technology to its existing fabric. HP now blames its chief technology guy Shane Robison who was instrumental in the Autonomy decision. But both Apotheker and Robison are gone from HP.

Shareholders of these public companies are an unhappy lot as the write downs affect the stock value, as seen this week in HP’s stock price.

Workday IPO this morning

The seven-year old company Workday, founded by Peoplesoft founder David Duffield went IPO this morning and immediately jumped 71%. The IPO price was set for $28 and it is treading around $48 after 3 hours. The market cap is reaching a whopping $6.5B. This also makes Duffield very wealthy with his 44% ownership ($2.5B). The co-CEO Anil Bhusree (Greylock Partners) is also a billionaire with his 17% stake in the company. Both started Workday after the hostile takeover of Peoplesoft by Oracle back in 2005. The total investment in Workday was over $200M and its revenue has been growing steadily, probably reaching over $500m at the end of next fiscal year.

Workday provides cloud-based human resources, payroll and financial management tools. So what is new? I think they have learnt from their deep experience at Peoplesoft and built this company to provide great user experience at much lower cost. Offering this as SaaS reduces capex and also much reduced consulting expenses (as much as 80%). Oracle or SAP applications require a hefty “services” cost, as consultants are brought in to customize and install the software. It is said that for every product license dollar, customers need to spend $6 to $7 in consulting. Workday aims at replacing the legacy in-house packaged applications such as SAP and Oracle. Recently companies like HP and Google have announced to endorse Workday for their internal use.

Workday started in the human resource area, but is expanding to financial management and eventually to ERP. The secret is in the architecture and design and ease-of-use. By making it multi-tenant and fast and using touch UI via tablets, they appear very modern and attractive. They have spent the time carefully building this product for the enterprise user, used to very archaic interfaces of charts and graphs and complex management.

Oracle and SAP are not sitting idle. In the human resource area, Oracle has bought Taleo (offered as cloud service) and SAP bought Success Factors. The other set of competition comes from SaaS companies such as SalesForce.com, Netsuite, and several niche players. It will be interesting to see how Workday manages its growth over next 2-3 years.

In-memory Database – Oracle’s Exadata X3 vs. SAP’s HANA

This week at the Oracle Open World conference, Larry Ellison announced the new Exadata X3 processor that has 4TB of DRAM plus 22TB of Flash or SSD memory. Therefore, he said that you could have 26TB of in-memory data for fast processing at very fast write-speed (1m writes per second). Clearly this is aimed at SAP’s in-memory database project HANA that has been shipping for last 6 months. Larry, in his typical style, derided HANA as one with 0.5TB of DRAM and therefore not worth comparing to the X3.

Subsequent to this announcement, Vishal Sikka, SAP’s CTO and head of HANA development, wrote a blog refuting Oracle’s claim as false and mis-leading. He says that the 22TB of SSD does not count as memory and HANA has such SSD for persistence. He says, “We are presently shipping, for the last several months, certified 16-node HANA hardware made by 4 vendors: IBM, HP, Fujitsu and Cisco.  These systems are available for 16TB of DRAM, so they are already 4 times bigger than Oracle’s machine, and they have been in the market since spring of this year. The machines can take up to 32TB of DRAM, within their current configurations.  In IBM’s case, with the Max5 configuration, they can go up to 40TB.”

During SAP’s annual conference Sapphire last May, they demonstrated the largest HANA system built so far – an IBM cluster running a 100TB of DRAM and 4000 CPU core. Already today this system can go up to 250TB of DRAM (and with HANA’s compression, can hold multiple Petabytes of data entirely in-memory).

SAP is not into hardware unlike Oracle (with its Sun acquisition) who is quite motivated to make the hardware business succeed by creating their “engineered” systems. SAP gets its hardware cluster from four vendors and IBM is the strongest partner (even though it competes in the database space). SAP claims that HANA is not merely an in-memory database system, it provides many additional functions such as real-time analytics, etc.

Oracle will be a formidable competitor, as they have the longest years of experience in managing data. Now they are shifting to providing platform as a service with database, analytics, application development and social services. The game is not merely about speed and feed, but several other dimensions. SAP claims a rapid adoption of HANA in the few months of its introduction. It is hard to compare as there is no benchmark performance numbers.

The market will be the best judge of who is better. There are many camps now. The NoSQL camp is denouncing all the traditional database vendors as incapable of handling the large volume of unstructured data (Big Data). The initial target of both SAP and Oracle seem to be its existing customer base, who will find a logical upgrade path to use these in-memory database solutions for fast speed and scalability.

Closer look at one NoSQL database – MongoDB

Among the new crop of NoSQL database products, MongoDB ranks quite high, in my opinion. The company that produces MongoDB is 10Gen, a venture backed new start-up since 2008. But its rapid growth over last 4 years bears testimony to its technical strength.

MongoDB’s name comes from the middle five letters of the word “humongous”, meaning big data. It is an open-source, document-oriented storage which is schema-free and can entertain dynamic queries with full indexing. The programming model is BSON – binary encoding of JSON (Javascript Object Notation), a lightweight text-based open standard designed for data interchange. Douglas Crawford of Yahoo invented JSON in 2006.

The other key tenet of MongoDB is its scalability architecture – it can scale out horizontally using its automatic “sharding” (or keyrange partitioning). It does provide master-slave or peer-to-peer replication for high availability, recovery, and performance. One of its customers Disney’s Interactive Media Group, for example, has 1400 instances of Mongo. It uses sharding for write performance and replication for read performance.

MongoDB can be deployed from the cloud via Amazon’s AWS. Their revenue model is via support services, training, and consulting. Partners include VMWare, Amazon, Redhat, etc. – all cloud platform providers offering MongoDB as an option to their clients. Although the database suits document storage the best, it can handle other unstructured data like video, and images. But initial thrust seems to be those customers looking for high scalability using commodity hardware and superior performance.

MongoDB claims over 400 customers, including many internet companies like FourSquare, Craigslist, etc. Several textbooks have been published on MongoDB and the development community is growing fast. It certainly bridges the gap between traditional RDBMS (Oracle, MySQL, SQL Server, DB2) at one end and Key-Value pair search engines (Riak, Cassandra, Voldemart,..) at the other end.

Apple Market Value exceeds $500B

Wow Apple! The market value exceeded $500B and now everyone is speculating if it will reach One Trillion, which no company has ever aspired. As I look into the valuation this morning, Apple is at $505B. Microsoft is almost at half of that at $266B. Look at the other big ones in technology sector – Oracle ($146B), Amazon($82B), Cisco ($107B), IBM ($229B), Intel ($134B), Google ($201B), and HP($50B). Other stalwarts for comparison are – Wal-Mart($202B), GE($202B), and Exxon Mobile($408B).

Someone commented that if Apple was part of the Dow Jones, then the value would have exceeded 14000 few months back. Apple is an American pride institution that symbolizes great creativity, innovation, and visionary leadership in planning and execution. Once written off as a “has-been”, Apple came back with a vengeance largely due to the dynamic leadership and imagination of the late Steve Jobs. He married liberal arts and computer science into a blend of consumer products that reached the pinnacle of success. It puts the leadership at Microsoft and even Google to the back benches. Let us not even talk about HP.

Steve put together a great leadership team and Apple will continue on its growth for next couple of years. They will announce the new iPad 3 next week on March 7th. I am sure newer versions of iPhone and MacBooks are on the pipeline. Their foray into television should begin to shake up that sector, much like what they did to music, smartphones and tablets. For now, Tim Cook and team seem to charge forward with the same vigor as their departed leader.

At a personal level, I was a user of all Apple products except the laptop, then I bought my first MacBook Air around Thanksgiving time. Now I am in love with my MacBook Air with its SSD and lightning fast boot-up. Every time I go back to my old IBM ThinkPad running Windows, I am in a time-warp of some prehistoric technology.

Hats off to Apple with its great success!