The rise of private equity in technology

Last week, a public company Informatica got acquired by two private equity funds – the Permira fund and Canada Pension Plan Investment Board (CPPIB) for $5.3B. This is the biggest leveraged buyout so far this year.

I am happy for my friend Sohaib Abbasi (we were colleagues at Oracle during the 1990s) who is CEO of Informatica after being a board member for a couple of years. During Sohaib’s time, the company entered into playing a bigger role in data archiving and life cycle management. It also made progress into offering cloud based services.

Gourav Dhillon (founder, Snaplogic) founded the company back in 1992 and was its CEO for 12 years growing it to be a $300m company after a successful IPO. It was created during the rise of data warehousing and one needed a component called ETL (Extraction, Transformation, and Loading), a process of cleansing the data from operational systems and getting it ready for analytics. I used to call this “twenty-five years of sin” that needs to be corrected!

Informatica helps companies integrate and analyze data from various sources. It counts Western Union Co, Citrix Systems Inc, American Airlines Group Inc and Bank of New York Mellon Corp among its customers. It competes with Tibco, which was taken private for $4.3 billion in December 2014 by private equity firm Vista Equity Partners. Dhillon thinks his new company Snaplogic is better off by seeing two of its competitors (Informatica and Tibco) shunted to the land of private equity, which will squeeze these companies for profit. This is financial engineering at its best and will impact customers and long-term employees negatively while rewarding top management.

Many people believe that the private equity players will eventually sell this to a big technology player, much like Crystal Decisions was acquired by Business Objects (now part of SAP) for $1.2B in 2003. The model seems to be – take a struggling public company private, work on improving its margins and value, then sell it back to a sugar daddy and make a hefty profit. We saw that happened to Skype also (from eBay to private equity to Microsoft).

The timing might be really good for this, because the areas that Informatica specializes in, are the key touch points within the enterprise: data quality and data security and data integration in support of big data projects. That explains the high value of $5.3B.

Other start-ups getting private equity funding in recent times include – Cloudera, MongoDB, etc. They provide an alternative funding resource to the traditional VCs.

Congratulations, Michael Stonebraker for winning the 2014 ACM Turing Award

This week, the 2014 ACM Turing award was given to Michael Stonebraker, professor of computer science and engineering at MIT. Mike spent 29 years at University of California, Berkeley, joining as assistant professor after his Ph.D. in 1971 from the University of Michigan. His undergraduate degree was from Princeton University. Since 2000, he has been at MIT. He is a remarkable researcher, pioneering many frontiers in database management. Personally I have interacted with him several times during my days at IBM and Oracle. We have even spoken at the same panel in couple of public forums during the 1990s.

The award citation reads, “Michael Stonebraker is being recognized for fundamental contributions to the concepts and practices underlying modern database systems.  Stonebreaker is the inventor of many concepts that were crucial to making databases a reality and that are used in almost all modern database systems. His work on INGRES introduced the notion of query modification, used for integrity constraints and views. His later work on Postgres introduced the object-relational model, effectively merging databases with abstract data types while keeping the database separate from the programming language.”

The ACM Turing award is considered as the “nobel prize in computer science” and is named after the British mathematician Alan Turing. The first award was given in the year 1966 and included a citation and $250,000 cash. Since last year, Google has sponsored and lifted the award to $1 Million dollars. Many stalwarts like Charles Bachman (1973, for inventing the concept of a shared database), Edgar Codd (1981, for pioneering the relational database), and Jim Gray (1998, for seminal work on database and transaction processing) have been honored with the Turing award. Mike Stonebraker joins this illustrious group.

The specialty of Mike is that his research has culminated in many product companies  as the following list (partial) shows:

  • Ingres – early relational database based on Dr. Codd’s (IBM) relational data model.
  • Postgres – object-relational database, base for products like Aster Data (part of Teradata), and Greenplum (part of EMC).
  • Illustra – Object database sold to Informix (now IBM) during the 1990s
  • Vertica – columnar data store, sold to HP in 2011
  • StreamBase – stream-oriented data store
  • Goby – data integration platform
  • VoltDB – in-memory database with high-speed transaction processing
  • SciDB – scientific data management
  • Tamr – to handle sensor data from varieties of sources

He has publicly derided the NoSQL movement, mainly due to its relaxed integrity (ACID) approach which he calls a fundamental flaw. He has also said in a recent interview, “IBM’s DB2, Oracle, and Microsoft‘s SQL Server are all obsolete, facing a couple major challenges. One is that at the time, they were designed for “business data processing.” But now there is also scientific data and social media, and web logs, and you name it! The number of people with database problems is now of a much broader scope. Second, “We were writing Ingres and System R for machines with a small main memory, so they were disk-based — they were what we call ‘row stores‘.” You stored data on disk record by record by record. All major database systems of the last 30 years all looked like that – Postgres, Ingres, DB2, Oracle DB, SQL Server — they’re all disk-row stores.” He says in-memory processing is quite economical and is the trend for future. He is a bit self-serving as his company VoltDB is based on that principle.

Mike thinks Facebook has the biggest database challenge with its “social graph” model which is growing in size at alarming speed. The underlying data store is MySQL which can not handle such load. Hence they have to come up with highly scalable innovative solutions, which will be mostly home-grown as no commercial product can handle that kind of load.

Mike Stonebraker is a legend in database research and the Turing award is well-deserved for such a pioneer. Congratulations!

Big Data Visualization

Recently I listened to a discussion on Big Data Visualization hosted by Bill McKnight of the McKnight Consulting group. The panelists agreed that Big Data is shifting from the hype state to an “imperative” state. For start-up companies, there are more Big Data projects whereas true big data is still a small part of the enterprise practice. At many companies, Big Data is moving from POC (Proof of Concept) to production. Interest in visualization of data from different sources is certainly increasing. There is a growth in data-driven decision-making as evidenced by the increasing use of platforms like YARN, HIVE, and Spark. The traditional approach of RDBMS platform can not scale to meet the needs of rapidly growing volume and varieties of Big Data.

So what is the difference between Data Exploration vs. Data Visualization? Data exploration is more analytical and is used to test hypothesis, whereas visualization is used to profile data and is more structured. The suggestion is to bring visualization to the beginning of data cycle (not the end) to do better data exploration. For example, in a personalized cancer treatment, the finding and examining of output of white blood counts and cancer cells can be done upfront using data visualization. In Internet e-commerce, billions of rows of data can be analyzed to understand consumer behavior. One customer uses Hadoop and Tableau’s visualization software to do this. Tableau enables visualization of all kinds of data sources from three scenarios – cold data from a data lake on Hadoop (where source data in native format can be located); warm data from a smaller set of data; or hot data served in-memory for faster processing.

Data format can be a challenge. How do you do visualization of NoSQL data? For example, JSON data (supported by MongoDB) is nested and schema-less and is hard for BI tools. Understanding data is crucial and flattening of nested hierarchies will be needed. Nested arrays can be broken as foreign keys. Graph data is another special case, where visualization of the right amount of graphs data is critical (good UX).

Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL. Modern big data applications such as social, mobile, web and IoT deal with a larger number of users and larger amount of data than the traditional transactional applications. The datasets associated with these applications evolve rapidly, are often self-describing and can include complex types such as JSON and Parquet. Apache Drill is built from the ground up to provide low latency queries natively on such rapidly evolving multi-structured datasets at scale.

Apache Spark is another exciting new approach to speed up queries by utilizing memory. It consists of Spark SQL (SQL like queries), Spark string, MLLib, and GraphX. It leverages Python, Scala, and Java to do the processing. It enables users of Hadoop to have more fun with data analysis and visualization.

Big Data Visualization is emerging to be a critical component for extracting business value from data.

Lambda Architecture

I attended a Meetup yesterday in Mountain View, hosted by The Hive group on the subject of Lambda Architecture. Since I had never heard about this new phrase, my curiosity took me there. There was a panel discussion and panelists came from Hortonworks, Cloudera, MapR, Teradata, etc.

Lambda Architecture is a useful framework to think about designing big data applications. Nathan Marz designed this generic architecture addressing common requirements for big data based on his experience working on distributed data processing systems at Twitter. Some of the key requirements in building this architecture include:

  • Fault-tolerance against hardware failures and human errors
  • Support for a variety of use cases that include low latency querying as well as updates
  • Linear scale-out capabilities, meaning that throwing more machines at the problem should help with getting the job done
  • Extensibility so that the system is manageable and can accommodate newer features easily

The following pictures summarizes the framework.

Overview of the Lambda Architecture

The Lambda Architecture as seen in the picture has three major components.

  1. Batch layer that provides the following functionality
    1. managing the master dataset, an immutable, append-only set of raw data
    2. pre-computing arbitrary query functions, called batch views.
  2. Serving layer—This layer indexes the batch views so that they can be queried in ad hoc with low latency.
  3. Speed layer—This layer accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, the speed layer deals with recent data only.

Criticism of lambda architecture has focused on its inherent complexity and its limiting influence. The batch and streaming sides each require a different code base that must be maintained and kept in sync so that processed data produces the same result from both paths. Yet attempting to abstract the code bases into a single framework puts many of the specialized tools in the batch and real-time ecosystems out of reach.

The panelists rambled on details without addressing real challenges on combining two very different approaches, thus compromising the benefits of stream with added latency of the batch world. However, there is merit to the thought process of unification of the two disparate worlds into a common framework. Real deployment will be the proof point.

Amazing Apple with its record-breaking earnings!

Yesterday Apple disclosed its fourth quarter financial results and it was simply astounding sending wall street analysts scrambling with their way-off forecasts. In the last quarter of 2014, Apple made a stunning profit of $18B (38% growth from a year ago) on a revenue base of $74.6B. This is the record profit made by any company ever! During that quarter iPhone sales reached 75m with a hefty growth in China market, and iPhone revenue reached $51.18B (70% of total revenue). This is more than Google and Microsoft revenue combined for that quarter. As Tim Cook said, they sold 34000 iPhones every hour, 24 hours a day, every day of the quarter. Apple now is the most valuable company in the planet!

The sales were fueled largely by the larger screen iPhone6 and 6Plus, introduced last fall. They were of higher price and demand exceeded supply, as usual. This growth again affected the tablet sales which declined, clearly showing customer’s preference of large screen smartphones with higher memory. Interestingly Mac sales also went up during the quarter.

Apple sits on $187B cash now and maintaining this growth would be a challenge. But the new Apple Watch is coming out in April, adding a new revenue stream. Hopefully it will be another product success in a new category. What one must appreciate is the relentless excellence in quality and delivery besides the unique design of Apple’s products. Unlike Samsung, Apple figured out how to crack the China market which has a huge appetite for large screen smartphones such as iPhone6 and 6Plus.

Hats off to Apple!

Big Data coverage at CES 2015

I saw more discussion on big data at CES 2015 this week, compared to previous years. Everyone talked about data as the central core of everything. The IoT (Internet of Things), NFC (near Field Communication) and M2M (Machine to Machine) communication are enabling pieces for many industries  – security monitoring, asset and inventory tracking, healthcare, process control, building environment monitoring, vehicle tracking and telemetry, in-store customer engagement and digital signage. Big data is the big deal here.

The Big Data ecosystem includes – cloud computing, M2M/IoT, dumb terminal 2.0 (devices getting dumber – more cloud, better broadband, less about storage and more about broadband access and high quality display), and analysis. The big data opportunity is slated to be a $200B business in 2015. Every company must insert the big data ecosystem into their future roadmap or get left out. The key here is not the technology, but its business value.

The progression goes like this: Big Data ->Big Info -> Big Knowledge -> Big Insight. For example Big Data says 60 (not much meaning) , then Big Info says “Steve is 60″ adding context. Then Big Knowledge says “Steve can’t hear very well” followed by Big Insight like “maybe we give Steve a hearing aid”, an actionable item. So we go from Big Data to Big Insight that becomes very useful. Several industry examples can be given:

  • Retail iBeacon technology – Apple’s technology allows smartphones to be tracked geographically. This will provide vector info about shoppers and hence allows for predictive service experience in combination with smart mirrors.
  • Insurance companies – by collecting information on drivers behavior the premiums can be adjusted by individual.
  • Medical event tracking – big data has crucial role here providing relevant information by patient.
  • Asset tracking in oil fields can help reduce costs and increase efficiency.
  • Smart cities – like San Francisco parking system SFPARK, every sensor-based parking space  can be efficiently used. You can use your smartphone to find available parking quickly.
  • and many more..

Big Data is the heart of it all – efficiently ingest, store, process, and manage unstructured data and provide meaningful analysis. Using an oil industry analogy, in the next 3-5 years we will see Big Data as the crude oil and Analytics is the new refinery.

From CES 2015 – Disruptive Technologies over next five years

This year’s CES expects to have 160,000 attendees and tonight’s keynote by the CEO of Samsung Mr. BK Yoon was “unlocking infinite possibilities of IoT”. The Internet of Things seems to be the overall theme this year.

Today I listened to an interesting panel on disruptive technologies over next five years. Here is a brief summary.

  1. 3D Printing: This year expects to see 300,000 desktop 3D printers in the US. Mainstream consumer adoption is doubtful. Someone jokingly said that you can build a statue of yourself and install it in your yard. Another term for 3D printing is additive manufacturing. Most likely it will be adopted by small industries providing repair service (by building plastic parts for a washing machine, for example). Many such 3-D printing devices are on display.
  2. Wearables: This is a diverse market of connecting the unconnected ($2B market). Healthcare seems to lead the usage via the health and fitness wearables, such as the Apple Watch. There are two values – quantified self (with context) and notification bits (of relevance). This technology will be quite disruptive over next five years. Apple explained what an wearable can be in their announcement last year. If they can galvanize the developer community, then huge value will be realized. Just like many PC functions got migrated to the smartphone, we will see similar migration of smartphone stuff to the wearable (e.g. notification, alerts, short messages,..).
  3. Drones: This is similar to 3D printing, with questionable mass adoption. Maybe over next ten years, serious adoption will take place. Immediate application may be video photography and surveillance. There are many regulatory and policy hurdles before drones can be mainstream.
  4. Self-driving cars: Here engineering is way ahead of the policy curve. While full adoption may not happen in the near future, semi-autonomous systems can be of help – such as self parking, and adaptive cruise control, tasks that can be turned over to the car. The panel felt that next five years will be the “preparation phase” and adoption will come in ten years.

Other technologies covered were: the huge growth in Internet users from 2B now to the over 5B. This will bring new cultural, political and economic ramifications. Smartphones will continue to be disruptive with newer and newer usage across the world impacting our daily lives. Robotics, specially home robots doing several tasks will become relevant.

The big question was on the ownership of data created by all these devices. This year’s CES has a bigger presence of automobile companies and both BMW and Mercedes Benz executives appeared in keynotes. The connected home and the connected car have bigger presence.