What is the “3rd Platform” of IT? It comprises of the cloud, mobile, social, and big data products. According to IDC, “3rd Platform technologies and solutions will drive 29 percent of 2014 IT spending and 89 percent of all IT spending growth”. Much of that growth will come from the “cannibalization” of traditional IT markets. Here are some interesting quotes and statements I read recently.
- Adding terabytes to a Hadoop cluster is much less costly than adding terabytes to an enterprise data warehouse (EDW).
- IDG Enterprise’s 2014 Big Data survey: more than half of the IT leaders polled believe they will have to re-architect the data center network to some extent to accommodate big data services.
- “Big data has the same sort of disruptive potential as the client-server revolution of 30 years ago, which changed the whole way that IT infrastructure evolved. For some people the disruption will be exciting and for others, it will be threatening.” – Marshall Presser, CTO at Pivotal
- The traditional IT infrastructure was designed to help the CFO close the company’s books faster than the manual accounting systems that preceded IT. A surprising number of those original systems are still kicking around, adding to the pile of “legacy spaghetti” that CIOs love to complain about.
- We are seeing a bumpy transition from the old kind of IT that faced mostly inward, to a new kind of IT that mostly faces outward.
- After years of resistance, IT is following the nearly universal business trend of replacing “product-centricity” with “customer-centricity”.
- One key challenge is rapidly scaling systems to meet unexpected levels of demand. “I call it the ‘curse of success’ because if the market suddenly loves your product, you have to scale up very quickly. Those kinds of scaling problems are difficult to solve, and there isn’t a universal toolkit for achieving scalability on the Internet of Things. When Henry Ford needed to scale up production, he could add another assembly line.” (Jordan Husney, Strategy Director, Undercurrent)
- “HDFS is a complete replacement for not just one, but four different layers of the traditional IT stack. The HDFS ecosystem does storage, processing, analytics, and BI/visualization, all without moving the data back and forth from one system to another. It is a complete cannibalization of the existing stack.” (Abhisek Mehta, Founder, CEO of Tresata). My view is that this only applies to the analytics side, not the transaction-processing aspect of business.
- API-ification of the Enterprise – “Not only we have to change the infrastructure, we have to fundamentally change the way we build applications. Hundreds of millions of new applications will be built. Some of them will be very small, and very transient. Traditional IT organizations – along with their tooling, approaches, and processes – will have to change. For IT, it’s going to be a different world. We’re seeing the ‘API-ification’ of the enterprise.” (Rick Bullota, Cofounder and CTO, ThingWorx
All these observations are interesting, but must be taken with the proper scope in mind. There is a tendency to sensationalize and generalize too quickly. The 3rd Platform is real and Big data is certainly changing the IT landscape. The only question is on the velocity of change!
I was invited to speak at a Oracle NoSQL and Big Data Meetup last night. Here is the link for the event. I kicked it off with a broad picture of the Big data landscape trying to clear some confusions on varieties of terms – NoSQL, NewSQL, Data Warehousing Appliances, data exhaust, M2M, Hadoop, data visualization, machine learning, etc. Then Dave Rubin from Oracle presented their latest release of Oracle NoSQL 3.0 that was announced this week. Oracle acquired Sleepycat which was the keeper of BerkleyDB, the key-value store that originated at the UC, Berkeley during the 1990s. Mike Olson who founded Cloudera was one of the key developers. Then I presented MongoDB Momentum showing seven examples of actual customer usage of MongoDB to solve business problems.
Oracle’s new release adds a tabular model on top of the key-value store. It also added secondary indexing, a shard key, and several operational features. It is not surprising to see the push to view tables in line with the Oracle RDBMS. Time will tell how useful that is, as the NoSQL experience is to move away from the two-dimensional row-column view. But it may be a co-existence statement by Oracle to its customer base.
My theme on MongoDB momentum was to highlight the need for Systems of Engagement (something Geoff Moore has been talking about lately) on top of Systems of Record. Geoff says these systems of record were built decades back with an RDBMS at the center. They are like the interstate highway system in the US. Now what we need is to build cool and groovy interactive applications using newer technologies like the web, cloud and mobile devices. These systems must be built very rapidly and must be highly elastic to accommodate new data formats and high scalability with performance. I mentioned several such examples at enterprises like MetLife, Telefonica, Cisco, Intuit, etc. These new-age modern applications are built using MongoDB’s flexible data model and horizontal scale-out architecture to yield fast performance and scale. MongoDB is rapidly growing with over 7 million downloads and thousand plus customers in just 4-5 years of its life.
There were 117 people attending the event and it was quite interactive with lots of questions at the end.
Posted in BI, Big Data, cloud computing, Database, New Technology, NoSQL, Oracle
Tagged Big Data, cloud computing, MongoDB, NoSQL, Oracle
Most of the discussions on Big Data begins and ends with Hadoop. It is the commercial version of HPC (High Performance Computing), whose underlying technologies have been around for years: clustering, parallel processing, and distributed file system. In today’s parlance you can read Hadoop clusters on commodity hardware, map-reduce algorithm, and HDFS in that order. There is no doubt that Hadoop has taken off in a big way, but it does not address one big emerging area called real time query and analysis on data that is moving all the time. Data can be categorized into three buckets – transactional data, analytics on data at rest, and analytics on data in flight (streaming, real time). We are talking about the last one here.
It is not about just velocity, but also latency. When an event occurs, we need to act on it within seconds or minutes. We have to “react in the moment”. First, the enterprise data warehouse (EDW) needs to be loaded with real time data, as opposed to the offline batch loading. What we need is continuous loading and data ingestion. Second, we have to do query and analysis on this fresh data as it comes for split-second decisions. The EDWs were designed years ago for offline batch processing and are unsuited for this new role. Hence newer technologies like in-memory processing, querying, and ingesting have to be looked at. As someone said – RAM is the new Disk, and Disk is the new Tape, and Tape is the new Microfiche (if they exist). One TB of RAM costs around $4k today and it will keep going down. Most EDW are under 5TB. So enterprises must evaluate the cost part of doing in-memory processing.
Data in motion includes social network data feeds, clickstreams, trading data, sensor data, etc. Velocity is the new big thing and actions on such data must be taken within seconds. There are economic values as well as safety values. For example, at Citibank, a 100 millisecond processing delay can cost them $1 million dollars of business. This also drastically reduces the analysis window for finding root cause. Scale-out solutions on commodity hardware offer big economic advantage. Solutions such as MemSQL, SAP HANA, Argyle Systems, Yahoo Storm, and Apache Spark/Shark are bringing in-memory processing architectures to handle this area of data in motion.
Microsoft announced this morning the appointment of a new CEO – Satya Nadella. He has worked at the company for the last 22 years and since last week, the rumors have started that he was the one picked by the board after five months of search. Being from the same country of origin, it gives me immense pleasure to see Satya climb the ladder to the top of the 4th. largest corporation (in terms of market value) in the world, besides being the number one software company. I don’t know him, but based on what I read, he seems like a very talented and capable leader.
Mr. Nadella comes from Hyderabad, India where he went to the Hyderabad Public School (HPS), fellow senior schoolmates are Shantanu Narayen, CEO of Adobe and Srikar Reddy, CEO of Sonata Software (where I am an advisor) and many other leaders. Then he attended Manipal Engineering College for his bachelor’s degree in electrical engineering. He did his masters degree in computer science at the University of Wisconsin, followed by an MBA from the U. of Chicago. While working at Microsoft he completed his MBA degree, traveling back and forth between Redmond and Chicago. During his 22 years he went up rapidly managing various teams at Microsoft, from XBox to Bing search engine to finally lead the cloud and server division. He grew that division’s revenue from $17B to $21B in three years. Microsoft’s only growth business is the back-end infrastructure software business (Sharepoint, Azure Cloud, SQL Server, etc.). The consumer stuff (Windows 8, Office,..) are in decline as the computing industry shifts from PC-centric model to a cloud model of serving billions of smart client devices. Clearly Microsoft is behind in that game to competitors like Google, Apple, Samsung, etc.
More importantly, changing course at Microsoft has proven to be difficult. Ray Ozzie tried few years back, but failed. Working through the contentious groups has been quite hard. The board pins their hope on Satya Nadella to address this issue as he steers the company on the new course of cloud computing, big data, and myriads of smart devices. He clearly recognizes the deficiencies and wants to bring an entrepreneurial culture back to Microsoft. It has talent and lots of money to spend.
We wish Mr. Nadella success in his new role.
NoSQL and Big Data Analytics: 10 Things Every CIO Should Know
February 3, 2014
CIOs of Fortune 1000 companies already understand the competitive edge IT systems can bring to their businesses, or they wouldn’t be in such a high corporate role. However, for many small and midsize companies, basic questions remain about how to prepare an older IT system for the 21st century and what big data technology should be considered to meet their needs. Should all enterprise data be stored, secured and maintained on the data center premises? Should analytics functions be farmed out to cloud services, or should they be kept in-house? What roles should new-gen IT such as NoSQL database, Hadoop and MapReduce play here? Hadoop and MapReduce have become popular tools for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. NoSQL databases are designed for fast storage and retrieval of data without strictly using the tabular structure of SQL databases. In this eWEEK slide show, with insight from former Oracle and IBM executive and technology visionary/executive consultant Jnan Dash, we offer 10 basic facts every CIO should know about big data and emerging database technologies.
I joined 600 people last night at a session sponsored by Hive to listen to Doug Cutting, the creator of Hadoop. Currently he is the chief architect at Cloudera and a director at Apache Software Foundation. The hall at NetApp facility was overflowing with an eager audience. Doug spoke about the future of data management.
He narrated a brief history of Hadoop, how it was founded and how far it has come. As everyone knows, the pedigree of Hadoop came from Google’s GFS (Google File System, now HDFS) and Map-Reduce programming. Here are the key predictions he made:
- Hadoop has grown to become the de-facto standard for Big Data. He had anticipated IBM and Microsoft to come up with alternative designs to compete with Hadoop, but that never happened. Both companies plus Oracle, HP and other players have endorsed Hadoop as the platform.
- Hadoop will become the center of data management in future. It will not be the original HDFS+MR layers, but a whole new ecosystem called “The Enterprise Data Hub”. There will be an explosion of products surrounding Hadoop (all open systems). He cited examples of Pig, Hive, Sqoop, etc. Currently many SQL implementations over HDFS are coming up.
- Will there be OLTP (Transactional systems) on Hadoop? He said yes. Current implementation of Impala (from Cloudera) has SQL on HDFS with Map-Reduce on top is proving quite efficient in ETL workloads. Several customers have started migrating from legacy world to Impala.
- The new project at Google called Spanner is also leading the way to a future OLTP system distributed across the globe. This work will propel future additions to the Hadoop ecosystem.
- He explained the big advantage of Open systems architecture and why that will become the norm over proprietary systems.
- The future Hadoop ecosystem (Enterprise Data Hub) will be a threat to the current incumbents like Oracle, MySQL, SQL server, DB2, and Vertica. Current challenges of weak security and lack of standardization will be addressed eventually.
Doug is an engaging speaker and clearly showed he knows his subject well. I have my doubts on his future predictions, as DBMS’s take a long time to mature and provide all the critical functions for mission-critical applications. We have learnt that over the last 4 decades. Hadoop is still primarily a batch system doing offline analytics. Moving from there to do real-time production workload is quite a jump and will take many years to accomplish.
Then there are the new breed of highly efficient NoSQL databases like MongoDB that are being deployed to create “systems of engagement” at large enterprises. Also, the incumbents are not sitting idle either with a total market size of $30 Billion dollars. It is funny to remember that our tax records are still managed by Model 204 at IRS, a DBMS created during the 1960s. Switching databases is extremely cumbersome and not for the faint-hearted. Doug did say that future spending will steer more towards Hadoop.
Given the challenges of Big Data and the rapid adoption of Hadoop, we will watch this space as it unfolds over next couple of years.