Category Archives: New Technology

Splice Machine – What is it?

Those of you who have never heard of Splice Machine, don’t worry. You are in the company of many. So I decided to listen to a webinar last week that said the following in its announcement: learn about benefits of a modern IoT application platform that can capture, process, store, analyze and act on the large streams of data generated by IoT devices. The demonstration will include:

  • High Performance Data Ingestion
  • Analytics and Transformation on Data-In-Motion
  • Relational DBMS, Supporting Hybrid OLTP and OLAP Processing
  • In-Memory and Non-Volatile, Row-based and Columnar Storage mechanisms
  • Machine Learning to support decision making and problem resolution

That was a tall order. Gartner has a new term HTAP – Hybrid Transactional and Analytical Processing. Forrester uses “Translytical” to describe this platform where you could do both OLTP and OLAP. I had written a blog on Translytical database almost two years back. So I did attend the webinar and it was quite impressive. The only confusion was the liberal use of IoT in its marketing slogan. By that they want to emphasize “streaming data” (ingest, store, manage).

In Splice Machine’s website, you see four things: Hybrid RDBMS, ANSI SQL, ACID Transactions, and Real-Time Analytics. A white paper advertisement says, “Your IoT applications deserve a better data platform”. In looking at the advisory board members, I recognized 3 names – Roger Bamford, ex-Oracle and an investor, Ken Rudin, ex-Oracle, and Marie-Anne Niemet, ex-TimeTen. The company is funded by Mohr Davidow Ventures, and Interwest Partners amongst others.

There is a need for bringing together the worlds of OLTP (Transaction workloads) and Analytics or OLAP workloads into a common platform. They have been separated for decades and that’s how the Data Warehouse, MDM, OLAP cubes, etc. got started. The movement of data between the OLTP world and OLAP has been handled by ETL vendors such as Informatica. With the popularity of Hadoop, the DW/Analytics world is crowded with terms like Data Lake, ELT (first load, then transform), Data Curation, Data Unification, etc. A new architecture called Lambda (not to be confused with AWS Lambda for serverless computing) claims to unify the two worlds – OLTP and real-time streaming and analytics.

Into this world, comes Splice Machine with its scale-out data platform. You can do your standard ACID-compliant OLTP processing, data ingestion via Spark streaming and Kafka topics, query processing via ANSI SQL, and get your analytical workload without ETL. They even claim support of procedural language like PL/SQL for Oracle data. With their support of machine learning, they demonstrated predictive analytics. The current focus is on verticals like Healthcare, Telco, Retail, and Finance (Wells fargo), etc.

In the cacophony of Big Data and IoT noise, it is hard to separate facts from fiction. But I do see a role for a “unified” approach like Splice Machine. Again, the proof is always in the pudding – some real-life customer deployment scenarios with performance numbers will prove the hypothesis and their claim of 10x faster speed with one-fourth the cost.

Data Unification at scale

This term Data Unification is new in the Big Data lexicon, pushed by varieties of companies such as Talend, 1010Data, and TamR. Data unification deals with the domain known as ETL (Extraction, Transformation, Loading), initiated during the 1990s when Data Warehousing was gaining relevance. ETL refers to the process of extracting data from inside or outside sources (multiple applications typically developed and supported by different vendors or hosted on separate hardware), transform it to fit operational needs (based on business rules), and load it into end target databases, more specifically, an operational data store, data mart, or a data warehouse. These are read-only databases for analytics. Initially the analytics was mostly retroactive (e.g. how many shoppers between age 25-35 bought this item between May and July?). This was like driving a car looking at the rear-view mirror. Then forward-looking analysis (called data mining) started to appear. Now business also demands “predictive analytics” and “streaming analytics”.

During my IBM and Oracle days, the ETL in the first phase was left for outside companies to address. This was unglamorous work and key vendors were not that interested to solve this. This gave rise to many new players such as Informatica, Datastage, Talend and it became quite a thriving business. We also see many open-source ETL companies.

The ETL methodology consisted of: constructing a global schema in advance, for each local data source write a program to understand the source and map to the global schema, then write a script to transform, clean (homonym and synonym issues) and dedup (get rid of duplicates) it. Programs were set up to build the ETL pipeline. This process has matured over 20 years and is used today for data unification problems. The term MDM (Master Data Management) points to a master representation of all enterprise objects, to which everybody agrees to confirm.

In the world of Big Data, this approach is very inadequate. Why?

  • data unification at scale is a very big deal. The schema-first approach works fine with retail data (sales transactions, not many data sources,..), but gets extremely hard with sources that can be hundreds or even thousands. This gets worse when you want to unify public data from the web with enterprise data.
  • human labor to map each source to a master schema gets to be costly and excessive. Here machine learning is required and domain experts should be asked to augment where needed.
  • real-time data unification of streaming data and analysis can not be handled by these solutions.

Another solution called “data lake” where you store disparate data in their native format, seems to address the “ingest” problem only. It tries to change the order of ETL to ELT (first load then transform). However it does not address the scale issues. The new world needs bottoms-up data unification (schema-last) in real-time or near real-time.

The typical data unification cycle can go like this – start with a few sources, try enriching the data with say X, see if it works, if you fail then loop back and try again. Use enrichment to improve and do everything automatically using machine learning and statistics. But iterate furiously. Ask for help when needed from domain experts. Otherwise the current approach of ETL or ELT can get very expensive.

The new Microsoft

Clearly Satya Nadella has made a huge difference at Microsoft since taking office in 2014. The stock in 2016 hit an all time high since 1999. So investors are happy. Here are the key changes he has made since taking the role as CEO:

  • Skipped Windows 9 and went straight from Windows 8 to Windows 10, a great release. However revenues from Window is declining with the reduction of PC sales.
  • Released Microsoft Office for iPad. Also releasing the Outlook product on iPhone & Android.
  • Embraced Linux by joining the Linux Foundation, previously anathema to Microsoft’s window-centric culture.
  • Spent $2.5B to buy Mojang, the studio behind hit game Minecraft.
  • Introduced Microsoft’s first laptop, The Surface Book.
  • Revealed Microsoft HoloLens, the super-futuristic holographic goggles.
  • Created the new partner program to provide Microsoft products on non-Windows platforms. Hired ex-Qualcomm exec Peggy Johnson to head the bus-dev group.
  • Enhanced company morale and employee excitement.
  • The biggest gamble was the purchase of Linked-In last June for a whopping $26.2B.

It’s important to understand the significance of the Linked-In purchase. Adam Rifkin (I worked with him twelve years back at KnowNow, a smart guy) recently wrote an article on this topic. I like his comment that in a world of machine learning, uniquely valuable data is the new network effect. The right kind of data is now the force multiplier that can catapult organizations past any competitors who lack equivalent data. So data is the new barrier to entry. Adam also makes a statement that the most valuable data is perishable and not static. Software is eating the world and AI is eating software meaning AI is eating data and popping out software.

Now let’s map what this means to the Linked-In purchase by Microsoft which sees the network effects of Linked-In’s data. What Google gets from search, Facebook gets from likes, and Amazon gets from shopping carts, Microsoft will get such insights from Linked-In’s data for its CRM services. Adam makes a point that the global CRM market in 2015 was worth $26.3B – almost exactly what Microsoft paid. It is the fastest growing area of enterprise software. Hence Marc Benioff of SalesForce was not very happy with this acquisition.

The new Microsoft is ready to fight the enterprise software battle with incumbents like SalesForce, Oracle, SAP and Workday.

The resurgence of AI/ML/DL

We have been seeing a sudden rise in the deployment of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). It looks like the long “AI winter” is finally over.

  • According to IDC, AI-related hardware, software and services business will jump from $8B this year to $47B by 2020.
  • I have also read comments like, “AI is like the Internet in the mid 1990s and it will be pervasive this time”.
  • According to Andrew Ng, chief scientist at Baidu, “AI is the new electricity. Just as 100 years ago electricity transformed industry after industry, AI will now do the same.”
  • Peter Lee, co-head at Microsoft Research said,  “Sales teams are using neural nets to recommend which prospects to contact next or what kind of products to recommend.”
  • IBM Watson used AI in 2011, not DL. Now all 30 components are augmented by DL (investment from $500M – $6B in 2020).
  • Google had 2 DL projects in 2012, now it is more than 1000 (Search, Android, Gmail, Translation, Maps, YouTube, Self-driving cars,..).

It is interesting to note that AI was mentioned by Alan Turing in a paper he wrote back in 1950 to suggest that there is possibility to build machines with true intelligence. Then in 1956, John McCarthy organized a conference at Dartmouth and coined the phrase Artificial Intelligence. Much of the next three decades did not see much activity and hence the phrase “AI Winter” was coined. Around 1997, IBM’s Deep Blue won the chess match against Kasparov. During the last few years, we saw deployments such as Apple’s Siri, Microsoft’s Cortana, and IBM’s Watson (beating Jeopardy game show champions in 2011). In 2014, DeepMind team used a deep learning algorithm to create a program to win Atari games.

During last 2 years, use of this technology has accelerated greatly. The key players pushing AI/ML/DL are – Nvidia, Baidu, Google, IBM, Apple, Microsoft, Facebook, Twitter, Amazon, Yahoo, etc. Many new players have appeared – DeepMind, Numenta, Nervana, MetaMind, AlchemyAPI, Sentient, OpenAI, SkyMind, Cortica, etc. These companies are all targets of acquisition by the big ones. Sunder Pichai of Google says, “Machine learning is a core transformative way in which we are rethinking everything we are doing”. Google’s products deploying these technologies are – Visual Translation, RankBrain, Speech Recognition, Voicemail Transcription, Photo Search, Spam Filter, etc.

AI is the broadest term, applying to any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision trees, and machine learning. The subset of AI that includes abstruse statistical techniques that enable machines to improve at tasks with experience is machine learning. A subset of machine learning called deep learning is composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, by exposing multi-layered neural networks to vast amounts of data.

I think the resurgence is a result of the confluence of several factors, like advanced chip technology such as Nvidia Pascal GPU architecture or IBM TrueNorth (brain-inspired computer chip), software architectures like microservice containers, ML libraries, and data analytics tool kits. Well known academia are heavily being recruited by companies – Geoffrey Hinton of University of Toronto (Google), Yann LeCun of New York University (Facebook), Andrew Ng of Stanford (Baidu), Yoshua Bengio of University of Montreal, etc.

The outlook of AI/ML/DL is very bright and we will see some real benefits in every business sector.

Data-driven enterprise

87bcf8ea-34c4-44f7-a9be-e6982c226924-originalI moderated a panel of 3 CIOs last Sunday at the Solix Empower conference on the subject of data-driven enterprise. The three CIO’s came from different industries. Marc Parmet of the TechPar group spent many years at Avery Dennison after stints at Apple and IBM. Sachin Mathur leads the IT innovations at Terex Corp., a large company supplying cranes and other heavy equipments. PK Agarwal, currently dean at Northeastern University, used to be the CIO for the Government of California. Here are some of the points covered:

  • I reminded the audience that we are at the fourth paradigm in science (as per the late Jim Gray). A thousand year ago, science was experimental, then few hundred years back science became theoretical (Newton’s law, Maxwell’s law..), fifty years ago, science became computational (simulation via a computer). Now the fourth paradigm is data-driven science where experiment, theory, and computation must be combined to one holistic discipline. Actually science hit the “big data” problem long before the commercial world.
  • Top level management is starting to understand that data is the oxygen, but they are yet to fully make their organizations data-driven. Just having a data warehouse with analytics and reporting does not make it data-driven, but they do see the value of predictive analytics and deep learning for competitive advantage.
  • While business-critical applications continue to run on-premise, newer, less critical apps such as collaboration and email (e.g. Lotus Notes) are moving to the public cloud. One said that they are evaluating migrating current Oracle ERP to a cloud version. Data security and reliability are critical needs. One panelist talked about not just private, public or hybrid cloud, but “scattered” cloud which will be highly distributed.
  • Out of the 3V’s of big data (volume, variety, and velocity), variety seems to be of higher need – images, pictures, videos combined with sensors deployed in manufacturing and factory automation. For industries such as retail and telcos, volume dominates. The velocity part will become more and more critical as streaming of these data in real-time will need fast ingestion and analysis-on-the-fly for timely decision making. This is the emerging world of IoT where devices with an IP address will be everywhere – individuals, connected homes, autonomous cars, connected factories. They will produce huge amounts of data volume. Cluster computing with Hadoop/Spark will be the most economical technology to deal with this load. Much work lies ahead.
  • There will be serious shortage of “big data” or “data science” skills, of the order of 4-5 million in next few years. Hence universities such as Northeastern is setting up new curriculum on data science. Today’s data scientist must have knowledge of the business, algorithms, comp. science, statistical modeling plus he/she must be good story teller. Unlike the past, it’s not just answering questions, but figuring out what questions to ask. Such skills will be at a premium as enterprises become more data-driven.

We discussed many other points. It was a fun panel.

 

The top five most-valued companies are Tech. – almost

On this first day of August 2016, I saw that the top most-valued companies are tech. companies, and the fifth one is almost there. Here is the list.

  1. Apple ($appl): $566 billion
  2. Alphabet ($goog): $562B
  3. Microsoft ($msft): $433B
  4. Amazon ($amzn): $365B
  5. Exxon Mobile ($xom): $356B
  6. Facebook ($fb): $353B

The big move is Amazon’s beating Exxon Mobile (used to be number 1 for many years) to the fourth spot. The switch came after Amazon posted its fifth straight quarter of profits last week as the oil giant’s profits tumbled 59 percent during the same rough period. If Exxon continues its drop, then Facebook will beat it in days.

This is quite remarkable! Other than Microsoft and Apple, the other 3 companies are much younger, Facebook being the youngest one. Their rapid rise is due to the growth of the Internet with its associated areas of search, e-commerce, and social networking. Interestingly Amazon survived the dot-com bust of the early 2000-2001 time unlike Yahoo, AOL, etc. Contrast this to the $4.8B valuation of Yahoo’s core business acquired by Verizon last week! Also, the fastest growing and most profitable of Amazon’s 3 businesses (Books, any commercial items, and AWS) is the cloud infrastructure piece called AWS (Amazon Web Services) with a run-rate of $10B this year. This is way ahead of Microsoft’s Azure cloud or Google’s cloud solutions. 

The importance of cloud is obvious as Oracle just paid $9.3B last week to acquire Netsuite, a company that was funded by Larry Ellison. With a 40% ownership of Netsuite, he gets a hefty $3.5B from this deal. Paradoxically, Amazon lead the way to cloud computing – not IBM, not HP, not EMC/VMWare, and not Microsoft or Google. So no wonder, Amazon is reaping the benefits!

Yahoo going to Verizon is so unexciting!

So finally it was Verizon paying $4.8B to acquire Yahoo’s core business. Business Insider said, “Yahoo, which was founded in 1994, was one of the world’s leading internet businesses but has gone through tough times in the past several years. Yahoo’s peak value was $125 billion in 2000, and even in 2008, Microsoft wanted to pay $45 billion for the company, so a $4.8 billion sale price pales in comparison.

This deal is also more or less the logical extension of Verizon’s $4 billion deal last year to acquire AOL, which is still run by Tim Armstrong, whom Yahoo CEO Marissa Mayer worked with at Google back in the day. Yahoo and AOL, after all, are fairly similar old-school content-and-advertising internet businesses. Here is the reaction from a competitor Sprint – CEO Marcelo Claure said Monday that Verizon’s purchase of Yahoo is just the latest in a long history of deals by telecom firms trying to get into the content business, none of which have panned out.

Although this deal sounds like a sad end to Yahoo, an icon of the early Internet players, Marissa Mayer tried to paint it as a success. Why not? She will walk out with almost $50m if fired from her job. It is a big let-down for her, specially after the high expectations when she was hired in 2012. She was supposed to turn this company around with big revenue growth. None of that happened. Rather she spent a ton of money for very little return. Take the case of Tumblr, which was mostly a waste (after paying $1.1B). As they say, you ruin the company and then walk out with a huge amount of money. Sad but true.

As of today, the business that will stay behind post-acquisition by Verizon includes Yahoo’s cash, its shares in Alibaba and Yahoo Japan, Yahoo’s convertible notes, certain minority investments, and Yahoo’s non-core patents (called the Excalibur portfolio). These remaining businesses will be rebranded after the completion of the acquisition in early 2017.

It will be interesting to see how Verizon brings some synergy across its 3 similar, but overlapping offerings  – AOL, Yahoo and its own go90.