Category Archives: New Technology

API-driven Economy?

I just went to a couple of sessions at the API World going on at the San Jose Convention center. I heard all kinds of new terms thrown within a span of couple of hours – the new API driven economy, iSaaS (integration software as a service), iPaaS (integration platform as a service), APIM (API management), BaaS (Backend as a service), etc. Then there was confusing & overlapping mixture of ideas in microservices, containers, connectors, API’s..all in the space of system integration. There were lots of young software developers at this conference and booths from companies I have never heard of – Jitterbit (Enterprise iPaaS), Back4App (backend dev. via Parser server), PubNub (real-time API), Rigor (API monitoring and testing). I took a deep breath and thought of all these ideas over last 3 decades – api’s, subroutines, reusable web services, service-oriented-architecture, integration via connectors, assembly of interchangeable parts from common libraries, etc. Welcome back to the future!

I see the urgency of this now that we have so many products and platforms in every category. A speaker from Jitterbit showed how Cisco’s marketing software stack has 39 different technologies – SalesForce, 6Sense, Eloqua, App Annie, Live Agent, etc. They do functions like campaign management, CRM, email blast, mobile notification… This is definitely not the ideal solution. Jitterbit wants to be the mediator via API’s to consolidate all these based on activities and work flow. No wonder, this Alameda-based startup is doing very well. I was not surprised to learn that SalesForce & private equity firm KKR are investors in Jitterbit.

Gartner predicts enterprise application integration market to be $33.5B by 2020 (CAGR of 7.1% from $25.5B in 2016), whereas the integration platform as a service (iPaaS) will be $3.6B by 2021 (CAGR of 41.5% from $526M in 2016). The data integration market is going to reach $12.2B in 2022 from $6.4B in 2017 (CAGR 13.7%). Gartner says, “IT leaders should combine on-premise integration platform, iPaaS, iSaaS and API Management capabilities into a single, yet modular enterprise capability.” Gartner defines this whole space as Application Integration Platforms.

I think it’s time we consolidate all these terms and bring real clarity. Current marketing hype of API driven economy does not help much. What used to be a programmer’s term (api – application programming interface) is now marketed as a broad term to solve the world hunger problem.

The goal has not changed – we want integration of heterogeneous systems (both inside and outside the enterprise) to be highly efficient, transparent, and less labor intensive.

Advertisements

Splice Machine – What is it?

Those of you who have never heard of Splice Machine, don’t worry. You are in the company of many. So I decided to listen to a webinar last week that said the following in its announcement: learn about benefits of a modern IoT application platform that can capture, process, store, analyze and act on the large streams of data generated by IoT devices. The demonstration will include:

  • High Performance Data Ingestion
  • Analytics and Transformation on Data-In-Motion
  • Relational DBMS, Supporting Hybrid OLTP and OLAP Processing
  • In-Memory and Non-Volatile, Row-based and Columnar Storage mechanisms
  • Machine Learning to support decision making and problem resolution

That was a tall order. Gartner has a new term HTAP – Hybrid Transactional and Analytical Processing. Forrester uses “Translytical” to describe this platform where you could do both OLTP and OLAP. I had written a blog on Translytical database almost two years back. So I did attend the webinar and it was quite impressive. The only confusion was the liberal use of IoT in its marketing slogan. By that they want to emphasize “streaming data” (ingest, store, manage).

In Splice Machine’s website, you see four things: Hybrid RDBMS, ANSI SQL, ACID Transactions, and Real-Time Analytics. A white paper advertisement says, “Your IoT applications deserve a better data platform”. In looking at the advisory board members, I recognized 3 names – Roger Bamford, ex-Oracle and an investor, Ken Rudin, ex-Oracle, and Marie-Anne Niemet, ex-TimeTen. The company is funded by Mohr Davidow Ventures, and Interwest Partners amongst others.

There is a need for bringing together the worlds of OLTP (Transaction workloads) and Analytics or OLAP workloads into a common platform. They have been separated for decades and that’s how the Data Warehouse, MDM, OLAP cubes, etc. got started. The movement of data between the OLTP world and OLAP has been handled by ETL vendors such as Informatica. With the popularity of Hadoop, the DW/Analytics world is crowded with terms like Data Lake, ELT (first load, then transform), Data Curation, Data Unification, etc. A new architecture called Lambda (not to be confused with AWS Lambda for serverless computing) claims to unify the two worlds – OLTP and real-time streaming and analytics.

Into this world, comes Splice Machine with its scale-out data platform. You can do your standard ACID-compliant OLTP processing, data ingestion via Spark streaming and Kafka topics, query processing via ANSI SQL, and get your analytical workload without ETL. They even claim support of procedural language like PL/SQL for Oracle data. With their support of machine learning, they demonstrated predictive analytics. The current focus is on verticals like Healthcare, Telco, Retail, and Finance (Wells fargo), etc.

In the cacophony of Big Data and IoT noise, it is hard to separate facts from fiction. But I do see a role for a “unified” approach like Splice Machine. Again, the proof is always in the pudding – some real-life customer deployment scenarios with performance numbers will prove the hypothesis and their claim of 10x faster speed with one-fourth the cost.

Data Unification at scale

This term Data Unification is new in the Big Data lexicon, pushed by varieties of companies such as Talend, 1010Data, and TamR. Data unification deals with the domain known as ETL (Extraction, Transformation, Loading), initiated during the 1990s when Data Warehousing was gaining relevance. ETL refers to the process of extracting data from inside or outside sources (multiple applications typically developed and supported by different vendors or hosted on separate hardware), transform it to fit operational needs (based on business rules), and load it into end target databases, more specifically, an operational data store, data mart, or a data warehouse. These are read-only databases for analytics. Initially the analytics was mostly retroactive (e.g. how many shoppers between age 25-35 bought this item between May and July?). This was like driving a car looking at the rear-view mirror. Then forward-looking analysis (called data mining) started to appear. Now business also demands “predictive analytics” and “streaming analytics”.

During my IBM and Oracle days, the ETL in the first phase was left for outside companies to address. This was unglamorous work and key vendors were not that interested to solve this. This gave rise to many new players such as Informatica, Datastage, Talend and it became quite a thriving business. We also see many open-source ETL companies.

The ETL methodology consisted of: constructing a global schema in advance, for each local data source write a program to understand the source and map to the global schema, then write a script to transform, clean (homonym and synonym issues) and dedup (get rid of duplicates) it. Programs were set up to build the ETL pipeline. This process has matured over 20 years and is used today for data unification problems. The term MDM (Master Data Management) points to a master representation of all enterprise objects, to which everybody agrees to confirm.

In the world of Big Data, this approach is very inadequate. Why?

  • data unification at scale is a very big deal. The schema-first approach works fine with retail data (sales transactions, not many data sources,..), but gets extremely hard with sources that can be hundreds or even thousands. This gets worse when you want to unify public data from the web with enterprise data.
  • human labor to map each source to a master schema gets to be costly and excessive. Here machine learning is required and domain experts should be asked to augment where needed.
  • real-time data unification of streaming data and analysis can not be handled by these solutions.

Another solution called “data lake” where you store disparate data in their native format, seems to address the “ingest” problem only. It tries to change the order of ETL to ELT (first load then transform). However it does not address the scale issues. The new world needs bottoms-up data unification (schema-last) in real-time or near real-time.

The typical data unification cycle can go like this – start with a few sources, try enriching the data with say X, see if it works, if you fail then loop back and try again. Use enrichment to improve and do everything automatically using machine learning and statistics. But iterate furiously. Ask for help when needed from domain experts. Otherwise the current approach of ETL or ELT can get very expensive.

The new Microsoft

Clearly Satya Nadella has made a huge difference at Microsoft since taking office in 2014. The stock in 2016 hit an all time high since 1999. So investors are happy. Here are the key changes he has made since taking the role as CEO:

  • Skipped Windows 9 and went straight from Windows 8 to Windows 10, a great release. However revenues from Window is declining with the reduction of PC sales.
  • Released Microsoft Office for iPad. Also releasing the Outlook product on iPhone & Android.
  • Embraced Linux by joining the Linux Foundation, previously anathema to Microsoft’s window-centric culture.
  • Spent $2.5B to buy Mojang, the studio behind hit game Minecraft.
  • Introduced Microsoft’s first laptop, The Surface Book.
  • Revealed Microsoft HoloLens, the super-futuristic holographic goggles.
  • Created the new partner program to provide Microsoft products on non-Windows platforms. Hired ex-Qualcomm exec Peggy Johnson to head the bus-dev group.
  • Enhanced company morale and employee excitement.
  • The biggest gamble was the purchase of Linked-In last June for a whopping $26.2B.

It’s important to understand the significance of the Linked-In purchase. Adam Rifkin (I worked with him twelve years back at KnowNow, a smart guy) recently wrote an article on this topic. I like his comment that in a world of machine learning, uniquely valuable data is the new network effect. The right kind of data is now the force multiplier that can catapult organizations past any competitors who lack equivalent data. So data is the new barrier to entry. Adam also makes a statement that the most valuable data is perishable and not static. Software is eating the world and AI is eating software meaning AI is eating data and popping out software.

Now let’s map what this means to the Linked-In purchase by Microsoft which sees the network effects of Linked-In’s data. What Google gets from search, Facebook gets from likes, and Amazon gets from shopping carts, Microsoft will get such insights from Linked-In’s data for its CRM services. Adam makes a point that the global CRM market in 2015 was worth $26.3B – almost exactly what Microsoft paid. It is the fastest growing area of enterprise software. Hence Marc Benioff of SalesForce was not very happy with this acquisition.

The new Microsoft is ready to fight the enterprise software battle with incumbents like SalesForce, Oracle, SAP and Workday.

The resurgence of AI/ML/DL

We have been seeing a sudden rise in the deployment of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). It looks like the long “AI winter” is finally over.

  • According to IDC, AI-related hardware, software and services business will jump from $8B this year to $47B by 2020.
  • I have also read comments like, “AI is like the Internet in the mid 1990s and it will be pervasive this time”.
  • According to Andrew Ng, chief scientist at Baidu, “AI is the new electricity. Just as 100 years ago electricity transformed industry after industry, AI will now do the same.”
  • Peter Lee, co-head at Microsoft Research said,  “Sales teams are using neural nets to recommend which prospects to contact next or what kind of products to recommend.”
  • IBM Watson used AI in 2011, not DL. Now all 30 components are augmented by DL (investment from $500M – $6B in 2020).
  • Google had 2 DL projects in 2012, now it is more than 1000 (Search, Android, Gmail, Translation, Maps, YouTube, Self-driving cars,..).

It is interesting to note that AI was mentioned by Alan Turing in a paper he wrote back in 1950 to suggest that there is possibility to build machines with true intelligence. Then in 1956, John McCarthy organized a conference at Dartmouth and coined the phrase Artificial Intelligence. Much of the next three decades did not see much activity and hence the phrase “AI Winter” was coined. Around 1997, IBM’s Deep Blue won the chess match against Kasparov. During the last few years, we saw deployments such as Apple’s Siri, Microsoft’s Cortana, and IBM’s Watson (beating Jeopardy game show champions in 2011). In 2014, DeepMind team used a deep learning algorithm to create a program to win Atari games.

During last 2 years, use of this technology has accelerated greatly. The key players pushing AI/ML/DL are – Nvidia, Baidu, Google, IBM, Apple, Microsoft, Facebook, Twitter, Amazon, Yahoo, etc. Many new players have appeared – DeepMind, Numenta, Nervana, MetaMind, AlchemyAPI, Sentient, OpenAI, SkyMind, Cortica, etc. These companies are all targets of acquisition by the big ones. Sunder Pichai of Google says, “Machine learning is a core transformative way in which we are rethinking everything we are doing”. Google’s products deploying these technologies are – Visual Translation, RankBrain, Speech Recognition, Voicemail Transcription, Photo Search, Spam Filter, etc.

AI is the broadest term, applying to any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision trees, and machine learning. The subset of AI that includes abstruse statistical techniques that enable machines to improve at tasks with experience is machine learning. A subset of machine learning called deep learning is composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, by exposing multi-layered neural networks to vast amounts of data.

I think the resurgence is a result of the confluence of several factors, like advanced chip technology such as Nvidia Pascal GPU architecture or IBM TrueNorth (brain-inspired computer chip), software architectures like microservice containers, ML libraries, and data analytics tool kits. Well known academia are heavily being recruited by companies – Geoffrey Hinton of University of Toronto (Google), Yann LeCun of New York University (Facebook), Andrew Ng of Stanford (Baidu), Yoshua Bengio of University of Montreal, etc.

The outlook of AI/ML/DL is very bright and we will see some real benefits in every business sector.

Data-driven enterprise

87bcf8ea-34c4-44f7-a9be-e6982c226924-originalI moderated a panel of 3 CIOs last Sunday at the Solix Empower conference on the subject of data-driven enterprise. The three CIO’s came from different industries. Marc Parmet of the TechPar group spent many years at Avery Dennison after stints at Apple and IBM. Sachin Mathur leads the IT innovations at Terex Corp., a large company supplying cranes and other heavy equipments. PK Agarwal, currently dean at Northeastern University, used to be the CIO for the Government of California. Here are some of the points covered:

  • I reminded the audience that we are at the fourth paradigm in science (as per the late Jim Gray). A thousand year ago, science was experimental, then few hundred years back science became theoretical (Newton’s law, Maxwell’s law..), fifty years ago, science became computational (simulation via a computer). Now the fourth paradigm is data-driven science where experiment, theory, and computation must be combined to one holistic discipline. Actually science hit the “big data” problem long before the commercial world.
  • Top level management is starting to understand that data is the oxygen, but they are yet to fully make their organizations data-driven. Just having a data warehouse with analytics and reporting does not make it data-driven, but they do see the value of predictive analytics and deep learning for competitive advantage.
  • While business-critical applications continue to run on-premise, newer, less critical apps such as collaboration and email (e.g. Lotus Notes) are moving to the public cloud. One said that they are evaluating migrating current Oracle ERP to a cloud version. Data security and reliability are critical needs. One panelist talked about not just private, public or hybrid cloud, but “scattered” cloud which will be highly distributed.
  • Out of the 3V’s of big data (volume, variety, and velocity), variety seems to be of higher need – images, pictures, videos combined with sensors deployed in manufacturing and factory automation. For industries such as retail and telcos, volume dominates. The velocity part will become more and more critical as streaming of these data in real-time will need fast ingestion and analysis-on-the-fly for timely decision making. This is the emerging world of IoT where devices with an IP address will be everywhere – individuals, connected homes, autonomous cars, connected factories. They will produce huge amounts of data volume. Cluster computing with Hadoop/Spark will be the most economical technology to deal with this load. Much work lies ahead.
  • There will be serious shortage of “big data” or “data science” skills, of the order of 4-5 million in next few years. Hence universities such as Northeastern is setting up new curriculum on data science. Today’s data scientist must have knowledge of the business, algorithms, comp. science, statistical modeling plus he/she must be good story teller. Unlike the past, it’s not just answering questions, but figuring out what questions to ask. Such skills will be at a premium as enterprises become more data-driven.

We discussed many other points. It was a fun panel.

 

The top five most-valued companies are Tech. – almost

On this first day of August 2016, I saw that the top most-valued companies are tech. companies, and the fifth one is almost there. Here is the list.

  1. Apple ($appl): $566 billion
  2. Alphabet ($goog): $562B
  3. Microsoft ($msft): $433B
  4. Amazon ($amzn): $365B
  5. Exxon Mobile ($xom): $356B
  6. Facebook ($fb): $353B

The big move is Amazon’s beating Exxon Mobile (used to be number 1 for many years) to the fourth spot. The switch came after Amazon posted its fifth straight quarter of profits last week as the oil giant’s profits tumbled 59 percent during the same rough period. If Exxon continues its drop, then Facebook will beat it in days.

This is quite remarkable! Other than Microsoft and Apple, the other 3 companies are much younger, Facebook being the youngest one. Their rapid rise is due to the growth of the Internet with its associated areas of search, e-commerce, and social networking. Interestingly Amazon survived the dot-com bust of the early 2000-2001 time unlike Yahoo, AOL, etc. Contrast this to the $4.8B valuation of Yahoo’s core business acquired by Verizon last week! Also, the fastest growing and most profitable of Amazon’s 3 businesses (Books, any commercial items, and AWS) is the cloud infrastructure piece called AWS (Amazon Web Services) with a run-rate of $10B this year. This is way ahead of Microsoft’s Azure cloud or Google’s cloud solutions. 

The importance of cloud is obvious as Oracle just paid $9.3B last week to acquire Netsuite, a company that was funded by Larry Ellison. With a 40% ownership of Netsuite, he gets a hefty $3.5B from this deal. Paradoxically, Amazon lead the way to cloud computing – not IBM, not HP, not EMC/VMWare, and not Microsoft or Google. So no wonder, Amazon is reaping the benefits!