AWS re:Invent 2017

In a few decades when the history of computing will be written, a major section will be devoted to cloud computing. The headline of the first section would read something like this – How did a dot-com era book-selling company became the father of cloud computing? While the giants like IBM, HP, and Microsoft were sleeping, Amazon started a new business eleven years ago in 2006 called AWS (Amazon Web Services). I still remember the afternoon when I had spent couple of hours with the CTO of Amazon (not Werner Vogel, his predecessor, a dutch gentleman) back in 2004 discussing the importance of SOA (Service Oriented Architecture). When I asked why was he interested, he mentioned how CEO Jeff Bezos has given a marching order to monetize the under-utilized infrastructure in their data centers. Thus AWS arrived in 2006 with S3 for storage and EC2 for computing.

Advance the clock by 11 years. At this week’s AWS Re-Invent event in Las Vegas it was amazing to listen to Andy Jassy, CEO of AWS who gave a 2.5 hour keynote on how far AWS has come. There were 43,000 people attending this event (in its 6th year) and another 60,000 were tuned in via the web. AWS has a revenue run rate of $18B with a 42% Year-to-Year growth. It’s profit is over 60% thus contributing significantly to Amazon’s bottom line. It has hundreds of thousands of customers starting from majority web startups to Fortune 500 enterprise players in all verticals. It has the strongest partner ecosystem. Garter group said AWS has a market share of 44.1% (39% last year), larger than all others combined. Customers like Goldman Sachs, Expedia, and National Football League were on stage showing how they fully switched to AWS for all their development and production.

Andy covered four major areas – computing, database, analytics, and machine learning with many new announcement of services. AWS already offers over 100 services. Here is a brief overview.

  • Computing – 3 major areas: Instances of EC2 including new GPU processor for AI, Containers (services such as Elastic Container Services and new ones like EKS – Elastic Kubernetes Services), and Serverless (Function as a Service with its Lambda services). The last one, Serverless is gaining fast traction in just last 12 months.
  • Database – AWS is starting to give real challenge to incumbents like Oracle, IBM and Microsoft. It has three offerings – AWS Aurora RDBMS for transaction processing, DynamoDB and Redshift. Andy announced Aurora Multi-Master for replicated read and writes across data centers and zones. He claims it is the first RDBMS with scale-out across multiple data centers and is lot cheaper than Oracle’s RAC solution. They also announced Aurora Serverless for on-demand, auto-scaling app dev. For No-SQL, AWS has DynamoDB (key-value store). They also have Amazon Elastic Cache for in-memory DB. Andy announced Dynamo DB Global Tables as a fully-managed, multi-master, multi-region DB for customers with global users (such as Expedia). Another new service called Amazon Neptune was announced for highly connected data (fully managed Graph database). They also have Redshift for data warehousing and analytics.
  • Analytics – AWS provides Data Lake service on S3 which enables API access to any data in its native form. They have many services like Athena, Glue, Kinesis to access the data lake. Two new services were announced – S3 Select (a new API to select and retrieve S3 data from within an object), Glacier Select (access less frequently used data in the archives).
  • Machine Learning – Amazon claims it has been using machine learning for 20 years in its e-commerce business to understand user’s preferences. A new service was announced called Amazon Sagemaker which brings storage, data movement, management of hosted notebook, and ML algorithms like 10 top commonly used ones (eg. Time Series Forecasting). It also accommodates other popular libraries like Tensorflow, Apache MxNet, and Caffe2. Once you pick an algorithm, training is much easier with Sagemaker. Then with one-click, the deployment happens. Their chief AI fellow Dr. Matt Wood demonstrated on stage how this is all done. They also announced AWS DeepLens, a video camera for developers with a computer vision model. This can detect facial recognition and image recognition for apps. New services announced besides the above two are – Amazon Kinesis Video streams (video ingestion), Amazon Transcribe (automatic speech recognition), Amazon Translate (between languages), and Amazon Comprehend (fully managed NLP – Natural Language Processing).

It was a very impressive and powerful presentation and shows how deeply committed and dedicated the AWS team is. Microsoft Azure cloud, Google’s computing cloud, IBM’s cloud and Oracle’s cloud all seem way behind in terms of AWS’s breadth and depth. It will be to customer’s benefit to have couple of AWS alternatives as we march along the cloud computing highway. Who wants a single-vendor lock-in?


Meet the new richest man on earth

This morning Jeff Bezos beat his nemesis from the same town Bill Gates as the richest man on the planet with his worth exceeding $90B. This was due to a huge surge in Amazon’s stock price (over $128 rise) to $1100 plus today. Their 3Q results came out yesterday and Amazon grew its revenue by 34% and profits inched up as well. There were fears that heavy investments in new warehouses and hiring workers would push it to a loss. This year Amazon’s stock started at $750. What a run!

Here are the numbers. Revenue soared 34% to a record $43.74B, a first for a non-holiday period, as the internet retail giant spread its ambitions with the acquisition of Whole Foods Market Inc. and widened its lead in cloud computing. Profit increased 1.6% to $256M, despite the costs bulging by 35%, a five-year high. I was surprised to know that Amazon employs 541,900 people, an increase from last quarter’s 382,400. Roughly 87,000 employees were added from Whole Foods. Now Amazon commands some 43.5% of e-commerce sales this year, compared with 38.1% last year.

I remember during the crash, everyone wrote off Amazon. When they ridiculed Bezos for a no-profit company with a bleak future, he jokingly replied, ” I spell profit as ‘prophet'”. He has come a long way with his prophetic vision and masterful execution.

The best addition to Amazon’s two core businesses (books and e-commerce) was the introduction of AWS as the cloud computing infrastructure back in 2004. First came S3 (simple shared storage) when Bezos convinced start-up companies to rent storage at one-hundredth of the cost of buying from big vendors. Then EC2 (Elastic Computing Cloud) was added and that took off in a big way, especially with capital-starved startups with unpredictable computing needs. Pretty soon, Amazon took the credit of being the ‘father of cloud computing’ beating big incumbents like IBM, HP, etc. Now AWS is a huge business growing fast and bringing in about $16B revenue with over 60% profit. AWS is making a difference to the bottom line. Microsoft is trying hard to catch up with its Azure cloud and so is Google with its GCE (Google Computing Cloud). Today’s AWS is a very rich stack with its own database as a service (Redshift, Dynamo, and Aurora), elastic Map-Reduce, serverless offering with Lambda, and much more.There are predictions that AWS could one day be the biggest business for Amazon.

While the pacific north-west remains to be the home of the richest man on earth, the title shifts to Bezos from Gates.

Blockchain 101

There is a lot of noise on Blockchain these days. Back in May, 2015 The Economist wrote a whole special on Bockchain and it said, “The “blockchain” technology that underpins bitcoin, a sort of peer-to-peer system of running a currency, is presented as a piece of innovation on a par with the introduction of limited liability for corporations, or private property rights, or the internet itself”. It all started after the 2008 financial crisis, when a seminal paper written by Satoshi Nakamoto on Halloween day (Oct 31, 2008) caught the attention of many (the real identity of the author is still unknown). The name of the paper was “Bitcoin: A peer to peer electronic cash system”. Thus began a cash-less, bank-less world of money exchange over the internet using blockchain technology. Bitcoin’s value has exceeded $6000 and market cap is over $100B. VC’s are rushing to invest in cryptocurrency like never before.

The September 1, 2017 issue of Fortune magazine’s cover page screamed “Blockhain Mania”. The article said, “A blockchain is a kind of ledger, a table that businesses use to track credits and debits. But it’s not just any run-of-the-mill financial database. One of blockchain’s distinguishing features is that it concatenates (or “chains”) cryptographically verified transactions into sequences of lists (or “blocks”). The system uses complex mathematical functions to arrive at a definitive record of who owns what, when. Properly applied, a blockchain can help assure data integrity, maintain auditable records, and contracts into programmable software. It’s a ledger, but on the bleeding edge”.

So welcome to the new phase of network computing where we switch from “transfer of information” to “transfer of values”. Just as TCP/IP became the fundamental protocol for communication and helped create today’s internet with the first killer app Email (SMTP), blockchain will enable exchange of assets (the first app being Bitcoin for money). So get used to new terms like cryptocurrency, DLS (distributed ledger stack), nonce, ethereum, smart contracts, pseudo anonymity, etc. The “information internet” becomes the “value internet”. — Patrick Byrne, CEO of Overstock said, “Over the next decade, what the internet did to communications, blockchain is going to do to about 150 industries”. — In a recent article in Harvard Business Review, authors Joi Ito, Neha Narula, and Robleh Ali said, “The blockchain will do to the financial system what the internet did to media”.

The key elements of blockchain are the following:

  • Distributed Database – each party on a blockchain has access to entire DB and its complete history. No single party controls the data or the info. Each party can verify records without an intermediary.
  • Peer-to-Peer Transmission (P2P) – communication directly between peers instead of thru a central node.
  • Transparency with Pseudonymity – each transaction and associated value are visible to anyone with access to system. Each node/user has a unique 30-plus-character alphanumeric address. Users can choose to be anonymous or provide proof of identity. Transactions occur between blockchain addresses.
  • Irreversibility of records – once a transaction is entered in the DB, they can not be altered, because they are linked to every xaction record before them (hence the term ‘chain’).
  • Computational Logic – blockchain transactions can be tied to computational logic and in essence programmed.

The heart of the system is a distributed database that is write-once, read-many with a copy replicated at each node. It is transaction processing in a highly distributed network with guaranteed data integrity, security, and trust. Blockchain also provides automated, secure coordination system with remuneration and tracking. Even if it started with “money transfer” via Bitcoin, the underpinnings can be applied to any assets. The need for a central coordinating agency such as bank becomes unnecessary. Assets such as mortgages, bonds, stocks, loans, home titles, auto registries, birth and death certificates, passport, visa, etc. can all be exchanged without intermediaries. The Feb, 2017 HBR article said, “Blockchain is a foundational technology (not disruptive). It has the potential to create new foundations for our economic & social systems.”

We did not get into the depth of the technology here, but plenty of literature is available for you to read. Major vendors such as IBM, Microsoft, Oracle, HPE are offering blockchain as an infrastructure service for enterprise asset management.

API-driven Economy?

I just went to a couple of sessions at the API World going on at the San Jose Convention center. I heard all kinds of new terms thrown within a span of couple of hours – the new API driven economy, iSaaS (integration software as a service), iPaaS (integration platform as a service), APIM (API management), BaaS (Backend as a service), etc. Then there was confusing & overlapping mixture of ideas in microservices, containers, connectors, API’s..all in the space of system integration. There were lots of young software developers at this conference and booths from companies I have never heard of – Jitterbit (Enterprise iPaaS), Back4App (backend dev. via Parser server), PubNub (real-time API), Rigor (API monitoring and testing). I took a deep breath and thought of all these ideas over last 3 decades – api’s, subroutines, reusable web services, service-oriented-architecture, integration via connectors, assembly of interchangeable parts from common libraries, etc. Welcome back to the future!

I see the urgency of this now that we have so many products and platforms in every category. A speaker from Jitterbit showed how Cisco’s marketing software stack has 39 different technologies – SalesForce, 6Sense, Eloqua, App Annie, Live Agent, etc. They do functions like campaign management, CRM, email blast, mobile notification… This is definitely not the ideal solution. Jitterbit wants to be the mediator via API’s to consolidate all these based on activities and work flow. No wonder, this Alameda-based startup is doing very well. I was not surprised to learn that SalesForce & private equity firm KKR are investors in Jitterbit.

Gartner predicts enterprise application integration market to be $33.5B by 2020 (CAGR of 7.1% from $25.5B in 2016), whereas the integration platform as a service (iPaaS) will be $3.6B by 2021 (CAGR of 41.5% from $526M in 2016). The data integration market is going to reach $12.2B in 2022 from $6.4B in 2017 (CAGR 13.7%). Gartner says, “IT leaders should combine on-premise integration platform, iPaaS, iSaaS and API Management capabilities into a single, yet modular enterprise capability.” Gartner defines this whole space as Application Integration Platforms.

I think it’s time we consolidate all these terms and bring real clarity. Current marketing hype of API driven economy does not help much. What used to be a programmer’s term (api – application programming interface) is now marketed as a broad term to solve the world hunger problem.

The goal has not changed – we want integration of heterogeneous systems (both inside and outside the enterprise) to be highly efficient, transparent, and less labor intensive.

iPhone’s tenth anniversary – iPhone X

Yesterday (September 12, 2017), Apple celebrated the tenth anniversary of its original iPhone, launched by Steve Jobs back in 2007 at the Moscone Center in San Francisco. It was a big day when Apple opened its brand new Steve Jobs Theater at the new Apple campus. The show began in front of 1000 invitees with a Steve Jobs video from the first iPhone event, thus inaugurating his own designed theater. His wife Lauren and co-founder Steve Wozniak were present. It was a big moment.

Besides introducing incremental upgrades to Apple Watch and Apple TV (4K support), Apple introduced two versions of iPhone 8, basically very similar to iPhone7. The brand new thing was Apple X (Ten, not X). This was a very different design. The screen is bigger (5.8″) using OLED technology for the first time. Ironically the OLED screen is developed by Samsung. The iPhone X is only slightly bigger than the iPhone 7, but its screen is larger than that of the jumbo-size iPhone 7 Plus.

Here are the highlights of iPhone X:

  • A gorgeous screen and beautiful design.
  • Great cameras, wireless charging, better battery life, and water resistance.
  • No home screen, side button is multi-tasked to do few functions.
  • The best mobile operating system.
  • All on a device that you’ll end up using several hours a day.

Facial-recognition is the most prominent new feature. Called Face ID, it will be the primary tool to unlock the nearly $1,000 iPhone X, which is scheduled to start shipping Nov. 3. A camera system with depth sensors project 30,000 infrared dots across a user’s face that computing systems use to create a mathematical model that is stored securely on the phone. Each time users hold the device to their faces, the technology verifies the mathematical model before unlocking the phone in an instant. Considering iPhone users on average unlock their devices 80 times a day, the success of Face ID could make or break the device, analysts says, especially after early users get their hands on it and begin sharing their experiences publicly. This is a crucial function that must be flawless. Yesterday the demo failed and that’s not very auspicious.

If it catches on, the facial-scanning technology in iPhone X could unlock other changes in how we use smartphones. In one small example, Apple also is using the system to capture facial expressions and use them to animate images of chickens, unicorns and other common emojis. Those animojis, as Apple calls them, can be captured and shared with friends.

iOS remains the best smartphone operating system and the iPhone’s biggest advantage over its competition. Apple’s operating system is the only smartphone platform that comes with consistent, guaranteed updates. And it’s the only one that routinely brings cutting-edge features, like augmented reality, to older phones.

Johny Ive’s new design elegance is clearly seen in iPhone X, as also in the round glass auditorium lobby of the Steve Job’s theater.

Splice Machine – What is it?

Those of you who have never heard of Splice Machine, don’t worry. You are in the company of many. So I decided to listen to a webinar last week that said the following in its announcement: learn about benefits of a modern IoT application platform that can capture, process, store, analyze and act on the large streams of data generated by IoT devices. The demonstration will include:

  • High Performance Data Ingestion
  • Analytics and Transformation on Data-In-Motion
  • Relational DBMS, Supporting Hybrid OLTP and OLAP Processing
  • In-Memory and Non-Volatile, Row-based and Columnar Storage mechanisms
  • Machine Learning to support decision making and problem resolution

That was a tall order. Gartner has a new term HTAP – Hybrid Transactional and Analytical Processing. Forrester uses “Translytical” to describe this platform where you could do both OLTP and OLAP. I had written a blog on Translytical database almost two years back. So I did attend the webinar and it was quite impressive. The only confusion was the liberal use of IoT in its marketing slogan. By that they want to emphasize “streaming data” (ingest, store, manage).

In Splice Machine’s website, you see four things: Hybrid RDBMS, ANSI SQL, ACID Transactions, and Real-Time Analytics. A white paper advertisement says, “Your IoT applications deserve a better data platform”. In looking at the advisory board members, I recognized 3 names – Roger Bamford, ex-Oracle and an investor, Ken Rudin, ex-Oracle, and Marie-Anne Niemet, ex-TimeTen. The company is funded by Mohr Davidow Ventures, and Interwest Partners amongst others.

There is a need for bringing together the worlds of OLTP (Transaction workloads) and Analytics or OLAP workloads into a common platform. They have been separated for decades and that’s how the Data Warehouse, MDM, OLAP cubes, etc. got started. The movement of data between the OLTP world and OLAP has been handled by ETL vendors such as Informatica. With the popularity of Hadoop, the DW/Analytics world is crowded with terms like Data Lake, ELT (first load, then transform), Data Curation, Data Unification, etc. A new architecture called Lambda (not to be confused with AWS Lambda for serverless computing) claims to unify the two worlds – OLTP and real-time streaming and analytics.

Into this world, comes Splice Machine with its scale-out data platform. You can do your standard ACID-compliant OLTP processing, data ingestion via Spark streaming and Kafka topics, query processing via ANSI SQL, and get your analytical workload without ETL. They even claim support of procedural language like PL/SQL for Oracle data. With their support of machine learning, they demonstrated predictive analytics. The current focus is on verticals like Healthcare, Telco, Retail, and Finance (Wells fargo), etc.

In the cacophony of Big Data and IoT noise, it is hard to separate facts from fiction. But I do see a role for a “unified” approach like Splice Machine. Again, the proof is always in the pudding – some real-life customer deployment scenarios with performance numbers will prove the hypothesis and their claim of 10x faster speed with one-fourth the cost.

Apache Drill + Arrow = Dremio

A new company just emerged from stealth mode yesterday, called Dremio, backed by Redpoint and Lightspeed in a Series A funding of $10m back in 2015. The founders came from MapR, but were active in Apache projects like Drill and Arrow. The same VC’s backed MapR and had the Dremio founders work out of their facilities during the stealth phase. Now the company has around 50 people in their Mountainview, California office.

Apache Drill acts as a single SQL engine that, in turn, can query and join data from among several other systems. Drill can certainly make use of an in-memory columnar data standard. But while Dremio was still in stealth, it wasn’t immediately obvious what Drill’s strong intersection with Arrow might be. But yesterday the company launched a namesake product that also acts as a single SQL engine that can query and join data from among several other systems, and it accelerates those queries using Apache Arrow. So it is a combo of (Drill + Arrow): schema-free SQL for variety of data sources plus a columnar in-memory analytics execution engine.

Dremio believes that BI today involves too many layers. Source systems, via ETL processes, feed into data warehouses, which may then feed into OLAP cubes. BI tools themselves may add another layer, building their own in-memory models in order to accelerate query performance. Dremio thinks that’s a huge mess and disintermediates things by providing a direct bridge between BI tools and the source system they’re querying. The BI tools connect to Dremio as if it were a primary data source, and query it via SQL. Dremio then delegates the query work to the true back-end systems through push-down queries that it issues. Dremio can connect to relational databases (DB2, Oracle, SQL Server, MySQL, PostgreSQL), NoSQL stores (MongoDB, Amazon Redshift, HBase, MapR-FS), Hadoop, cloud blob stores like S3, and ElasticSearch.

Here’s how it works: all data pulled from the back-end data sources is represented in memory using Arrow. Combined with vectorized (in-CPU parallel processing) querying, this design can yield up to a 5x performance improvement over conventional systems (company claims). But a perhaps even more important optimization is Dremio’s use of what it calls “Reflections,” which are materialized data structures that optimize Dremio’s row and aggregation operations. Reflections are sorted, partitioned, and indexed, stored as files on Parquet disk, and handled in-memory as Arrow-formatted columnar data. This sounds similar to ROLAP aggregation tables).

Andrew Brust from ZDNet said, “While Dremio’s approach to this is novel, and may break a performance barrier that heretofore has not been well-addressed, the company is nonetheless entering a very crowded space. The product will need to work on a fairly plug-and-play basis and live up to its performance promises, not to mention build a real community and ecosystem. These are areas where Apache Drill has had only limited success. Dremio will have to have a bigger hammer, not just an Arrow”.