Category Archives: cloud computing

Data Sharehouse?

This is yet another new term in our lexicon. The San Mateo, California-based startup Snowflake announced this week a new offering with this name, as a free add-on to the data warehouse it built for cloud computing. Now companies using Snowflake’s technology, officially called Snowflake Data Sharing, can share any part of their data warehouses, subject to defined security policies and controls on access, with each other.

Snowflake’s data sharehouse allows companies to provide direct access to structured and unstructured data without the need to copy the data to a new location. Current approaches include file-sharing, electronic data interchange, application programming interfaces and email, but all of them have issues ranging from lack of security to cumbersome methods of providing data access to the right people. Jon Bock, Snowflake’s marketing chief compared the difference in data sharing on Snowflake versus other methods to the difference between streaming music and compact discs. “It looks [to the data recipient] just as if the data resides on their own data warehouse,” he said.

The catch is that every participant must be a Snowflake customer using their data warehouse in the cloud. So this is another way to grow their market. We have seen this approach in the 1990s when Exchanges were introduced by the likes of Oracle for B2B data interchange. That did not go very far. Of course cost was a big factor, but the policy agreement on common formats and security for data exchange was another issue. Snowflake claims to solve this by having one source of truth in the cloud.

Of course companies, like manufacturers and suppliers, advertisers and publishers have been sharing data for quite a long time, but it has been cumbersome via technologies like EDI (electronic data interchange, developed in the 1940s), email, file sharing, APIs and more. That kind of sharing takes time and wasn’t created for the current situation, in which businesses need live data processed in real time to keep a competitive edge.

According to Bob Muglia, Snowflake’s CEO (ex-Microsoft), the data sharehouse changes the game and democratizes the possibilities, because anyone can access the service. Rather than being charged a subscription fee, users pay only according to the amount of data they have processed. Snowflake’s data sharing service is free to data providers, data consumers pay for the compute resources they use. Not only that, but data providers and consumers make their arrangements independent of Snowflake Computing which is the infrastructure provider.

In an increasingly collaborative world there is little doubt that sharing data easily, and in real time, without sacrificing security, privacy, governance and compliance is of great value. Whether it will create entirely new markets has yet to be seen, but actionable data-driven insights are likely to be huge differentiators in the digital economy.

It is a clever move, but time will tell if this will enable smooth data exchange or create more chaos.

Amazon+Whole Foods – How to read this?

Last Thursday (June 15, 2017), Amazon decided to acquire Whole Foods for a whopping $13.7B ($42 per share, a 27% premium to its closing price). On Friday, stock prices of Walmart, Target, and Costco took a hit downwards, while Amazon shares went up by more than 2%. So why did Amazon buy Whole Foods? Clearly Amazon sees groceries as an important long-term driver of growth in its retail segment. What is funny is that a web pioneer with no physical retail outlet decided to get back to the brick-and-mortar model. Amazon has also started physical bookstores at a few cities. We have come full circle.

Amazon grocery business has focussed on Amazon Fresh subscription service so far to deliver online food orders. Amazon will eventually use the stores to promote private-label products, integrate and grow its AI powered Echo speakers, boost prime membership and entice more customers into the fold. Hence this acquisition is the start of a long term strategy. Amazon is known for its non-linear thinking. Just see how it started a brand new business with AWS about 12 years back and now it is a $14B business with a 50%+ margin. It commands a powerful leadership position in the cloud computing business and competitors like Microsoft Azure or Google’s GCE are trying hard to catch up.

The interesting thing to ponder is how the top tech companies are spreading their tentacles. This was a front-page article in today’s WSJ. Apple, a computer company that became a phone company, is now working on self-driving cars, TV programming, and augmented reality. It is also pushing into payments territory challenging the banks. Google parent Alphabet built Android which now runs most PC devices. It ate the maps industry; it’s working on internet-beaming balloons, energy-harvesting kites, and self-driving technologies. Facebook is creating drones, VR hardware, original TV shows, and even telepathic brain computers. Of course Elon Musk brings his tech notions to any market he pleases – finance, autos, energy, and aerospace.

What is special about Amazon is that it is willing to work on everyday problems. According to the author of the WSJ article, this may be the smarter move in the long run. While Google and Facebook have yet to drive significant revenue outside their core, Amazon has managed to create business after business that is profitable, or at least not a drag on the bottom line. The article ends with cautionary note, “Imagine a future in which Amazon, which already employs north of 340,000 people worldwide, is America’s biggest employer. Imagine we are all spending money at what’s essentially the company store, and when we get home we’re streaming Amazon’s media….”

With few tech giants controlling so many businesses, are we comfortable to get all our goods and services from the members of an oligopoly?

A conference in Bangalore

I was invited to speak at a conference called Solix Empower 2017 held in Bangalore, India on April 28th, 2017. It was an interesting experience. The conference focused on Big Data, Analytics, and Cloud. Over 800 people attended the one-day event with keynotes and parallel tracks on wide-ranging subjects.

I did three things. First, I was part of the inaugural keynote where I spoke on “Data as the new Oxygen” showing the emergence of data as a key platform for the future. I emphasized the new architecture of containers and micro-services on which are machine learning libraries and analytic tool kits to build modern big data applications.

Then I moderated two panels. The first was titled, ” The rise of real-time data architecture for streaming applications” and the second one was called, “Top data governance challenges and opportunities”. In the first panel, the members came from Hortonworks, Tech Mahindra, and ABOF (Aditya Birla Fashion). Each member described the criticality of real-time analytics where trends/anomalies are caught on the fly and action is taken immediately in a matter of seconds/minutes. I learnt that for online e-commerce players like ABOF, a key challenge is identifying customers most likely to refuse goods delivered at their door (many do not have credit cards, hence there is COD or cash on delivery). Such refusal causes major loss to the company. They do some trend analysis to identify specific customers who are likely to behave that way. By using real-time analytics, ABOF has been able to reduce such occurrences by about 4% with significant savings. The panel also discussed technologies for data ingestion, streaming, and building stateful apps. Some comments were made on combining Hadoop/EDW(OLAP) plus streaming(OLTP) into one solution like the Lambda architecture.

The second panel on data governance had members from Wipro, Finisar, Solix and Bharti AXA Insurance. These panelists agreed that data governance is no longer viewed as the “bureaucratic police and hence universally disliked” inside the company and it is taken seriously by the upper management. Hence policies for metadata management, data security, data retirement, and authorization are being put in place. Accuracy of data is a key challenge. While organizational structure for data governance (like a CDO, chief data officer) is still evolving, there remains many hard problems (specially for large companies with diverse groups).

It was interesting to have executives from Indian companies reflect on these issues that seem no different than what we discuss here. Big Data is everywhere and global.

The end of Cloud Computing?

A provocative title for sure when everyone thinks we just started the era of cloud computing. I recently listened to a talk by Peter Levine, general partner at Andreessen Horowitz on this topic which makes a ton of sense. The proliferation of intelligent devices and the rise of IoT (Internet of Things) lead us to a new world beyond what we see today in cloud computing (in terms of scale).

I have said many times that the onset of cloud computing was like back to the future of centralized computing. We had IBM mainframes, dominating the centralized computing era during the 1960s and 1970s. The introduction of PCs created the world of client-server computing (remember the wintel duopoly?) from 1980s till 2000. Then the popularity of the mobile devices started the cloud era in 2005, thus taking us back to centralized computing again. The text message I send you does not go from my device to your device directly, but gets to a server somewhere in the cloud first and then to your phone. The trillions of smart devices forecasted to appear as sensors in automobiles, home appliances, airplanes, drones, engines, and almost any thing you can imagine (like in your shoe) will drastically change the computing paradigm again. Each of these “edge intelligent devices” can not go back and forth to the cloud for every interaction. Rather they would want to process data at the edge to cut down latency. This brings us back to a new form of “distributed computing” model – kind of back to a vastly expanded version of the “PC era”.

Peter emphasized that the cloud will continue to exist, but its role will change from being the central hub to a “learning center” where curated data from the edge (only relevant data) resides in the cloud. The learning gets pushed back to the edge for getting better at its job. The edge of the cloud does three things – sense, infer, and act. The sense level handles massive amount of data like in a self-driving car (10GB per mile), thus making it like a “data center on wheels”. The sheer volume of data is too much to push back to the cloud. The infer piece is all machine learning and deep learning to detect patterns, improve accuracy and automation. Finally, the act phase is all about taking actions in real-time. Once again, the cloud plays the central role as a “learning center” and the custodian of important data for the enterprise.

Given the sheer volume of data created, peer-to-peer networks will be utilized to lessen load on core network and share data locally. The challenge is huge in terms of network management and security. Programming becomes more data-centric, meaning less code and more math. As the processing power of the edge devices increases, the cost will come down drastically. I like his last statement that the entire world becomes the domain of IT meaning we will have consumer-oriented applications with enterprise-scale manageability.

This is exciting and scary. But whoever could have imagined the internet in the 1980s or the smartphone during the 1990s, let alone self-driving cars?

IoT Analytics – A panel discussion

I was invited to participate in a panel called “IoT Analytics” last Thursday, March 23rd. This was organized for the IoT Global Council by Erick Schonfeld of Traction Technology Partner (New York). Besides me there were two other speakers: Brandon Cannaday, cofounder and chief product officer of Losant and Patrick Stuart, head of products at SkyCatch. For those of you not familiar with IoT, it stands for Internet of Things. There is another term called IIoT for Industrial Internet of Things. IoT has been in the lexicon for last few years signifying the era of “pervasive computing” where devices with an IP address can be everywhere – the freeze, microwave, thermostats, door knobs, cars, airplanes, electric motors, various sensors,…..constantly sending data. The phrases “connected home” or “connected car” are an upshot of the IoT phenomenon. However Gartner group showed IoT to be at the peak of the “hype cycle” couple of years back.

I emphasized on the “pieces of the puzzle” or the components of IoT Analytics – data ingestion at scale, handling streaming data pipeline, data curation and unification, and storing the results in a highly scalable NoSQL data store, as the steps before analytics can happen. Just dumping everything into a Hadoop data lake only addresses 5% of the problem (data ingestion). Transforming the data and curating it to make sense is a non-trivial step. Then I spoke about analytics which has several components – descriptive (what happened and why?), predictive (what is probably going to happen?), and prescriptive (what should I do about it?). Streaming analytics must filter, aggregate, enrich, and analyze high throughput of data from disparate sources to identify patterns, detect urgent situations (like a temperature spike in an engine), and automate immediate action in real time.

Patrick of SkyCatch showed how they are serving the construction industry in taking images (via drones) and accurately creating “earth maps” for self-driving bulldozers, thus saving human labor cost. Another example was taking images of actual progress in large construction sites and contrasting it against plan, to show offsets, thus detecting delays and taking corrective actions in time.

Brandon of Losant showed example of a large utility company in Australia that supplies high powered (expensive) pumps with sensors. By collecting data from the sensors and monitoring it centrally, they can identify problems and notify the maintenance teams for taking corrective actions. Previously they had to fly people around for maintenance and this new IoT Analytics has saved the company lots of cost. Both are startup companies in the IoT Analytics space and are tackling immediate issues in real time.

It was a good panel and I learnt a lot from my co-panelists.

Data-driven enterprise

87bcf8ea-34c4-44f7-a9be-e6982c226924-originalI moderated a panel of 3 CIOs last Sunday at the Solix Empower conference on the subject of data-driven enterprise. The three CIO’s came from different industries. Marc Parmet of the TechPar group spent many years at Avery Dennison after stints at Apple and IBM. Sachin Mathur leads the IT innovations at Terex Corp., a large company supplying cranes and other heavy equipments. PK Agarwal, currently dean at Northeastern University, used to be the CIO for the Government of California. Here are some of the points covered:

  • I reminded the audience that we are at the fourth paradigm in science (as per the late Jim Gray). A thousand year ago, science was experimental, then few hundred years back science became theoretical (Newton’s law, Maxwell’s law..), fifty years ago, science became computational (simulation via a computer). Now the fourth paradigm is data-driven science where experiment, theory, and computation must be combined to one holistic discipline. Actually science hit the “big data” problem long before the commercial world.
  • Top level management is starting to understand that data is the oxygen, but they are yet to fully make their organizations data-driven. Just having a data warehouse with analytics and reporting does not make it data-driven, but they do see the value of predictive analytics and deep learning for competitive advantage.
  • While business-critical applications continue to run on-premise, newer, less critical apps such as collaboration and email (e.g. Lotus Notes) are moving to the public cloud. One said that they are evaluating migrating current Oracle ERP to a cloud version. Data security and reliability are critical needs. One panelist talked about not just private, public or hybrid cloud, but “scattered” cloud which will be highly distributed.
  • Out of the 3V’s of big data (volume, variety, and velocity), variety seems to be of higher need – images, pictures, videos combined with sensors deployed in manufacturing and factory automation. For industries such as retail and telcos, volume dominates. The velocity part will become more and more critical as streaming of these data in real-time will need fast ingestion and analysis-on-the-fly for timely decision making. This is the emerging world of IoT where devices with an IP address will be everywhere – individuals, connected homes, autonomous cars, connected factories. They will produce huge amounts of data volume. Cluster computing with Hadoop/Spark will be the most economical technology to deal with this load. Much work lies ahead.
  • There will be serious shortage of “big data” or “data science” skills, of the order of 4-5 million in next few years. Hence universities such as Northeastern is setting up new curriculum on data science. Today’s data scientist must have knowledge of the business, algorithms, comp. science, statistical modeling plus he/she must be good story teller. Unlike the past, it’s not just answering questions, but figuring out what questions to ask. Such skills will be at a premium as enterprises become more data-driven.

We discussed many other points. It was a fun panel.


Oracle’s push into cloud solutions

I watched Larry Ellison’s keynotes at this week’s Oracle Open world conference in San Francisco. They are definitely serious in pushing their cloud offerings, even though they came in late. But Oracle claimed that they have been working on it for almost ten years. The big push is at all 3 levels – SaaS, PaaS, and IaaS. The infrastructure as a service claims faster and cheaper resources (computing, storage, and networking) to beat Amazon’s AWS. They make a good point on better security for the enterprises, given the risk of security breaches happening at greater frequency lately. One comment I have is that AWS is beyond just IaaS, they are into PaaS as well (e.g. Docker services, etc. for devops). Oracle’s big advantage is in offering SaaS for all their application suits – ERP, HCM and CRM (they call it CX as customer experience). This is not something AWS offers for the enterprise market, although apps like SalesForce and Workday are available. Microsoft has Dynamics as an ERP on their cloud.

I do agree that Oracle has an upper hand when it comes to database as a service. Larry showed performance numbers for AWS Redshift, Aurora, and DynamoDB compared to Oracle’s database (much faster). They do have a chance to beat AWS when it comes to serious enterprise-scale implementations, given their strong hold in that market. Most of these enterprises still run much of their systems on-premise. Oracle offers them an alternative to switch to the cloud version within their firewall. They also suggest the co-existence of both on-prem and cloud solutions. The total switch-over to cloud will take ten years or more, as the confidence and comfort level grows over time.

AWS has a ten year lead here and they have grown in scale and size. The current run rate for AWS is over $10B in revenue with hefty profit (over 50%). However, many clients complain about the high cost as you use more services of AWS. Microsoft Azure and Google’s cloud services are marching fast to catch up. Most of the new-age web-companies use AWS. Oracle is better off focusing on the enterprise market, their strong hold. Not to discount IBM here, who is pushing their Soft Layer cloud solutions to the enterprise customers. Mark Hurd of Oracle showed several examples of cloud deployment at large to medium size companies as well. One interesting presence at the Open World yesterday was the chief minister (like a state Governor) of the Indian state, Maharashtra (Mumbai being the big city there). He signed a deal with Oracle to help implement cloud solutions to make many cities into “smart” cities and also connecting 29000 villages digitally. This is a big win for Oracle and will set the stage for many other government outfits to follow suit.

I think more competition to AWS is welcome, as no one wants a single-vendor lock-in. Mark Hurd said that by 2020, cloud solutions will dominate the enterprise landscape. The analysts are skeptical on Oracle’s claim over AWS, but a focused Oracle on cloud is not to be taken lightly.

Jnan Dash