Category Archives: Conference

A conference in Bangalore

I was invited to speak at a conference called Solix Empower 2017 held in Bangalore, India on April 28th, 2017. It was an interesting experience. The conference focused on Big Data, Analytics, and Cloud. Over 800 people attended the one-day event with keynotes and parallel tracks on wide-ranging subjects.

I did three things. First, I was part of the inaugural keynote where I spoke on “Data as the new Oxygen” showing the emergence of data as a key platform for the future. I emphasized the new architecture of containers and micro-services on which are machine learning libraries and analytic tool kits to build modern big data applications.

Then I moderated two panels. The first was titled, ” The rise of real-time data architecture for streaming applications” and the second one was called, “Top data governance challenges and opportunities”. In the first panel, the members came from Hortonworks, Tech Mahindra, and ABOF (Aditya Birla Fashion). Each member described the criticality of real-time analytics where trends/anomalies are caught on the fly and action is taken immediately in a matter of seconds/minutes. I learnt that for online e-commerce players like ABOF, a key challenge is identifying customers most likely to refuse goods delivered at their door (many do not have credit cards, hence there is COD or cash on delivery). Such refusal causes major loss to the company. They do some trend analysis to identify specific customers who are likely to behave that way. By using real-time analytics, ABOF has been able to reduce such occurrences by about 4% with significant savings. The panel also discussed technologies for data ingestion, streaming, and building stateful apps. Some comments were made on combining Hadoop/EDW(OLAP) plus streaming(OLTP) into one solution like the Lambda architecture.

The second panel on data governance had members from Wipro, Finisar, Solix and Bharti AXA Insurance. These panelists agreed that data governance is no longer viewed as the “bureaucratic police and hence universally disliked” inside the company and it is taken seriously by the upper management. Hence policies for metadata management, data security, data retirement, and authorization are being put in place. Accuracy of data is a key challenge. While organizational structure for data governance (like a CDO, chief data officer) is still evolving, there remains many hard problems (specially for large companies with diverse groups).

It was interesting to have executives from Indian companies reflect on these issues that seem no different than what we discuss here. Big Data is everywhere and global.

The resurgence of AI/ML/DL

We have been seeing a sudden rise in the deployment of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). It looks like the long “AI winter” is finally over.

  • According to IDC, AI-related hardware, software and services business will jump from $8B this year to $47B by 2020.
  • I have also read comments like, “AI is like the Internet in the mid 1990s and it will be pervasive this time”.
  • According to Andrew Ng, chief scientist at Baidu, “AI is the new electricity. Just as 100 years ago electricity transformed industry after industry, AI will now do the same.”
  • Peter Lee, co-head at Microsoft Research said,  “Sales teams are using neural nets to recommend which prospects to contact next or what kind of products to recommend.”
  • IBM Watson used AI in 2011, not DL. Now all 30 components are augmented by DL (investment from $500M – $6B in 2020).
  • Google had 2 DL projects in 2012, now it is more than 1000 (Search, Android, Gmail, Translation, Maps, YouTube, Self-driving cars,..).

It is interesting to note that AI was mentioned by Alan Turing in a paper he wrote back in 1950 to suggest that there is possibility to build machines with true intelligence. Then in 1956, John McCarthy organized a conference at Dartmouth and coined the phrase Artificial Intelligence. Much of the next three decades did not see much activity and hence the phrase “AI Winter” was coined. Around 1997, IBM’s Deep Blue won the chess match against Kasparov. During the last few years, we saw deployments such as Apple’s Siri, Microsoft’s Cortana, and IBM’s Watson (beating Jeopardy game show champions in 2011). In 2014, DeepMind team used a deep learning algorithm to create a program to win Atari games.

During last 2 years, use of this technology has accelerated greatly. The key players pushing AI/ML/DL are – Nvidia, Baidu, Google, IBM, Apple, Microsoft, Facebook, Twitter, Amazon, Yahoo, etc. Many new players have appeared – DeepMind, Numenta, Nervana, MetaMind, AlchemyAPI, Sentient, OpenAI, SkyMind, Cortica, etc. These companies are all targets of acquisition by the big ones. Sunder Pichai of Google says, “Machine learning is a core transformative way in which we are rethinking everything we are doing”. Google’s products deploying these technologies are – Visual Translation, RankBrain, Speech Recognition, Voicemail Transcription, Photo Search, Spam Filter, etc.

AI is the broadest term, applying to any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision trees, and machine learning. The subset of AI that includes abstruse statistical techniques that enable machines to improve at tasks with experience is machine learning. A subset of machine learning called deep learning is composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, by exposing multi-layered neural networks to vast amounts of data.

I think the resurgence is a result of the confluence of several factors, like advanced chip technology such as Nvidia Pascal GPU architecture or IBM TrueNorth (brain-inspired computer chip), software architectures like microservice containers, ML libraries, and data analytics tool kits. Well known academia are heavily being recruited by companies – Geoffrey Hinton of University of Toronto (Google), Yann LeCun of New York University (Facebook), Andrew Ng of Stanford (Baidu), Yoshua Bengio of University of Montreal, etc.

The outlook of AI/ML/DL is very bright and we will see some real benefits in every business sector.

Data-driven enterprise

87bcf8ea-34c4-44f7-a9be-e6982c226924-originalI moderated a panel of 3 CIOs last Sunday at the Solix Empower conference on the subject of data-driven enterprise. The three CIO’s came from different industries. Marc Parmet of the TechPar group spent many years at Avery Dennison after stints at Apple and IBM. Sachin Mathur leads the IT innovations at Terex Corp., a large company supplying cranes and other heavy equipments. PK Agarwal, currently dean at Northeastern University, used to be the CIO for the Government of California. Here are some of the points covered:

  • I reminded the audience that we are at the fourth paradigm in science (as per the late Jim Gray). A thousand year ago, science was experimental, then few hundred years back science became theoretical (Newton’s law, Maxwell’s law..), fifty years ago, science became computational (simulation via a computer). Now the fourth paradigm is data-driven science where experiment, theory, and computation must be combined to one holistic discipline. Actually science hit the “big data” problem long before the commercial world.
  • Top level management is starting to understand that data is the oxygen, but they are yet to fully make their organizations data-driven. Just having a data warehouse with analytics and reporting does not make it data-driven, but they do see the value of predictive analytics and deep learning for competitive advantage.
  • While business-critical applications continue to run on-premise, newer, less critical apps such as collaboration and email (e.g. Lotus Notes) are moving to the public cloud. One said that they are evaluating migrating current Oracle ERP to a cloud version. Data security and reliability are critical needs. One panelist talked about not just private, public or hybrid cloud, but “scattered” cloud which will be highly distributed.
  • Out of the 3V’s of big data (volume, variety, and velocity), variety seems to be of higher need – images, pictures, videos combined with sensors deployed in manufacturing and factory automation. For industries such as retail and telcos, volume dominates. The velocity part will become more and more critical as streaming of these data in real-time will need fast ingestion and analysis-on-the-fly for timely decision making. This is the emerging world of IoT where devices with an IP address will be everywhere – individuals, connected homes, autonomous cars, connected factories. They will produce huge amounts of data volume. Cluster computing with Hadoop/Spark will be the most economical technology to deal with this load. Much work lies ahead.
  • There will be serious shortage of “big data” or “data science” skills, of the order of 4-5 million in next few years. Hence universities such as Northeastern is setting up new curriculum on data science. Today’s data scientist must have knowledge of the business, algorithms, comp. science, statistical modeling plus he/she must be good story teller. Unlike the past, it’s not just answering questions, but figuring out what questions to ask. Such skills will be at a premium as enterprises become more data-driven.

We discussed many other points. It was a fun panel.


TiEcon 2016 – some keynotes

After a few years gap, I attended this annual conference called TiEcon. TiE stands for The Indus Entrepreneurs, formed 23 years back by some of the valley technocrats originating from India. This is a non-profit organization to foster and help budding entrepreneurs. I helped organize the contents of this 15 years back. Now the scale has gone up and last week, there were almost 3000 attendees from the US and outside. Many attendees came from faraway places like India, Singapore, etc. Let me highlight some of the keynotes I attended.

  • Shantanu Narayen, CEO of Adobe – This was the first keynote on day 1 where he narrated how far Adobe has come, from a desktop publishing company of the 1980s and 1990s to a cloud-based digital solutions company. He emphasized the challenge of transformation and said that some of the difficult ones are the antibodies inside the company averse to change. Hence he spent a lot of cycles convincing the troops on why change is so key for survival and growth. Now Adobe has a line of products called Creative Cloud (developers), Document Cloud (Acrobat, etc delivered in cloud), and Marketing Cloud (number of analytics products in cloud). Adobe has also been acquiring companies for non-organic growth, such as Omniture. They claim to be changing the digital experience for everyone, from emerging artists to global brands.
  • Vishal Sikka, CEO of Infosys – I liked Vishal’s talk a lot. He has a Ph.D. in computer science from Stanford and his thesis was on AI, which was out of fashion for many years, but is emerging as the latest big trend. Vishal joined Infosys about 21 months back, after being CTO at SAP for many years. He described the tough transition from a product/technology company to a services company. But one can see his stamp of injecting AI technology into the services sector. He calls AI as Automation and Innovation. He announced a new solution called Infosys Mana, a platform that brings machine learning together with the deep knowledge of an organization, to drive automation and innovation – enabling businesses to continuously reinvent their system landscapes. Mana, with the Infosys Aikido service offerings, dramatically lowers the cost of maintenance for both physical and digital assets; captures the knowledge and know-how of people, and fragmented and complex systems; simplifies the continuous renovation of core business processes; and enables businesses to bring new and delightful user experiences leveraging state of the art technology. I was surprised to learn that Infosys has 200,000 employees and they educate something to the order of 17000 people every year in their huge facility in Mysore. Vishal is certainly transforming Infosys and their recent quarterly results have reflected that.
  • Sanjay Mehrotra, CEO of SanDisk – This was a real treat as I was unfamiliar with the evolution of SanDisk as a company, built by 3 immigrants – Sanjay from India, Eli Harari from Israel, and Jack Yuan from Taiwan. Sanjay described how he got rejected 3 times for a US visa when he was planning to come to UC Berkeley for his undergraduate studies. He got his BS and MS in electrical engineering and started a career at Intel where he met the other two founders. The three started SanDisk, which created a new revolution in the flash memory business. After 27 years, SanDisk was acquired by Western Digital last October for $19B. I liked the candid answers Sanjay gave to the ups and downs of his journey and how he learned many lessons while going from an engineer to a business leader and growing a company to such scale. He narrated how Sequoia rejected them for the initial investment, suggesting that funding will happen only if they follow the Intel model. Of course they refused. He said that VC’s don’t always see the future and are risk-averse if you are charting a new path.
  • Besides these keynotes, I also enjoyed listening to Diane Green, the new cloud czar at Google and how they are planning to compete with the de facto cloud king AWS. Sandy Carter from IBM described how IBM is moving towards building cognitive apps on its Watson platform.

There were several tracks on Cloud, IoT, Data Economy Social Entrepreneurship, etc. Overall it was a good 2-days experience.

The NoSQLNow conference in San Jose this week

I attended the NoSQLNow conference this week at the San Jose Convention Center. The organizers claimed there were 800 attendees, clearly much higher than last couple of years. Given the number of sessions, exhibits, speakers and attendees, the interest on newer data management products and solutions (aka Big Data) has been growing fast.

I spoke at a session titled, “Are NoSQL databases ready for the enterprise? Examples of MongoDB deployment” which was well attended. I also participated in a panel on “enterprise adoption of cloud”. My co-panelists were from Oracle and NeoDB. The conference opening session was given by one of the co-hosts, Dan McCreary and he spoke about the state of NoSQL. He mentioned that a total of $2.4B have been invested in NoSQL DB companies over last couple of years- MongoDB ($231M), CouchBase ($116M), Aerospike ($22M), Basho ($32.5M), Datastax ($83.7M), Clustrix ($59.3M), FoundationDB ($22.3M), etc. Even big player like Intel has invested in Cloudera. 

Here are some new trends in the NoSQL world:

  • Hadoop is starting to move from batch to real time and streaming
  • Real time systems are adding Hadoop integration points
  • Storm (twitter) and Spark are addressing data streaming
  • Spark/Scala is popular on multiple systems
  • MongoDB is the big leader in NoSQL operational systems based on document data model, followed by Datastax and CouchBase

The market pressures, according to Dan point to:

  • Big Data & Predictive analytics
  • Internet of Things (time series data and log files)
  • Security for highly regulated areas like finance/banking, healthcare, and the government
  • streaming data
  • keeping the operational cost low (bye bye to license fees)
  • High Availability (move away from master-slave to clusters of peer to peer networks)

There are other trends like old-school Map-Reduce programming is being taken over by Spark. JSON data formats are gaining in popularity for agile development, but there is no standardization of JSON query language. On the other hand, XQuery 3.1 is supporting both XML and JSON formats. There is new emphasis on agile transformation, as data storage is no longer the issue. The question is how non-programmers can transform data to various useful formats.  The acronym ETL will be replaced by ETTTTTTT… (extract, store in data lake, and transform in many ways).

Other keynotes included Oracle’s head of database development, Andy Mendelson, who showed Oracle’s 3 areas under “big data” – Oracle DBMS & Exadata, Oracle Hadoop, and Oracle NoSQL (formerly BerkeleyDB), all with one interface called Oracle Big Data SQL. SQL seems to make a comeback as an interface to several products such as Cloudera Impala.

Amazon presented their Dynamo DB, built for the cloud with fast and predictable performance. They claim seamless scalability and easy admin. Amazon’s motto has always been, “build services, not software”. uses DynamoDB to minimize opex.

I presented many examples of enterprises deploying MongoDB to build “systems of engagement” on top of “systems of record” ( a concept Geoff Moore of Crossing the Chasm fame has been talking lately). There is great momentum of MongoDB deployment at enterprises because of agile development (flexible data model and high coding velocity), fast scalability and high availability using shards and replicas, and the open source culture.

MongoDB World in NY City this week

I attended the MongoDB World in NY city earlier this week on June 24th. and 25th. There were 2000 attendees for this first-time event. I met many people who flew in from all over the world. It was quite a phenomenon for a six-year old company. The momentum of MongoDB  is truly amazing.

For those who do not know MongoDB, it’s a new generation open source NoSQL database for building highly interactive modern applications. Enterprises have built “systems of record” over last 3 decades using traditional RDBMS such as DB2, Oracle, MySQL and SQL Server. These systems are like the interstate highways built during the 1930s across the USA. They serve the business in handling basic functions, but are inadequate in meeting several new needs. Hence the need for building “systems of engagement” has arrived. These systems connect customers, employees, and partners into the business using mobile devices and providing visibility of critical business information. MongoDB provides two crucial functions – an easy to use development platform with very high coding velocity; and a horizontally scalable operations spanning 100s and 1000s of commodity servers at much lower cost compared to the traditional systems. Many new features for easy management of such distributed systems were announced.

While Hadoop brings a highly economic distributed  platform to handle batch-oriented offline analytics, MongoDB addresses the online transactional workloads. They compliment each other, much like today’s operational systems do with the data warehousing solutions. Hadoop is a step up in the EDW and analytics world, while MongoDB is a step up in the mission critical business transaction world. At the conference, many customers across all verticals presented success stories on how they have used MongoDB to address the data variety problem and built systems at record speed. It only takes weeks to a couple of months from inception to finish, not possible using the standard RDBMS-based technologies. Special features like handling spatial data with indexing and text search, were highlighted by some customers.

Cloudera founder and CSO Mike Olson gave a keynote on the co-existence of Hadoop and MongoDB, so did Amazon’s CTO Werner Vogels on using MongoDB via AWS. Other keynotes were given by Max Schireson, CEO of MongoDB and Eliot Horowitz, CTO of MongoDB. A number of CIOs and CTOs of large enterprises were present also. The excitement and developer endorsement was visible all around. It clearly showed that amongst the new database solutions, MongoDB is the unquestioned leader. Beside Amazon AWS, MongoDB is also available in other cloud solutions such as Google Cloud Platform and Microsoft Azure.There were many partners showing their support of MongoDB at the exhibition booths – Teradata, Cloudera, AppDynamics, Pure Storage, SAP, etc.

For a first-time event, MongoDB World was quite remarkable!

Internet of Things – IoT

This phrase “Internet of Things” is making big rounds these days, specially this week at CES (Consumer Electronic Show), Las Vegas. Not sure who came up with this, maybe Cisco in one of their self-serving predictions of the enormous growth of devices connected to the Internet (from about 10 billion today to 50 billion by 2020) and hence the need for their networking gear. John Chambers will elaborate this opportunity at CES in a keynote speech tomorrow. Gartner puts the number of connected devices at fewer than 30 billion, but sees $309 billion in additional revenue for products and service suppliers by 2020.

So the next wave of computing has started in a big way: from smartphones and tablets we move to wearables and other gadgets connected to various home entities. Example of devices on the market or on the drawing board include smart door locks, toothbrushes, wrist-watches, fitness trackers (I wear one called Fitbit to track the steps and distance I walk plus floors I climb plus calories I burn), smoke detectors, surveillance cameras, ovens, toys, and robots.

One of the best start-ups called Nest Labs (founded by ex Apple executive Tony Fadell) supplies beautifully-designed Wi-Fi enabled thermostats (costs couple of hundred dollars at Home Depot) and smoke detectors. Another company called August is developing smart door locks. Consumers can now use smartphones to remotely check if they locked doors, left the light on or turned down the thermostat. Parking meters can communicate with smartphones users.

There will be several hurdles in connecting all these devices seamlessly – main one seems to be the fragmented assortment of wireless communications technologies. Someone said that things are getting connected badly. Privacy issues also come up as something to sort out. But there is no doubt that this evolution is on in a big way. This week’s CES features key executives from Google, Twitter, Yahoo, besides the usual consumer product giants like Sony, Samsung, LG, etc. A whole series of connected devices to home appliances will be there. This assumes cloud computing and Big Data as base technologies. There is enormous optimism on the vast opportunity this IoT will generate.

I am heading there tomorrow to see first hand this new revolution on display.