I moderated a panel of 3 CIOs last Sunday at the Solix Empower conference on the subject of data-driven enterprise. The three CIO’s came from different industries. Marc Parmet of the TechPar group spent many years at Avery Dennison after stints at Apple and IBM. Sachin Mathur leads the IT innovations at Terex Corp., a large company supplying cranes and other heavy equipments. PK Agarwal, currently dean at Northeastern University, used to be the CIO for the Government of California. Here are some of the points covered:
- I reminded the audience that we are at the fourth paradigm in science (as per the late Jim Gray). A thousand year ago, science was experimental, then few hundred years back science became theoretical (Newton’s law, Maxwell’s law..), fifty years ago, science became computational (simulation via a computer). Now the fourth paradigm is data-driven science where experiment, theory, and computation must be combined to one holistic discipline. Actually science hit the “big data” problem long before the commercial world.
- Top level management is starting to understand that data is the oxygen, but they are yet to fully make their organizations data-driven. Just having a data warehouse with analytics and reporting does not make it data-driven, but they do see the value of predictive analytics and deep learning for competitive advantage.
- While business-critical applications continue to run on-premise, newer, less critical apps such as collaboration and email (e.g. Lotus Notes) are moving to the public cloud. One said that they are evaluating migrating current Oracle ERP to a cloud version. Data security and reliability are critical needs. One panelist talked about not just private, public or hybrid cloud, but “scattered” cloud which will be highly distributed.
- Out of the 3V’s of big data (volume, variety, and velocity), variety seems to be of higher need – images, pictures, videos combined with sensors deployed in manufacturing and factory automation. For industries such as retail and telcos, volume dominates. The velocity part will become more and more critical as streaming of these data in real-time will need fast ingestion and analysis-on-the-fly for timely decision making. This is the emerging world of IoT where devices with an IP address will be everywhere – individuals, connected homes, autonomous cars, connected factories. They will produce huge amounts of data volume. Cluster computing with Hadoop/Spark will be the most economical technology to deal with this load. Much work lies ahead.
- There will be serious shortage of “big data” or “data science” skills, of the order of 4-5 million in next few years. Hence universities such as Northeastern is setting up new curriculum on data science. Today’s data scientist must have knowledge of the business, algorithms, comp. science, statistical modeling plus he/she must be good story teller. Unlike the past, it’s not just answering questions, but figuring out what questions to ask. Such skills will be at a premium as enterprises become more data-driven.
We discussed many other points. It was a fun panel.