Category Archives: IoT

Big Data & Analytics – what’s ahead?

Recently I read somewhere this statement – As we end 2017 and look ahead to 2018, topics that are top of mind for data professionals are the growing range of data management mandates, including the EU’s new General Data Protection Regulation that is directed at personal data and privacy, the growing role of artificial intelligence (AI) and machine learning in enterprise applications, the need for better security in light of the onslaught of hacking cases, and the ability to leverage the expanding Internet of Things.

Here are the key areas as we look ahead:

  • Business owners demand outcomes – not just a data lake to store all kinds of data in its native format and API’s.
  • Data Science must produce results – Play and Explore is not enough. Learn to ask the right questions. Visualization of analytics from search.
  • Everyone wants Real Time – Days and weeks too slow, need immediate actionable outcomes. Analytics & recommendations based on real time data.
  • Everyone wants AI (artificial intelligence) – Tell me what I don’t know.
  • Systems must be secure – no longer a mere platitude.
  • ML (machine learning) and IoT at massive scale – Thousands of ML models. Need model accuracy.
  • Blockchain – need to understand its full potential to business – since it’s not transformational, but a foundational technology shift.

In the area of big data, a combination of new and long-established technologies are being put to work. Hadoop and Spark are expanding their roles within organizations. NoSQL and NewSQL databases bring their own unique attributes to the enterprise, while in-memory capabilities (such as Redis) are increasingly being utilized to deliver insights to decision makers faster. And through it all, tried-and-true relational databases continue to support many of the most critical enterprise data environments.

Cloud becomes the de-facto deployment choice for both users and developers. Serverless technology with FaaS (Function as a Service) is getting rapid adoption amongst developers. According to IDC, enterprises are undergoing IT transformation as they rethink their business operations, including how they use information and what technology to deploy. In line with that transformation, nearly 80% of large organizations already have a hybrid cloud strategy in place. The modern application architecture, sometimes referred to as SMAC (social, mobile, analytics, cloud) is becoming standard everywhere.

The DBaaS (database as a service) is still not as widespread as other cloud services. Microsoft is arguably making the strongest explicit claim for a converged database system with its Azure Cosmo DB as DBaaS. Cosmo DB claims to support four data models – key-value, column-family, document, and graph. However, databases have been slower to migrate to the cloud than other elements of computing infrastructure mainly for security and performance reasons. But DBaaS adoption is poised to accelerate. Some of these cloud based DBaaS systems – Cosmo DB, Spanner from Google, and AWS DynamoDB – now offer significant advantages over their on-premise counterparts.

One thing for sure, big data and analytics will continue to be vibrant and exciting in 2018.

Advertisements

Splice Machine – What is it?

Those of you who have never heard of Splice Machine, don’t worry. You are in the company of many. So I decided to listen to a webinar last week that said the following in its announcement: learn about benefits of a modern IoT application platform that can capture, process, store, analyze and act on the large streams of data generated by IoT devices. The demonstration will include:

  • High Performance Data Ingestion
  • Analytics and Transformation on Data-In-Motion
  • Relational DBMS, Supporting Hybrid OLTP and OLAP Processing
  • In-Memory and Non-Volatile, Row-based and Columnar Storage mechanisms
  • Machine Learning to support decision making and problem resolution

That was a tall order. Gartner has a new term HTAP – Hybrid Transactional and Analytical Processing. Forrester uses “Translytical” to describe this platform where you could do both OLTP and OLAP. I had written a blog on Translytical database almost two years back. So I did attend the webinar and it was quite impressive. The only confusion was the liberal use of IoT in its marketing slogan. By that they want to emphasize “streaming data” (ingest, store, manage).

In Splice Machine’s website, you see four things: Hybrid RDBMS, ANSI SQL, ACID Transactions, and Real-Time Analytics. A white paper advertisement says, “Your IoT applications deserve a better data platform”. In looking at the advisory board members, I recognized 3 names – Roger Bamford, ex-Oracle and an investor, Ken Rudin, ex-Oracle, and Marie-Anne Niemet, ex-TimeTen. The company is funded by Mohr Davidow Ventures, and Interwest Partners amongst others.

There is a need for bringing together the worlds of OLTP (Transaction workloads) and Analytics or OLAP workloads into a common platform. They have been separated for decades and that’s how the Data Warehouse, MDM, OLAP cubes, etc. got started. The movement of data between the OLTP world and OLAP has been handled by ETL vendors such as Informatica. With the popularity of Hadoop, the DW/Analytics world is crowded with terms like Data Lake, ELT (first load, then transform), Data Curation, Data Unification, etc. A new architecture called Lambda (not to be confused with AWS Lambda for serverless computing) claims to unify the two worlds – OLTP and real-time streaming and analytics.

Into this world, comes Splice Machine with its scale-out data platform. You can do your standard ACID-compliant OLTP processing, data ingestion via Spark streaming and Kafka topics, query processing via ANSI SQL, and get your analytical workload without ETL. They even claim support of procedural language like PL/SQL for Oracle data. With their support of machine learning, they demonstrated predictive analytics. The current focus is on verticals like Healthcare, Telco, Retail, and Finance (Wells fargo), etc.

In the cacophony of Big Data and IoT noise, it is hard to separate facts from fiction. But I do see a role for a “unified” approach like Splice Machine. Again, the proof is always in the pudding – some real-life customer deployment scenarios with performance numbers will prove the hypothesis and their claim of 10x faster speed with one-fourth the cost.

IoT Analytics – A panel discussion

I was invited to participate in a panel called “IoT Analytics” last Thursday, March 23rd. This was organized for the IoT Global Council by Erick Schonfeld of Traction Technology Partner (New York). Besides me there were two other speakers: Brandon Cannaday, cofounder and chief product officer of Losant and Patrick Stuart, head of products at SkyCatch. For those of you not familiar with IoT, it stands for Internet of Things. There is another term called IIoT for Industrial Internet of Things. IoT has been in the lexicon for last few years signifying the era of “pervasive computing” where devices with an IP address can be everywhere – the freeze, microwave, thermostats, door knobs, cars, airplanes, electric motors, various sensors,…..constantly sending data. The phrases “connected home” or “connected car” are an upshot of the IoT phenomenon. However Gartner group showed IoT to be at the peak of the “hype cycle” couple of years back.

I emphasized on the “pieces of the puzzle” or the components of IoT Analytics – data ingestion at scale, handling streaming data pipeline, data curation and unification, and storing the results in a highly scalable NoSQL data store, as the steps before analytics can happen. Just dumping everything into a Hadoop data lake only addresses 5% of the problem (data ingestion). Transforming the data and curating it to make sense is a non-trivial step. Then I spoke about analytics which has several components – descriptive (what happened and why?), predictive (what is probably going to happen?), and prescriptive (what should I do about it?). Streaming analytics must filter, aggregate, enrich, and analyze high throughput of data from disparate sources to identify patterns, detect urgent situations (like a temperature spike in an engine), and automate immediate action in real time.

Patrick of SkyCatch showed how they are serving the construction industry in taking images (via drones) and accurately creating “earth maps” for self-driving bulldozers, thus saving human labor cost. Another example was taking images of actual progress in large construction sites and contrasting it against plan, to show offsets, thus detecting delays and taking corrective actions in time.

Brandon of Losant showed example of a large utility company in Australia that supplies high powered (expensive) pumps with sensors. By collecting data from the sensors and monitoring it centrally, they can identify problems and notify the maintenance teams for taking corrective actions. Previously they had to fly people around for maintenance and this new IoT Analytics has saved the company lots of cost. Both are startup companies in the IoT Analytics space and are tackling immediate issues in real time.

It was a good panel and I learnt a lot from my co-panelists.