Back in the early 1980s, two IBMers wrote a seminal paper in the IBM Systems Journal, using the phrase “Data Warehouse” for the first time. I remember reading it and the last names of the authors were Devlin and Murphy from IBM Ireland. They described how they had isolated production data (internal IBM Europe stuff) into a separate Data Warehouse for retroactive analysis. This way, the production systems were not interrupted by users doing trends analysis. This “non-interference” factor was very important from a performance point of view. The production system used IMS database, and the extracted data was in relational form for easy query-ability.
Soon after that publication, many of IBM’s customers called to say that they have the same issues – how to access production data without affecting performance to satisfy a new set of users wanting to do analysis on trends and usage. For example, someone in a retail industry wants to find out how many red sweaters got sold at a specific store during the month of December. Answers to such questions can help inventory management and marketing promotions. The two industries who jumped into such analysis were retail and telco (call center analysis). I remember going on IBM customer roadshows explaining Data Warehousing back in the year 1990.
The database vendors assumed that they were the default leaders in this game, as they “owned” the data of the customers. But what they did not have were front-end analytic tools. So two new industries came up during the 1980s and 1990s. The boring task of data extraction, transformation, and loading (ETL) from multiple legacy sources and the front-end tools for varieties of analytics for the knowledge worker. The ETL market gave rise to companies such as Prism and Informatica. The analytics industry saw new players like Business Objects (now part of SAP), Cognos (part of IBM), and Hyperion (part of Oracle).
Another trend also started soon after – that of predicting the future based on past trends. This was called “Data Mining”. All BI vendors made claims that not only they can do retroactive analysis, but also do predictive analysis. The example popular at the time was – you don’t drive your car looking at the rear-view mirror.
Two recent trends are visible in this journey of the BI players. The first one is more verticalization using an “all-in-one” solution like an appliance. Netizza led the way here followed by newer players such as GreenPlum. These products market a specific industry sector and provide better price-performance than the generalized solutions of the past. Even HP had joined the race with its new offering called NeoView (Teradata fighter). Oracle and HP have joined forces last year to come up with a Data Warehouse appliance. A new company called Vertica (started by Michael Stonebraker) offers another way to do “search by column” , offering much faster performance than traditional relational databases.
The second trend is bringing real-time search into BI. Some people call it ESO (Enterprise Search Option). Real-time search of events can send alerts and trigger corrective actions in critical business areas. The other aspect of ESO is to search the cloud (external sources) and blend it with internal search for relevant results. Such a need has gained prominence after the rise of Google and other search tools quite common these days.
Gone are the days of extra-expensive solutions like Teradata and even Oracle or IBM. Open source solutions like Jasper and Pentaho are offering cheaper alternatives to certain sectors of the market. The post-SaaS trend of Cloud Computing also brings new opportunities to provide BI as a service, but the challenges of data integration and security must be addressed.
The BI journey continues with more vigor and innovative approaches.