Enterprise Information Access

The industry has been using the term “Enterprise Search” to identify vendors offering that capability. But often that term does not do justice to the varieties of functions implied.  Ovum Research of UK identifies the phrase “Enterprise Information Access” with four components:

• finding information that is known to be recorded
• searching for information that one hopes is (or sometimes isn’t) there
• discovery of information one didn’t know was there, but which is appropriate
and useful)
• retrieval – presenting that information in an accessible form, in a timely
manner, to persons authorised to receive it.

Now, let us understand the nature of enterprise information. More than 80% of all data in an enterprise is unstructured information. This includes telephone conversations, voicemails, emails, word documents, paper documents, images, web pages, video, and hundreds of other formats. Unfortunately, attempts to leverage this immense and strategic resource often fail because many
businesses lack the requisite technology to understand and effectively utilize content that resides outside the scope of structured databases.

We have seen several innovative vendors addressing this issue over the last couple of years. The Swedish company FAST was acquired by Microsoft last year for $1.2B. The UK company Autonomy has been doing very well with its impressive growth in revenue and customers, even during these hard economic times – it’s market value is at $4.5B and revenue has reached $500M last year. They started 13 years ago,  in 1996. Back in 2005 Autonomy bought its largest competitor Verity and earlier this year they acquired InterWoven.

I see a parallel to how relational databases started from a theoretical foundation from mathematics ( from set-theory).  Autonomy draws from the 18th. century mathematics of Thomas Bayes and Claude Shannon. Shannon’s information theory says that “information” could be treated as a quantifiable value in communication.  Autonomy’s approach to concept modeling relies on Shannon’s theory that the less frequently a unit of communication occurs, the more information it conveys. Therefore, ideas, which are rarer within the context of a communication, tend to be more indicative of its meaning. It is this theory that enables Autonomy’s software to determine the most important (or informative) concepts within a document. They have been using MBC – Meaning Based Computing as a phrase.

Autonomy offers a layer called IDOL (Intelligent Data Operating Layer) that automates the management, processing, and delivery of structured and unstructured information from disparate internal and external sources. It integrates with all known legacy systems, eliminating the need for organizations to patch together multiple systems and support their distinct components.

I am glad to see this, as adhoc approaches with connectors  or the old ESB slogans from the likes of Tibco, are grossly inadequate. Even keyword search, so popular in the Internet (thanks to Google) is very inadequate for the enterprise needs. “Contextual search” or drawing out the meaning of the information becomes very crucial.

Other vendors besides the big names (IBM, Microsoft, Google, Oracle) in this space are Vivisimo, Endeaca, Sinequa, Exalead and Brainware. But it appears Autonomy is clearly taking the big lead with impressive list of OEM partners and customers. Microsoft’s announcement of the new search engine  “Bing” is something to watch. Google has not addressed this issue for the enterprise even with its GSA (Google Search Appliance).

Some of the existing BI vendors such as Information Builder and SAS have added search capabilities. Search vendors like FAST and Autonomy have also added analytics to their solutions. This says BI and search must come together.

MBC (Meaning Based Computing) is a new phrase, but has relevance to the future.


One response to “Enterprise Information Access

  1. Good article.

    Just a footnote that “federated search” is becoming an important component for finding information for enterprise users, in that it allows real-time searching of disparate databases.

    It used to be called “distributed search,” and in the enterprise, enables individual users to consolidate their searches to different sources of information without the necessity of including all information in one index.

    An example of this technology can be found at http://www.scienceresearch.com/, http://www.science.gov/, and http://www.mednar.com/, just to name a few publicly available websites.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s