Category Archives: SOA

Importance of ILM & Data Archiving

With the noise of cloud computing rising by the day, there are basic operational issues one should not forget – cloud or no cloud. One such issue is the discipline of ILM (Information Life-cycle Management). How do you manage data over its lifetime of many years and decades? Do you keep all data current which drastically impacts the performance of applications using them? As everyone knows the appetite for data is growing by leaps and bounds. Not far from now, “personal petabyte” is quite viable given the need to store audio and video stuff. A petabyte is one thousand terabytes which is 1000 gigabytes which is 1000 megabytes. Now do the math. A petabyte is ten to the power of 15 bytes. And 1000 petabytes is one “exabyte”. Back in 2002, one petabyte would have cost  $2M, whereas in 2012 (ten years) its cost will be $2K. This is real Moore’s law in disk storage!

Most of enterprise business data is resident as structured data managed by DBMS (e.g. Oracle or DB2). There are production databases of the size of 100 plus terabytes , mostly in places such as Walmart’s data warehouse for retail transactions. Telcos also have huge databases for call records. With the growth in size, performance degradation is normal. Hence enterprises must create a multi-tiered archiving policy. For example, current data can be in active databases for 2-3 years, followed by 2-4 years of inactive data followed by several years of historical data. As we move further behind, such data can be part of cloud storage. But access is paramount even if data is stored in multiple levels. For compliance and legal reasons, historical data should be easily accessible at high speed with smart search.

Another aspect of ILM is management of copies of data. Some companies may need 8-20 copies of active data for test, development, disaster recovery, quality control, etc.  A 200 GB database may end up like 1200 GBs of data with six copies. Such issues are normally not reflected as part of planning, but IT shops get shocked when they see such numbers and the associated cost factors. Anther area at many enterprises is the “application retirement” issue. This happens with M&A or as a precursor to move into the cloud. This area is addressed in a very adhoc way resulting in unforeseen delays and cost. Any automation here should be highly welcome.

Gartner Group said this last year, “The return on the investment for implementing a structured data archiving solution is exceptionally high, especially for application retirement or when deployed for a packaged application for which vendor-supplied templates are available to ease implementation and maintenance.”

One company (I am an adviser) leading in this space is Solix that provides all the tools mentioned above. Their Enterprise Data Management System (EDMS) platform provides a comprehensive set of ILM tools for  enterprises. Solix even introduced an appliance to ease the cost and administrative burdens for clients. The rapid adoption of Solix products is a testimony to the growing importance of data archiving, application retirement, data masking, and test data management.

ILM should be a well-thought-out discipline at every IT organization.

Welcome 2010 & Cloud Computing

We just finished the first decade of this century/millennium. The early part of this decade saw great worry about the Year 2000 problem. Much gloom and doom was predicted, but things passed off smoothly. No apocalyptic upheaval.

As we usher into the next decade, the biggest buzzword is “Cloud Computing”, a rapprochement of ASP, SaaS, SOA, Virtualization, Grid Computing, Enterprise 2.0, etc. All these buzzwords have been making the rounds over past few years. Finally, computing as a “utility” seems practical and doable. Amazon took the lead in introducing AWS (Amazon Web Services) way back in 2003. It then brought in Storage as a Service concept via S3 (Simple Shared Storage). It also introduced EC2 (Elastic Computing Cloud), where Infrastructure as a Service became viable.

I just read a nice summary of this written by M.R. Rangaswamy of the Sand Hill Group. While the momentum is on, MR says large enterprises are going to be slow adapters. Much cloud adoption is in the SMB arena where lower TCO and capex override any concern for security and scale. Older vendors like IBM will offer a hybrid model – In-house systems and cloud. This is a no-brainer, as there is a huge legacy of production systems in Fortune 1000 companies running in the premises. But “pure cloud” vendors like Google, Amazon, and SalesForce.com will push for “cloud-only” approach.

Another area of interest is data management, the volume of which has never been seen before. There is the NoSQL movement to deal with unstructured data and framework like Hadoop combined with the MapReduce algorithm is getting quick adoption for fast search.

This decade will see a big landscape change in the computing arena – from the model of computing to how we store and manage data for access and analytics.

Welcome to 2010.

Enterprise Information Access

The industry has been using the term “Enterprise Search” to identify vendors offering that capability. But often that term does not do justice to the varieties of functions implied.  Ovum Research of UK identifies the phrase “Enterprise Information Access” with four components:

• finding information that is known to be recorded
• searching for information that one hopes is (or sometimes isn’t) there
• discovery of information one didn’t know was there, but which is appropriate
and useful)
• retrieval – presenting that information in an accessible form, in a timely
manner, to persons authorised to receive it.

Now, let us understand the nature of enterprise information. More than 80% of all data in an enterprise is unstructured information. This includes telephone conversations, voicemails, emails, word documents, paper documents, images, web pages, video, and hundreds of other formats. Unfortunately, attempts to leverage this immense and strategic resource often fail because many
businesses lack the requisite technology to understand and effectively utilize content that resides outside the scope of structured databases.

We have seen several innovative vendors addressing this issue over the last couple of years. The Swedish company FAST was acquired by Microsoft last year for $1.2B. The UK company Autonomy has been doing very well with its impressive growth in revenue and customers, even during these hard economic times – it’s market value is at $4.5B and revenue has reached $500M last year. They started 13 years ago,  in 1996. Back in 2005 Autonomy bought its largest competitor Verity and earlier this year they acquired InterWoven.

I see a parallel to how relational databases started from a theoretical foundation from mathematics ( from set-theory).  Autonomy draws from the 18th. century mathematics of Thomas Bayes and Claude Shannon. Shannon’s information theory says that “information” could be treated as a quantifiable value in communication.  Autonomy’s approach to concept modeling relies on Shannon’s theory that the less frequently a unit of communication occurs, the more information it conveys. Therefore, ideas, which are rarer within the context of a communication, tend to be more indicative of its meaning. It is this theory that enables Autonomy’s software to determine the most important (or informative) concepts within a document. They have been using MBC – Meaning Based Computing as a phrase.

Autonomy offers a layer called IDOL (Intelligent Data Operating Layer) that automates the management, processing, and delivery of structured and unstructured information from disparate internal and external sources. It integrates with all known legacy systems, eliminating the need for organizations to patch together multiple systems and support their distinct components.

I am glad to see this, as adhoc approaches with connectors  or the old ESB slogans from the likes of Tibco, are grossly inadequate. Even keyword search, so popular in the Internet (thanks to Google) is very inadequate for the enterprise needs. “Contextual search” or drawing out the meaning of the information becomes very crucial.

Other vendors besides the big names (IBM, Microsoft, Google, Oracle) in this space are Vivisimo, Endeaca, Sinequa, Exalead and Brainware. But it appears Autonomy is clearly taking the big lead with impressive list of OEM partners and customers. Microsoft’s announcement of the new search engine  “Bing” is something to watch. Google has not addressed this issue for the enterprise even with its GSA (Google Search Appliance).

Some of the existing BI vendors such as Information Builder and SAS have added search capabilities. Search vendors like FAST and Autonomy have also added analytics to their solutions. This says BI and search must come together.

MBC (Meaning Based Computing) is a new phrase, but has relevance to the future.

Enterprise Software Infrastructure

Many friends and colleagues ask the question – what will happen to companies such as Tibco, Informatica, etc., who are in the middleware space for many years. The biggest middleware player BEA is gone. In the old days, we used to joke that after death you go to CA (Computer Associates), because of CA’s acquisition of so many companies. Now we see such acquisitions happening all around, specially by Oracle. When Sun’s acquisition will be completed, Oracle will own the Java software platform and MySQL open source database product.

Look at the BI players. Business Objects is now part of SAP. Cognos, the Canadian BI company,  is part of IBM. Hyperion Solutions (formerly Arbor Software) was acquired by Oracle. In that space, we see the good old SAS standing alone as a private company for almost 3 decades. Informatica, which started as a ETL (Extraction, Transformation, and Loading) company has transformed itself to do other stuff like data integration, data migration, data quality management, etc. The space looks sparse for smaller players. There are new companies in the data warehouse appliance space such as Netezza (an MPP Data Warehouse appliance) and Greenplum (based on PostgresSQL and also an MPP solution). All the big players have data warehousing and business analytics as core competencies. That includes HP with its new offering called Neoview (based on Tandem technology of non-stop SQL) and its Data Warehouse appliance with Oracle. IBM has done several acquisitions over last five years to strengthen its Information Analytics space.

In the middleware space, the landscape looks even more sparse. IBM claims to be the provider of enterprise infrastructure for middleware (DB2, WebSphere, Tivoli, Rational, ..). Its software business contributes over 50% of profit, althouh the revenue is only 20% of the total. HP wants to get to that space after acquiring Mercury Interactive and Opsware. Its sorrounding its flagship product Openview with other software tools. BEA was the big player with its success of Weblogic, but now its all in the all-pervasive Fusion world at Oracle.

So what do start-up companies in the infratsructure software business do? It will be hard to succeed independently unless the value is extremely critical and obvious. At the same time, it must co-exist in a broader ecosystem of the customers. Microsoft delivers the whole stack, so does IBM and Oracle. Its back to the good old days when each vendor supplied the entire stack – from chips to hardware to operating system to subsystems to applications. Starting in the 1980s, we saw that structure shifted to a more horizontal play, where each layer had many players. But customers became the default “systems integrators” and they did not like it. Of course, the systems integration business flourished. Now the customers prefer less number of vendors  for ease of management and service.

Where we see a lot of activity is the data center management software (call it cloud if you want), be it in security, manageability, reliability, backup-recovery, data archiving,  virtualization at various levels, and governance. Any innovation in these areas will see a good future, but focus and specificity are the catchwords.

Given the dismal IPO market, every start-up in the enterprise software infrastructure space should be pursuing a strong partnership strategy for market success.

Back to the Future

In our technology business (software), we keep seeing many things come back from the past with new labels as the latest trend. It’s like my graduate school room-mate (a Canadian) who refused to buy a new tie in line with the style of the day. When “thin tie” was in, he was wearing a “wide tie”. When I asked, “how come you are wearing something out of date?”,  he would reply, “I don’t change. The styles will eventually come back to me”.

Similarly in our business. When we talk of cloud computing or SaaS, someone quickly remembers “time-sharing” days of the likes of ADP. When we rave about VMWare and virtualization, I recall IBM’s VM operating system of the 1970s. Whenever we brag about caching for better performance, I recall the “prefetching” we did in DB2 during the 1980s. When the “internet kids” thought “statefulness” is a cool idea (in an otherwise stateless web), we remember the days of the transaction processing, where the database contained the ACID properties and techniques like two-phase commits were invented for ensuring transactional integrity.  When we bring SOA and web services as key to re-use, we remember “subroutines” of the past. Concepts are the same, the execution might be a little different.

This is not to undermine the advances of technology, such as the Internet and the world wide web. Two things have clearly pushed the envelop ahead – processing speed and bandwidth. We have been debating SMP (symmetric multiprocessing) and MPP (massively parallel processing) for years. We have also debated the “scale-up” vs the “scale-out” model, the latter one used at new sites such as Google. We still debate the merits of “shared nothing” vs “shared disk”.

Few years ago professor Eric Brewer of UC, Berekely brought the idea that the old-world rigid model of two-phase commit etc. may not be the best choice for the internet era. He proposed a new acronym called BASE (Basically Available, Soft-state, Eventually consistent). This is based on his CAP theory, which says between Consistency, Availability, and tolerance to network Partition (distributed network), one can only achieve a maximum of 2 out of the 3. If you want consistency and availability (like a financial institution), then give up distributability or keep everything centralized. If you want distributed data and availability (like Amazon’s book business), then give up consistency. Finally if you want consistency and distributed data (like a nation-wide bank) then be ready to pay for availability. Leave some hours at night to run those utilities for synchronizing databases.

As we march into the future, Eric’s wisdom from several years ago would be a great guideline, even to database purists with an “all or nothing” philosophy.

Catalyst Conference 2008, San Diego

I had the opportunity to attend Burton Group’s annual conference called Catalyst 2008 at the Manchester Grand Hyatt in San Diego. There were probably 1200 people from all over the world.

This was my first time attending this conference. I am somewhat skeptical of conferences hosted by analysts firms such as Gartner, Forrester, etc. Having been a vendor writing software for over 25 years, I was always sought by the key analysts to understand what we were building and the future road map for products such as DB2 or Oracle DB. Then you pay to hear the same analyst tell you what you told them. But to be fair, there are good analysts who synthesize lots of information and give some insights. What I get very tired of is hearing the obvious. Tell me something I don’t know, please.

So at the Catalyst conference, I noticed many parallel tracks. One track that must be very good is security or identity management. Because whoever I ran into, mentioned that they are there for the identity management topics. I attended the track for collaboration, Rich Internet Applications, SOA, etc. Again, I was told that REST is the better way than SOAP.  Please!  This is at least 3 years old news. Amazon Web Services (AWS) had done this way back when. Just because the analyst thought it’s new does not make it new news.

Overall, there were many good sessions. Customers presenting real deployment stories were the most valuable. Paisley, a software company in the GRC space (Governance, Risk, Compliance), showed how they have used Curl RIA platform to build very attractive UI for the user. This was a good example of using Web as a platform over client-server to satisfy user demand at a much lower cost.

The evenings were packed with vendors “open house” and I did not even step out of the hotel for 3 days. There was a track on mobile computing and it always drew large audiences Burton Group seems much more technically focused and many attendees are their clients over the years. This was more like their annual user group meeting.

TiECon2007 – May 18-19, 2007

I was at the annual conference of the organization called TiE (The Indus Entrepreneurs) at the Santa Clara Convention center last Friday and Saturday along with almost 3600 people. This event has grown immensely and draws all the VC’s, entrepreneurs, corporate executives, students, and IT professionals. Due to the focus on entrepreneurs, all kinds of law groups and bankers also flood the event seeking new business. Here are some highlights.

- this year’s theme was “Face of the New Entrepreneur” and appropriately the morning keynote started with Tim O’Reily talking about Web 2.0. Then he moderated a panel of young founders like Ashvin Navin of BitTorrent, Jaideep Singh of Spock, and the founder of Photobucket (being acquired by MySpace from News Corp.). It was a good panel focusing on elements of Web 2.o for their success – harnessing collective intelligence, asymmetric competition, and the science of the Network effect.

- Nobuyuki Idei, former CEO of Sony gave a very insightful talk on Sony’s history and growth. He commented that Japan suffers from the disease of ABC – Aging population, Bureaucracy, and Closed Society.

- Marc Benioff gave the keynote on the usual “end of software” and the wonderful benefits of SaaS – Software as a Service.

- Many parallel sessions saw all kinds of luminaries from the venture and corporate business. Vinod Khosla talked about his investments in clean energy like biofuels, solar. Vinod, as usual, gets very deep and passionate on any thing he takes up.

- Matt Cohler of Facebook gave a rousing talk on how Facebook becomes a real platform for social networking.

- Meg Whitman of eBay gave one of the best talks on eBay’s forward strategy of auction, payment, and communication, thus tying the three pieces – core business, PayPal, and Skype into one holistic entity. She came across as a real leader with a firm grip on her business.

- There was a 13 year old entrepreneur (written up in San Jose Mercury News) Ansu, with a booth to show off his new software game company and product. His goal is to raise $1M dollars before he graduates from middle school.

What was amazing is the number of people who traveled from many parts of the world to attend this event. During a special reception for charter members on Thursday, Laura Tyson (former economic chief adviser to Clinton) , professor at UC Berkeley, gave a great talk on the state of the world economy and the changing equation caused by the rapid rise of India and China into the economic scene.

I introduced and moderated a session for Zia Yousuf, executive VP at SAP in charge of the new partner ecosystem. Zia spoke of how SAP is moving ahead to create future solutions with a well orchestrated partner ecosystem and technology platforms such as SOA.

It was quite an event. What I loved the most was meeting many many friends and colleagues from past years and catching up.

Jnan Dash