Jnan Dash’s Weblog

Technology to manage a wedding

August 17, 2009 · Leave a Comment

Away from working on software, I got busy last month (and months before) with a different kind of project – the wedding of my son.  Such a project is no less complex than any software project. Too many moving parts. Early planning. A road-map with details including milestones, closure, as well as quality control. The scale was a challenge. A six day event with a peak load of 500 guests coming from all over the world. There were at least 3 locations for various evening programs to handle 160, 360 and 500 guests in succession. Each location had its own unique attributes to deal with.

Since several individuals were involved, collaborative software became key to timely decision and smooth flow of information. Google Documents, spreadsheets were used heavily. Means of communication included both synchronous and asynchronous – real-time phone calls, emails, sms, etc. The biggest challenge was to keep track of multiple events. The brain is a natural processor of multiple events, although at any time, it focuses on one event. No different than how message Q’s work. Brain memory was fully utilized augmented by digital forms of event lists and progress status.

Some items can not be replaced by the computer, like tasting food items or picking flower colors. One always checks the websites to get prepared for a meaningful discussion with the vendors. But I felt we are in the information age when invitation cards were printed eight thousand miles away and editing and design were managed via email and pdf files. Location transparency was in full practice here.  Handouts, name cards, address labels were all done via software and dispatched to appropriate parties. We resorted to good old Microsoft Spreadsheet for guest lists, seating charts, and various number crunching exercises. When nothing worked, we resorted to the age-old software called “handwrite”.

We had the usual confusion when multiple copies were created and I was reminded of the file-era problems of duplication and redundancy before databases came along.  We managed to stick to “single-copy” in the cloud via Google documents and spreadsheets, even though it was functionally deficient.

Overall, things went as smoothly as one can expect. We all had fun.

You can check this blog and some pictures to get an idea of the end product.

http://weddingdocumentaryblog.com/?p=1862

→ Leave a CommentCategories: Uncategorized

Paying homage to the late Rajeev Motwani

June 9, 2009 · 1 Comment

It was a sombre afternoon yesterday as we silently walked into the Motwani residence in Atherton. There were many faces, all silent, grieving the untimely death of professor Rajeev Motwani.

Shailesh Mehta, Kanwal Rekhi, Vivek Ranadive, Ron Conway, Vish Misra, M.R.Rangaswami, Naren Gupta, Vinita Gupta, Manish Chandra, Vas Bhandarkar, Prashant Shah, Prabhakar Raghavan, and hundreds more kept coming by. All shaken up with the realization of how fleeting this life can be. Now here, gone the next minute. Ron Conway was whispering that he had just arranged a meeting of Rajeev with Mark Andreessan as they have never met.

As we stared at his picture and listening to the chanting of Taittriya Upanishad and Shiva Manas Puja, the air was filled with melancholy of a departed soul. A brilliant scientist suddenly gone at the prime of his career. We silently hugged Asha (Rajeev’s wife) trying to fathom her grief.

As our scriptures say, each one’s life is pre-ordained for a set number of years. Each one of us has a deadline, highly unpredictable, but sure to come. So let us live this life with love, compassion, and care.

→ 1 CommentCategories: Uncategorized

Good Bye, dear Rajeev Motwani

June 6, 2009 · 2 Comments

I am just shocked to hear of the sad and untimely demise of Rajeev Motwani, well-known angel investor and professor of Computer Science at Stanford.  He was an adviser to the Google founders from the start and invested in several start-ups. After graduating from IIT, Kanpur in 1983, Rajeev joined EECS at UC Berkeley and got his Ph.D. Then he joined the faculty at Stanford and has been teaching/researching for last 20 years. His special interest was in data mining, computational theory, and algorithms.

Back in 2003, at the suggestion of a few friends, Rajeev and I started a technology think-tank group to share new ideas. It did not last very long due to our hectic schedules. But the first meeting was held in San Jose, where Rajeev spoke of his work in “data streams” and Eric Brewer of Berkely talked about his CAP theorem and BASE theory. We had several very smart folks who enjoyed listening to Rajeev’s passion for new technology. I was fortunate to have been invited to a few of Rajeev’s investment companies as an informal adviser. I always enjoyed talking to him and was impressed with his inquisitiveness to ask many real-world questions about new technology.

He will be missed in the silicon valley technology circles. I pray for his wife Asha to have the courage to sustain this terrible loss.

I feel extremely humbled at this moment on the ephemeral nature of life.

Rest in peace my friend.

→ 2 CommentsCategories: Uncategorized

Web 3.0

June 2, 2009 · Leave a Comment

At the “D conference” (All things Digital) hosted by the Walls Street Journal last week, there was sudden use of the term Web 3.0. The hosts said, “So what’s the seminal development that’s ushering in the era of Web 3.0? It’s the real arrival, after years of false predictions, of the thin client, running clean, simple software, against cloud-based data and services”.

I like the tone of this description. It’s everything opposite to what we have been doing for years in the computing business. It’s not fat client, not complex, bug-prone, attack-prone software like Windows (origin from QDOS, Quick and Dirty Operating System), not power-hungry and lowly battery-life, not redundancy in data, not closed proprietary systems, and not labor-intensive maintenance.  Just look at Apple iPhone. It has the attributes of a Web 3.0. – elegant touch screen user interface, myriads of applications developed by others, just feels simple and hassle free.

I like this quote from The Wall Street Journal of today,

“…the complete integration of computing into every part of our lives in a way that is seamless, ubiquitous and ideally, dead simple. From using easy gestures to grab any piece of information from the Web to having powerful computers in the palm of your hand to being able to quickly dip into complex social networks to getting real-time information from across the globe as it happens, this is an era when  computing could become as integrated and invisible as electricity and just as important.”

This reminds me of the late Mark Weiser (of Xerox PARC) who predicted back in 1991 (18 years ago!) – The most fundamental technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.

Some of us never thought we will see that prediction in our life times. But we are there now.

A long march over 50 years – from “big computers”, to “small computers”, to “connected computers”, to finally “invisible computers embedded in every day objects”.

—

→ Leave a CommentCategories: Conference · Web 3.0 · cloud computing

Enterprise Information Access

May 28, 2009 · 1 Comment

The industry has been using the term “Enterprise Search” to identify vendors offering that capability. But often that term does not do justice to the varieties of functions implied.  Ovum Research of UK identifies the phrase “Enterprise Information Access” with four components:

• finding information that is known to be recorded
• searching for information that one hopes is (or sometimes isn’t) there
• discovery of information one didn’t know was there, but which is appropriate
and useful)
• retrieval – presenting that information in an accessible form, in a timely
manner, to persons authorised to receive it.

Now, let us understand the nature of enterprise information. More than 80% of all data in an enterprise is unstructured information. This includes telephone conversations, voicemails, emails, word documents, paper documents, images, web pages, video, and hundreds of other formats. Unfortunately, attempts to leverage this immense and strategic resource often fail because many
businesses lack the requisite technology to understand and effectively utilize content that resides outside the scope of structured databases.

We have seen several innovative vendors addressing this issue over the last couple of years. The Swedish company FAST was acquired by Microsoft last year for $1.2B. The UK company Autonomy has been doing very well with its impressive growth in revenue and customers, even during these hard economic times – it’s market value is at $4.5B and revenue has reached $500M last year. They started 13 years ago,  in 1996. Back in 2005 Autonomy bought its largest competitor Verity and earlier this year they acquired InterWoven.

I see a parallel to how relational databases started from a theoretical foundation from mathematics ( from set-theory).  Autonomy draws from the 18th. century mathematics of Thomas Bayes and Claude Shannon. Shannon’s information theory says that “information” could be treated as a quantifiable value in communication.  Autonomy’s approach to concept modeling relies on Shannon’s theory that the less frequently a unit of communication occurs, the more information it conveys. Therefore, ideas, which are rarer within the context of a communication, tend to be more indicative of its meaning. It is this theory that enables Autonomy’s software to determine the most important (or informative) concepts within a document. They have been using MBC – Meaning Based Computing as a phrase.

Autonomy offers a layer called IDOL (Intelligent Data Operating Layer) that automates the management, processing, and delivery of structured and unstructured information from disparate internal and external sources. It integrates with all known legacy systems, eliminating the need for organizations to patch together multiple systems and support their distinct components.

I am glad to see this, as adhoc approaches with connectors  or the old ESB slogans from the likes of Tibco, are grossly inadequate. Even keyword search, so popular in the Internet (thanks to Google) is very inadequate for the enterprise needs. “Contextual search” or drawing out the meaning of the information becomes very crucial.

Other vendors besides the big names (IBM, Microsoft, Google, Oracle) in this space are Vivisimo, Endeaca, Sinequa, Exalead and Brainware. But it appears Autonomy is clearly taking the big lead with impressive list of OEM partners and customers. Microsoft’s announcement of the new search engine  “Bing” is something to watch. Google has not addressed this issue for the enterprise even with its GSA (Google Search Appliance).

Some of the existing BI vendors such as Information Builder and SAS have added search capabilities. Search vendors like FAST and Autonomy have also added analytics to their solutions. This says BI and search must come together.

MBC (Meaning Based Computing) is a new phrase, but has relevance to the future.

→ 1 CommentCategories: BI · Collaboration · Database · SOA · cloud computing

Facebook valued at $10B?

May 27, 2009 · 3 Comments

Facebook recently got an investment of $200M from a Russian Internet investor -  Digital Sky Technologies. This is based on a valuation of $10B for Facebook which is yet to turn a profit.  Including this round, Facebook has raised more than $600M. The last round of investment came from Microsoft in 2007  of $240M which was 1.6%, implying the valuation at $15B then. But Facebook’s own appraisal gave it a market value of $3.7B.

Such astronomical valuation reminds us of the boom days of dot-com. Jeff Bezos of Amazon used to say, “I spell profit as prophet”. Young start-ups were valued higher than household names  such as Ford, GM, and Coca Cola. Sun was valued at the peak at $200B when it claimed to be the “dot” in the dot-com movement. We know from history that such boasts are transient. The same Sun is now valued at less than $7B.

Facebook  is a private company. Hence it’s hard to know its revenue numbers.  It claims to be go cash-positive in 2010. The youthful CEO Mark Zukerberg claims 70% growth in revenue this year. The company claims 200 million users, with 70% from outside the US. Such numbers are very impressive. It becomes very attractive to marketers and one should expect healthy ad dollars to come. But I read somewhere that online ad dollars are declining in the current economy.

Mark Z., the CEO says that IPO is a long way off.  A healthy revenue plus profitability would make Facebook an attractive acquisition target. It just feels somehow to loose some steam, given the rise of Twitter with real-time search and a pub/sub model of social interaction. Let’s wait and see how well Facebook monetizes its success with eyeballs.

→ 3 CommentsCategories: Collaboration · cloud computing

Plenty of research on database

May 13, 2009 · 1 Comment

It’s great to see lots of research going on in the database software space. The new era of huge volumes of structured and unstructured data flying through the web brings new sets of problems of scalability, performance, and security – besides search and query capabilities.

Google folks wrote a paper few years back on MapReduce for dealing with efficient search on large cluster of data. Java Frameworks for data intensive distributed applications like Hadoop have included MapReduce as another programmatic way to deal with large clusters. GreenPlum, a valley start-up has blended both SQL in its PostgresSQL base as well as MapReduce for its shared-nothing massively parallel architecture for terabyte scale database handling. Netezza, another east coast vendor in the warehouse appliance space also uses MPP architecture as an alternative to the expensive Teradata solution.

Here is an interesting website called The Databse Column hosted by Vertica Systems. Seven well-known experts are writing blogs. I see Jerry Held (a former colleague at Oracle) and Don Haderle (former colleague at IBM) are part of the seven. There is Michael Stonebraker, the well known researcher and professor (founder of Ingres, and now Vertica) and also David Dewitt, another known researcher from University of Wisconsin. There is plenty of discussion on columnar search, cloud computing and BI.

There is a movement towards “more focused” solutions for database handling, compared to the 25-year old solution of Relational DBMS. Google is pushing the Big Table and in-memory database to minimize the latency and improve scale. Now cloud computing is all about handling large volumes of data. It’s back to the old days of centralized computing, now the scale is much higher than before. We do see plenty of academic research happening and that is a healthy sign. There are plenty of research opportunities in the challenging world of massive scale with  multi-core processors, varieties of data types, and providing extreme reliability and fault-tolerance.

The new world of Data as a Service (DaaS) is coming.

→ 1 CommentCategories: BI · Database · cloud computing

Enterprise Software Infrastructure

May 6, 2009 · 1 Comment

Many friends and colleagues ask the question – what will happen to companies such as Tibco, Informatica, etc., who are in the middleware space for many years. The biggest middleware player BEA is gone. In the old days, we used to joke that after death you go to CA (Computer Associates), because of CA’s acquisition of so many companies. Now we see such acquisitions happening all around, specially by Oracle. When Sun’s acquisition will be completed, Oracle will own the Java software platform and MySQL open source database product.

Look at the BI players. Business Objects is now part of SAP. Cognos, the Canadian BI company,  is part of IBM. Hyperion Solutions (formerly Arbor Software) was acquired by Oracle. In that space, we see the good old SAS standing alone as a private company for almost 3 decades. Informatica, which started as a ETL (Extraction, Transformation, and Loading) company has transformed itself to do other stuff like data integration, data migration, data quality management, etc. The space looks sparse for smaller players. There are new companies in the data warehouse appliance space such as Netezza (an MPP Data Warehouse appliance) and Greenplum (based on PostgresSQL and also an MPP solution). All the big players have data warehousing and business analytics as core competencies. That includes HP with its new offering called Neoview (based on Tandem technology of non-stop SQL) and its Data Warehouse appliance with Oracle. IBM has done several acquisitions over last five years to strengthen its Information Analytics space.

In the middleware space, the landscape looks even more sparse. IBM claims to be the provider of enterprise infrastructure for middleware (DB2, WebSphere, Tivoli, Rational, ..). Its software business contributes over 50% of profit, althouh the revenue is only 20% of the total. HP wants to get to that space after acquiring Mercury Interactive and Opsware. Its sorrounding its flagship product Openview with other software tools. BEA was the big player with its success of Weblogic, but now its all in the all-pervasive Fusion world at Oracle.

So what do start-up companies in the infratsructure software business do? It will be hard to succeed independently unless the value is extremely critical and obvious. At the same time, it must co-exist in a broader ecosystem of the customers. Microsoft delivers the whole stack, so does IBM and Oracle. Its back to the good old days when each vendor supplied the entire stack – from chips to hardware to operating system to subsystems to applications. Starting in the 1980s, we saw that structure shifted to a more horizontal play, where each layer had many players. But customers became the default “systems integrators” and they did not like it. Of course, the systems integration business flourished. Now the customers prefer less number of vendors  for ease of management and service.

Where we see a lot of activity is the data center management software (call it cloud if you want), be it in security, manageability, reliability, backup-recovery, data archiving,  virtualization at various levels, and governance. Any innovation in these areas will see a good future, but focus and specificity are the catchwords.

Given the dismal IPO market, every start-up in the enterprise software infrastructure space should be pursuing a strong partnership strategy for market success.

→ 1 CommentCategories: BI · Database · IBM · SOA · SaaS · cloud computing

Back to the Future

April 27, 2009 · 1 Comment

In our technology business (software), we keep seeing many things come back from the past with new labels as the latest trend. It’s like my graduate school room-mate (a Canadian) who refused to buy a new tie in line with the style of the day. When “thin tie” was in, he was wearing a “wide tie”. When I asked, “how come you are wearing something out of date?”,  he would reply, “I don’t change. The styles will eventually come back to me”.

Similarly in our business. When we talk of cloud computing or SaaS, someone quickly remembers “time-sharing” days of the likes of ADP. When we rave about VMWare and virtualization, I recall IBM’s VM operating system of the 1970s. Whenever we brag about caching for better performance, I recall the “prefetching” we did in DB2 during the 1980s. When the “internet kids” thought “statefulness” is a cool idea (in an otherwise stateless web), we remember the days of the transaction processing, where the database contained the ACID properties and techniques like two-phase commits were invented for ensuring transactional integrity.  When we bring SOA and web services as key to re-use, we remember “subroutines” of the past. Concepts are the same, the execution might be a little different.

This is not to undermine the advances of technology, such as the Internet and the world wide web. Two things have clearly pushed the envelop ahead – processing speed and bandwidth. We have been debating SMP (symmetric multiprocessing) and MPP (massively parallel processing) for years. We have also debated the “scale-up” vs the “scale-out” model, the latter one used at new sites such as Google. We still debate the merits of “shared nothing” vs “shared disk”.

Few years ago professor Eric Brewer of UC, Berekely brought the idea that the old-world rigid model of two-phase commit etc. may not be the best choice for the internet era. He proposed a new acronym called BASE (Basically Available, Soft-state, Eventually consistent). This is based on his CAP theory, which says between Consistency, Availability, and tolerance to network Partition (distributed network), one can only achieve a maximum of 2 out of the 3. If you want consistency and availability (like a financial institution), then give up distributability or keep everything centralized. If you want distributed data and availability (like Amazon’s book business), then give up consistency. Finally if you want consistency and distributed data (like a nation-wide bank) then be ready to pay for availability. Leave some hours at night to run those utilities for synchronizing databases.

As we march into the future, Eric’s wisdom from several years ago would be a great guideline, even to database purists with an “all or nothing” philosophy.

→ 1 CommentCategories: Database · IBM · SOA · SaaS · Web Services · cloud computing

The Journey of Business Intelligence (BI)

April 24, 2009 · 2 Comments

Back in the early 1980s, two IBMers wrote a seminal paper in the IBM Systems Journal, using the phrase “Data Warehouse” for the first time. I remember reading it and the last names of the authors were Devlin and Murphy from IBM Ireland. They described how they had isolated production data (internal IBM Europe stuff) into a separate Data Warehouse for retroactive analysis. This way, the production systems were not interrupted by users doing trends analysis. This “non-interference” factor was very important from a performance point of view. The production system used IMS database, and the extracted data was in relational form for easy query-ability.

Soon after that publication, many of IBM’s customers called to say that they have the same issues – how to access production data without affecting performance to satisfy a new set of users wanting to do analysis on trends and usage. For example, someone in a retail industry wants to find out how many red sweaters got sold at a specific store during the month of December. Answers to such questions can help inventory management and marketing promotions. The two industries who jumped into such analysis were retail and telco (call center analysis). I remember going on IBM customer roadshows explaining Data Warehousing back in the year 1990.

The database vendors assumed that they were the default leaders in this game, as they “owned” the data of the customers. But what they did not have were front-end analytic tools. So two new industries came up during the 1980s and 1990s. The boring task of data extraction, transformation, and loading (ETL) from multiple legacy sources and the front-end tools for varieties of analytics for the knowledge worker. The ETL market gave rise to companies such as Prism and Informatica. The analytics industry saw new players like Business Objects (now part of SAP), Cognos (part of IBM), and Hyperion (part of Oracle).

Another trend also started soon after -  that of predicting the future based on past trends. This was called “Data Mining”.  All BI vendors made claims that not only they can do retroactive analysis, but also do predictive analysis. The example popular at the time was – you don’t drive your car looking at the rear-view mirror.

Two recent trends are visible in this journey of the BI players. The first one is more verticalization using an “all-in-one” solution like an appliance. Netizza led the way here followed by newer players such as GreenPlum. These products market a specific industry sector and provide better price-performance than the generalized solutions of the past. Even HP had joined the race with its new offering called NeoView (Teradata fighter). Oracle and HP have joined forces last year to come up with a Data Warehouse appliance. A new company called Vertica (started by Michael Stonebraker) offers another way to do “search by column” , offering much faster performance than traditional relational databases.

The second trend is bringing real-time search into BI. Some people call it ESO (Enterprise Search Option). Real-time search of events can send alerts and trigger corrective actions in critical business areas. The other aspect of ESO is to search the cloud (external sources) and blend it with internal search for relevant results. Such a need has gained prominence after the rise of Google and other search tools quite common these days.

Gone are the days of extra-expensive solutions like Teradata and even Oracle or IBM. Open source solutions like Jasper and Pentaho are offering cheaper alternatives to certain sectors of the market. The post-SaaS trend of Cloud Computing also brings new opportunities to provide BI as a service, but the challenges of data integration and security must be addressed.

The BI journey continues with more vigor and innovative approaches.

→ 2 CommentsCategories: BI · Database · SaaS · cloud computing