I attended the NoSQLNow conference this week at the San Jose Convention Center. The organizers claimed there were 800 attendees, clearly much higher than last couple of years. Given the number of sessions, exhibits, speakers and attendees, the interest on newer data management products and solutions (aka Big Data) has been growing fast.
I spoke at a session titled, “Are NoSQL databases ready for the enterprise? Examples of MongoDB deployment” which was well attended. I also participated in a panel on “enterprise adoption of cloud”. My co-panelists were from Oracle and NeoDB. The conference opening session was given by one of the co-hosts, Dan McCreary and he spoke about the state of NoSQL. He mentioned that a total of $2.4B have been invested in NoSQL DB companies over last couple of years- MongoDB ($231M), CouchBase ($116M), Aerospike ($22M), Basho ($32.5M), Datastax ($83.7M), Clustrix ($59.3M), FoundationDB ($22.3M), etc. Even big player like Intel has invested in Cloudera.
Here are some new trends in the NoSQL world:
- Hadoop is starting to move from batch to real time and streaming
- Real time systems are adding Hadoop integration points
- Storm (twitter) and Spark are addressing data streaming
- Spark/Scala is popular on multiple systems
- MongoDB is the big leader in NoSQL operational systems based on document data model, followed by Datastax and CouchBase
The market pressures, according to Dan point to:
- Big Data & Predictive analytics
- Internet of Things (time series data and log files)
- Security for highly regulated areas like finance/banking, healthcare, and the government
- streaming data
- keeping the operational cost low (bye bye to license fees)
- High Availability (move away from master-slave to clusters of peer to peer networks)
There are other trends like old-school Map-Reduce programming is being taken over by Spark. JSON data formats are gaining in popularity for agile development, but there is no standardization of JSON query language. On the other hand, XQuery 3.1 is supporting both XML and JSON formats. There is new emphasis on agile transformation, as data storage is no longer the issue. The question is how non-programmers can transform data to various useful formats. The acronym ETL will be replaced by ETTTTTTT… (extract, store in data lake, and transform in many ways).
Other keynotes included Oracle’s head of database development, Andy Mendelson, who showed Oracle’s 3 areas under “big data” – Oracle DBMS & Exadata, Oracle Hadoop, and Oracle NoSQL (formerly BerkeleyDB), all with one interface called Oracle Big Data SQL. SQL seems to make a comeback as an interface to several products such as Cloudera Impala.
Amazon presented their Dynamo DB, built for the cloud with fast and predictable performance. They claim seamless scalability and easy admin. Amazon’s motto has always been, “build services, not software”. Amazon.com uses DynamoDB to minimize opex.
I presented many examples of enterprises deploying MongoDB to build “systems of engagement” on top of “systems of record” ( a concept Geoff Moore of Crossing the Chasm fame has been talking lately). There is great momentum of MongoDB deployment at enterprises because of agile development (flexible data model and high coding velocity), fast scalability and high availability using shards and replicas, and the open source culture.