Current websites like Facebook, Google, Twitter see unprecedented growth in number of concurrent active users and volumes of data in the guise of photos and videos. This level of scalability-demand defies our collective experiences from the past.
For example, Facebook in January of 2009 had 150 million active users. On September of the same year, it reached 300m and currently it is over 400m. Back in 2006, they had one data center in the bay area. In 2008, they added a new one in Virginia. Pretty soon, they are adding a third one in Oregon. Users download 3 billion photos a month, part of 200 Terabytes of live data. They do have over 60,000 servers doing the service for this rapidly growing community. All this was described by their engineer Tom Cook at Velocity 2010 last week in a talk titled “A day in the life of Facebook Operation”.
Sometimes, operational aspects are invisible to the user community. The belief is that somehow it all works until some breakdown or failure makes headlines. It takes a huge amount of innovation and discipline to run these operations. Boring stuff like configuration management, version control, early optimization, failure management, instrumentation, and automated tools require tremendous focus. Google spends a lot of money and talent to keep its operation efficient. So does Facebook. At the same conference, my friend James Hamilton (we were at IBM years back) gave an interesting talk on “Datacenter Infrastructure Innovation”. James is currently at Amazon as a VP and distinguished engineer after working at Microsoft for a number of years. He identifies top cost components and where some innovations can yield significant savings.
As more and more cloud service providers face these challenges, they better check how these pioneers at Facebook, Amazon, and Google are charting new courses for extreme scalability.