Large Scale Hadoop Data Migration at Facebook

In 2010 Facebook had the largest Hadoop cluster in the world, with over 20PB of storage. Yes, that is 20 Petabytes or 20,000 Terabytes or  20 to the power of 15 bytes!  By March 2011, the cluster had grown to 30 PB – that’s 3000 times the Library of Congress. As they ran out of power and space to add more nodes, a migration to a larger data center was needed. In a Facebook post Paul Yang describes how they accomplished this monumental task.

The Facebook infrastructure team chose a replication approach for this migration. Two steps were followed. First, a bulk copy transferred most of the data from the source cluster to the destination. Second all changes since the start of the bulk copy were captured via a cluster Hive plug-in that recorded the changes in an audit log. The replication system continuously polled the audit log and applied modified files to the destination. The plug-in also recorded Hive metadata changes. The Facebook Hive team developed both the plug-in and the replication system.

Remember, the challenge here is not to interrupt the 750 million  Facebook users access to the data 24/7, while the migration was happening. It is like changing the engine of an airplane while flying 600 miles an hour at 36000 feet above ground.  Quite a remarkable effort.

The big lesson learned was to develop a fast replication system that proved invaluable in addressing the issues. For example, corrupt files could easily be remedied with additional copy without affecting the schedule. The replication system became a potential disaster-recovery solution for warehouses using Hive. The Hadoop HDFS-based warehouses lack built-in data-recovery functionality usually found in traditional RDBMS systems.

Facebook’s next challenge will be to support a data warehouse distributed across multiple data centers.

Advertisements

One response to “Large Scale Hadoop Data Migration at Facebook

  1. Pingback: Windows Azure and Cloud Computing Posts for 8/4/2011+ - Windows Azure Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s