In 2010 Facebook had the largest Hadoop cluster in the world, with over 20PB of storage. Yes, that is 20 Petabytes or 20,000 Terabytes or 20 to the power of 15 bytes! By March 2011, the cluster had grown to 30 PB – that’s 3000 times the Library of Congress. As they ran out of power and space to add more nodes, a migration to a larger data center was needed. In a Facebook post Paul Yang describes how they accomplished this monumental task.
The Facebook infrastructure team chose a replication approach for this migration. Two steps were followed. First, a bulk copy transferred most of the data from the source cluster to the destination. Second all changes since the start of the bulk copy were captured via a cluster Hive plug-in that recorded the changes in an audit log. The replication system continuously polled the audit log and applied modified files to the destination. The plug-in also recorded Hive metadata changes. The Facebook Hive team developed both the plug-in and the replication system.
Remember, the challenge here is not to interrupt the 750 million Facebook users access to the data 24/7, while the migration was happening. It is like changing the engine of an airplane while flying 600 miles an hour at 36000 feet above ground. Quite a remarkable effort.
The big lesson learned was to develop a fast replication system that proved invaluable in addressing the issues. For example, corrupt files could easily be remedied with additional copy without affecting the schedule. The replication system became a potential disaster-recovery solution for warehouses using Hive. The Hadoop HDFS-based warehouses lack built-in data-recovery functionality usually found in traditional RDBMS systems.
Facebook’s next challenge will be to support a data warehouse distributed across multiple data centers.