Digging NoSQL

I started hearing about Cassandra recently. Cassandra is an open source distributed database management system designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. The short-list of adopters is impressive; Facebook, Digg, Twitter, Rackspace, Reddit, etc.

The Digg staff in particular have been blogging their progress and use of this new database technology and their reasons for adoption. Basically, the penalties for using MySQL had become burdensome given the large amount of data Digg handles and the difficulties in scaling MySQL to meet the demand.

Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.

In September of 09 Digg evaluated Cassandra and were very successful. On the heels of their success they are replacing most of their infrastructure components and moving away from LAMP and towards NoSQL. Soon, Digg will unveil it's overhaul of the site, presumably running on the new platform they've built.