Using memcachedb and memcached to make things scale

I don't remember exactly how I found memcachedb, however, it is one of those projects that somebody else beat me to the punch in writing. I mean, it was going to happen, as the need was there. Steve Chu, the author, did a great job of melding two open source projects, BerkeleyDB, and memcached, to produce something really very powerful

Now, memcached has become almost completely ubiquitous in scaling web apps. Memcached is essentially a network enabled non-persistent data store. It is generally used as a write-back data cache, meaning that you look in the faster cache, if nothing is there, you look in the slower place, then write the value back to the faster cache. Some industrious people have used it for session storage, and I'm sure a few other clever uses.

One of my favorite parts of memcached is how dead simple it is. The protocol is very easy to read, making debugging issues and writing new clients very easy. It uses the "least recently used" algorithm to move things out of the cache when it starts to fill up, so its extremely easy to understand how the whole thing works.

The cleverest part of using memcached has nothing to do with the service itself, but the API. The smart guys who developed it figured out that they could hash the key, and pick the same server for reads/writes every time as long as the number of servers doesn't change. This allows it to scale out to a ridiculous size and retains its simplicity and performance

Two problems arise when a site uses any caching, be it memcached or aggressive HTTP headers.

First, the site starts to rely on caching too heavily for performance. As an example, I had a situtation where the entire corpus of settings for each client site (hundreds of clients, hundreds of settings) was kept in memcached as one massive 200kB+ serialized PHP object. Every page view that needed to access any settings would grab this object at the beginning of the code, and use the object throughout.

This worked really great in some instances, as most of the biggest pages needed to access 30 - 50 settings each time. However, the trouble would come when there was a page that would get a high degree of concurrency, such as an iframe that gets displayed on every page of a major website, or on a page that gets slashdotted. It would be blazing fast, generating almost no load at all for a while, but whenever a setting would be changed (the settings application would clear the cache of settings for whichever client was edited), or the cache object would expire, the database would spike out of control.

The reason was this object took about 1-3 seconds to fetch from the database. Well with 1000 requests per second, thats 3000 requests that get a negative hit on the cache, and so, ask the database for the information. The solution was to cache each setting individually, and use a random skew on the expire time. This prevented the storm of requests whenever there was an expire, and it allowed items looked up in rapid succession to not expire all at once.

This brings us to the second problem with caching, and specifically memcached. The cache is sometimes mistaken for a data store. In the above example, by clearing out entries from memcached, the caching was essentially neutered. Any time during the day somebody might come along and blow out the cache. Thats fine with MySQL's query cache, for instance, because that just makes queries come back faster. The connection is already made, one of the most painful parts has already happened. With memcached however, the cache can scale to many thousands of connections very cheaply, whereas doing this with most databases is expensive, if not impossible.

So to combat this, what is really needed is a persistent place to keep your data up to date when it is needed in an extremely high reads to write ratio. Thats where memcachedb is so attractive. Instead of keeping everything in RAM, memcachedb stores anything you put into it in a berkeleydb database. To boot, it can replicate this data to another machine, adding to its reliability and availability. This means that writes will be slower, and it won't scale out nearly as cheaply, but thats ok for situations like this.

With memcachedb, we can change the setting management program to save the data into the database and memcachedb, confident in the fact that it will be there later. Then we don't have write-back caching code in our application, we just remove the part that connects to the database for that data at all.

This has a huge benefit beyond just performance. With this scheme, we can write simple applications that won't rely on the read/write database server ever being up. It also means that we don't have to have a giant database server, or a huge replication fanout to get this data available in realtime.

There is of course the danger that memcachedb gets out of sync with the main db. Thats why in addition to writing to memcachedb whenever you write to the database server, you can also run a refresh script periodically that grabs all of the data from the database and walks through, writing items to memcachedb. Care must be taken here to make sure one doesn't write stale data to memcachedb. The safest way is to include a timestamp with each record that can easily be compared. Another way to go is to just have this script alert you to items that are out of sync, requiring manually re-saving these records.

Memcachedb is, unfortunately, still a little raw. The replication setup is rather complex. It took me a little while to get it working the way I wanted with just two boxes. It definitely could use command line options to set replication options, so that slaves don't accidentally promote themselves to masters. Right now one can only do that through the protocol, so I have a nagios plugin that checks it and changes it if it is wrong.

I think its important to note just how cool it is that 90% of memcachedb was written before it was conceived of. BerkeleyDB is one of the great open source success stories, having a successful business model built on free code, and eventually attracting enough attention from Oracle to get purchased. Then to merge that with memcached, which is one of those projects that makes you wish you had written it first, well, I think thats a stroke of genius. Good job Mr. Chu.