<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>FewBar.com - Make it good &#187; caching</title>
	<atom:link href="http://fewbar.com/tag/caching/feed/" rel="self" type="application/rss+xml" />
	<link>http://fewbar.com</link>
	<description>Technology, life, and mischief, not in that order</description>
	<lastBuildDate>Fri, 23 Dec 2011 01:41:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.5</generator>
		<item>
		<title>TokyoOops</title>
		<link>http://fewbar.com/2009/10/tokyo-tyrant-ignores-memcache-protocol-flags/</link>
		<comments>http://fewbar.com/2009/10/tokyo-tyrant-ignores-memcache-protocol-flags/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 04:28:52 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Memcache]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[berkeleydb]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[memcachedb]]></category>
		<category><![CDATA[process]]></category>
		<category><![CDATA[RTFM]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[tokyotyrant]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=117</guid>
		<description><![CDATA[We had a fun time this week with TokyoTyrant. Recently it has become apparent that MemcacheDB has been all but abandoned. As fantastic as the early work was by Steve Chu, the project is in disrepair. That, coupled with the less than obvious failover for its replication combined to make us seek alternatives. Brian Aker [...]]]></description>
			<content:encoded><![CDATA[<p>We had a fun time this week with <a href="http://1978th.net/tokyotyrant/">TokyoTyrant</a>. Recently it has become apparent that <a href="http://www.memcachedb.org/">MemcacheDB</a> has been all but abandoned. As fantastic as the early work was by Steve Chu, the project is in disrepair. That, coupled with the <a href="http://fewbar.com/2009/03/memcachedb-fault-tolerance-procedures/">less than obvious failover for its replication</a> combined to make us seek alternatives.</p>
<p><a href="http://fewbar.com/wp-content/uploads/2009/10/virtual_stupidity.jpg"><img class="alignnone size-full wp-image-121" title="virtual_stupidity" src="http://fewbar.com.s3.amazonaws.com/wp-content/uploads/2009/10/virtual_stupidity.jpg" alt="virtual_stupidity" width="280" height="280" /></a></p>
<p><span id="more-117"></span><br />
<a href="http://krow.net">Brian Aker</a> had mentioned to me at one time that TokyoTyrant was way better than memcachedb and we should run it instead. I took notice and it turns out he&#8217;s right! It does basically the same thing, applying the memcache protocol to an on disk key/value store. However, the code is incredibly clean, well maintained, and runs extremely fast. There&#8217;s also a lot more flexibility, with the ability to choose between in-memory or on disk storage, hash tables, B+Tree&#8217;s, etc.</p>
<p>The availability of log based asynchronous master/master replication (somewhat similar to MySQL&#8217;s replication in concept) was probably one of the biggest wins, allowing much simpler failover (just move the IP, or DNS, or whatever) when compared to MemcacheDB&#8217;s adherence to BerkeleyDB&#8217;s replication setup, which is a single-master system implementing an election algorithm.</p>
<p>Somewhere during migration, we missed one tiny detail though. Sometimes, the devil is in the details. This is really the only evidence in <a href="http://1978th.net/tokyotyrant/spex.html#protocol">the documentation that tokyo tyrant has support for the memcache protocol</a>. It is very clear:</p>
<blockquote><p>Memcached Compatible Protocol</p>
<p>As for the memcached (ASCII) compatible protocol, the server implements the following commands; &#8220;set&#8221;, &#8220;add&#8221;, &#8220;replace&#8221;, &#8220;get&#8221;, &#8220;delete&#8221;, &#8220;incr&#8221;, &#8220;decr&#8221;, &#8220;stats&#8221;, &#8220;flush_all&#8221;, &#8220;version&#8221;, and &#8220;quit&#8221;. &#8220;noreply&#8221; options of update commands are also supported. However, &#8220;flags&#8221;, &#8220;exptime&#8221;, and &#8220;cas unique&#8221; parameters are ignored.</p></blockquote>
<p>Now, as I said, there&#8217;s nothing ambiguous about this. That would have helped, if anyone on my team had ever read it. We installed TokyoTyrant, pointed our basic test code at it, and it worked. This is really a process problem, not so much a technical one. The process must be to assume it won&#8217;t work, and test all the different use cases to make sure it works.</p>
<p>Now, why is that bit of the manual important? Well we use PHP. Specifically, we use the PECL &#8220;Memcache&#8221; module to access memcache protocol storage. Now, the Memcache module is mostly oriented toward caching in the memory based original memcached. It works great for memcachedb too, which simply ignores the exptime parameter. However, memcacheDB *does not* ignore &#8220;flags&#8221;.</p>
<p>And therein lies the problem. Users of the <a href="http://pecl.php.net/package/memcache">PECL Memcache module</a> may not know this, but the flags are *important*. There are two bits in that flags field that the Memcache module may set. Bit 0 is used to indicate whether or not the content has been serialized, and, therefore, on read, must be unserialized. Bit 1 is used to indicate whether or not the content has been gzipped.</p>
<p>So, while all of the strings that were stored in MemcacheDB and subsequently copied to TokyoTyrant worked great, the serialized objects, arrays, and gzipped values, were completely inoperative, as they were coming back to the code as strings and binary compressed data. The gzipped data was easy (turn off automatic gzip compression). The serialized data took some quick tap dancing to remedy, with code something like this:</p>
<p><code lang="php"><br />
class Memcache_BrokenFlags extends Memcache<br />
{<br />
public function get($key, &amp;$flags)<br />
{<br />
$v = parent::get($key, $flags);<br />
$uv = @unserialize($v);<br />
return $uv === false ? $v : $uv;<br />
}<br />
}<br />
</code></p>
<p>Luckily our code all uses one Factory method to spawn all &#8220;MemcacheDB&#8221; connections, so it was easy to substitute this in.</p>
<p>Eventually we can just change the code by segregating into things that always serialize, and things that don&#8217;t, and just do the serialization ourselves. This should eventually allow us to use the new <a href="http://pecl.php.net/package/tokyo_tyrant">tokyo_tyrant module in PECL</a>, which only reliably stores scalars (I noticed recent versions have added a call to the internal PHP function convert_to_string().. this is, I think, a mistake, but one that still leaves it up the programmer to explicitly serialize when serialization is desired).</p>
<p>This was a pretty big gotchya, and one that illustrates that even though sometimes us cowboy coders and sysadmins get annoyed when those pesky business people ask us for plans, schedules, expected impact, etc., and we keep assuring them we know whats up, its still important to actually know whats up, and make sure to RTFMC .. C as in, CAREFULLY.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2009/10/tokyo-tyrant-ignores-memcache-protocol-flags/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Query Cache defeats Serverzilla</title>
		<link>http://fewbar.com/2008/07/mysql-query-cache-scales-like-a-286-with-turbo-off/</link>
		<comments>http://fewbar.com/2008/07/mysql-query-cache-scales-like-a-286-with-turbo-off/#comments</comments>
		<pubDate>Tue, 15 Jul 2008 20:47:55 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=11</guid>
		<description><![CDATA[So a few days ago, my big mean MySQL server started having problems that were very hard to explain. It was slowing down, taking a minute to run queries that usually take a few seconds, and Linux load averages were in the teens, despite having quiet disks (less than 0.1% cpu IO wait time) and [...]]]></description>
			<content:encoded><![CDATA[<p>So a few days ago, my big mean MySQL server started having problems that were very hard to explain. It was slowing down, taking a minute to run queries that usually take a few seconds, and Linux load averages were in the teens, despite having quiet disks (less than 0.1% cpu IO wait time) and plenty of RAM (128G for about 200G of data total&#8230;).</p>
<p>The developers were stumped. The other systems guys were stumped. So was I. But it still seemed ok. We found all sorts of things to point fingers at, but nothing made sense.<br />
<span id="more-11"></span><br />
Then this Monday, everything came to a screeching halt. 3 second queries were taking 15 minutes. 30 second queries were never completing. The CPU&#8217;s were only a little busy. What gives?! This box has 8 CPU cores and 128G of RAM.. nothing can take it down, right?!</p>
<p>We threw our hands in the air and failed over to the active standby (the other side of our master&lt;-&gt;master replication pair). Suddenly all was well. But something smelled wrong. We blamed some kind of bug in MySQL.</p>
<p>I spent all day trying to make Memcached more efficient, and trying to explain why suddenly this beast was felled by such tiny arrows as instantaneous queries that should have been cached anyway.</p>
<p>Oh wait, did somebody say cached? As in the MySQL query cache? I mentioned this in the #mysql channel on <a href="http://freenode.net">Freenode</a>, and Mr. Eric Bergen (ebergen) from <a href="http://www.provenscaling.com/">Proven Scaling</a> immediately said something like &#8220;well duh, turn off the cache, moron&#8221;. I was dumbfounded. Shouldn&#8217;t it be helping us with all those tiny queries?</p>
<p>Well apparently not. <a href="http://lists.mysql.com/internals/35777">This recent thread on the MySQL internals list</a> talks about mutex contention in the query cache while it is *searched*, not just while it is updated. This is disasterous for an environment where thousands and thousands of tiny queries are being run constantly. Even with query_cache_type set to 2, or &#8220;cache on demand&#8221; mode, every query in the system must run through this mutex.</p>
<p>So, this morning when the standby box again cried for mercy, hitting max_connections and spinning all queries around in circles, I ran &#8216;SET GLOBAL query_cache_type=2&#8242;. Instantly the server became more healthy. I half expected to trade one problem for another.. with the server being consumed by tiny queries. But instead, these tiny queries did as expected, and took very little time to complete. And large queries against tables that change every second or 2 didn&#8217;t have to contend for the query cache, they just ran through like nothing.</p>
<p>So, it would appear that for any sort of multi-core installations of MySQL, the query cache is not only a waste, but a hazard!</p>
<p>Thanks again to Mr. Bergen. I would not have thought about that until he said it.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2008/07/mysql-query-cache-scales-like-a-286-with-turbo-off/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using memcachedb and memcached to make things scale</title>
		<link>http://fewbar.com/2008/06/using-memcachedb-and-memcached-make-things-scale/</link>
		<comments>http://fewbar.com/2008/06/using-memcachedb-and-memcached-make-things-scale/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 05:40:01 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[memcachedb]]></category>
		<category><![CDATA[sclability]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=9</guid>
		<description><![CDATA[I don&#8217;t remember exactly how I found memcachedb, however, it is one of those projects that somebody else beat me to the punch in writing. I mean, it was going to happen, as the need was there. Steve Chu, the author, did a great job of melding two open source projects, BerkeleyDB, and memcached, to [...]]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t remember exactly how I found <a href="http://www.memcachedb.org">memcachedb</a>, however, it is one of those projects that somebody else beat me to the punch in writing. I mean, it was going to happen, as the need was there. Steve Chu, the author, did a great job of melding two open source projects, <a href="http://www.oracle.com/database/berkeley-db/index.html">BerkeleyDB</a>, and <a href="http://www.danga.com/memcached/">memcached</a>, to produce something really very powerful<br />
<span id="more-9"></span><br />
Now, memcached has become almost completely ubiquitous in scaling web apps. Memcached is essentially a network enabled non-persistent data store. It is generally used as a <a href="http://en.wikipedia.org/wiki/Cache">write-back data cache</a>, meaning that you look in the faster cache, if nothing is there, you look in the slower place, then write the value back to the faster cache. Some industrious people have used it for session storage, and I&#8217;m sure a few other clever uses.</p>
<p>One of my favorite parts of memcached is how dead simple it is. The protocol is very easy to read, making debugging issues and writing new clients very easy. It uses the &#8220;least recently used&#8221; algorithm to move things out of the cache when it starts to fill up, so its extremely easy to understand how the whole thing works.</p>
<p>The cleverest part of using memcached has nothing to do with the service itself, but the API. The <a href="http://www.danga.com/">smart guys</a> who developed it figured out that they could hash the key, and pick the same server for reads/writes every time as long as the number of servers doesn&#8217;t change. This allows it to scale out to a ridiculous size and retains its simplicity and performance</p>
<p>Two problems arise when a site uses any caching, be it memcached or aggressive HTTP headers.</p>
<p>First, the site starts to rely on caching too heavily for performance. As an example, I had a situtation where the entire corpus of settings for each client site (hundreds of clients, hundreds of settings) was kept in memcached as one massive 200kB+ serialized PHP object. Every page view that needed to access any settings would grab this object at the beginning of the code, and use the object throughout.</p>
<p>This worked really great in some instances, as most of the biggest pages needed to access 30 &#8211; 50 settings each time. However, the trouble would come when there was a page that would get a high degree of concurrency, such as an iframe that gets displayed on every page of a major website, or on a page that gets slashdotted. It would be blazing fast, generating almost no load at all for a while, but whenever a setting would be changed (the settings application would clear the cache of settings for whichever client was edited), or the cache object would expire, the database would spike out of control.</p>
<p>The reason was this object took about 1-3 seconds to fetch from the database. Well with 1000 requests per second, thats 3000 requests that get a negative hit on the cache, and so, ask the database for the information. The solution was to cache each setting individually, and use a random skew on the expire time. This prevented the storm of requests whenever there was an expire, and it allowed items looked up in rapid succession to not expire all at once.</p>
<p>This brings us to the second problem with caching, and specifically memcached. The cache is sometimes mistaken for a data store. In the above example, by clearing out entries from memcached, the caching was essentially neutered. Any time during the day somebody might come along and blow out the cache. Thats fine with MySQL&#8217;s query cache, for instance, because that just makes queries come back faster. The connection is already made, one of the most painful parts has already happened. With memcached however, the cache can scale to many thousands of connections very cheaply, whereas doing this with most databases is expensive, if not impossible.</p>
<p>So to combat this, what is really needed is a persistent place to keep your data up to date when it is needed in an extremely high reads to write ratio. Thats where memcachedb is so attractive. Instead of keeping everything in RAM, memcachedb stores anything you put into it in a berkeleydb database. To boot, it can replicate this data to another machine, adding to its reliability and availability. This means that writes will be slower, and it won&#8217;t scale out nearly as cheaply, but thats ok for situations like this.</p>
<p>With memcachedb, we can change the setting management program to save the data into the database <strong>and </strong>memcachedb, confident in the fact that it will be there later. Then we don&#8217;t have write-back caching code in our application, we just remove the part that connects to the database for that data at all.</p>
<p>This has a huge benefit beyond just performance. With this scheme, we can write simple applications that won&#8217;t rely on the read/write database server ever being up. It also means that we don&#8217;t have to have a giant database server, or a huge replication fanout to get this data available in realtime.</p>
<p>There is of course the danger that memcachedb gets out of sync with the main db. Thats why in addition to writing to memcachedb whenever you write to the database server, you can also run a refresh script periodically that grabs all of the data from the database and walks through, writing items to memcachedb. Care must be taken here to make sure one doesn&#8217;t write stale data to memcachedb. The safest way is to include a timestamp with each record that can easily be compared. Another way to go is to just have this script alert you to items that are out of sync, requiring manually re-saving these records.</p>
<p>Memcachedb is, unfortunately, still a little raw. The replication setup is rather complex. It took me a little while to get it working the way I wanted with just two boxes. It definitely could use command line options to set replication options, so that slaves don&#8217;t accidentally promote themselves to masters. Right now one can only do that through the protocol, so I have a nagios plugin that checks it and changes it if it is wrong.</p>
<p>I think its important to note just how cool it is that 90% of memcachedb was written before it was conceived of. <a href="http://www.oracle.com/database/berkeley-db/index.html">BerkeleyDB</a> is one of the great open source success stories, having a successful business model built on free code, and eventually attracting enough attention from Oracle to get purchased. Then to merge that with memcached, which is one of those projects that makes you wish you had written it first, well, I think thats a stroke of genius. Good job Mr. Chu.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2008/06/using-memcachedb-and-memcached-make-things-scale/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.159 seconds -->

