<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>FewBar.com - Make it good &#187; Scalability</title>
	<atom:link href="http://fewbar.com/category/tech/scalability/feed/" rel="self" type="application/rss+xml" />
	<link>http://fewbar.com</link>
	<description>Technology, life, and mischief, not in that order</description>
	<lastBuildDate>Thu, 08 Jul 2010 14:36:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Gearman K.O.&#8217;s mysql to solr replication</title>
		<link>http://fewbar.com/2010/03/gearman-replicate-mysql-to-solr/</link>
		<comments>http://fewbar.com/2010/03/gearman-replicate-mysql-to-solr/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 05:47:36 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[gearman]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=154</guid>
		<description><![CDATA[Ding ding ding.. in this corner, wearing black shorts and a giant schema, we have over 11 million records in MySQL with a complex set of rules governing which must be searchable and which must not be. And in that corner, we have the contender, a kid from the back streets, outweighed and out reached [...]]]></description>
			<content:encoded><![CDATA[<p>Ding ding ding.. in this corner, wearing black shorts and a giant schema, we have over 11 million records in MySQL with a complex set of rules governing which must be searchable and which must not be. And in that corner, we have the contender, a kid from the back streets, outweighed and out reached by all his opponents, but still victorious in the queue shootout, with just open source, and 12 patch releases.. written in C, its <b><a href="http://gearman.org">gearman</a></b>!</p>
<p><a href="http://fewbar.com.s3.amazonaws.com/wp-content/uploads/2010/03/ko-mike-tyson.png"><img src="http://fewbar.com.s3.amazonaws.com/wp-content/uploads/2010/03/ko-mike-tyson.png" alt="" title="ko-mike-tyson" width="500" height="437" class="alignnone size-full wp-image-155" /></a><br />
<span id="more-154"></span></p>
<p>I&#8217;m pretty excited today, as I&#8217;m preparing to go live with the first real, high load application of Gearman that I&#8217;ve written. What is it you say? Well it is a simple trigger based replicator from mysql to <a href="http://lucene.apache.org/solr/">SOLR</a>.</p>
<p>I should say (because I know some of my colleagues read this blog) that I don&#8217;t actually believe in this design. Replication using triggers seems fraught with danger. It totally makes sense if you have a giant application and can&#8217;t track down everywhere that a table is changed. However, if your app is simple and properly abstracted, hopefully you know the 1 or 2 places that write to the table.</p>
<p>I should also say that I really can&#8217;t reveal all of the details. The general idea is pretty simple. Basically we have a trigger that dumps a primary key into gearman via the <a href="https://launchpad.net/gearman-mysql-udf">gearman MySQL UDFs</a>. The idea is just to tell a gearman worker &#8220;look at this record in that table&#8221;.</p>
<p>Once the worker picks it up, it applies some logic to the record.. &#8220;should this be searchable or not&#8221;. If the answer is yes it should be searchable, the worker pushes the record into SOLR. If not, the worker will make sure it is not in solr.</p>
<p>This at least is pretty simple. The end result is a system where we can rebuild the search index in parallel using multiple CPU&#8217;s (thank you to solr/lucene for being able to update indexes concurrently and efficiently btw). This is done by pushing all of the records in the table into the queue at once.</p>
<p>Anyway, gearmand is performing like a champ, libgearman and the gearman pecl module are doing great. I&#8217;m just really happy to see gearman rolled out in production, as I really do think it has that nice mix of simplicity and performance. I love the commandline client which makes it easy to write scripts to inject things into queues, or query workers.  This allows me to access a worker like this:</p>
<p><code>$ gearman -h gearmanbox -f all_workers -s<br />
Known Workers: 11</p>
<p>boxname_RealTimeUpdate_Queue_TriggerWorker_1 jobs=627366,restarts=0,memory_MB=4.27,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13311 jobs=304134,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13306 jobs=606126,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13314 jobs=576714,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13342 jobs=294846,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13347 jobs=376998,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13359 jobs=470508,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13364 jobs=403182,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700<br />
boxname_RealTimeUpdate_Property_SolrPublish_ jobs=219630,restarts=0,memory_MB=6.19,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_TriggerWorker_2 jobs=393642,restarts=0,memory_MB=4.27,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Property_SolrBatchPub jobs=6,restarts=0,memory_MB=6.23,lastcheckin=Tue, 23 Mar 2010 22:37:28 -0700</code></p>
<p>Brilliant.. no need for html or HTTP.. just a nice simple commandline interface.</p>
<p>I think gearman still has a ways to go. I&#8217;d really like to see some more administration added to it. Deleting empty queues and quickly flushing all queues without restarting gearmand would be nice to haves. We&#8217;ll see what happens going forward, but for not, thanks so much to the gearman team (especially Eric Day who showed me gearman, and Brian Aker for pushing hard to release v0.12).</p>
<p>w00t!</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2010/03/gearman-replicate-mysql-to-solr/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How do you do, that voodoo, that Queues Do?</title>
		<link>http://fewbar.com/2010/01/queue-voodoo-that-queues-do/</link>
		<comments>http://fewbar.com/2010/01/queue-voodoo-that-queues-do/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 08:32:46 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[amqp]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[gearman]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[qpid]]></category>
		<category><![CDATA[queuing]]></category>
		<category><![CDATA[stomp]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=136</guid>
		<description><![CDATA[Queues seem to be all over the place right now. Maybe its like when I wanted a VW GTi VR6 a few years back. I kept seeing them pass me on the freeway and thought &#8220;crap, everybody is getting this hot new thing and I&#8217;m missing out!&#8221;.
I think everybody at one point looked at MySQL [...]]]></description>
			<content:encoded><![CDATA[<p>Queues seem to be all over the place right now. Maybe its like when I wanted a VW GTi VR6 a few years back. I kept seeing them pass me on the freeway and thought &#8220;crap, everybody is getting this hot new thing and I&#8217;m missing out!&#8221;.</p>
<p>I think everybody at one point looked at MySQL and tought.. &#8220;that would work fine as a queue system&#8221;. For low volume stuff, it *is* fine. But then somebody grabs your little transactional, relational, reliable queue system and plugs 5 million messages per hour through it, and somewhere, a man name Heikki cries.</p>
<p>So then you start to look around.. and for those of us who have meager budgets and tend to use open source, there aren&#8217;t a lot of choices.<span id="more-136"></span> <a href="http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes">The guys at Second Life did some research for all of us&#8230;</a>. Once you get through that though, you realize that the needs of second life, a MMORPG, are quite a bit different from your average web app.</p>
<p>So, without further ado, my &#8220;queue&#8221; system round up.</p>
<ul>
<li><a href="http://activemq.apache.org/">ActiveMQ</a> &#8211; This shining star of the queueing world seems to come up quickly in conversation. At Adicio, we actually gave it a good try. The main problem was, we&#8217;re a PHP shop. The PHP accessibility comes not through the normal Java Messing Service connector, but <a href="http://stomp.codehaus.org/Protocol">&#8220;STOMP&#8221;</a>.
<p />
Honestly, I&#8217;m not a big fan of these giant Apache sponsored java projects. <a href="http://lucene.apache.org/solr/">SOLR</a> has changed my mind a bit, as it seems to work well and doesn&#8217;t really crash. Then again, I&#8217;m not carrying a pager anymore, so maybe it does suck and I&#8217;m just not seeing it.</p>
<p />
Anyway, at first, ActiveMQ was winning me over. It was pretty quick.. had a pretty simple setup curve (just start up the latest version, and you have a working persistent queue system), and despite having mountains of documentation that reads like the text spammers shove into their emails randomly to pass bayesian filters, it made sense.</p>
<p />
However, its fall was pretty quick, as the first problem we hit was its Producer Throttling. This probably works fine when you&#8217;re using the JMS connector. However, with Stomp, when ActiveMQ decides your queue is too full, and it needs you to stop, it just stops acking your packets. Your stomp client blocks (or spins, in non-block mode) and you wait. This is made worse by the fairly naive php stomp driver, which doesn&#8217;t really check to see why its write failed, or even try to see if it <b>can</b>.</p>
<p />
Things got better when that was disabled, but the stomp driver was still haphazard. After figuring out that the Master/Slave protocol requires one to shut down the slave whenever failing back to a downed master, I had had enough. Sionara ActiveMQ.
</li>
<li><a href="http://www.rabbitmq.com/">RabbitMQ</a> &#8211; This one seems to be a favorite of many. My experience is limited, and I really haven&#8217;t tried it that much. Its written in erlang, which I guess automatically makes something &#8220;telco reliable&#8221;. Cool.</li>
<li><a href="http://qpid.apache.org/">QPID</a> &#8211; Wow, this one is supposedly INCREDIBLE. <a href="http://www.redhat.com/mrg/messaging/features/#aio">&#8220;500,000 messages per second per LUN.&#8221; </a>. WOW. It also has RedHat&#8217;s backing, which is a big win for me.<br />
In fact, as I write this, I&#8217;m doing my best to build and install the latest qpid on CentOS 5.4. </p>
<pre>
 gcc -DHAVE_CONFIG_H -I. -I. -I./src/config -I./include/ -I/usr/src/redhat/BUILD/xerces-c-src_2_8_0/src -I./src/lexer/ -D_GNU_SOURCE -D_REENTRANT -O2 -g -m64 -mtune=generic -MT mapm_add.lo -MD -MP -MF .deps/mapm_add.Tpo -c src/mapm/mapm_add.c  -fPIC -DPIC -o .libs/mapm_add.o
...
</pre>
<p>In case you&#8217;re familiar, I&#8217;m there. Oops, thats not qpid. Thats xerces-c. Which I have to build.. and I also have to build xqilla after that. Luckily, 40 other packages required to build qpid were available in the standard CentOS yum repository.</p>
<p />
Another unfortunate reality is that there is no qpid connectivity available for PHP. Unless the <a href="http://code.google.com/p/php-amqp/">php-amqp module</a> works. Its really not clear yet.<br />
Anyway, this looks like a promising messaging technology. However, this much software leaves a lot of room for things to break.. so, while I will probably complete the build, as I want to find out how it stacks up to the others in terms of simplicity and performance, I think this one is dead.</p>
<p />
</li>
<li><a href="http://gearman.org">Gearman</a> &#8211; Ok I&#8217;m going to say it up front. I like this one. Its really not a &#8220;queue&#8221; system per sé. The name is an anagram of &#8216;manager&#8217; (say that 5 times fast!). Its one of those great things that came out of the Danga group, the same people who created MogileFS and Memcached.
<p />
Call me stupid, but I like to be able to read things. QPID is in C++, and is so big, I don&#8217;t even know where to start. Java gives me the shivers, and I don&#8217;t even know what erlang looks like. But damn, who doesn&#8217;t like poring over well written C? Thats pretty much what the new C port of gearmand is.</p>
<p />
I&#8217;m especially fond of the ease with which one can write a persistence layer. I recently submitted code to make the tokyocabinet queue store better. Its a simple B+Tree store that everybody&#8217;s going crazy about these days. Its also written in really nice C.</p>
<p />
The built in ability for gearman clients/workers (producers/consumers) to have a 2 way conversation is especially appealing. Its not like they can just freely pass messages back and forth. But clients can choose to wait for the job they submitted to complete. They can also check on the status of the job fairly easily. Workers can send back two integers (numerator and denominator), which is particularly useful for sending back a count of things done over the count of things to do.</p>
<p />
Combine all this cool stuff with the dead simple &#8216;gearman&#8217; command line client, and you have a happy Clint. I wrote a little PHP worker that just sits around collecting data sent to it by the other workers running. When it receives a &#8220;show_all_workers&#8221; message (function in gearman-ese), it just spits back a text report of what it knows. This can be triggered by just saying:</p>
<pre>
$ gearman -s -f show_all_workers

Known Workers: 5

dev3.adicio.com_Adicio_App_Reverse_Worker_29336 jobs=26508,restarts=0,memory_MB=1.47,lastcheckin=Thu, 21 Jan 2010 15:33:32 -0800
dev3.adicio.com_Adicio_App_Reverse_Worker_29333 jobs=19194,restarts=0,memory_MB=1.47,lastcheckin=Thu, 21 Jan 2010 15:33:32 -0800
dev3.adicio.com_Adicio_App_Reverse_Worker_29356 jobs=29208,restarts=0,memory_MB=1.47,lastcheckin=Thu, 21 Jan 2010 15:33:32 -0800
dev3.adicio.com_Adicio_App_Reverse_Worker_29370 jobs=27638,restarts=0,memory_MB=1.47,lastcheckin=Thu, 21 Jan 2010 15:33:32 -0800
dev3.adicio.com_Adicio_App_Reverse_Worker_29332 jobs=10636,restarts=0,memory_MB=1.47,lastcheckin=Thu, 21 Jan 2010 15:33:32 -0800

$
</pre>
<p>This is pretty damn cool. Now double the fun with <a href="https://launchpad.net/gearman-mysql-udf">MySQL UDF&#8217;s</a>, and you have a workable solution for queueing via MySQL trigger.<br />
<a href="http://fewbar.com.s3.amazonaws.com/wp-content/uploads/2010/01/paris-hilton-thats-hot.jpg"><img src="http://fewbar.com/wp-content/uploads/2010/01/paris-hilton-thats-hot.jpg" alt="" title="paris hilton thats hot" width="256" height="256" class="alignnone size-full wp-image-138" /></a></p>
<p />
So, I can&#8217;t help but give this one the nod for simplicity of design. There are no massive books written to explain what gearman does. Just a nice easy C library, and perhaps one of the most important things, a really useful PHP extension.
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2010/01/queue-voodoo-that-queues-do/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Bromine and Selenium &#8211; second and third most useful elements behind Oxygen</title>
		<link>http://fewbar.com/2009/11/bromine-and-selenium-tests-for-the-rests-of-u/</link>
		<comments>http://fewbar.com/2009/11/bromine-and-selenium-tests-for-the-rests-of-u/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 01:48:10 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Engineers]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[selenium]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=125</guid>
		<description><![CDATA[If you&#8217;re an engineer, you hate testing. Seriously, who likes doing what those mere mortal &#8220;users&#8221; do? We&#8217;re POWER users and we don&#8217;t need to use all those silly features on all those sites. Just look at Craigslist, clearly an engineer&#8217;s dream tool.
For web apps, testing actually isn&#8217;t *that* hard. The client program (the browser) [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re an engineer, you hate testing. Seriously, who likes doing what those mere mortal &#8220;users&#8221; do? We&#8217;re POWER users and we don&#8217;t need to use all those silly features on all those sites. Just look at Craigslist, clearly an engineer&#8217;s dream tool.</p>
<p>For web apps, testing actually isn&#8217;t *that* hard. The client program (the browser) is readily available on every platform known to man, and they generally don&#8217;t do much more than store and retrieve data in clever ways. So, its not like we have to fire up a Large Hadron Collider to observe the effects of our web app.<span id="more-125"></span><a href="http://fewbar.com/wp-content/uploads/2009/11/periodictable.jpg"><img class="size-full wp-image-127 alignleft" title="periodictable" src="http://fewbar.com/wp-content/uploads/2009/11/periodictable.jpg" alt="periodictable" width="234" height="142" /></a></p>
<p>Therein lies the problem though, as clicking around on web forms and entering the same email address, password, address, phone number, etc. etc., 100 times, is BORING.</p>
<p>Enter <a href="http://seleniumhq.org/">Selenium</a>. This amazing little tool has been on the scene for a little while now, but its just now getting some momentum. Click through to the website and watch &#8220;the magic&#8221; as they put it, but basically here&#8217;s how it goes:</p>
<ul>
<li> open their firefox plugin and click &#8216;record</li>
<li>do something</li>
<li>click &#8216;record&#8217; again.</li>
</ul>
<p>Then just save this little test case to a file, and the next time you change anything that might relate to the series of clicks and data entries you just made, run this test again. There are all kinds of assertions you can make while you&#8217;re doing something. Like &#8216;Make sure the title is X&#8217; or &#8216;make sure a link to Y exists&#8217;.</p>
<p>But wait, I could have done that with something like Test::More,  PHPUnit, or lime. Where&#8217;s the real benefit?</p>
<p>Well because Selenium remotely controls your browser, all those gotchya&#8217;s regarding javascript CSS incompatibilities can come into play here. Because Selenium can control Internet Explorer, Firefox, *and* Safari. In fact it can also control Opera, and according to their website, any browser that properly supports javascript fully.</p>
<p>This is really a nice evolutionary step for web shops, as tools like this generally are OS specific and cost a lot of money. Once again open source software appears where a need becomes somewhat ubiquitous.</p>
<p>You can even take it a step further. The next thing that generally happens in a web dev shop when they get bigger than 20 or 30 people is they hire people who actually <strong>like</strong> testing. Well not really, but they dislike it *less* than software engineers. These are QA engineers. And they <strong>DO</strong> like things to be orderly and efficient.</p>
<p><a href="http://seleniumhq.org/projects/bromine/">Bromine</a> is the answer for that. Its still pretty rough around the edges, but it gets the job done.</p>
<p>Again check out their website and watch the screencast, but basically it goes like this:</p>
<ul>
<li>Write selenium tests as specified above</li>
<li>Upload tests to Bromine server</li>
<li>Attach tests to requirements</li>
<li>Run selenium remote control on all required OS/browser version combinations (can you say virtualbox?)</li>
<li>Run tests</li>
</ul>
<p>Another nice thing about using bromine is now you are running your tests in a server side language, not just the Selenium IDE, which is limited to the IDE&#8217;s generated &#8220;Selenese&#8221; XML commands for tests. The IDE exports your basic test into PHP or Java, and then on the bromine server you can do interesting things, like check an IMAP box for an email, run a backend process, or send an SMS.</p>
<p>At first it may not seem like much, but eventually you end up with a multitude of useful tests for your web app that can be run all the time against development branches before release, and catch many problems. Quality means happier users, which hopefully means loyal users that keep coming back.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2009/11/bromine-and-selenium-tests-for-the-rests-of-u/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TokyoTyrant &#8211; MemcacheDB, but without the BDB?</title>
		<link>http://fewbar.com/2009/06/tokyotyrant-memcachedb-but-without-the-bdb/</link>
		<comments>http://fewbar.com/2009/06/tokyotyrant-memcachedb-but-without-the-bdb/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 06:40:26 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Memcache]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[drizzle]]></category>
		<category><![CDATA[memcachedb]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[tokyocabinet]]></category>
		<category><![CDATA[tokyotyrant]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=85</guid>
		<description><![CDATA[Anyway, the next thing I mentioned was that we had also tried MemcacheDB with some success. Brian wasn't exactly impressed with MemcacheDB, and immediately suggested that we should be using <a href="http://tokyocabinet.sourceforge.net/tyrantdoc/">Tokyo Tyrant</a> instead. I had heard of Tokyo Cabinet, the new hotness in local key/value storage and retrieval, but what is this Tyrant you speak of?]]></description>
			<content:encoded><![CDATA[<p>This past April I was riding in a late model, 2 door rental car with an interesting trio for sure. On my right sat <a href="http://capttofu.livejournal.com/">Patrick Galbraith</a>, maintainer of DBD::mysql and author of the Federated storage engine. Directly in front of me manning the steering wheel (for those of you keen on spatial description, you may have noted at this point that its most likely I was seated in the back, left seat of a car which is designed to be driven on the right side of the road. EOUF [end of useless fact]), David Axmark, co-founder of MySQL. Immediately to his right sat <a href="http://krow.net/">Brian Aker</a>, of (most recently) Drizzle fame.<br />
<span id="more-85"></span><br />
This was one of those conversations that I felt grossly unprepared for. It was the 2009 MySQL User&#8217;s conference, and  Patrick and I had been hacking on <a href="https://launchpad.net/dbd-drizzle">DBD::drizzle</a> for most of the day. We had it 98% of the way there and were in need of food, so we were joining the Drizzle dev team for gourmet pizza.</p>
<p>As we navigated from the Santa Clara conference center to Mountain View&#8217;s quaint downtown, Brian, Patrick, and I were discussing memcached stuff. I mentioned <a href="http://fewbar.com/2008/12/memcached-and-mogile-form-memcachemegazord/">my idea, and subsequent implementation of the Mogile+Memcached method for storing data more reliably</a> in memcached. I knew in my head why we had chosen to read from all of the replica servers, not just the first one that worked, but I forgot (The reason, btw, is that if one of the servers had missed a write for some reason, you might get out-of-date data). I guess I was a little overwhelmed by Brian&#8217;s mountain of experience w/ memcached.</p>
<p>Anyway, the next thing I mentioned was that we had also tried MemcacheDB with some success. Brian wasn&#8217;t exactly impressed with MemcacheDB, and immediately suggested that we should be using <a href="http://tokyocabinet.sourceforge.net/tyrantdoc/">Tokyo Tyrant</a> instead. I had heard of Tokyo Cabinet, the new hotness in local key/value storage and retrieval, but what is this Tyrant you speak of?</p>
<p>I&#8217;ve been playing with Tokyo Tyrant ever since, and advocating for its usage at Adicio. Its pretty impressive. In addition to speaking memcached protocol, it apparently speaks HTTP/WEBDAV  too. The ability to select hash, btree, and a host of other options is nice, though I&#8217;m sure some of these are available as obscure options to berkeleydb as well.</p>
<p>Anyway, I was curious what performance was like, so I did some tests on my little Xen instance, and came up with pretty graphs.</p>
<p><a href="http://fewbar.com/wp-content/uploads/2009/06/tokyotyrantvsmemcachedb1.gif"><img src="http://fewbar.com/wp-content/uploads/2009/06/tokyotyrantvsmemcachedb1.gif" alt="tokyotyrantvsmemcachedb1" title="tokyotyrantvsmemcachedb1" width="465" height="472" class="alignnone size-full wp-image-92" /></a></p>
<p>I used the excellent <a href="http://code.google.com/p/brutis/">Brutis</a> tool to run these benchmarks using the most interesting platform for me at the moment.. which would be, php with the pecl Memcache  module.</p>
<p>These numbers were specifically focused on usage that is typical to MemcacheDB. A wide range of keys (in this case, 10000 is &#8220;wide&#8221; since the testing system is very small), not-small items (2k or so), and lower write:read ratio (1:50). I had the tests restart each daemon after each run, and these numbers are the results of the average of 3 runs each test.</p>
<p>I also tried these from another xen instance on the same LAN, and things got a lot slower. Not really sure why as latency is in the sub-millisecond range.. but maybe Xen&#8217;s networking just isn&#8217;t very fast. Either way, the numbers for each combination didn&#8217;t change much.</p>
<p>What I find interesting is that memachedb in no-sync mode actually went faster than memached. Of course, in nosync mode, memcachedb is just throwing data at the disk. It doesn&#8217;t have to maintain LRU or slabs or anything.</p>
<p>Tokyo Tyrant was very consistent, and used *very* little RAM in all instances. I do recall reading that it compresses data. Maybe thats a default? Anyway, tokyo tyrant also was the most CPU hungry of the bunch, so I have to assume having more cores might have resulted in much better results.</p>
<p>I&#8217;d like to get together a set of 3 or 4 machines to test multiple client threads, and replication as well. Will post that as part 2 when I pull it together. For now, it looks like.</p>
<p>In case anybody wants to repeat these tests, I&#8217;ve included <a href="http://spamaps.org/files/tt-mdb-memcache-tests.tgz">the results, and the scripts used to generate them in this tarball</a>.</p>
<p>&#8211; Additional info, 6/4/2009<br />
Another graph that some might find interesting, is this one detailing CPU usage. During all the tests, brutis used about 60% of the CPU available on the machine, so 40% is really 100%:</p>
<p><a href="http://fewbar.com/wp-content/uploads/2009/06/tokyotyranttests_cpu.gif"><img src="http://fewbar.com/wp-content/uploads/2009/06/tokyotyranttests_cpu.gif" alt="tokyotyranttests_cpu" title="tokyotyranttests_cpu" width="428" height="385" class="alignnone size-full wp-image-98" /></a></p>
<p>This tells me that the CPU was the limiting factor for Tokyo Tyrant, and with a multi-core machine, we should see huge speed improvements. Stay tuned for those tests!</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2009/06/tokyotyrant-memcachedb-but-without-the-bdb/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Parallel mysql replication?</title>
		<link>http://fewbar.com/2009/06/parallel-mysql-replication/</link>
		<comments>http://fewbar.com/2009/06/parallel-mysql-replication/#comments</comments>
		<pubDate>Tue, 02 Jun 2009 19:08:48 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[drizzle]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=80</guid>
		<description><![CDATA[Its always been a dream of mine. I&#8217;ve posted about parallel replication on Drizzle&#8217;s mailing list before. I think when faced with the problem of a big, highly concurrent master, and scaling out reads simply with lower cost slaves, this is going to be the only way to go.
So today I was really glad to [...]]]></description>
			<content:encoded><![CDATA[<p>Its always been a dream of mine. I&#8217;ve <a href="https://lists.launchpad.net/drizzle-discuss/msg03988.html">posted about parallel replication</a> on Drizzle&#8217;s mailing list before. I think when faced with the problem of a big, highly concurrent master, and scaling out reads simply with lower cost slaves, this is going to be the only way to go.</p>
<p>So today I was really glad to see that somebody is trying out the idea. Seppo Jaakola from <a href="http://www.codership.com/">&#8220;Codership&#8221;</a>, who I&#8217;ve never heard of before today, <a href="https://lists.launchpad.net/drizzle-discuss/msg04214.html">posted a link</a> to an article on his blog about his <a href="http://www.codership.com/content/parallel-applying">experimentation with parallel replication slaves</a>. The findings are pretty interesting.<br />
<span id="more-80"></span><br />
I hope that he&#8217;ll be able to repeat his tests with a real world setup. The software they&#8217;ve written seems to have the right idea. The biggest issue I have with the tests is that  the tests were run on tiny hardware. Hyperthreading? Single disks? Thats not really the point of having parallel replication slaves.</p>
<p>The idea is that you have maybe a gigantic real time write server for OLTP. This beast may have lots of medium-power CPU cores, and an obscene amount of RAM, and a lot of battery backed write cache for writes.</p>
<p>Now you know that there are tons of reads that shouldn&#8217;t ever be done against this server. You drop a few replication slaves in, and you realize that you need a box with as much disk storage as your central server, and probably just as much write cache. Pretty soon scaling out those reads is just not very cost effective.</p>
<p>However, if you could have lots of CPU cores, and lots of cheap disks, you could dispatch these writes to be done in parallel, and you wouldn&#8217;t need expensive disk systems or lots of RAM for each slave.</p>
<p>So, the idea is not to make slaves faster in a 1:1 size comparison. Its to make it easier for a cheap slave to keep up with a very busy, very expensive master.</p>
<p>I do see where another huge limiting factor is making sure things synchronize in commit order. I think thats an area where a lot of time needs to be spent on optimization. The order should already be known so that the commiter thread is just waiting for the next one in line, and if the next 100 are already done it can just rip through them quickly, not signal them that they can go. Something like this seems right:</p>
<p><code><br />
id=first_commit_id();<br />
while(wait_for_commit(id)) {<br />
  commit(id);<br />
  id++;<br />
}<br />
</code></p>
<p>I applaud the efforts of Codeship, and I hope they&#8217;ll continue the project and maybe ship something that will rock all our worlds.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2009/06/parallel-mysql-replication/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Memcached and Mogile Form MemcacheMegaZord!</title>
		<link>http://fewbar.com/2008/12/memcached-and-mogile-form-memcachemegazord/</link>
		<comments>http://fewbar.com/2008/12/memcached-and-mogile-form-memcachemegazord/#comments</comments>
		<pubDate>Sun, 14 Dec 2008 17:21:50 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[memcachedb]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[sessions]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=27</guid>
		<description><![CDATA[So I was starting to play with Memcached for session storage, and I found a fairly big problem with justing memcached in its normal caching mode as a session store. It really just boils down to caching and storing of deterministic data being very different things that only look similar on the surface.

So normally, memcached [...]]]></description>
			<content:encoded><![CDATA[<p>So I was starting to play with Memcached for session storage, and I found a fairly big problem with justing memcached in its normal caching mode as a session store. It really just boils down to caching and storing of deterministic data being very different things that only look similar on the surface.<br />
<span id="more-27"></span><br />
So normally, memcached is used in a very clever way by adding a list of servers, and then using a hashing algorithm to pick a server to actually contact based on the key of a get/set request. This allows a ton of scaling out, with minimal moving parts. There&#8217;s no periodic monitor or broadcast protocol to add and remove cluster members to and from pools, so you can just run memcached on a bunch of servers, and use a consistent list across all of your machines to achieve a huge degree of scale out. When a server dies, the code just sees that, and moves on to the next one in the hash algorithm, and all is well.</p>
<p>For caching, this &#8220;failover&#8221; methodology works fine. If I go to set a value in memcached, and the server fails over to the second one, thats ok. The next get to the primary will fail, and get set properly, and the old entry on the secondary will *eventually* get pushed out of the cache.</p>
<p>However, for storing data reliably, this becomes a problem. Lets say there is a scenario where a network cable is bad on one of the memcached servers. 1 in 100 requests fails. With caching, failover will go a little nuts, but its entirely possible nobody will even notice, as results will be cached, data won&#8217;t get stale.. no big deal.</p>
<p>With storage though, this could happen..</p>
<p>- session is created on memache1</p>
<p>- session tries to read from memcache1, and fails.. so new session is created on memache2</p>
<p>- session is then read from memache1</p>
<p>- session is updated on memcache1 with new information</p>
<p>- session fails to read from memcache1, and old session data is read from memacache2, then the set succeeds on memcache1, and the old data is lost.</p>
<p>The point isn&#8217;t really this scenario&#8217;s details, but that this hashing algorithm is vulnerable, even designed to lose data that was written to it. That is the caching paradigm.</p>
<p>As I discussed this with some colleagues, my mind immediately jumped to <a href="http://www.memcachedb.org">MemcacheDB</a>. Maybe that would work for session storage. It has replication, so we could use the traditional active/passive paradigm for it. However, this limits our scale to whatever a single instance of MemcacheDB can handle. Honestly thats probably fine for most sites, as MemcacheDB can probably handle tens of thousands of small writes per second.</p>
<p>However, there are multiple problems. The biggest problem with MemcacheDB is there&#8217;s no easy way (yet, they&#8217;re working on it) to pull keys out of it to do garbage collection. Likewise, session data really doesn&#8217;t need to live for a long time. We just need to be reasonably certain that the data we&#8217;re getting is reasonably new.</p>
<p>If we store the data in *all* of the servers, and if we store a highly accurate (meaning if it takes you milliseconds to complete a request, this timestamp needs to be down to microseconds) timestamp of when the data was given to us (meaning we use the same timestamp for each server) along side it, we can then just read it from all of the servers, and pick the newest one. Ew, that means we are still limited to the scale of one instance of memcached.</p>
<p>Then I had a flash back to the way <a href="http://danga.com/mogilefs/">MogileFS</a> works. It stores data on a number of replica servers. Of course, it also keeps track of where it stored them. But I figured, for sessions, thats a lot of overhead. There&#8217;s an easier way. We can use the <a href="http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/">consistent hashing algorithm</a> that the PHP Memcache module uses to pick servers, and just read and write the data from nReplicas servers. If a server fails, we&#8217;ll move on to the next one, and there&#8217;s a reasonable degree of certainty that it will remain the same. If we write stale data to a server and then fail back to it later, we&#8217;re protected by the timestamp rules. The higher nReplicas, the higher the reliability that a server failure won&#8217;t cause issues. I even found <a href="http://paul.annesley.cc/articles/2008/04/30/flexihash-consistent-hashing-php">a PHP implementation of consistent hashing falled FlexiHash</a>.</p>
<p>There&#8217;s one last issue that bugs me about using memcached for sessioning, and the timestamp helps us solve. We recently found that there was a problem where a request would take, say, 45 seconds to complete. At 20 seconds, the user would hit the back button out of frustration. This would put other stuff in the session, then the 45 second request would complete, and write the version of the session it thinks is right to the session store, losing the user&#8217;s new activity.</p>
<p>There are two ways to solve this. One is to introduce locking. This actually isn&#8217;t hard to do with Memcached, it is <a href="http://www.socialtext.net/memcached/index.cgi?faq#emulating_locking_with_the_add_command">described in the memcached faq</a>. However, this introduces something to block or fail on in the read. I think its simpler than that. You simply read the record before you write it, and if it has changed since you read it the first time, you don&#8217;t write it. You just throw the session write away. Obviously the user has moved on, so there&#8217;s no reason to make your update.  If you used locking, the user would still be waiting on the old thread to finish.</p>
<p>Of course, this all hinges on you caring that your session data is accurate, and that you care that users don&#8217;t lose their sessions when one server goes down. If neither of those apply to you, then you can just use sessions like cache.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2008/12/memcached-and-mogile-form-memcachemegazord/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Deciding whether to send reads to slave or master</title>
		<link>http://fewbar.com/2008/10/maximizing-usage-of-mysql-replication-slaves/</link>
		<comments>http://fewbar.com/2008/10/maximizing-usage-of-mysql-replication-slaves/#comments</comments>
		<pubDate>Sat, 04 Oct 2008 17:43:42 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[application]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=18</guid>
		<description><![CDATA[There are quite a few articles out there that talk about how to give your application some context and send reads to one server, and writes to another. There are even some mentions of marking your connection &#8220;dirty&#8221; and then sending all reads to the write server.
As a first try at scaling things, I recently [...]]]></description>
			<content:encoded><![CDATA[<p>There are quite a few articles out there that talk about how to give your application some context and send reads to one server, and writes to another. There are even some mentions of marking your connection &#8220;dirty&#8221; and then sending all reads to the write server.</p>
<p>As a first try at scaling things, I recently made a change to our web application&#8217;s data access layer where reads went to a group of readonly slaves. However, if a write was made to a database, a value was put into the user&#8217;s session, saying that the database was dirty, and causing all subsequent reads to go to the master server.<br />
<span id="more-18"></span><br />
This was good as users would use the readonly slaves as long as they hadn&#8217;t changed anything in the database. The real problem though, was that as soon as the user logged in, their account was updated to say that they had logged in, marking that database dirty.</p>
<p>Rather than try to cleverly change this one problem, we changed the &#8220;dirty&#8221; value from a boolean to a timestamp. Whenever the user writes to the database, it records the current time in their session. Then a global timeout is applied to that. This gives the replication slaves time to catch up and get the record that was just changed, then the user will have a consistent view fo their data.</p>
<p>This is great, but I think a further step is to have something publish the actual maximum lag of the slaves into a memcache key, and simply double that value as the timeout. This would allow maximum usage of the readonly slaves and keep the master server busy doing mostly writes.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2008/10/maximizing-usage-of-mysql-replication-slaves/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can more queries equal a healthier MySQL server?</title>
		<link>http://fewbar.com/2008/08/innodb-concurrency-problems-on-multi-core-boxes-possibly-a-thing-of-the-past/</link>
		<comments>http://fewbar.com/2008/08/innodb-concurrency-problems-on-multi-core-boxes-possibly-a-thing-of-the-past/#comments</comments>
		<pubDate>Sat, 30 Aug 2008 06:21:34 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[innodb]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=14</guid>
		<description><![CDATA[This week was an ugly one for my monster database servers. It should have been triumphant, but oddly enough, I think it shows how prone to mistuning InnoDB on MySQL 5.0 is with multiple cores.
This server is a multi-core, high concurrency server. The application has been designed a little bit naively in that it just [...]]]></description>
			<content:encoded><![CDATA[<p>This week was an ugly one for my monster database servers. It should have been triumphant, but oddly enough, I think it shows how prone to mistuning InnoDB on MySQL 5.0 is with multiple cores.</p>
<p>This server is a multi-core, high concurrency server. The application has been designed a little bit naively in that it just throws almost all queries at the main db server. Several bits have been designed to scale by not doing that, but unfortunately, huge amounts of functionality were built around those apps to prevent them from scaling.</p>
<p>As a result, we&#8217;ve had to scale up the central database server and its redundant systems significantly. We started with the Proliant DL380 G4 with two Xeon 3.4Ghz CPU&#8217;s and 12GB of RAM, and plenty of disks in an external RAID. As more traffic was added, we moved up to the DL580 servers with 4 Xeon 3.4Ghz and 64GB of RAM. This worked well, but still more traffic, and more data, was coming and the app wasn&#8217;t ready to change significantly. We finally landed on the latest DL580 server, with 1GB of total battery backed write cache, 14 SAS disks, 128GB of RAM, and two quad core Xeon CPU&#8217;s.<br />
<span id="more-14"></span><br />
Some things got better. Writes were now incredibly fast. The server was churning out 1000 queries per second easily. Sometimes during peak times, query response time would suffer, but ultimately, the box was keeping up and performing well. <a href="http://fewbar.com/2008/07/mysql-query-cache-scales-like-a-286-with-turbo-off/">Especially after we turned of query caching</a>. After this week though, I wonder how much of the problem was query caching&#8230; more later.</p>
<p>Anyway, whenever the server would need to have maintenance, some high traffic applications would suffer needlessly for their need of rarely changing data (memcached was out of the question for the complexity and &#8220;realtime&#8221; nature of this data). So we setup a selective replication fanout onto multiple boxes and pointed these apps at that cluster for these queries.</p>
<p>Well the next day, without all of these tiny queries pounding on it, the database server had horrible problems. 400 threads stacked up inside InnoDB &#8220;Waiting for InnoDB queue&#8221;. System resources were fine, but it was clear, InnoDB was having trouble. Queries that normally take 0.75 seconds were taking 300+ seconds, or just never completing. I knew there was real trouble, when killing the thread would result in it just changing state to &#8220;Killed&#8221;, but never dying. Based on what I&#8217;d read in High Performance MySQL, and <a href="http://www.mysqlperformanceblog.com/2006/06/05/innodb-thread-concurrency/">articles like this one</a>, I tried twiddling with innodb_thread_concurrency, innodb_concurrency_tickets, and innodb_thread_sleep_delay. None of them seemed to help, though innodb_thread_concurrency set to a value of about half the CPU cores seemed to delay the problems.</p>
<p>I noticed that we were running MySQL v5.0.51a still. We had planned an upgrade to 5.0.67, which was just recently released, but hadn&#8217;t gotten there yet. I went ahead and upgraded one of the boxes to it, and failed over to it. Instantly things were more healthy, and the health seemed to stay for hours, without any more InnoDB freakouts.</p>
<p>After some research, it would seem that between 5.0.51a and 5.0.67, a lot of really big fixes were made to InnoDB to help it scale up on multi-core machines. The box has been healthy for a couple of days, though there&#8217;s still a lot of work to do removing query load from the server.</p>
<p>But why would a _reduction_ in queries cause concurrency problems? I have a theory, but no real ideas on how to test it.</p>
<p>Before, we were doing 1000 queries per second. Things were healthy. We removed about 400 queries per second from that. These 400 queries were basically instantaneous.. often times returning no results at all and reading from tables and indexes completely stored in the innodb_buffer_pool. But, with query cache turned off, they were still being processed fully by InnoDB. When we removed these tiny queries from the queue imposed by innodb_thread_concurrency, I think we removed the equivalent of spin waits from the queue. These tiny, easy queries were just hard enough to process, to prevent a lot of bigger queries from hitting the queue at the same time. Thats why reducing innodb_thread_concurrency to 4 helped a bit.. with only 4 threads vying for mutexes and CPU resources constantly, InnoDB was able to (sort of) keep up.</p>
<p>My final bit of evidence for this is that we actually, I think, had this problem before with the <a href="http://fewbar.com/2008/07/mysql-query-cache-scales-like-a-286-with-turbo-off/">aforementioned article</a>. Turning off the query cache moved these tiny queries out of the query cache, and into the InnoDB queue, providing the needed pseudo-spin-waits to prevent it from locking in on itself.</p>
<p>I have to wonder if raising innodb_sync_spin_loops to something ridiculously high, like 50000, would have the same effect. Unfortunately, its very hard to test this without dedicating a lot of time to it.</p>
<p>So, in this case, it would seem that more work can, in fact, make the server healthier.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2008/08/innodb-concurrency-problems-on-multi-core-boxes-possibly-a-thing-of-the-past/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Query Cache defeats Serverzilla</title>
		<link>http://fewbar.com/2008/07/mysql-query-cache-scales-like-a-286-with-turbo-off/</link>
		<comments>http://fewbar.com/2008/07/mysql-query-cache-scales-like-a-286-with-turbo-off/#comments</comments>
		<pubDate>Tue, 15 Jul 2008 20:47:55 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=11</guid>
		<description><![CDATA[So a few days ago, my big mean MySQL server started having problems that were very hard to explain. It was slowing down, taking a minute to run queries that usually take a few seconds, and Linux load averages were in the teens, despite having quiet disks (less than 0.1% cpu IO wait time) and [...]]]></description>
			<content:encoded><![CDATA[<p>So a few days ago, my big mean MySQL server started having problems that were very hard to explain. It was slowing down, taking a minute to run queries that usually take a few seconds, and Linux load averages were in the teens, despite having quiet disks (less than 0.1% cpu IO wait time) and plenty of RAM (128G for about 200G of data total&#8230;).</p>
<p>The developers were stumped. The other systems guys were stumped. So was I. But it still seemed ok. We found all sorts of things to point fingers at, but nothing made sense.<br />
<span id="more-11"></span><br />
Then this Monday, everything came to a screeching halt. 3 second queries were taking 15 minutes. 30 second queries were never completing. The CPU&#8217;s were only a little busy. What gives?! This box has 8 CPU cores and 128G of RAM.. nothing can take it down, right?!</p>
<p>We threw our hands in the air and failed over to the active standby (the other side of our master&lt;-&gt;master replication pair). Suddenly all was well. But something smelled wrong. We blamed some kind of bug in MySQL.</p>
<p>I spent all day trying to make Memcached more efficient, and trying to explain why suddenly this beast was felled by such tiny arrows as instantaneous queries that should have been cached anyway.</p>
<p>Oh wait, did somebody say cached? As in the MySQL query cache? I mentioned this in the #mysql channel on <a href="http://freenode.net">Freenode</a>, and Mr. Eric Bergen (ebergen) from <a href="http://www.provenscaling.com/">Proven Scaling</a> immediately said something like &#8220;well duh, turn off the cache, moron&#8221;. I was dumbfounded. Shouldn&#8217;t it be helping us with all those tiny queries?</p>
<p>Well apparently not. <a href="http://lists.mysql.com/internals/35777">This recent thread on the MySQL internals list</a> talks about mutex contention in the query cache while it is *searched*, not just while it is updated. This is disasterous for an environment where thousands and thousands of tiny queries are being run constantly. Even with query_cache_type set to 2, or &#8220;cache on demand&#8221; mode, every query in the system must run through this mutex.</p>
<p>So, this morning when the standby box again cried for mercy, hitting max_connections and spinning all queries around in circles, I ran &#8216;SET GLOBAL query_cache_type=2&#8242;. Instantly the server became more healthy. I half expected to trade one problem for another.. with the server being consumed by tiny queries. But instead, these tiny queries did as expected, and took very little time to complete. And large queries against tables that change every second or 2 didn&#8217;t have to contend for the query cache, they just ran through like nothing.</p>
<p>So, it would appear that for any sort of multi-core installations of MySQL, the query cache is not only a waste, but a hazard!</p>
<p>Thanks again to Mr. Bergen. I would not have thought about that until he said it.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2008/07/mysql-query-cache-scales-like-a-286-with-turbo-off/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using memcachedb and memcached to make things scale</title>
		<link>http://fewbar.com/2008/06/using-memcachedb-and-memcached-make-things-scale/</link>
		<comments>http://fewbar.com/2008/06/using-memcachedb-and-memcached-make-things-scale/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 05:40:01 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[memcachedb]]></category>
		<category><![CDATA[sclability]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=9</guid>
		<description><![CDATA[I don&#8217;t remember exactly how I found memcachedb, however, it is one of those projects that somebody else beat me to the punch in writing. I mean, it was going to happen, as the need was there. Steve Chu, the author, did a great job of melding two open source projects, BerkeleyDB, and memcached, to [...]]]></description>
			<content:encoded><![CDATA[<p>I don&#8217;t remember exactly how I found <a href="http://www.memcachedb.org">memcachedb</a>, however, it is one of those projects that somebody else beat me to the punch in writing. I mean, it was going to happen, as the need was there. Steve Chu, the author, did a great job of melding two open source projects, <a href="http://www.oracle.com/database/berkeley-db/index.html">BerkeleyDB</a>, and <a href="http://www.danga.com/memcached/">memcached</a>, to produce something really very powerful<br />
<span id="more-9"></span><br />
Now, memcached has become almost completely ubiquitous in scaling web apps. Memcached is essentially a network enabled non-persistent data store. It is generally used as a <a href="http://en.wikipedia.org/wiki/Cache">write-back data cache</a>, meaning that you look in the faster cache, if nothing is there, you look in the slower place, then write the value back to the faster cache. Some industrious people have used it for session storage, and I&#8217;m sure a few other clever uses.</p>
<p>One of my favorite parts of memcached is how dead simple it is. The protocol is very easy to read, making debugging issues and writing new clients very easy. It uses the &#8220;least recently used&#8221; algorithm to move things out of the cache when it starts to fill up, so its extremely easy to understand how the whole thing works.</p>
<p>The cleverest part of using memcached has nothing to do with the service itself, but the API. The <a href="http://www.danga.com/">smart guys</a> who developed it figured out that they could hash the key, and pick the same server for reads/writes every time as long as the number of servers doesn&#8217;t change. This allows it to scale out to a ridiculous size and retains its simplicity and performance</p>
<p>Two problems arise when a site uses any caching, be it memcached or aggressive HTTP headers.</p>
<p>First, the site starts to rely on caching too heavily for performance. As an example, I had a situtation where the entire corpus of settings for each client site (hundreds of clients, hundreds of settings) was kept in memcached as one massive 200kB+ serialized PHP object. Every page view that needed to access any settings would grab this object at the beginning of the code, and use the object throughout.</p>
<p>This worked really great in some instances, as most of the biggest pages needed to access 30 &#8211; 50 settings each time. However, the trouble would come when there was a page that would get a high degree of concurrency, such as an iframe that gets displayed on every page of a major website, or on a page that gets slashdotted. It would be blazing fast, generating almost no load at all for a while, but whenever a setting would be changed (the settings application would clear the cache of settings for whichever client was edited), or the cache object would expire, the database would spike out of control.</p>
<p>The reason was this object took about 1-3 seconds to fetch from the database. Well with 1000 requests per second, thats 3000 requests that get a negative hit on the cache, and so, ask the database for the information. The solution was to cache each setting individually, and use a random skew on the expire time. This prevented the storm of requests whenever there was an expire, and it allowed items looked up in rapid succession to not expire all at once.</p>
<p>This brings us to the second problem with caching, and specifically memcached. The cache is sometimes mistaken for a data store. In the above example, by clearing out entries from memcached, the caching was essentially neutered. Any time during the day somebody might come along and blow out the cache. Thats fine with MySQL&#8217;s query cache, for instance, because that just makes queries come back faster. The connection is already made, one of the most painful parts has already happened. With memcached however, the cache can scale to many thousands of connections very cheaply, whereas doing this with most databases is expensive, if not impossible.</p>
<p>So to combat this, what is really needed is a persistent place to keep your data up to date when it is needed in an extremely high reads to write ratio. Thats where memcachedb is so attractive. Instead of keeping everything in RAM, memcachedb stores anything you put into it in a berkeleydb database. To boot, it can replicate this data to another machine, adding to its reliability and availability. This means that writes will be slower, and it won&#8217;t scale out nearly as cheaply, but thats ok for situations like this.</p>
<p>With memcachedb, we can change the setting management program to save the data into the database <strong>and </strong>memcachedb, confident in the fact that it will be there later. Then we don&#8217;t have write-back caching code in our application, we just remove the part that connects to the database for that data at all.</p>
<p>This has a huge benefit beyond just performance. With this scheme, we can write simple applications that won&#8217;t rely on the read/write database server ever being up. It also means that we don&#8217;t have to have a giant database server, or a huge replication fanout to get this data available in realtime.</p>
<p>There is of course the danger that memcachedb gets out of sync with the main db. Thats why in addition to writing to memcachedb whenever you write to the database server, you can also run a refresh script periodically that grabs all of the data from the database and walks through, writing items to memcachedb. Care must be taken here to make sure one doesn&#8217;t write stale data to memcachedb. The safest way is to include a timestamp with each record that can easily be compared. Another way to go is to just have this script alert you to items that are out of sync, requiring manually re-saving these records.</p>
<p>Memcachedb is, unfortunately, still a little raw. The replication setup is rather complex. It took me a little while to get it working the way I wanted with just two boxes. It definitely could use command line options to set replication options, so that slaves don&#8217;t accidentally promote themselves to masters. Right now one can only do that through the protocol, so I have a nagios plugin that checks it and changes it if it is wrong.</p>
<p>I think its important to note just how cool it is that 90% of memcachedb was written before it was conceived of. <a href="http://www.oracle.com/database/berkeley-db/index.html">BerkeleyDB</a> is one of the great open source success stories, having a successful business model built on free code, and eventually attracting enough attention from Oracle to get purchased. Then to merge that with memcached, which is one of those projects that makes you wish you had written it first, well, I think thats a stroke of genius. Good job Mr. Chu.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2008/06/using-memcachedb-and-memcached-make-things-scale/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
