<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>FewBar.com - Make it good &#187; PHP</title>
	<atom:link href="http://fewbar.com/category/tech/php/feed/" rel="self" type="application/rss+xml" />
	<link>http://fewbar.com</link>
	<description>Technology, life, and mischief, not in that order</description>
	<lastBuildDate>Thu, 08 Jul 2010 14:36:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Gearman K.O.&#8217;s mysql to solr replication</title>
		<link>http://fewbar.com/2010/03/gearman-replicate-mysql-to-solr/</link>
		<comments>http://fewbar.com/2010/03/gearman-replicate-mysql-to-solr/#comments</comments>
		<pubDate>Wed, 24 Mar 2010 05:47:36 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[gearman]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=154</guid>
		<description><![CDATA[Ding ding ding.. in this corner, wearing black shorts and a giant schema, we have over 11 million records in MySQL with a complex set of rules governing which must be searchable and which must not be. And in that corner, we have the contender, a kid from the back streets, outweighed and out reached [...]]]></description>
			<content:encoded><![CDATA[<p>Ding ding ding.. in this corner, wearing black shorts and a giant schema, we have over 11 million records in MySQL with a complex set of rules governing which must be searchable and which must not be. And in that corner, we have the contender, a kid from the back streets, outweighed and out reached by all his opponents, but still victorious in the queue shootout, with just open source, and 12 patch releases.. written in C, its <b><a href="http://gearman.org">gearman</a></b>!</p>
<p><a href="http://fewbar.com.s3.amazonaws.com/wp-content/uploads/2010/03/ko-mike-tyson.png"><img src="http://fewbar.com.s3.amazonaws.com/wp-content/uploads/2010/03/ko-mike-tyson.png" alt="" title="ko-mike-tyson" width="500" height="437" class="alignnone size-full wp-image-155" /></a><br />
<span id="more-154"></span></p>
<p>I&#8217;m pretty excited today, as I&#8217;m preparing to go live with the first real, high load application of Gearman that I&#8217;ve written. What is it you say? Well it is a simple trigger based replicator from mysql to <a href="http://lucene.apache.org/solr/">SOLR</a>.</p>
<p>I should say (because I know some of my colleagues read this blog) that I don&#8217;t actually believe in this design. Replication using triggers seems fraught with danger. It totally makes sense if you have a giant application and can&#8217;t track down everywhere that a table is changed. However, if your app is simple and properly abstracted, hopefully you know the 1 or 2 places that write to the table.</p>
<p>I should also say that I really can&#8217;t reveal all of the details. The general idea is pretty simple. Basically we have a trigger that dumps a primary key into gearman via the <a href="https://launchpad.net/gearman-mysql-udf">gearman MySQL UDFs</a>. The idea is just to tell a gearman worker &#8220;look at this record in that table&#8221;.</p>
<p>Once the worker picks it up, it applies some logic to the record.. &#8220;should this be searchable or not&#8221;. If the answer is yes it should be searchable, the worker pushes the record into SOLR. If not, the worker will make sure it is not in solr.</p>
<p>This at least is pretty simple. The end result is a system where we can rebuild the search index in parallel using multiple CPU&#8217;s (thank you to solr/lucene for being able to update indexes concurrently and efficiently btw). This is done by pushing all of the records in the table into the queue at once.</p>
<p>Anyway, gearmand is performing like a champ, libgearman and the gearman pecl module are doing great. I&#8217;m just really happy to see gearman rolled out in production, as I really do think it has that nice mix of simplicity and performance. I love the commandline client which makes it easy to write scripts to inject things into queues, or query workers.  This allows me to access a worker like this:</p>
<p><code>$ gearman -h gearmanbox -f all_workers -s<br />
Known Workers: 11</p>
<p>boxname_RealTimeUpdate_Queue_TriggerWorker_1 jobs=627366,restarts=0,memory_MB=4.27,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13311 jobs=304134,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13306 jobs=606126,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13314 jobs=576714,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13342 jobs=294846,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13347 jobs=376998,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13359 jobs=470508,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700<br />
boxname_RealTimeUpdate_Queue_Subject_13364 jobs=403182,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700<br />
boxname_RealTimeUpdate_Property_SolrPublish_ jobs=219630,restarts=0,memory_MB=6.19,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Queue_TriggerWorker_2 jobs=393642,restarts=0,memory_MB=4.27,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700<br />
boxname_RealTimeUpdate_Property_SolrBatchPub jobs=6,restarts=0,memory_MB=6.23,lastcheckin=Tue, 23 Mar 2010 22:37:28 -0700</code></p>
<p>Brilliant.. no need for html or HTTP.. just a nice simple commandline interface.</p>
<p>I think gearman still has a ways to go. I&#8217;d really like to see some more administration added to it. Deleting empty queues and quickly flushing all queues without restarting gearmand would be nice to haves. We&#8217;ll see what happens going forward, but for not, thanks so much to the gearman team (especially Eric Day who showed me gearman, and Brian Aker for pushing hard to release v0.12).</p>
<p>w00t!</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2010/03/gearman-replicate-mysql-to-solr/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bromine and Selenium &#8211; second and third most useful elements behind Oxygen</title>
		<link>http://fewbar.com/2009/11/bromine-and-selenium-tests-for-the-rests-of-u/</link>
		<comments>http://fewbar.com/2009/11/bromine-and-selenium-tests-for-the-rests-of-u/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 01:48:10 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Engineers]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[selenium]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=125</guid>
		<description><![CDATA[If you&#8217;re an engineer, you hate testing. Seriously, who likes doing what those mere mortal &#8220;users&#8221; do? We&#8217;re POWER users and we don&#8217;t need to use all those silly features on all those sites. Just look at Craigslist, clearly an engineer&#8217;s dream tool.
For web apps, testing actually isn&#8217;t *that* hard. The client program (the browser) [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re an engineer, you hate testing. Seriously, who likes doing what those mere mortal &#8220;users&#8221; do? We&#8217;re POWER users and we don&#8217;t need to use all those silly features on all those sites. Just look at Craigslist, clearly an engineer&#8217;s dream tool.</p>
<p>For web apps, testing actually isn&#8217;t *that* hard. The client program (the browser) is readily available on every platform known to man, and they generally don&#8217;t do much more than store and retrieve data in clever ways. So, its not like we have to fire up a Large Hadron Collider to observe the effects of our web app.<span id="more-125"></span><a href="http://fewbar.com/wp-content/uploads/2009/11/periodictable.jpg"><img class="size-full wp-image-127 alignleft" title="periodictable" src="http://fewbar.com/wp-content/uploads/2009/11/periodictable.jpg" alt="periodictable" width="234" height="142" /></a></p>
<p>Therein lies the problem though, as clicking around on web forms and entering the same email address, password, address, phone number, etc. etc., 100 times, is BORING.</p>
<p>Enter <a href="http://seleniumhq.org/">Selenium</a>. This amazing little tool has been on the scene for a little while now, but its just now getting some momentum. Click through to the website and watch &#8220;the magic&#8221; as they put it, but basically here&#8217;s how it goes:</p>
<ul>
<li> open their firefox plugin and click &#8216;record</li>
<li>do something</li>
<li>click &#8216;record&#8217; again.</li>
</ul>
<p>Then just save this little test case to a file, and the next time you change anything that might relate to the series of clicks and data entries you just made, run this test again. There are all kinds of assertions you can make while you&#8217;re doing something. Like &#8216;Make sure the title is X&#8217; or &#8216;make sure a link to Y exists&#8217;.</p>
<p>But wait, I could have done that with something like Test::More,  PHPUnit, or lime. Where&#8217;s the real benefit?</p>
<p>Well because Selenium remotely controls your browser, all those gotchya&#8217;s regarding javascript CSS incompatibilities can come into play here. Because Selenium can control Internet Explorer, Firefox, *and* Safari. In fact it can also control Opera, and according to their website, any browser that properly supports javascript fully.</p>
<p>This is really a nice evolutionary step for web shops, as tools like this generally are OS specific and cost a lot of money. Once again open source software appears where a need becomes somewhat ubiquitous.</p>
<p>You can even take it a step further. The next thing that generally happens in a web dev shop when they get bigger than 20 or 30 people is they hire people who actually <strong>like</strong> testing. Well not really, but they dislike it *less* than software engineers. These are QA engineers. And they <strong>DO</strong> like things to be orderly and efficient.</p>
<p><a href="http://seleniumhq.org/projects/bromine/">Bromine</a> is the answer for that. Its still pretty rough around the edges, but it gets the job done.</p>
<p>Again check out their website and watch the screencast, but basically it goes like this:</p>
<ul>
<li>Write selenium tests as specified above</li>
<li>Upload tests to Bromine server</li>
<li>Attach tests to requirements</li>
<li>Run selenium remote control on all required OS/browser version combinations (can you say virtualbox?)</li>
<li>Run tests</li>
</ul>
<p>Another nice thing about using bromine is now you are running your tests in a server side language, not just the Selenium IDE, which is limited to the IDE&#8217;s generated &#8220;Selenese&#8221; XML commands for tests. The IDE exports your basic test into PHP or Java, and then on the bromine server you can do interesting things, like check an IMAP box for an email, run a backend process, or send an SMS.</p>
<p>At first it may not seem like much, but eventually you end up with a multitude of useful tests for your web app that can be run all the time against development branches before release, and catch many problems. Quality means happier users, which hopefully means loyal users that keep coming back.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2009/11/bromine-and-selenium-tests-for-the-rests-of-u/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TokyoOops</title>
		<link>http://fewbar.com/2009/10/tokyo-tyrant-ignores-memcache-protocol-flags/</link>
		<comments>http://fewbar.com/2009/10/tokyo-tyrant-ignores-memcache-protocol-flags/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 04:28:52 +0000</pubDate>
		<dc:creator>clint</dc:creator>
				<category><![CDATA[Memcache]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[berkeleydb]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[memcachedb]]></category>
		<category><![CDATA[process]]></category>
		<category><![CDATA[RTFM]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[tokyotyrant]]></category>

		<guid isPermaLink="false">http://fewbar.com/?p=117</guid>
		<description><![CDATA[We had a fun time this week with TokyoTyrant. Recently it has become apparent that MemcacheDB has been all but abandoned. As fantastic as the early work was by Steve Chu, the project is in disrepair. That, coupled with the less than obvious failover for its replication combined to make us seek alternatives.


Brian Aker had [...]]]></description>
			<content:encoded><![CDATA[<p>We had a fun time this week with <a href="http://1978th.net/tokyotyrant/">TokyoTyrant</a>. Recently it has become apparent that <a href="http://www.memcachedb.org/">MemcacheDB</a> has been all but abandoned. As fantastic as the early work was by Steve Chu, the project is in disrepair. That, coupled with the <a href="http://fewbar.com/2009/03/memcachedb-fault-tolerance-procedures/">less than obvious failover for its replication</a> combined to make us seek alternatives.</p>
<p><a href="http://fewbar.com/wp-content/uploads/2009/10/virtual_stupidity.jpg"><img class="alignnone size-full wp-image-121" title="virtual_stupidity" src="http://fewbar.com.s3.amazonaws.com/wp-content/uploads/2009/10/virtual_stupidity.jpg" alt="virtual_stupidity" width="280" height="280" /></a></p>
<p><span id="more-117"></span><br />
<a href="http://krow.net">Brian Aker</a> had mentioned to me at one time that TokyoTyrant was way better than memcachedb and we should run it instead. I took notice and it turns out he&#8217;s right! It does basically the same thing, applying the memcache protocol to an on disk key/value store. However, the code is incredibly clean, well maintained, and runs extremely fast. There&#8217;s also a lot more flexibility, with the ability to choose between in-memory or on disk storage, hash tables, B+Tree&#8217;s, etc.</p>
<p>The availability of log based asynchronous master/master replication (somewhat similar to MySQL&#8217;s replication in concept) was probably one of the biggest wins, allowing much simpler failover (just move the IP, or DNS, or whatever) when compared to MemcacheDB&#8217;s adherence to BerkeleyDB&#8217;s replication setup, which is a single-master system implementing an election algorithm.</p>
<p>Somewhere during migration, we missed one tiny detail though. Sometimes, the devil is in the details. This is really the only evidence in <a href="http://1978th.net/tokyotyrant/spex.html#protocol">the documentation that tokyo tyrant has support for the memcache protocol</a>. It is very clear:</p>
<blockquote><p>Memcached Compatible Protocol</p>
<p>As for the memcached (ASCII) compatible protocol, the server implements the following commands; &#8220;set&#8221;, &#8220;add&#8221;, &#8220;replace&#8221;, &#8220;get&#8221;, &#8220;delete&#8221;, &#8220;incr&#8221;, &#8220;decr&#8221;, &#8220;stats&#8221;, &#8220;flush_all&#8221;, &#8220;version&#8221;, and &#8220;quit&#8221;. &#8220;noreply&#8221; options of update commands are also supported. However, &#8220;flags&#8221;, &#8220;exptime&#8221;, and &#8220;cas unique&#8221; parameters are ignored.</p></blockquote>
<p>Now, as I said, there&#8217;s nothing ambiguous about this. That would have helped, if anyone on my team had ever read it. We installed TokyoTyrant, pointed our basic test code at it, and it worked. This is really a process problem, not so much a technical one. The process must be to assume it won&#8217;t work, and test all the different use cases to make sure it works.</p>
<p>Now, why is that bit of the manual important? Well we use PHP. Specifically, we use the PECL &#8220;Memcache&#8221; module to access memcache protocol storage. Now, the Memcache module is mostly oriented toward caching in the memory based original memcached. It works great for memcachedb too, which simply ignores the exptime parameter. However, memcacheDB *does not* ignore &#8220;flags&#8221;.</p>
<p>And therein lies the problem. Users of the <a href="http://pecl.php.net/package/memcache">PECL Memcache module</a> may not know this, but the flags are *important*. There are two bits in that flags field that the Memcache module may set. Bit 0 is used to indicate whether or not the content has been serialized, and, therefore, on read, must be unserialized. Bit 1 is used to indicate whether or not the content has been gzipped.</p>
<p>So, while all of the strings that were stored in MemcacheDB and subsequently copied to TokyoTyrant worked great, the serialized objects, arrays, and gzipped values, were completely inoperative, as they were coming back to the code as strings and binary compressed data. The gzipped data was easy (turn off automatic gzip compression). The serialized data took some quick tap dancing to remedy, with code something like this:</p>
<p><code lang="php"><br />
class Memcache_BrokenFlags extends Memcache<br />
{<br />
public function get($key, &amp;$flags)<br />
{<br />
$v = parent::get($key, $flags);<br />
$uv = @unserialize($v);<br />
return $uv === false ? $v : $uv;<br />
}<br />
}<br />
</code></p>
<p>Luckily our code all uses one Factory method to spawn all &#8220;MemcacheDB&#8221; connections, so it was easy to substitute this in.</p>
<p>Eventually we can just change the code by segregating into things that always serialize, and things that don&#8217;t, and just do the serialization ourselves. This should eventually allow us to use the new <a href="http://pecl.php.net/package/tokyo_tyrant">tokyo_tyrant module in PECL</a>, which only reliably stores scalars (I noticed recent versions have added a call to the internal PHP function convert_to_string().. this is, I think, a mistake, but one that still leaves it up the programmer to explicitly serialize when serialization is desired).</p>
<p>This was a pretty big gotchya, and one that illustrates that even though sometimes us cowboy coders and sysadmins get annoyed when those pesky business people ask us for plans, schedules, expected impact, etc., and we keep assuring them we know whats up, its still important to actually know whats up, and make sure to RTFMC .. C as in, CAREFULLY.</p>
]]></content:encoded>
			<wfw:commentRss>http://fewbar.com/2009/10/tokyo-tyrant-ignores-memcache-protocol-flags/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
