Jul 15 2008

Query Cache defeats Serverzilla

So a few days ago, my big mean MySQL server started having problems that were very hard to explain. It was slowing down, taking a minute to run queries that usually take a few seconds, and Linux load averages were in the teens, despite having quiet disks (less than 0.1% cpu IO wait time) and plenty of RAM (128G for about 200G of data total…).

The developers were stumped. The other systems guys were stumped. So was I. But it still seemed ok. We found all sorts of things to point fingers at, but nothing made sense.

Then this Monday, everything came to a screeching halt. 3 second queries were taking 15 minutes. 30 second queries were never completing. The CPU’s were only a little busy. What gives?! This box has 8 CPU cores and 128G of RAM.. nothing can take it down, right?!

We threw our hands in the air and failed over to the active standby (the other side of our master<->master replication pair). Suddenly all was well. But something smelled wrong. We blamed some kind of bug in MySQL.

I spent all day trying to make Memcached more efficient, and trying to explain why suddenly this beast was felled by such tiny arrows as instantaneous queries that should have been cached anyway.

Oh wait, did somebody say cached? As in the MySQL query cache? I mentioned this in the #mysql channel on Freenode, and Mr. Eric Bergen (ebergen) from Proven Scaling immediately said something like “well duh, turn off the cache, moron”. I was dumbfounded. Shouldn’t it be helping us with all those tiny queries?

Well apparently not. This recent thread on the MySQL internals list talks about mutex contention in the query cache while it is *searched*, not just while it is updated. This is disasterous for an environment where thousands and thousands of tiny queries are being run constantly. Even with query_cache_type set to 2, or “cache on demand” mode, every query in the system must run through this mutex.

So, this morning when the standby box again cried for mercy, hitting max_connections and spinning all queries around in circles, I ran ‘SET GLOBAL query_cache_type=2′. Instantly the server became more healthy. I half expected to trade one problem for another.. with the server being consumed by tiny queries. But instead, these tiny queries did as expected, and took very little time to complete. And large queries against tables that change every second or 2 didn’t have to contend for the query cache, they just ran through like nothing.

So, it would appear that for any sort of multi-core installations of MySQL, the query cache is not only a waste, but a hazard!

Thanks again to Mr. Bergen. I would not have thought about that until he said it.


Jun 20 2008

OpenOffice’s achilles heel

Anybody who is in IT in America, has probably experienced that sinking feeling when somebody somehow introduces the latest version of Microsoft Office into their organization. It usually comes in like some corporate ninja while you’re not looking. Whether its an application that your accounting department writes with the new version of Access, or that Outlook plugin that somebody locked in to, you have to deal with it.

The most frustrating part of this for me is never that people are going to use Office. Its not a bad product. Whats frustrating, is that every 3 or 4 years, Microsoft somehow gets people to pay $300-$400 per user. As somebody who has used OpenOffice since it was called “StarOffice”, this is perplexing. There’s even a high quality mac version called NeoOffice, in case anybody still thinks you have to have X11 installed to run OpenOffice on Mac. The file format problem isn’t even an issue anymore. Microsoft has has priced office so high, people stay on very old versions as long as they can, ensuring that even office to office incompatibilities are common.

So why not use it? OpenOffice is totally free, and has all the features that most users care to use in Office.

Oh wait.. except one. The email client. OpenOffice has no Outlook competitor. Calc is like Excel. Writer is like Word. There’s a powerpoint equivilent too. But no Outlook. StarOffice was written before email was really on the radar. But they’ve had enough time by now, why haven’t they solved this?

I’m sure some OpenOffice users are happy with Thunderbird. I’m not. It just doesn’t work very well. I’ve been using Linux with Gnome/Ximian/SuSE/Novell’s Evolution for years, and it just keeps getting better. Even for users of MS Exchange, IMAP access works, and Evolution actually implements the shared calendaring and address book of Exchange. On my mac, I use Apple’s excellent mail application. But on Windows, people are kind of stuck.

I recently had a friend who tried 4 or 5 email clients on Windows, trying to get away from constantly dealing with his “pst” files crashing or just going slow. He gave up in defeat. There’s just nothing. Another friend has all his email forwarded to his Gmail account. I’m not a big fan, but he is. I think this is just crazy that people woudl choose a webmail-only client for email.

Maybe I’m just being too hard on Thunderbird. I got all excited when I saw a blog entry that said OpenOffice 3.0 would compete with Outlook. But its just Thunderbird, with Lightning. At first I was deflated by this. But with Lightning, maybe Thunderbird will work well. I still think Outlook is lightyears ahead of it in terms of usability.


Jun 17 2008

The Thread_Concurrency myth

Just a few weeks ago, I found out that thread_concurrency’s purported magical effects at correcting MySQL’s concurrency limitations (especially in 4.1) were something of a myth. It was a post on mysql’s lists that alerted me to this. Apparently it only works on Solaris, Linux’s threading library ignores this parameter completely. This is not to be confused with innodb_thread_concurrency, which is quite useful in controlling the flow of transactions through InnoDB. I think the problem really lies in the fact that the default my.cnf example configs tell us to set thread_concurrency to the number of CPU’s*2. They fail to mention that this only matters on Solaris, though the manual is quite clear.

I think I set this parameter to 1, 4, and 8 trying to see if it would affect things positively or negatively on quite a few 4.1 boxes. I always just sort of assumed it was going to help prevent any sort of snowballing of server load if it ever got hit hard.

Yet another example where its important to RTFM!