Nov 11 2008

I should write an innodb backup tool

One of my favorite bloggers, Peter over at Percona/MySQL Performance Blog, has thrown down the gauntlet. Basically saying that it would be fairly trivial to write an incremental InnoDB backup tool.

If you see me, and I haven’t run up to you and told you that I am writing/have written an amazing InnoDB incremental backup tool, I give you permission to make fun of me. This sounds like a fun, interesting project that will challenge me and sort of scratches an itch I have, which is, faster MySQL backups.


Oct 4 2008

Deciding whether to send reads to slave or master

There are quite a few articles out there that talk about how to give your application some context and send reads to one server, and writes to another. There are even some mentions of marking your connection “dirty” and then sending all reads to the write server.

As a first try at scaling things, I recently made a change to our web application’s data access layer where reads went to a group of readonly slaves. However, if a write was made to a database, a value was put into the user’s session, saying that the database was dirty, and causing all subsequent reads to go to the master server.

This was good as users would use the readonly slaves as long as they hadn’t changed anything in the database. The real problem though, was that as soon as the user logged in, their account was updated to say that they had logged in, marking that database dirty.

Rather than try to cleverly change this one problem, we changed the “dirty” value from a boolean to a timestamp. Whenever the user writes to the database, it records the current time in their session. Then a global timeout is applied to that. This gives the replication slaves time to catch up and get the record that was just changed, then the user will have a consistent view fo their data.

This is great, but I think a further step is to have something publish the actual maximum lag of the slaves into a memcache key, and simply double that value as the timeout. This would allow maximum usage of the readonly slaves and keep the master server busy doing mostly writes.


Aug 29 2008

Can more queries equal a healthier MySQL server?

This week was an ugly one for my monster database servers. It should have been triumphant, but oddly enough, I think it shows how prone to mistuning InnoDB on MySQL 5.0 is with multiple cores.

This server is a multi-core, high concurrency server. The application has been designed a little bit naively in that it just throws almost all queries at the main db server. Several bits have been designed to scale by not doing that, but unfortunately, huge amounts of functionality were built around those apps to prevent them from scaling.

As a result, we’ve had to scale up the central database server and its redundant systems significantly. We started with the Proliant DL380 G4 with two Xeon 3.4Ghz CPU’s and 12GB of RAM, and plenty of disks in an external RAID. As more traffic was added, we moved up to the DL580 servers with 4 Xeon 3.4Ghz and 64GB of RAM. This worked well, but still more traffic, and more data, was coming and the app wasn’t ready to change significantly. We finally landed on the latest DL580 server, with 1GB of total battery backed write cache, 14 SAS disks, 128GB of RAM, and two quad core Xeon CPU’s.

Some things got better. Writes were now incredibly fast. The server was churning out 1000 queries per second easily. Sometimes during peak times, query response time would suffer, but ultimately, the box was keeping up and performing well. Especially after we turned of query caching. After this week though, I wonder how much of the problem was query caching… more later.

Anyway, whenever the server would need to have maintenance, some high traffic applications would suffer needlessly for their need of rarely changing data (memcached was out of the question for the complexity and “realtime” nature of this data). So we setup a selective replication fanout onto multiple boxes and pointed these apps at that cluster for these queries.

Well the next day, without all of these tiny queries pounding on it, the database server had horrible problems. 400 threads stacked up inside InnoDB “Waiting for InnoDB queue”. System resources were fine, but it was clear, InnoDB was having trouble. Queries that normally take 0.75 seconds were taking 300+ seconds, or just never completing. I knew there was real trouble, when killing the thread would result in it just changing state to “Killed”, but never dying. Based on what I’d read in High Performance MySQL, and articles like this one, I tried twiddling with innodb_thread_concurrency, innodb_concurrency_tickets, and innodb_thread_sleep_delay. None of them seemed to help, though innodb_thread_concurrency set to a value of about half the CPU cores seemed to delay the problems.

I noticed that we were running MySQL v5.0.51a still. We had planned an upgrade to 5.0.67, which was just recently released, but hadn’t gotten there yet. I went ahead and upgraded one of the boxes to it, and failed over to it. Instantly things were more healthy, and the health seemed to stay for hours, without any more InnoDB freakouts.

After some research, it would seem that between 5.0.51a and 5.0.67, a lot of really big fixes were made to InnoDB to help it scale up on multi-core machines. The box has been healthy for a couple of days, though there’s still a lot of work to do removing query load from the server.

But why would a _reduction_ in queries cause concurrency problems? I have a theory, but no real ideas on how to test it.

Before, we were doing 1000 queries per second. Things were healthy. We removed about 400 queries per second from that. These 400 queries were basically instantaneous.. often times returning no results at all and reading from tables and indexes completely stored in the innodb_buffer_pool. But, with query cache turned off, they were still being processed fully by InnoDB. When we removed these tiny queries from the queue imposed by innodb_thread_concurrency, I think we removed the equivalent of spin waits from the queue. These tiny, easy queries were just hard enough to process, to prevent a lot of bigger queries from hitting the queue at the same time. Thats why reducing innodb_thread_concurrency to 4 helped a bit.. with only 4 threads vying for mutexes and CPU resources constantly, InnoDB was able to (sort of) keep up.

My final bit of evidence for this is that we actually, I think, had this problem before with the aforementioned article. Turning off the query cache moved these tiny queries out of the query cache, and into the InnoDB queue, providing the needed pseudo-spin-waits to prevent it from locking in on itself.

I have to wonder if raising innodb_sync_spin_loops to something ridiculously high, like 50000, would have the same effect. Unfortunately, its very hard to test this without dedicating a lot of time to it.

So, in this case, it would seem that more work can, in fact, make the server healthier.


Aug 12 2008

The new fad: Outsourced Parachute Packing

Holy cow, did you read about this company “The Linkup” losing 45% of its customers’ data?! How about they change their name to “The @$%! Up”.

First off, let me say that these guys didn’t have to be retarded to lose this much data. In fact, there are (were?) probably a lot of really talented people who designed and built this system to avoid such things.

I’m an optimist, so I have to believe somebody raised their voice at a meeting when data was shipped off to some loosely linked company from some past relationship. The finger pointing going on now is exactly what nobody ever wants to see happen to something they built.

Nirvanix says it has not deleted any customer data, and promises that its Storage Delivery Network is immune to the problem that plagued The Linkup. At The Linkup, a “system administrator ran a script that misidentified active account data and disassociated physical files from their owners,” Nirvanix says. “This led to files being marked offline in the old Streamload/MediaMax file system when they shouldn’t have been.” Iverson, meanwhile, claims it was a Nirvanix engineer who caused the data loss.

Hiring managers beware.. if you receive a resume of a Systems Administrator from the San Diego area with a couple of years missing and immediate availability, you might want to ask him if he knows what the -f argument to rm does.

We’ve all been there at the meeting that produces this type of situation though. Somebody finds some amazing 3rd party vendor that will do for the company what now takes a little resources, but will, when the company gets successful, take a lot of resources. Rather than scale our business up, why not just jump onto the coat tails of some other already successful business and rake in the dough, not needing any real technology, just gluing things together.

This is the essence of things like Amazon S3 and EC2, and Google’s AppEngine. These companies have built massive clusters to deal with their biggest days.. and somebody over there got smart and said “hey, 1% of our architecture is 200 times the power of most startups. We should sell it to them.”

The problem is, one day Amazon will need that power, and they won’t think twice about cutting off their tiny little source of revenue to make sure they can sell copies of Oprah’s latest selection. Their core business is selling stuff over the web, so why should they care if EC2 and S3 go slow, stop working, or function better than most people’s multi-thousand-dollar-a-month contracts with ISP’s.

So, if something is key to your success, even if its not “your core competency”, I say build it into your business model to gain that competency. I wouldn’t pay somebody to pack my parachute if I were a sky diver. I’d pay somebody to teach me, maybe even to check it for me, but I want to know that I packed it right when I pull the cord.. I don’t want a little flag to fly out saying “Sorry, We gave up our parachute packing business 2 days ago. Might want to start praying..”


Jul 15 2008

Query Cache defeats Serverzilla

So a few days ago, my big mean MySQL server started having problems that were very hard to explain. It was slowing down, taking a minute to run queries that usually take a few seconds, and Linux load averages were in the teens, despite having quiet disks (less than 0.1% cpu IO wait time) and plenty of RAM (128G for about 200G of data total…).

The developers were stumped. The other systems guys were stumped. So was I. But it still seemed ok. We found all sorts of things to point fingers at, but nothing made sense.

Then this Monday, everything came to a screeching halt. 3 second queries were taking 15 minutes. 30 second queries were never completing. The CPU’s were only a little busy. What gives?! This box has 8 CPU cores and 128G of RAM.. nothing can take it down, right?!

We threw our hands in the air and failed over to the active standby (the other side of our master<->master replication pair). Suddenly all was well. But something smelled wrong. We blamed some kind of bug in MySQL.

I spent all day trying to make Memcached more efficient, and trying to explain why suddenly this beast was felled by such tiny arrows as instantaneous queries that should have been cached anyway.

Oh wait, did somebody say cached? As in the MySQL query cache? I mentioned this in the #mysql channel on Freenode, and Mr. Eric Bergen (ebergen) from Proven Scaling immediately said something like “well duh, turn off the cache, moron”. I was dumbfounded. Shouldn’t it be helping us with all those tiny queries?

Well apparently not. This recent thread on the MySQL internals list talks about mutex contention in the query cache while it is *searched*, not just while it is updated. This is disasterous for an environment where thousands and thousands of tiny queries are being run constantly. Even with query_cache_type set to 2, or “cache on demand” mode, every query in the system must run through this mutex.

So, this morning when the standby box again cried for mercy, hitting max_connections and spinning all queries around in circles, I ran ‘SET GLOBAL query_cache_type=2′. Instantly the server became more healthy. I half expected to trade one problem for another.. with the server being consumed by tiny queries. But instead, these tiny queries did as expected, and took very little time to complete. And large queries against tables that change every second or 2 didn’t have to contend for the query cache, they just ran through like nothing.

So, it would appear that for any sort of multi-core installations of MySQL, the query cache is not only a waste, but a hazard!

Thanks again to Mr. Bergen. I would not have thought about that until he said it.


Jun 25 2008

Using memcachedb and memcached to make things scale

I don’t remember exactly how I found memcachedb, however, it is one of those projects that somebody else beat me to the punch in writing. I mean, it was going to happen, as the need was there. Steve Chu, the author, did a great job of melding two open source projects, BerkeleyDB, and memcached, to produce something really very powerful

Now, memcached has become almost completely ubiquitous in scaling web apps. Memcached is essentially a network enabled non-persistent data store. It is generally used as a write-back data cache, meaning that you look in the faster cache, if nothing is there, you look in the slower place, then write the value back to the faster cache. Some industrious people have used it for session storage, and I’m sure a few other clever uses.

One of my favorite parts of memcached is how dead simple it is. The protocol is very easy to read, making debugging issues and writing new clients very easy. It uses the “least recently used” algorithm to move things out of the cache when it starts to fill up, so its extremely easy to understand how the whole thing works.

The cleverest part of using memcached has nothing to do with the service itself, but the API. The smart guys who developed it figured out that they could hash the key, and pick the same server for reads/writes every time as long as the number of servers doesn’t change. This allows it to scale out to a ridiculous size and retains its simplicity and performance

Two problems arise when a site uses any caching, be it memcached or aggressive HTTP headers.

First, the site starts to rely on caching too heavily for performance. As an example, I had a situtation where the entire corpus of settings for each client site (hundreds of clients, hundreds of settings) was kept in memcached as one massive 200kB+ serialized PHP object. Every page view that needed to access any settings would grab this object at the beginning of the code, and use the object throughout.

This worked really great in some instances, as most of the biggest pages needed to access 30 - 50 settings each time. However, the trouble would come when there was a page that would get a high degree of concurrency, such as an iframe that gets displayed on every page of a major website, or on a page that gets slashdotted. It would be blazing fast, generating almost no load at all for a while, but whenever a setting would be changed (the settings application would clear the cache of settings for whichever client was edited), or the cache object would expire, the database would spike out of control.

The reason was this object took about 1-3 seconds to fetch from the database. Well with 1000 requests per second, thats 3000 requests that get a negative hit on the cache, and so, ask the database for the information. The solution was to cache each setting individually, and use a random skew on the expire time. This prevented the storm of requests whenever there was an expire, and it allowed items looked up in rapid succession to not expire all at once.

This brings us to the second problem with caching, and specifically memcached. The cache is sometimes mistaken for a data store. In the above example, by clearing out entries from memcached, the caching was essentially neutered. Any time during the day somebody might come along and blow out the cache. Thats fine with MySQL’s query cache, for instance, because that just makes queries come back faster. The connection is already made, one of the most painful parts has already happened. With memcached however, the cache can scale to many thousands of connections very cheaply, whereas doing this with most databases is expensive, if not impossible.

So to combat this, what is really needed is a persistent place to keep your data up to date when it is needed in an extremely high reads to write ratio. Thats where memcachedb is so attractive. Instead of keeping everything in RAM, memcachedb stores anything you put into it in a berkeleydb database. To boot, it can replicate this data to another machine, adding to its reliability and availability. This means that writes will be slower, and it won’t scale out nearly as cheaply, but thats ok for situations like this.

With memcachedb, we can change the setting management program to save the data into the database and memcachedb, confident in the fact that it will be there later. Then we don’t have write-back caching code in our application, we just remove the part that connects to the database for that data at all.

This has a huge benefit beyond just performance. With this scheme, we can write simple applications that won’t rely on the read/write database server ever being up. It also means that we don’t have to have a giant database server, or a huge replication fanout to get this data available in realtime.

There is of course the danger that memcachedb gets out of sync with the main db. Thats why in addition to writing to memcachedb whenever you write to the database server, you can also run a refresh script periodically that grabs all of the data from the database and walks through, writing items to memcachedb. Care must be taken here to make sure one doesn’t write stale data to memcachedb. The safest way is to include a timestamp with each record that can easily be compared. Another way to go is to just have this script alert you to items that are out of sync, requiring manually re-saving these records.

Memcachedb is, unfortunately, still a little raw. The replication setup is rather complex. It took me a little while to get it working the way I wanted with just two boxes. It definitely could use command line options to set replication options, so that slaves don’t accidentally promote themselves to masters. Right now one can only do that through the protocol, so I have a nagios plugin that checks it and changes it if it is wrong.

I think its important to note just how cool it is that 90% of memcachedb was written before it was conceived of. BerkeleyDB is one of the great open source success stories, having a successful business model built on free code, and eventually attracting enough attention from Oracle to get purchased. Then to merge that with memcached, which is one of those projects that makes you wish you had written it first, well, I think thats a stroke of genius. Good job Mr. Chu.


Jun 20 2008

OpenOffice’s achilles heel

Anybody who is in IT in America, has probably experienced that sinking feeling when somebody somehow introduces the latest version of Microsoft Office into their organization. It usually comes in like some corporate ninja while you’re not looking. Whether its an application that your accounting department writes with the new version of Access, or that Outlook plugin that somebody locked in to, you have to deal with it.

The most frustrating part of this for me is never that people are going to use Office. Its not a bad product. Whats frustrating, is that every 3 or 4 years, Microsoft somehow gets people to pay $300-$400 per user. As somebody who has used OpenOffice since it was called “StarOffice”, this is perplexing. There’s even a high quality mac version called NeoOffice, in case anybody still thinks you have to have X11 installed to run OpenOffice on Mac. The file format problem isn’t even an issue anymore. Microsoft has has priced office so high, people stay on very old versions as long as they can, ensuring that even office to office incompatibilities are common.

So why not use it? OpenOffice is totally free, and has all the features that most users care to use in Office.

Oh wait.. except one. The email client. OpenOffice has no Outlook competitor. Calc is like Excel. Writer is like Word. There’s a powerpoint equivilent too. But no Outlook. StarOffice was written before email was really on the radar. But they’ve had enough time by now, why haven’t they solved this?

I’m sure some OpenOffice users are happy with Thunderbird. I’m not. It just doesn’t work very well. I’ve been using Linux with Gnome/Ximian/SuSE/Novell’s Evolution for years, and it just keeps getting better. Even for users of MS Exchange, IMAP access works, and Evolution actually implements the shared calendaring and address book of Exchange. On my mac, I use Apple’s excellent mail application. But on Windows, people are kind of stuck.

I recently had a friend who tried 4 or 5 email clients on Windows, trying to get away from constantly dealing with his “pst” files crashing or just going slow. He gave up in defeat. There’s just nothing. Another friend has all his email forwarded to his Gmail account. I’m not a big fan, but he is. I think this is just crazy that people woudl choose a webmail-only client for email.

Maybe I’m just being too hard on Thunderbird. I got all excited when I saw a blog entry that said OpenOffice 3.0 would compete with Outlook. But its just Thunderbird, with Lightning. At first I was deflated by this. But with Lightning, maybe Thunderbird will work well. I still think Outlook is lightyears ahead of it in terms of usability.


Jun 17 2008

The Thread_Concurrency myth

Just a few weeks ago, I found out that thread_concurrency’s purported magical effects at correcting MySQL’s concurrency limitations (especially in 4.1) were something of a myth. It was a post on mysql’s lists that alerted me to this. Apparently it only works on Solaris, Linux’s threading library ignores this parameter completely. This is not to be confused with innodb_thread_concurrency, which is quite useful in controlling the flow of transactions through InnoDB. I think the problem really lies in the fact that the default my.cnf example configs tell us to set thread_concurrency to the number of CPU’s*2. They fail to mention that this only matters on Solaris, though the manual is quite clear.

I think I set this parameter to 1, 4, and 8 trying to see if it would affect things positively or negatively on quite a few 4.1 boxes. I always just sort of assumed it was going to help prevent any sort of snowballing of server load if it ever got hit hard.

Yet another example where its important to RTFM!


Jun 17 2008

FewBar.com comes alive

I’ve been promising for some time to create a blog about Technology, Life, and mischief in general. Fewbar.com is just my good-faith first step in getting that done. Only time will tell whether this will be the next clearing house for genius technical advice, or just another empty Wordpress blog on the web.

While you wait patiently for the amazingly useful and poignant posts to start rolling out, please do take a moment to brighten everybody in the office’s day with this ingenious little perl script. Its especially useful for informing your users that they are “TREE KILLERS!”, or for passing along any ex-girlfriends’ phone numbers to the other men in the office.. you know.. in the interest of making sure she can find a date.