It’s summer and the Fail Whales have migrated back towards shore. There have been sightings aplenty and we’re all feeling like whale watchers this week. My own most recent Fail Whale sighting (about 2 minutes ago) gave me a chance to reflect on what the Twitter engineers must be facing right now as they try to manage unprecedented volume. 15 years of web app development experience (and some very similar challenges with high volume sites) leads me to the following thoughts. Please note, I have no actual insider knowledge about how Twitter is built or programmed. This is pure conjecture:
- The Fail Whale is triggered by the system either a) having no available threads to process a request or b) having a request time out while waiting to be filled.
- A request that has timed out has still occupied system resources while waiting to be processed, so the longer that timeout period is, the more likely a Fail Whale is to spawn more Fail Whales.
- A longer timeout period gives a request a better chance to be fulfilled, but means more resources will be occupied by pending requests. A longer wait also gives more chance for us impatient clickers to hit “Refresh” (I’m guilty, but I’m not the only one).
- A shorter timeout period means fewer stacked up requests, but a greater chance that any given request will timeout (calling more Fail Whales to the surface).
- Finally, most of us are never quite content to sit back and bask in the majesty of a single Fail Whale. Nope, we always try back in a few seconds or a few minutes, hoping to break out of the pod into open water and get back to our Twittering. Obviously, that doesn’t help because we just increase the load that much more.
So, my solution for the Fail Whale problem is…. hmm…. yeah, no solution that I can offer. I can just offer my sympathies for the Twitter engineers working on the problem and my hopes that they get those whales back out into the open ocean before we start to give up and look for other channels.










