Monday, 22 July 2013

When web traffic hits eleven

There's the famous Spinal Tap joke, about the musician who believes that because his amplifier goes up to eleven (when the volume knob usually only goes up to ten) it is the most powerful. That iconic joke has become a shorthand with the same meaning as 'giving it all the wellie you can' and 'going for the max'. (It even has its own Wikipedia page.)

A web service I manage recently had a 'up to eleven' moment when I was called one evening recently because the system had stopped/crashed and it seemed that a lot of queries were coming in from one source. It turned out that one of the users of the service had decided to carry out an unscheduled load test and had inadvertently overloaded us.

This got me thinking out how to deal with spikes in web traffic: what causes them and how can you deal with them (and following on from Elaine's notes on testing last time). In the real (ie non-testing) world the kind of thing that generates a rush of traffic is something like a nationwide TV ad promoting the URL. It may cause an almost instantaneous rush of people to their computers, tablets etc, similar to the legendary burst of electricity when people all over the country put the kettle on during half time in the cup-final (it's a UK thing but I'm sure every country has its equivalent).

Of course even if everyone pushed the button simultaneously the requests would not hit the servers at the same time. Network latency tends to spread the traffic out over a short period; we can assume a bell-shaped gaussian distribution. Also not everyone will actually do the action at the same time which further spreads the curve. In the end what proves crucial is a combination of the peak traffic per second and how quickly you process that traffic. If the application takes time to process the requests (because it's consulting a database and making calculations) then several queries will be working their way through the system at the same time, all slightly out of sync with each other. If you have a bottleneck because you're not processing queries as fast as they're arriving, then initially your response slows down and eventually it may grind to a halt.

Some of this load is handled by your web server and some by any back end database. Either or both of these can be a bottleneck, and it can be especially difficult if both run on the same machine or virtual machine.

The Whitesites blog has some musings on How much traffic can one server handle (Feb 2013).

Using the cloud can help, as long as you know the spikes are coming, since you can spin up more capacity at relativly short notice ... but some notice is required. Doug Heise on theiMedia Connection discusses How to prepare your website for a traffic spike and goes further than the technical issues since some of the planning for such events should be driven by marketing. He notes that Gartner say Marketing budgets will exceed IT budgets by 2017.

Your hosting company will help, (as this post on Atlantica outlines: How To Find A Hosting Solution That Handles Traffic Spikes), I would add that the technical support engineers will probably know a lot more about how the equipment works at a low level and can use a variety of tools you may not have heard of to suggest how you can overcome speed problems.

Eventually it may come down to a cost-benefit analysis where you decide how likely it is that you'll be hit by spikes and can you afford to go off-line for short while if the worst happens. After all, real people don't usually behave like load testing programs ... do they?