Load testing for Web servers

Before moving anything important into production, you should test if it works.

While this sounds easy enough, it isn’t. In this specific case, we will talk about doing load tests for webservers. While web serving is usually a non issue when serving static .html pages without big downloads, things get much more interesting when you’re using some kind of automatically generated content. A blog is usually a simplified content management system with not-that-much features. But there are bigger CMS, which for example might generate pictures automatically. They might also have search possibilities built in, which is always non-cacheable.

Things get even more interesting when we’re talking about completely interactive, mostly not-cacheable sites like discussion portals. But i don’t have any experience with those.

Load testing isn’t difficult, but the important thing is to use the right numbers. Getting them is much more difficult than doing the testing itself.

When we’re talking about smaller projects, you usually have to gather these numbers on your own – more about this below. When we’re talking about bigger projects, you can use the numbers used to size the system (You did size the system, right?).

Your first step would usually be to gather the necessary data for normal load. If you’re a startup, and don’t have any data for a normal load, then estimate. Then use three times your estimate.

But what should you estimate? What kind of numbers do we need?

  • Number of concurrent users
  • Delay between hits

You can usually calculate these numbers using your previous statistics. Never use your hits statistic, always use your visitor statistic. Why? Because the hits will change together with your page layout.

Another important part is the time a visitor spends on your site during a visit, and how many pagehits he generated in this time. If you’re not using frames, then you can use the page hits of your old page of your new page. You will need to calculate the number of hits (as opposed to page hits) for the new site, though.

There’s no simple hands on formula you can use to get valid values to use with your load testing software. See if the numbers “feel” right. For a small company, the concurrent visitors can easily be below ten. For sites like bluewin.ch they can be much, much higher.

You can configure your web browser to use Sproxy, and you can then gather an appropriate URL List for Siege easily. It supports both GET and POST requests, which means it will work with everything you can throw at it. When creating a URLs.txt, make sure to include pages which use heavy processing power on the backends (like searches) multiple times to simulate a more real load. You shouldn’t filter out images or static pages from your URLs.txt to simulate a more real load.

Then you can run siege. At first, you should run with low concurrency and a delay of 1 second. This is just to make sure that everything works as it is supposed to work. You can also use this to setup your monitors and testing stuff.

  • Monitor the operating system – usually your statistics system (i use Cacti, MoM and other tools will work fine, too) won’t gather everything. Have a window with vmstat 1 and one with top open, when you’re using Linux. On windows, i would recommend Process Explorer. If you’re running into limits, it’s important to see where your system comes to it’s limit. Processes can be either IO-bound (DB, Filesystem, Network), CPU-bound or application bound (the last is the only one not easily fixable using money).
  • Monitor your web server. Apache has a nice server-status handler. I don’t know about IIS, but i’m sure it has equivalent tools.
  • Monitor your back end. This one depends a lot. For the standard SMB/SOHO setup, the backend usually is a MySQL database. SHOW STATUS is your friend, though you will want to have a look at the manual to be able to interpret these values. For bigger CMS, this will be completely different. Trust the experts you’ve hired to monitor it.
  • Monitor your load balancer. This one is also different depending on the solution you use.

Note that you should monitor the operating system on all systems affected. Start the test slow, and increase at a steady rate. There’s a tool included called bombardment which does the increasing on it’s own.

When you’ve successfully passed a load test at twice the expected maximum load, you should make a backup of everything. Start defining sane resource limits for all processes and servers you’re using. Verify again with the expected maximum load. The results should not change.

Now run siege with the -b option, to run a benchmark. You will probably have to up the resource limits for the shell your siege process is running in, because the defaults don’t suffice for many, many virtual users. Wait for the first component to get killed by resource limits.

If nothing gets killed by resource limits, you either have insane hardware, or you’re not trying hard enough. If a machine runs into it’s hardware limits before the resource limits you have defined, it may crash (thus the backup). If you’re using a multi tier infrastructure, make sure to also run load tests behind your caching frontend servers. This is necessary to ensure proper resource limits on your back end machines. Otherwise, a bug, runaway process or something similar can cause one of your backend machines to crash.

 

Posted in Labels: , |

0 comments:

Related Posts Plugin for WordPress, Blogger...