ab – Apache Bench, understanding and getting tangible results.

Apache Bench (AB) is a very powerful tool when used right. I use it as a guideline for how to set up my apache2/httpd.conf files.
All too often I see people boasting that they can get an outrageous number of RPS in AB (the Apace Bench tool).

“OMG, I totally get 3,000 rps in an AWS micro instance!!!” (I’ve seen this on the likes of Serverfault)

Debunking misunderstandings:
Concurrency != users on site at same time
Requests per second != users on site at same time

Apache Bench is meant to give a ‘feel of pants’ diagnostic for the page/s you point it to. 

Every page on a website is different; and may require a different number of requests to load (resources: css, js, images, html, etc).  

Aspects of effective Apache benchmarking:

  1. Concurrent users
  2. Response time
  3. Requests

“Concurrent users” – Have you ever stopped to ask yourself: What the hell is a user? (in the Apache sense) We don’t stop to think about them individually, we just think about them as a ‘request’ or the ‘concurrent’ part of benchmarking.

When a user loads a page, their browser may start X connections at the same time to your server to load resources. This is a complex dynamic, because browsers have a built in limit of how many concurrent requests to make to a host at a given time (a hackable aspect). 

So at any given time, let’s say a user = 6 concurrent connections/requests.

“Response time” – What is an acceptable response time? What is a response? In the context of this article, it’s the round trip process involved with the transfer of a resource. This summarizes the intricacies of database queries and programming logic, etc into a measurable aspect for consideration in your benchmarking.

Is 300ms to load the HTML output of a Drupal website acceptable? 400? 500? 600? 700?

How fast does a static asset load? What is the typical size of an asset for your webpages? 10KiB? 20?

“Requests”Requests happen at the sub-second level, measured in milliseconds.
Let’s say the average page has 15 resources (images, js, css files, html data, etc) – aka: 15 requests per page

This means if a single user comes to load a page, there’s a good chance his/her browser will make 15 requests total.

Another part of this aspect is to be aware that the browser will perform these requests in a ‘trickle’ fashion, meaning one request to get the HTML, then an instant 6 requests (browser concurrency) but the next 8 will happen one at a time once concurrent connections free up.


Putting aspects together:
We have to draw an understanding of how these aspects all tie together to determine the start-to-finish load of a typical page.

Let’s say a page is 15 requests (14 images/js/css files and HTML) with an average payload of 10KB of data each.150KB of data.

A user/browser makes 6 simultaneous requests (All completing at slightly different times, ideally, at the same time).

Response time is the metric we’re interested in.

The questions we ask of ab are – given the current configuration and environmental conditions:

  • What’s the highest level of concurrency I can support loading a given asset in less than X milliseconds
  • How many requests per second can I support at that given level of concurrency

By attempting to answer these questions, we’ll derive the tipping point of the server – and what it’s bottlenecks are. (This article will not cover determining bottlenecks – just how to get meaning from ab)

Caveats
Naturally these seat of pants numbers are generated on a machine on the same network with plenty of elbow room – making them best case scenarios – in today’s world it’s close enough with how widespread HSI is. It also assumes the network can handle the throughput of the transfers involved with a page. It also assumes all assets are hosted on the same host, nothing cross domain, e.g.: Google Analytics, Web Fonts, etc.


How to test 
First, we need to classify the load. As mentioned a few times, there’s two types of site data: static files, and generated files.

Static files have extremely fast response times,  generated files take longer because they’re performing logic, database work and other things.

A browser doesn’t know what to load until it retrieves the HTML file containing the references to other resources – this changes how we look at timings, and most cases, the HTML document is generated.



First
Let’s simulate a single user/browser loading a page from an idle server.
First, we must get the HTML data…

So you can see, this took 29ms, at 299KB/s to generate the HTML from Drupal and send it across the pipe.


Now, let’s simulate a browser at 6 concurrent connections loading 15 assets at 10KiB each.

So here you can see, each request completed at 35ms or less – The whole test took about 560 ms.


This means that under pristine conditions, the user would have the entire webpage loaded in 589ms.

Finding the limits…
So, looking at our numbers – it’s clear that 1 user is a piece of cake. 
It’s also clear that there’s two big considerations into acceptable timings:

  1. Time to get HTML (our 1 generated request)
  2. Time to get references from that HTML (our 15 static requests)

We’re going to multiply our numbers and start pushing the server harder – let’s say by 30 times:

So once again, let’s emulate 30 users making 30 requests to our Drupal page:
 

Every user got the HTML file in ~298ms or less.


Next up, static files – 30 users = 180 concurrent connections, 450 requests.



So, here’s where things start getting interesting – 2% of the requests were slowed down to 600+ms per request. The exact cause of that is out of the scope of this article – could be IO – regardless, the numbers are still good and it’s clear that this is indicating the start of a tipping point for the host.


Take the HTML load time – 298ms
Do some fuzzy math here: 265ms* 2.5 = 662ms (mean total)  
957ms average for entire page to load.

This is ‘fuzzy math’ because I’m not simulating the procedural/trickle loading effect of browsers mentioned above (initial burst of 6 connections, and making them busy as they become available at different times to complete the 15 requests). But instead treating it as 6 requests in sets. Someone more mathematically inspired might calculate better numbers – I use this method because it’s my “buffer factor” that I use for variables (bursts of activity, latency changes, etc).

So from this data, we can say the server can sustain 30 constant users given our assumptions.

Phase two:
So now we have a ballpark figure of where our server will tip over. We’re going to perform two simultaneous ab sessions to put this to the test, this will simulate both worlds of content: generating content and loading the assets. This is the ‘honing’ phase where we dial down/up our configuration by smaller increments. 

  • 5 minutes of load at our proposed 30 users
  • 30 concurrent connections for our generated content page.
  • 180 concurrent connections for static data.

Ding ding ding! We’ve got a tipper!
Ok, so as you can see, for the most part, everything was fairly responsive. If the system could not keep up, most of the percentiles will have the awful numbers like the ones shown in the highest 2%. However, overall these numbers should be improved upon to deal with sporadic spikes above our estimates, and other environmental factors that may have a swaying factor in our throughput. 

How to interpret:
Discard the 100% – longest request; at these volumes it’s irrelevant.
However, the 98-99% matter – your goal should be to make the 98% fall under your target response time for the request. The 99% shows us that we’ve found our happy limit. 


(Remember, at these levels, theres many many variables involved – and the server should never reach this limit, that’s why we’re finding it out!)


Let’s tone our bench down to supplement 25 users and see what happens …



Wrap up
25 simultaneous users may not sound like a lot, but imagine a classroom of 25 students – and they all click to your webpage at the exact same moment – this is the kind of load we’re talking about; to have every one of those machines (under correct conditions) load up the website within 1 second


To turn that into a real requests per second: 375. (25 users @ 15 requests).


The configuration – workload (website code, images) and hardware (2 1Ghz CPU’s…) are capable of performing this at a constant rate – these benchmarks indicate that the hardware (or configuration) should be changed before it gets to this point to supplement growth. These benchmarks indicate that ~430 pageloads out of 21,492 will take longer than a second to load. In reality, the ebbs and flows of request peaks and valleys make these less likely to happen.


As you can see, the static files are served very fast in comparison to the generated content from Drupal.
If this Apache instance was backed by the likes of Varnish, the host would be revitalized to handle quite a bit more (depending on cache retention).


Testing hardware

  • 1x AWS EC2 Large instance on EBS – Apache host
    • 2 virtual cores with 2 EC2 Compute Units each
      • AKA:  (2) 2Ghz Xeon equiv
    • 7GB RAM
  • 1x AWS EC2 4X Large CPU – AB runner
    • 8 virtual cores with 3.25 EC2 Compute Units each 
      • AKA: (8) 3.25Ghz Xeon Equiv
    •  68GB RAM

4 thoughts on “ab – Apache Bench, understanding and getting tangible results.

  1.  

    How did you calculate 30  users ?

    1. I just pulled the number out of a hat to increase the load. It’s a long post, but a little bit up from the line you pasted I just came up with the number:

      We’re going to multiply our numbers and start pushing the server harder – let’s say by 30 times:

  2. Nice article. I’m load testing a server based on Netty. I use ab as well as wrk (nginx). I see a pattern where 99% of the requests are well under required latency, however the 100% or the max is always much much higher. You suggested discarding that. Can you please explain this behavior? Why is the 100 percentile so high?

    1. That’s a good question! I’m not sure. I could wager a guess between latency, thread contention and the ability of the AB program to keep it’s stack of connections and stats updated. AB is a mere tool to get a feel for the performance given certain styles of testing so I wouldn’t expect it to do everything perfectly.

      I could throw some ideas out for were I’d start investgating. I’d start by having the target server log the usec for responses in it’s access logs. “%D” http://httpd.apache.org/docs/2.2/mod/mod_log_config.html that would indicate if it was truly the Apache server involved (or just the AB/benchmarking box).
      Apache will do this even on static assets. It would be interesting to see if someone had the time to trace the long running processes; chances are it’ll point back to a hardware interrupt of some sort.

      However, for me; I simply toss them out because I just want to know the ‘near limit’ of the server. There’s too many variables involved ranging from cron, disk, network, memory and thread management, code, database, etc. The goal is (for me at least) to take that number of concurrent requests/rps and reduce it to a more conservative number to leave some wiggle room.

Leave a Reply

Your email address will not be published. Required fields are marked *