Tales of an IT Nobody

devbox:~$ iptables -A OUTPUT -j DROP

Certbot on Amazon Linux without using Yum – Fix [Errno 2] No such file or directory May 18, 2019

So let’s say you’re running an aging version of Amazon Linux and don’t want to blow up your system by wedging in yum repos from distributions that aren’t quite in line with the CentOS derived Amazon Linux.
Instructions on the web call on users to use Fedora or RHEL yum repos for CentOS users; but on Amazon Linux, you’re kind of twice-removed.

So long-story short, here’s some fodder for those who want the benefits of LetsEncrypt without the fluff of a repo.

My instructions will be for Apache/HTTPD, but you’ll see the key linch-pin item below.

First, start by downloading Certbot by hand:

Second, back up your Apache HTTPD configuration:

Third, test certbot-auto and let it ‘bootstrap’ dependencies:
** An error is likely here**

Error from certbot – “creating virtual environment” gives an error: “No such file or directory”:
After running the command above – you may see this error after it installs the dependencies for certbot:

After some searching, I’ve found that this is really easy to solve!

To fix, upgrade pip and REMOVE virtualenv:

Now you’ll see that certbot works like a champ!

Once you’ve established a working test configuration with certbot – you should see a LetsEncrypt test certificate on your site, it’s time to run the real command without the --test flag.

certbot-auto --debug --apache

If all goes well, you’ll have a completely valid and proper SSL certificate for free via LetsEncrypt!

I won’t cover the automation aspect as there are already endless write-ups on how to do that.

No Comments on Certbot on Amazon Linux without using Yum – Fix [Errno 2] No such file or directory
Tags:
Categories: apache servers

ab – Apache Bench, understanding and getting tangible results. December 3, 2011

Apache Bench (AB) is a very powerful tool when used right. I use it as a guideline for how to set up my apache2/httpd.conf files.
All too often I see people boasting that they can get an outrageous number of RPS in AB (the Apace Bench tool).

“OMG, I totally get 3,000 rps in an AWS micro instance!!!” (I’ve seen this on the likes of Serverfault)

Debunking misunderstandings:
Concurrency != users on site at same time
Requests per second != users on site at same time

Apache Bench is meant to give a ‘feel of pants’ diagnostic for the page/s you point it to. 

Every page on a website is different; and may require a different number of requests to load (resources: css, js, images, html, etc).  

Aspects of effective Apache benchmarking:

  1. Concurrent users
  2. Response time
  3. Requests

“Concurrent users” – Have you ever stopped to ask yourself: What the hell is a user? (in the Apache sense) We don’t stop to think about them individually, we just think about them as a ‘request’ or the ‘concurrent’ part of benchmarking.

When a user loads a page, their browser may start X connections at the same time to your server to load resources. This is a complex dynamic, because browsers have a built in limit of how many concurrent requests to make to a host at a given time (a hackable aspect). 

So at any given time, let’s say a user = 6 concurrent connections/requests.

“Response time” – What is an acceptable response time? What is a response? In the context of this article, it’s the round trip process involved with the transfer of a resource. This summarizes the intricacies of database queries and programming logic, etc into a measurable aspect for consideration in your benchmarking.

Is 300ms to load the HTML output of a Drupal website acceptable? 400? 500? 600? 700?

How fast does a static asset load? What is the typical size of an asset for your webpages? 10KiB? 20?

“Requests”Requests happen at the sub-second level, measured in milliseconds.
Let’s say the average page has 15 resources (images, js, css files, html data, etc) – aka: 15 requests per page

This means if a single user comes to load a page, there’s a good chance his/her browser will make 15 requests total.

Another part of this aspect is to be aware that the browser will perform these requests in a ‘trickle’ fashion, meaning one request to get the HTML, then an instant 6 requests (browser concurrency) but the next 8 will happen one at a time once concurrent connections free up.


Putting aspects together:
We have to draw an understanding of how these aspects all tie together to determine the start-to-finish load of a typical page.

Let’s say a page is 15 requests (14 images/js/css files and HTML) with an average payload of 10KB of data each.150KB of data.

A user/browser makes 6 simultaneous requests (All completing at slightly different times, ideally, at the same time).

Response time is the metric we’re interested in.

The questions we ask of ab are – given the current configuration and environmental conditions:

  • What’s the highest level of concurrency I can support loading a given asset in less than X milliseconds
  • How many requests per second can I support at that given level of concurrency

By attempting to answer these questions, we’ll derive the tipping point of the server – and what it’s bottlenecks are. (This article will not cover determining bottlenecks – just how to get meaning from ab)

Caveats
Naturally these seat of pants numbers are generated on a machine on the same network with plenty of elbow room – making them best case scenarios – in today’s world it’s close enough with how widespread HSI is. It also assumes the network can handle the throughput of the transfers involved with a page. It also assumes all assets are hosted on the same host, nothing cross domain, e.g.: Google Analytics, Web Fonts, etc.


How to test 
First, we need to classify the load. As mentioned a few times, there’s two types of site data: static files, and generated files.

Static files have extremely fast response times,  generated files take longer because they’re performing logic, database work and other things.

A browser doesn’t know what to load until it retrieves the HTML file containing the references to other resources – this changes how we look at timings, and most cases, the HTML document is generated.



First
Let’s simulate a single user/browser loading a page from an idle server.
First, we must get the HTML data…

So you can see, this took 29ms, at 299KB/s to generate the HTML from Drupal and send it across the pipe.


Now, let’s simulate a browser at 6 concurrent connections loading 15 assets at 10KiB each.

So here you can see, each request completed at 35ms or less – The whole test took about 560 ms.


This means that under pristine conditions, the user would have the entire webpage loaded in 589ms.

Finding the limits…
So, looking at our numbers – it’s clear that 1 user is a piece of cake. 
It’s also clear that there’s two big considerations into acceptable timings:

  1. Time to get HTML (our 1 generated request)
  2. Time to get references from that HTML (our 15 static requests)

We’re going to multiply our numbers and start pushing the server harder – let’s say by 30 times:

So once again, let’s emulate 30 users making 30 requests to our Drupal page:
 

Every user got the HTML file in ~298ms or less.


Next up, static files – 30 users = 180 concurrent connections, 450 requests.



So, here’s where things start getting interesting – 2% of the requests were slowed down to 600+ms per request. The exact cause of that is out of the scope of this article – could be IO – regardless, the numbers are still good and it’s clear that this is indicating the start of a tipping point for the host.


Take the HTML load time – 298ms
Do some fuzzy math here: 265ms* 2.5 = 662ms (mean total)  
957ms average for entire page to load.

This is ‘fuzzy math’ because I’m not simulating the procedural/trickle loading effect of browsers mentioned above (initial burst of 6 connections, and making them busy as they become available at different times to complete the 15 requests). But instead treating it as 6 requests in sets. Someone more mathematically inspired might calculate better numbers – I use this method because it’s my “buffer factor” that I use for variables (bursts of activity, latency changes, etc).

So from this data, we can say the server can sustain 30 constant users given our assumptions.

Phase two:
So now we have a ballpark figure of where our server will tip over. We’re going to perform two simultaneous ab sessions to put this to the test, this will simulate both worlds of content: generating content and loading the assets. This is the ‘honing’ phase where we dial down/up our configuration by smaller increments. 

  • 5 minutes of load at our proposed 30 users
  • 30 concurrent connections for our generated content page.
  • 180 concurrent connections for static data.

Ding ding ding! We’ve got a tipper!
Ok, so as you can see, for the most part, everything was fairly responsive. If the system could not keep up, most of the percentiles will have the awful numbers like the ones shown in the highest 2%. However, overall these numbers should be improved upon to deal with sporadic spikes above our estimates, and other environmental factors that may have a swaying factor in our throughput. 

How to interpret:
Discard the 100% – longest request; at these volumes it’s irrelevant.
However, the 98-99% matter – your goal should be to make the 98% fall under your target response time for the request. The 99% shows us that we’ve found our happy limit. 


(Remember, at these levels, theres many many variables involved – and the server should never reach this limit, that’s why we’re finding it out!)


Let’s tone our bench down to supplement 25 users and see what happens …



Wrap up
25 simultaneous users may not sound like a lot, but imagine a classroom of 25 students – and they all click to your webpage at the exact same moment – this is the kind of load we’re talking about; to have every one of those machines (under correct conditions) load up the website within 1 second


To turn that into a real requests per second: 375. (25 users @ 15 requests).


The configuration – workload (website code, images) and hardware (2 1Ghz CPU’s…) are capable of performing this at a constant rate – these benchmarks indicate that the hardware (or configuration) should be changed before it gets to this point to supplement growth. These benchmarks indicate that ~430 pageloads out of 21,492 will take longer than a second to load. In reality, the ebbs and flows of request peaks and valleys make these less likely to happen.


As you can see, the static files are served very fast in comparison to the generated content from Drupal.
If this Apache instance was backed by the likes of Varnish, the host would be revitalized to handle quite a bit more (depending on cache retention).


Testing hardware

  • 1x AWS EC2 Large instance on EBS – Apache host
    • 2 virtual cores with 2 EC2 Compute Units each
      • AKA:  (2) 2Ghz Xeon equiv
    • 7GB RAM
  • 1x AWS EC2 4X Large CPU – AB runner
    • 8 virtual cores with 3.25 EC2 Compute Units each 
      • AKA: (8) 3.25Ghz Xeon Equiv
    •  68GB RAM
5 Comments on ab – Apache Bench, understanding and getting tangible results.

Tuning apache directory indexes June 11, 2009

Are you a fan of Options +Indexes like I am?

There’s a few tweaks you can apply to this feature to make it behave more like you want.
Throw a gander at the IndexOptions directive documentation for fine-details.

You can place these in a site configuration, or if allowed, in a .htaccess file.

Notable options:

Make sure you pay attention when using the FancyIndexing option since it resets directives before it. e.g:

No Comments on Tuning apache directory indexes
Categories: apache servers

Running parallel versions of PHP – Part 2 May 14, 2009

The hard part is over, now for the easier stuff – for this part you’ll need:

  • Coffee
  • Patience

Now we compile and install:

Now if all goes well – the prefix you used in your configure (--prefix=/whatever)
will have the following directories in it:

The file we care about the most is located in the bin/ directory – called php-cgi.

Apache
First we need to enable the action module in Apache if it isn’t already:

Now we need to modify either a specific config file in the sites-enabled/ directory or your actual apache2.conf , add this line to where you deem appropriate:

* Make sure you change “/home/rovangju/misc/php53” to where you set your --prefix=/whatever/ value is. Leave the /bin at the end.

Next, simply reload apache: /etc/init.d/apache2 force-reload, if done right you shouldn’t get any errors.

We’re almost there!

Ok, now for the magic part, for demonstrative purposes – make a the following file structure in a browser-visible spot on your server:

Fill the index.php files with phpinfo(); – the idea is to just show the php version.

Now place this into php_test/php53/.htaccess:

This will re-map all .php files under that path to the newly compiled php-cgi file I mentioned earlier.

Now point a browser to both of those index files and hopefully you should see the version differences.

Troubleshooting notes:

  • Depending on where you actually put these files – the .htaccess may not be allowed to override the server defaults.
  • Your Apache error log is your friend !
  • If you get the timezone error like I did for strtotime, you can set it in your php.ini
  • Try restarting Apache once more – sometimes .htaccess files can be fiesty.
No Comments on Running parallel versions of PHP – Part 2
Categories: apache php

Running parallel versions of PHP – Part 1

With RC2 of the php5.3 release coming out, I wanted to run the new version alongside the stable version on our development server at M State. I wanted to be able to control this behavior via .htaccess files to control particular projects.

What you’ll need:

  • Reasonable compiling experience
  • Ability to satisfy dependencies
  • Understanding and access to your httpd.conf and to be able to reload/restart apache.

Do NOT compile now unless you completely understand how to install to a directory other than the default!

Start off by downloading and extracting the source and cd into the extracted directory.

Getting ready to configure:

Modules:
Before you configure, make sure you know which modules you want.
These are enabled by the myriad of --with-gd type switches.
If you don’t know what modules you have, run phpinfo(); and look for the configure string – OR you can try the command line tool php-config (or) php5-config if it’s been installed.

Prefix (*important*)
You’re not going to want php to be installed into the default path, this will rewrite your stable php5. Instead you want it to go into a custom directory – You can do this by setting the --prefix option when you run configure

Example: ./configure --prefix=/home/rovangju/misc/php53

Using current php.ini (Optional, recommended)
Change the paths to wherever your php.ini file is located. If you’re running older php and don’t have the conf.d folder – don’t use that line. This step may save you any nuances setup in your php.ini files.

--with-config-file-path=/etc/php5/apache2 \
--with-config-file-scan-dir=/etc/php5/apache2/conf.d \

Configuration summary
Now we’re ready to run the gauntlet of trying to configure and build. Expect to fulfil dependencies. At a minimum our configure will look like this: ./configure --prefix=/path/you/want

When/if your ./configure throws up that a library isn’t found – this is the least fun part when you have to install the dev libraries for those modules. So if your getting an error about libxml2 – You’ll likely want to install libxml2-dev. If you use apt this can be handy: apt-cache search libxml2 | grep "dev" – once you’ve installed it, try the configure again. It took me about half an hour of configure and installing the dev libs …

You can see my configure string as a comment to this post for example.

Continue to Part 2 when your configure doesn’t throw errors!
1 Comment on Running parallel versions of PHP – Part 1
Categories: apache php