Tales of an IT Nobody

devbox:~$ iptables -A OUTPUT -j DROP

Commando style: triage dashboard May 20, 2015

If you’re working on a foreign system, or one that doesn’t have the bells and whistles that make you feel at home, sometimes you need to improvise tools on the spot by chaining together commands, etc.

This little one-snip serves as a “dashboard” approach for quickly assessing consumption,

 

 

No Comments on Commando style: triage dashboard
Categories: servers tools

How to use PHPUnit installed by composer in PHPStorm April 5, 2015

Ever wonder how to properly use those packages installed from the require-dev section of composer.json?

Ideally you’d integrate them with your IDE, or perhaps set up your system path to access it via vendor\bin\phpunit – If you use PHPUnit, take a quick look at this on how to properly set up PHPUnit in PHPStorm on a per-project basis (because not all projects use the same PHPUnit version).;

No Comments on How to use PHPUnit installed by composer in PHPStorm

Composer and getting to vendor/bin March 22, 2015

Want to stop typing ‘vendor\bin\toolname’ to access tools like PHPUnit, phpcs, etc when installed through composer?

It’s a simple process really – merely add “vendor\bin” into your PATH variable and profit! (as long as you’re running the command from the project root).

No Comments on Composer and getting to vendor/bin
Categories: php programming tools

PHPCS custom standards and PHPStorm integration March 9, 2015

At about 6 minutes long, I threw together this screencast to show a method to involve your custom PHP CodeSniffer standards into your project workflow when using Composer. Essentially it covers the convenience of putting your standards into a Composer package and adding a wrapper to ‘extend’ the PHPCS shell/batch script to automatically detect your custom standards without having to install them system-wide in your development environment. *Best viewed in full screen*

No Comments on PHPCS custom standards and PHPStorm integration
Categories: php programming tools

Using Logstash to log SMTP/email bounces like a boss July 26, 2013

I’ve recently worked on a customized emailing suite for a client that involves bulk email (shutter) and thought I’d do a write up on a few things that I thought were slick.

Originally we decided to use AWS SES but were quickly kicked off of the service because my client doesn’t have clean enough email lists (a story for another day on that).

The requirement from the email suite was pretty much the same as what you’d expect from SendGrid except the solution was a bit more catered toward my client. Dare I say I came up with an AWESOME way of dealing with creating templates? Trade secret.

Anyways – when the dust settled after the initial trials of the system and we were without a way to deliver bulk emails and track the SMTP/email bounces. After scouring for pre-canned solutions there wasn’t a whole lot to pick from. There were some convoluted modifications for postfix and other knick knacks that didn’t lend well to tracking the bounces effectively (or implementable in a sane amount of time).

 

Getting at the bounces…

At this point in the game, knowing what bounced can come from only one place: the maillog from postfix. Postfix is kind enough to record the Delivery Status Notification (‘DSN’) in an easily parsable way.

 

Pairing a bounce with database entries…

The application architecture called for very atomic analytics. So every email that’s sent is recorded in a database with a generated ID. This ID links click events and open reports on a per-email basis (run of the mill email tracking).  To make the log entries sent from the application distinguishable, I changed the message-id header to the generated SHA1 ID – this lets me pick out which message ID’s are from the application, and which ones are from other sources in the log.

There’s one big problem though:

Postfix uses it’s own queue ID’s to track emails – this is that first 10 hex digits of the log entry. So we have to perform a lookup as such:

  1. Deterimine the queue ID from the application generated message-id log entry
  2. Find the queue ID log entry that contains the DSN

 

(Those of you know where this is going – skip this next paragraph).

This is a problem because we can’t do two at the same time. The time when a DSN comes in is variable – this would lead to a LOT of grepping and file processing – one to get the queue ID. Another to find the DSN – if it’s been recorded yet.

 

The plan…

The plan is as follows with the above in mind:

  • Continuously trail the log file
  • Any entries matching message-id=<hash regex> – RECORD: messageId, queueId to a database table(‘holding tank’).
  • Any entries with a DSN – RECORD: queueId, dsn to holding tank table
  • Periodically run a script that will:
    • Process all rows that have a DSN and messageId that is NOT NULL. (This will give us all messages that were sent from our application that have DSN’s recording)
    • Delete all rows that do NOT match that criteria AND are older than 30 days old (they’re a lost cause and hogging up space in the table).

Our database table looks like this:

Making the plan stupid simple, enter: Logstash

We use Logstash where I work for log delivery from our web applications. In my experience with Logstash I learned that it is a tool with so much potential for what I was looking for. Progressive tailing of logs and so many built in inputs, outputs and filters for this kind of work it was a no brainer.

So I set up Logstash and three stupid-simple scripts to implement the plan.

Hopefully this is self explanatory to what the scripts take for input – and where they’re putting that input (see holding tank table above)

Setting up logstash, logical flow:

Logstash has a handy notion of ‘tags’ – where you can have the system’s elements (input, output, filter) enact on data fragments tagged when they match a criteria.

So I created the following configuration logic:

Input:
/path/to/maillog

Tags:

  • bounce – matches log entries containing a status of ‘bounced’ – (not interested in success or defer)
  • send – matches log enries matching the message-id regex for the application generated ID format

Output:

  • EXEC: LogDSN.php – for ‘bounce’ tags only
  • EXEC: LogOutbound.php – for ‘send’ tags only 

 

Full config file (it’s up to you to look at the logstash docs to determine what directives like grep, kv and grok are doing)

 

Then the third component – the result recorder runs on a cron schedule and simply queries records where the message ID and DSN are not null. The mere presence of the message ID indicates the email was sent from the application. the DSN means there’s something to enact on by the result recorder script.

 

Simple overview

* The way I implemented this would change depending on the scale – but for this particular environment, executing scripts instead of putting things into an AMQP channel or the likes was acceptable. 

1 Comment on Using Logstash to log SMTP/email bounces like a boss

My Google exit strategy March 14, 2013

I’ve been saying this for a long time now, Google can’t be trusted. I think it’s becoming commonplace in other blogs to start talking about having an exit strategy. I’ve been planning for a while (starting with the removal of my blog from blogspot).

The Google products I’ve unfortunately come to rely on:

  • Gmail (personal, and business)
  • Calendar
  • Drive
  • Reader
  • Analytics
  • Google Charts API
  • Google Web Fonts API

There’s something to be said about the flighty nature of these ‘free’ services. While Google and Yahoo have very long track records for their services; I don’t care for Yahoo’s UI, and Google is obviously the reason this is all in question anyways.Lucky for me, I have -very- few sites utilizing Google API’s because I’ve seen this coming from a mile away. That said, replacing Google charts will be a bear because graphs are a PITA to code.


Gmail, Google Calendar, Reader:

To replace Gmail, Google Calendar, Google Reader, I will be using Microsoft Hosted Exchange . At $4/mo – I can get push email to my phone from multiple domains and aliases (Which is what my email structure is anyways). Piece of mind that Microsoft isn’t using my email to target advertisements at me and Outlook (Admit it, Outlook is superior at what it does.) Outlook supports RSS feeds; and I don’t need to go into detail on what Exchange offers for sharing, etc.

Drive:

This one is easy, because I naturally do this anyways: Amazon S3Dirt cheap for small file management; I don’t need the web UI to edit files; and IF I needed that, I would upgrade to the $6/mo plan for Live 365 for online office. There’s a myriad of programs that make it easy to make Amazon S3 a part of your workflow: Firefox’s “S3Fox” plugin and the Cyberduck app make it painless.

Analytics:

I’m not worried because there’s a dime a dozen of free analytic apps that all do just a good a job, don’t believe me? Look!

Google charts API:

Time to man up and buy a highcharts license. If you do web design you can build that price in per-client, or bite the bullet and get a developer version. Remember: How much is it worth NOT having to go crawling to your client telling them their graphs won’t work anymore because YOU used Google API that may be retired god-knows-when?

Google web fonts API:

To be honest, the only reason this is in use is pure laziness. In that I’m too lazy to DOWNLOAD the stupid .woff files and host them myself. No real issue here, just some busy footwork to remove the dependency.

No Comments on My Google exit strategy
Categories: google tools

MySQL command line – zebra stripe admin tool February 7, 2013

I came up with a cool usage for the zebra stripe admin tool.  In MySQL you can set a custom pager for your MySQL CLI output; so one can simply set it to the zebra stripe tool and get the benefit of alternated rows for better visual clarity.

Something like ‘PAGER /path/to/zebra’ should yield you results like the image below.

Zebra stripe tool used as a MySQL pager

Zebra stripe tool used as a MySQL pager

 

You can always adjust the script to skip more lines before highlighting; you can also modify it if you’re savvy to the color codes to just set the font color instead of the entire background (which may be preferable but not a ‘global’ solution so the script doesn’t do it).

1 Comment on MySQL command line – zebra stripe admin tool
Categories: linux mysql servers tools

Atlassian Fisheye starter license and 10 commiter limit November 28, 2012

The problem with Atlassian Fisheye starter license:

I love using Atlassian Fisheye at work. It’s a very nice frill to have for a small team especially since it saves us time and adds a very easy, fast way to document the reviews and be open about feedback.

I have one gripe however; the 10 commiter limit (5 repos is bad enough). Our team has 4 developers – so we’re _technically_ 4 committers.

When we first started to use source control (Mercurial), our system setups would have inconsistencies in usernames: “Justin Rovang”, “RovangJu”, “rovangju” are all treated as unique usernames. Add to the fact that after we converted from HG to Git, all of the email addresses associated with those turned into <devnull@localhost> from the conversion script.

Git is sensitive to username AND email address for unique users. So our new set ups would be ‘Justin Rovang <justin.rovang@domain.com>’; but the history that was converted would have ‘Justin Rovang <devnull@localhost>’. So it’s easy to see how quickly even a small team could exceed that 10-commiter limit very fast in that circumstance.

Enter the .mailmap file:

So here’s the rundown, first you need to know what to map to what – so take an inventory of all of the incorrect/out-dated usernames that should point to a more modern/recent one; to do that I used this one-liner:

That provides an output like so:

I want to map those according to the page linked in the subtitle above; so here’s an example .mailmap entry:

You can verify the results by running the command above again (git log –format … etc); and you’ll see that the list has changed. This applies to -ALL- git log output, and therefore fixes the ’10 committers’ issue I was having with Atlassian Fisheye and Crucible.

No Comments on Atlassian Fisheye starter license and 10 commiter limit
Categories: git rant tools

Zend Studio + PHPUnit upgrade: A faster method May 11, 2012

As a counterpart and refinement of my previous how-to, this update shows how to update the PHPUnit library faster than using the symlink method:

View in HD!

No Comments on Zend Studio + PHPUnit upgrade: A faster method

Mercurial (hg) checkstyle hook, at last! May 7, 2012

As far as I can tell, there’s not much in the lane of check style hooks for Mercurial.

There’s a lot of hits for git and SVN, but not much for Mercurial.

Check it out in my ‘hg-checkstyle-hook‘ bitbucket repo.

I thought I’d share my (imperfect) rendition of a Mercurial checkstyle hook. It’s meant to be setup for a pretxnchangegroup event.

Basically it does this:

  • Find what files have changed from the beginning of the changegroup to the tip
  • Copy those files to a staging directory in /tmp
  • Run PHPCS ( PHP_CodeSniffer, a PHP checkstyle command) on those files specifically
  • Provide a report on any violations resulting in a non 0 exit code.
  • The script should be configurable for any checkstyle command, as long as it takes a space delimited list of files at the end of it’s arguments.
No Comments on Mercurial (hg) checkstyle hook, at last!

Why I won’t (can’t) adopt Google Chrome yet… January 10, 2012

Privacy aside, simply put: in my role, I do my fair share of design work, AJAX debugging, CSS, you name it –  I need tools at my fingertips to quickly do more than just rip apart the DOM of a page, these are my deal breaker extensions/capabilities that aren’t in chrome:

Dealbreakers:

1. Web Developer Toolbar – Session toggle, disable/enable cache
Chrome has no way to turn on/off cache at the click of a button. The closest thing I have found is a to create an icon that has a switch in the launch parameters. Another biggie for me is to clear a specific set of session cookies for a domain instead of all of them. The chrome version of Web Developer Toolbar completely lacks these options.

 

 

 

 

 

 2. Selenium IDE
Only firefox has the Selenium IDE plugin; for those of us who perform automation or frequent checking on forms for SQL injection or other; there’s a few alternatives out there for chrome, but none as extensive as Selenium (you can also reuse the IDE tests for Selenium RC)

 

 

 

3. S3Fox (or equiv.)

4. View image info, without a darn plugin… (’nuff said; even IE has it!)

Google Chrome ISthe browser of the future;  it’s still not quite there yet for me…

No Comments on Why I won’t (can’t) adopt Google Chrome yet…
Categories: google purdy tools

ab – Apache Bench, understanding and getting tangible results. December 3, 2011

Apache Bench (AB) is a very powerful tool when used right. I use it as a guideline for how to set up my apache2/httpd.conf files.
All too often I see people boasting that they can get an outrageous number of RPS in AB (the Apace Bench tool).

“OMG, I totally get 3,000 rps in an AWS micro instance!!!” (I’ve seen this on the likes of Serverfault)

Debunking misunderstandings:
Concurrency != users on site at same time
Requests per second != users on site at same time

Apache Bench is meant to give a ‘feel of pants’ diagnostic for the page/s you point it to. 

Every page on a website is different; and may require a different number of requests to load (resources: css, js, images, html, etc).  

Aspects of effective Apache benchmarking:

  1. Concurrent users
  2. Response time
  3. Requests

“Concurrent users” – Have you ever stopped to ask yourself: What the hell is a user? (in the Apache sense) We don’t stop to think about them individually, we just think about them as a ‘request’ or the ‘concurrent’ part of benchmarking.

When a user loads a page, their browser may start X connections at the same time to your server to load resources. This is a complex dynamic, because browsers have a built in limit of how many concurrent requests to make to a host at a given time (a hackable aspect). 

So at any given time, let’s say a user = 6 concurrent connections/requests.

“Response time” – What is an acceptable response time? What is a response? In the context of this article, it’s the round trip process involved with the transfer of a resource. This summarizes the intricacies of database queries and programming logic, etc into a measurable aspect for consideration in your benchmarking.

Is 300ms to load the HTML output of a Drupal website acceptable? 400? 500? 600? 700?

How fast does a static asset load? What is the typical size of an asset for your webpages? 10KiB? 20?

“Requests”Requests happen at the sub-second level, measured in milliseconds.
Let’s say the average page has 15 resources (images, js, css files, html data, etc) – aka: 15 requests per page

This means if a single user comes to load a page, there’s a good chance his/her browser will make 15 requests total.

Another part of this aspect is to be aware that the browser will perform these requests in a ‘trickle’ fashion, meaning one request to get the HTML, then an instant 6 requests (browser concurrency) but the next 8 will happen one at a time once concurrent connections free up.


Putting aspects together:
We have to draw an understanding of how these aspects all tie together to determine the start-to-finish load of a typical page.

Let’s say a page is 15 requests (14 images/js/css files and HTML) with an average payload of 10KB of data each.150KB of data.

A user/browser makes 6 simultaneous requests (All completing at slightly different times, ideally, at the same time).

Response time is the metric we’re interested in.

The questions we ask of ab are – given the current configuration and environmental conditions:

  • What’s the highest level of concurrency I can support loading a given asset in less than X milliseconds
  • How many requests per second can I support at that given level of concurrency

By attempting to answer these questions, we’ll derive the tipping point of the server – and what it’s bottlenecks are. (This article will not cover determining bottlenecks – just how to get meaning from ab)

Caveats
Naturally these seat of pants numbers are generated on a machine on the same network with plenty of elbow room – making them best case scenarios – in today’s world it’s close enough with how widespread HSI is. It also assumes the network can handle the throughput of the transfers involved with a page. It also assumes all assets are hosted on the same host, nothing cross domain, e.g.: Google Analytics, Web Fonts, etc.


How to test 
First, we need to classify the load. As mentioned a few times, there’s two types of site data: static files, and generated files.

Static files have extremely fast response times,  generated files take longer because they’re performing logic, database work and other things.

A browser doesn’t know what to load until it retrieves the HTML file containing the references to other resources – this changes how we look at timings, and most cases, the HTML document is generated.



First
Let’s simulate a single user/browser loading a page from an idle server.
First, we must get the HTML data…

So you can see, this took 29ms, at 299KB/s to generate the HTML from Drupal and send it across the pipe.


Now, let’s simulate a browser at 6 concurrent connections loading 15 assets at 10KiB each.

So here you can see, each request completed at 35ms or less – The whole test took about 560 ms.


This means that under pristine conditions, the user would have the entire webpage loaded in 589ms.

Finding the limits…
So, looking at our numbers – it’s clear that 1 user is a piece of cake. 
It’s also clear that there’s two big considerations into acceptable timings:

  1. Time to get HTML (our 1 generated request)
  2. Time to get references from that HTML (our 15 static requests)

We’re going to multiply our numbers and start pushing the server harder – let’s say by 30 times:

So once again, let’s emulate 30 users making 30 requests to our Drupal page:
 

Every user got the HTML file in ~298ms or less.


Next up, static files – 30 users = 180 concurrent connections, 450 requests.



So, here’s where things start getting interesting – 2% of the requests were slowed down to 600+ms per request. The exact cause of that is out of the scope of this article – could be IO – regardless, the numbers are still good and it’s clear that this is indicating the start of a tipping point for the host.


Take the HTML load time – 298ms
Do some fuzzy math here: 265ms* 2.5 = 662ms (mean total)  
957ms average for entire page to load.

This is ‘fuzzy math’ because I’m not simulating the procedural/trickle loading effect of browsers mentioned above (initial burst of 6 connections, and making them busy as they become available at different times to complete the 15 requests). But instead treating it as 6 requests in sets. Someone more mathematically inspired might calculate better numbers – I use this method because it’s my “buffer factor” that I use for variables (bursts of activity, latency changes, etc).

So from this data, we can say the server can sustain 30 constant users given our assumptions.

Phase two:
So now we have a ballpark figure of where our server will tip over. We’re going to perform two simultaneous ab sessions to put this to the test, this will simulate both worlds of content: generating content and loading the assets. This is the ‘honing’ phase where we dial down/up our configuration by smaller increments. 

  • 5 minutes of load at our proposed 30 users
  • 30 concurrent connections for our generated content page.
  • 180 concurrent connections for static data.

Ding ding ding! We’ve got a tipper!
Ok, so as you can see, for the most part, everything was fairly responsive. If the system could not keep up, most of the percentiles will have the awful numbers like the ones shown in the highest 2%. However, overall these numbers should be improved upon to deal with sporadic spikes above our estimates, and other environmental factors that may have a swaying factor in our throughput. 

How to interpret:
Discard the 100% – longest request; at these volumes it’s irrelevant.
However, the 98-99% matter – your goal should be to make the 98% fall under your target response time for the request. The 99% shows us that we’ve found our happy limit. 


(Remember, at these levels, theres many many variables involved – and the server should never reach this limit, that’s why we’re finding it out!)


Let’s tone our bench down to supplement 25 users and see what happens …



Wrap up
25 simultaneous users may not sound like a lot, but imagine a classroom of 25 students – and they all click to your webpage at the exact same moment – this is the kind of load we’re talking about; to have every one of those machines (under correct conditions) load up the website within 1 second


To turn that into a real requests per second: 375. (25 users @ 15 requests).


The configuration – workload (website code, images) and hardware (2 1Ghz CPU’s…) are capable of performing this at a constant rate – these benchmarks indicate that the hardware (or configuration) should be changed before it gets to this point to supplement growth. These benchmarks indicate that ~430 pageloads out of 21,492 will take longer than a second to load. In reality, the ebbs and flows of request peaks and valleys make these less likely to happen.


As you can see, the static files are served very fast in comparison to the generated content from Drupal.
If this Apache instance was backed by the likes of Varnish, the host would be revitalized to handle quite a bit more (depending on cache retention).


Testing hardware

  • 1x AWS EC2 Large instance on EBS – Apache host
    • 2 virtual cores with 2 EC2 Compute Units each
      • AKA:  (2) 2Ghz Xeon equiv
    • 7GB RAM
  • 1x AWS EC2 4X Large CPU – AB runner
    • 8 virtual cores with 3.25 EC2 Compute Units each 
      • AKA: (8) 3.25Ghz Xeon Equiv
    •  68GB RAM
5 Comments on ab – Apache Bench, understanding and getting tangible results.

Staying on top of things February 16, 2011

One of the things that’s crucial to my workflow where I work; is to know as soon as possible when changes are made to our mainline repositories. Same with tickets in our Trac ticket system, when it hits the pipes – I need to know.

I could setup a fancy hook for Mercurial to email, our source control system we use. But that doesn’t handle Trac and maybe the other oddball items I want to watch.

Having notifications emailed is nice, but I often find the chatter from unrelated emails annoying. I usually have my email off and check every hourish. This helps me focus.

What I’ve found is this: Trac, Mercurial (via hgweb), etc all have a form of RSS/Atom feed available. These feeds from our tools I have a hunch we often overlook.

Bring in feed notifier:

Pretty configurable – supports https:// with login – and the popup notifications are exactly what I need. Noticeable enough but not annoying like having to look to an email screen. (Sad thing is, I have 3 gorgeous 24″ dell monitors – all too busy to have a silly email client up.)

I think when I get some me time, I may put something together to watch these feeds – and use XMPP to notify a configurable list of users. That’d be a fun project with very cool results. But for now – this developer is happy with his RSS addon.

No Comments on Staying on top of things

Hashing out those regex April 30, 2010

Priceless utility I found not to long ago and thought I’d drop a note about it.

There’s an online version – as well as an Adobe AIR app that is simple and lightweight to install.

Check it out: http://gskinner.com/RegExr/

A must-have for people who have to hammer out some levels of regular expressions

No Comments on Hashing out those regex
Categories: programming tools

Minimize -anything- to systray with ‘trayconizer’ July 20, 2009

A handy little utility I’ve been using for years, often I have the need to have programs running all the time while I work and I don’t want them eating up taskbar realestate, for programs that don’t natively support it look no more!

TRAYCONIZER
– This handy tool is easy to use and works like a charm.

No Comments on Minimize -anything- to systray with ‘trayconizer’
Categories: tools windows

VirtualBox Rocks! June 9, 2009

If you’re looking for a lightweight virtualization solution – for something simple like running Ubuntu from within Windows look no further.

VirtualBox has everything a growing boy/gal needs. I asked a co-worker if he knew of anything better than MS Virtual PC, but not as heavy as VMWare and he mentioned VirtualBox to me.

I quickly downloaded it and was suprised as to how lightweight and perfect of a solution it was.

Quick notes of awesome-ness (At least through my eyes from Vista host to my Ubuntu guest OS)

Mouse handling:
Once you install the virtualization drivers – you can seamlessly move your mouse from guest to host OS without having to hit any hotkeys or anything.

Fullscreen mode:
I’m currently typing from Firefox within Vista, and on my right screen I have Ubuntu fired up in fullscreen – we’re not talking crappy emulation either – i’ve got effects jammin and it’s just like working within full boot-mode.

Networking:
Networking threw me for a quick loop through my own stupidity. You can easily configure multiple adapters of many types – NAT, Bridged, you name it.

Graphics support:
Still a little lacking but for non-gaming desktop purposes the power is there for sure. I only say this because I’ve had a little difficulty with fullscreen OpenGL -style screensavers – and that might be due to my lack of installed drivers. We’ll see. But at any rate, I’m running window effects on a 1920×1200 resolution screen without any lag or issues.

Screenshot: Simple print-screen grabs both- note fullscreen ubuntu =]

No Comments on VirtualBox Rocks!
Categories: linux tools