Tales of an IT Nobody

devbox:~$ iptables -A OUTPUT -j DROP

Skeletons: Older versions of ntp and not using DNS March 17, 2014

A while ago (years), I reluctantly set up ntp on some servers using an IP address for the source server; at the time, using a DNS name in ntp.conf was incompatible with the ntp/ntpd version and I didn’t want to go out of my way to compile it from scratch.

Today, I realized that I’ve been slowly getting bit in the butt, several years later.
Back then, the IP addresses were supposed to be rock solid ntp references. But over the years, and finally about a month ago, they all came offline.

I would not have caught the drift if it wasn’t for my use of pt-heartbeat (mk-heartbeat) and doing a review of our cacti graphs.

NTP not synchronized

This is what pt-heartbeat will show when your NTP service isn’t working

 

Usually I check them every monday as a routine, but I’ve been so busy for the last several months I haven’t had time to do that. I figured if it hits the fan, our alerts/thresholds will let me know. Which on a few occasions worked as needed for an Apache server.

pt-heartbeat, a tool of the Percona Toolkit has a common table across replicated servers that each one updates a record in the table with a datetime value.

The time difference between the value for server A, replicated to server B is the ‘lag’. The lag can/should be due to temporary spikes in traffic (or intentional delaying). Needless to say my gut sank when I saw that something weird as going on that was causing a small, yet unrecoverable, and linearly increasing lag time. After quickly confirming that SHOW SLAVE STATUS confirmed that everything’s up to date, it was quickly apparent that the mechanisms involved with the graph generation were at fault. Upon a quick side-by-side examination of server A and server B’s ‘heartbeat’ table, it stood out that the times were off by a few seconds from each other.

I restarted the pt-heartbeat daemons and the issue persisted – the next culprit was simple to identify: ntp.

Output of ntpq -p quickly showed that the ntp server hadn’t synchronized for over a month.

Over the years through periodic apt upgrades; the newer version of ntp came through the pipeline and all that was needed was an update of ntp.conf to use a new host (I opted for stratum 1 ‘time.nist.gov’)

 

 

 

No Comments on Skeletons: Older versions of ntp and not using DNS
Categories: linux servers

MySQL’s max_connect_errors … 6 years later… August 2, 2013

Having recently been bitten by the awful default value (10) for max_connect_errors on a production server – I’m having a very hard time coming to terms with who the heck thought this would be a good way to do it.

This type of “feature” allows you to effecitvely DOS yourself quickly with just one misconfigured script – or Debian’s stupid debian-sys-maint account not replicating.

I’ve been thinking about how I could avoid this scenario in the future – upping the limit was a no brainer. But another item of curiosity: How do I know what hosts are on the list?

 

Have a look at this, 6 years later: http://bugs.mysql.com/bug.php?id=24906

 

So up until 5.6 (still brand new in the DBMS world) – there was no practical way to find out who was on this list.

The default needs to be changed, and this feature should be able to be turned off…

 

No Comments on MySQL’s max_connect_errors … 6 years later…
Categories: linux mysql rant servers

Using Logstash to log SMTP/email bounces like a boss July 26, 2013

I’ve recently worked on a customized emailing suite for a client that involves bulk email (shutter) and thought I’d do a write up on a few things that I thought were slick.

Originally we decided to use AWS SES but were quickly kicked off of the service because my client doesn’t have clean enough email lists (a story for another day on that).

The requirement from the email suite was pretty much the same as what you’d expect from SendGrid except the solution was a bit more catered toward my client. Dare I say I came up with an AWESOME way of dealing with creating templates? Trade secret.

Anyways – when the dust settled after the initial trials of the system and we were without a way to deliver bulk emails and track the SMTP/email bounces. After scouring for pre-canned solutions there wasn’t a whole lot to pick from. There were some convoluted modifications for postfix and other knick knacks that didn’t lend well to tracking the bounces effectively (or implementable in a sane amount of time).

 

Getting at the bounces…

At this point in the game, knowing what bounced can come from only one place: the maillog from postfix. Postfix is kind enough to record the Delivery Status Notification (‘DSN’) in an easily parsable way.

 

Pairing a bounce with database entries…

The application architecture called for very atomic analytics. So every email that’s sent is recorded in a database with a generated ID. This ID links click events and open reports on a per-email basis (run of the mill email tracking).  To make the log entries sent from the application distinguishable, I changed the message-id header to the generated SHA1 ID – this lets me pick out which message ID’s are from the application, and which ones are from other sources in the log.

There’s one big problem though:

Postfix uses it’s own queue ID’s to track emails – this is that first 10 hex digits of the log entry. So we have to perform a lookup as such:

  1. Deterimine the queue ID from the application generated message-id log entry
  2. Find the queue ID log entry that contains the DSN

 

(Those of you know where this is going – skip this next paragraph).

This is a problem because we can’t do two at the same time. The time when a DSN comes in is variable – this would lead to a LOT of grepping and file processing – one to get the queue ID. Another to find the DSN – if it’s been recorded yet.

 

The plan…

The plan is as follows with the above in mind:

  • Continuously trail the log file
  • Any entries matching message-id=<hash regex> – RECORD: messageId, queueId to a database table(‘holding tank’).
  • Any entries with a DSN – RECORD: queueId, dsn to holding tank table
  • Periodically run a script that will:
    • Process all rows that have a DSN and messageId that is NOT NULL. (This will give us all messages that were sent from our application that have DSN’s recording)
    • Delete all rows that do NOT match that criteria AND are older than 30 days old (they’re a lost cause and hogging up space in the table).

Our database table looks like this:

Making the plan stupid simple, enter: Logstash

We use Logstash where I work for log delivery from our web applications. In my experience with Logstash I learned that it is a tool with so much potential for what I was looking for. Progressive tailing of logs and so many built in inputs, outputs and filters for this kind of work it was a no brainer.

So I set up Logstash and three stupid-simple scripts to implement the plan.

Hopefully this is self explanatory to what the scripts take for input – and where they’re putting that input (see holding tank table above)

Setting up logstash, logical flow:

Logstash has a handy notion of ‘tags’ – where you can have the system’s elements (input, output, filter) enact on data fragments tagged when they match a criteria.

So I created the following configuration logic:

Input:
/path/to/maillog

Tags:

  • bounce – matches log entries containing a status of ‘bounced’ – (not interested in success or defer)
  • send – matches log enries matching the message-id regex for the application generated ID format

Output:

  • EXEC: LogDSN.php – for ‘bounce’ tags only
  • EXEC: LogOutbound.php – for ‘send’ tags only 

 

Full config file (it’s up to you to look at the logstash docs to determine what directives like grep, kv and grok are doing)

 

Then the third component – the result recorder runs on a cron schedule and simply queries records where the message ID and DSN are not null. The mere presence of the message ID indicates the email was sent from the application. the DSN means there’s something to enact on by the result recorder script.

 

Simple overview

* The way I implemented this would change depending on the scale – but for this particular environment, executing scripts instead of putting things into an AMQP channel or the likes was acceptable. 

1 Comment on Using Logstash to log SMTP/email bounces like a boss

MySQL command line – zebra stripe admin tool February 7, 2013

I came up with a cool usage for the zebra stripe admin tool.  In MySQL you can set a custom pager for your MySQL CLI output; so one can simply set it to the zebra stripe tool and get the benefit of alternated rows for better visual clarity.

Something like ‘PAGER /path/to/zebra’ should yield you results like the image below.

Zebra stripe tool used as a MySQL pager

Zebra stripe tool used as a MySQL pager

 

You can always adjust the script to skip more lines before highlighting; you can also modify it if you’re savvy to the color codes to just set the font color instead of the entire background (which may be preferable but not a ‘global’ solution so the script doesn’t do it).

1 Comment on MySQL command line – zebra stripe admin tool
Categories: linux mysql servers tools

Re: Linux and hardware support – specifically Graphics and lack thereof August 24, 2012

The passion here says enough: (Linus gives NVidia a what for)

Edit: Apparently embedded view won’t handle the timestamp I used – skip to 49m 10s

No Comments on Re: Linux and hardware support – specifically Graphics and lack thereof
Categories: linux

Disable PHP 5.4’s built-in web server, while keeping CLI … February 6, 2012

Administrators: Don’t get blind-sided by PHP 5.4’s CLI web server!

I’ve gone over a similar issue like this before regarding the likes of git/hg. While those are developer tools and are less likely to be present on a production machine.

PHP 5.4 is jumping on the bandwagon to include a ‘cute’ little internal server – which is enabled by default.
The ‘everything needs a standalone server’ thing is starting to get on my security nerves feel silly.

It has limited use, and most developers will have limited use for it due to it’s lack of mod_rewrite (and equiv.) behavior … The worse part is: You can’t disable it if you want to keep cli (e.g.: no pear!)
 
Wish I spoke up on the list!

Anywho, here’s a hob-knobbed patch (for PHP 5.4.0RC6) that will change that for you.
(GNU/*nix only!) The patch adds a new configure option ‘–disable-cli-server’.

Download the patch here: patch-php5.4.0RC6-no-cli-server.diff
Place it in the PHP source base directory.

In the future I’ll plan on formalizing this patch and propose it in php.internals when I get a chance to make the windows part of the patch.

References:
https://wiki.php.net/rfc/builtinwebserver
http://svn.php.net/viewvc/php/php-src/branches/PHP_5_4/sapi/cli/
https://gist.github.com/835698

2 Comments on Disable PHP 5.4’s built-in web server, while keeping CLI …
Categories: linux php security servers

/bin/false – Report false bugs to … December 20, 2011

While sifting through the manpages for /bin/false (looking for crafty uses for this oddball command) – I just had to share a funny line from within the manpage:

Maybe it’s been a long day =|

No Comments on /bin/false – Report false bugs to …
Categories: linux

ab – Apache Bench, understanding and getting tangible results. December 3, 2011

Apache Bench (AB) is a very powerful tool when used right. I use it as a guideline for how to set up my apache2/httpd.conf files.
All too often I see people boasting that they can get an outrageous number of RPS in AB (the Apace Bench tool).

“OMG, I totally get 3,000 rps in an AWS micro instance!!!” (I’ve seen this on the likes of Serverfault)

Debunking misunderstandings:
Concurrency != users on site at same time
Requests per second != users on site at same time

Apache Bench is meant to give a ‘feel of pants’ diagnostic for the page/s you point it to. 

Every page on a website is different; and may require a different number of requests to load (resources: css, js, images, html, etc).  

Aspects of effective Apache benchmarking:

  1. Concurrent users
  2. Response time
  3. Requests

“Concurrent users” – Have you ever stopped to ask yourself: What the hell is a user? (in the Apache sense) We don’t stop to think about them individually, we just think about them as a ‘request’ or the ‘concurrent’ part of benchmarking.

When a user loads a page, their browser may start X connections at the same time to your server to load resources. This is a complex dynamic, because browsers have a built in limit of how many concurrent requests to make to a host at a given time (a hackable aspect). 

So at any given time, let’s say a user = 6 concurrent connections/requests.

“Response time” – What is an acceptable response time? What is a response? In the context of this article, it’s the round trip process involved with the transfer of a resource. This summarizes the intricacies of database queries and programming logic, etc into a measurable aspect for consideration in your benchmarking.

Is 300ms to load the HTML output of a Drupal website acceptable? 400? 500? 600? 700?

How fast does a static asset load? What is the typical size of an asset for your webpages? 10KiB? 20?

“Requests”Requests happen at the sub-second level, measured in milliseconds.
Let’s say the average page has 15 resources (images, js, css files, html data, etc) – aka: 15 requests per page

This means if a single user comes to load a page, there’s a good chance his/her browser will make 15 requests total.

Another part of this aspect is to be aware that the browser will perform these requests in a ‘trickle’ fashion, meaning one request to get the HTML, then an instant 6 requests (browser concurrency) but the next 8 will happen one at a time once concurrent connections free up.


Putting aspects together:
We have to draw an understanding of how these aspects all tie together to determine the start-to-finish load of a typical page.

Let’s say a page is 15 requests (14 images/js/css files and HTML) with an average payload of 10KB of data each.150KB of data.

A user/browser makes 6 simultaneous requests (All completing at slightly different times, ideally, at the same time).

Response time is the metric we’re interested in.

The questions we ask of ab are – given the current configuration and environmental conditions:

  • What’s the highest level of concurrency I can support loading a given asset in less than X milliseconds
  • How many requests per second can I support at that given level of concurrency

By attempting to answer these questions, we’ll derive the tipping point of the server – and what it’s bottlenecks are. (This article will not cover determining bottlenecks – just how to get meaning from ab)

Caveats
Naturally these seat of pants numbers are generated on a machine on the same network with plenty of elbow room – making them best case scenarios – in today’s world it’s close enough with how widespread HSI is. It also assumes the network can handle the throughput of the transfers involved with a page. It also assumes all assets are hosted on the same host, nothing cross domain, e.g.: Google Analytics, Web Fonts, etc.


How to test 
First, we need to classify the load. As mentioned a few times, there’s two types of site data: static files, and generated files.

Static files have extremely fast response times,  generated files take longer because they’re performing logic, database work and other things.

A browser doesn’t know what to load until it retrieves the HTML file containing the references to other resources – this changes how we look at timings, and most cases, the HTML document is generated.



First
Let’s simulate a single user/browser loading a page from an idle server.
First, we must get the HTML data…

So you can see, this took 29ms, at 299KB/s to generate the HTML from Drupal and send it across the pipe.


Now, let’s simulate a browser at 6 concurrent connections loading 15 assets at 10KiB each.

So here you can see, each request completed at 35ms or less – The whole test took about 560 ms.


This means that under pristine conditions, the user would have the entire webpage loaded in 589ms.

Finding the limits…
So, looking at our numbers – it’s clear that 1 user is a piece of cake. 
It’s also clear that there’s two big considerations into acceptable timings:

  1. Time to get HTML (our 1 generated request)
  2. Time to get references from that HTML (our 15 static requests)

We’re going to multiply our numbers and start pushing the server harder – let’s say by 30 times:

So once again, let’s emulate 30 users making 30 requests to our Drupal page:
 

Every user got the HTML file in ~298ms or less.


Next up, static files – 30 users = 180 concurrent connections, 450 requests.



So, here’s where things start getting interesting – 2% of the requests were slowed down to 600+ms per request. The exact cause of that is out of the scope of this article – could be IO – regardless, the numbers are still good and it’s clear that this is indicating the start of a tipping point for the host.


Take the HTML load time – 298ms
Do some fuzzy math here: 265ms* 2.5 = 662ms (mean total)  
957ms average for entire page to load.

This is ‘fuzzy math’ because I’m not simulating the procedural/trickle loading effect of browsers mentioned above (initial burst of 6 connections, and making them busy as they become available at different times to complete the 15 requests). But instead treating it as 6 requests in sets. Someone more mathematically inspired might calculate better numbers – I use this method because it’s my “buffer factor” that I use for variables (bursts of activity, latency changes, etc).

So from this data, we can say the server can sustain 30 constant users given our assumptions.

Phase two:
So now we have a ballpark figure of where our server will tip over. We’re going to perform two simultaneous ab sessions to put this to the test, this will simulate both worlds of content: generating content and loading the assets. This is the ‘honing’ phase where we dial down/up our configuration by smaller increments. 

  • 5 minutes of load at our proposed 30 users
  • 30 concurrent connections for our generated content page.
  • 180 concurrent connections for static data.

Ding ding ding! We’ve got a tipper!
Ok, so as you can see, for the most part, everything was fairly responsive. If the system could not keep up, most of the percentiles will have the awful numbers like the ones shown in the highest 2%. However, overall these numbers should be improved upon to deal with sporadic spikes above our estimates, and other environmental factors that may have a swaying factor in our throughput. 

How to interpret:
Discard the 100% – longest request; at these volumes it’s irrelevant.
However, the 98-99% matter – your goal should be to make the 98% fall under your target response time for the request. The 99% shows us that we’ve found our happy limit. 


(Remember, at these levels, theres many many variables involved – and the server should never reach this limit, that’s why we’re finding it out!)


Let’s tone our bench down to supplement 25 users and see what happens …



Wrap up
25 simultaneous users may not sound like a lot, but imagine a classroom of 25 students – and they all click to your webpage at the exact same moment – this is the kind of load we’re talking about; to have every one of those machines (under correct conditions) load up the website within 1 second


To turn that into a real requests per second: 375. (25 users @ 15 requests).


The configuration – workload (website code, images) and hardware (2 1Ghz CPU’s…) are capable of performing this at a constant rate – these benchmarks indicate that the hardware (or configuration) should be changed before it gets to this point to supplement growth. These benchmarks indicate that ~430 pageloads out of 21,492 will take longer than a second to load. In reality, the ebbs and flows of request peaks and valleys make these less likely to happen.


As you can see, the static files are served very fast in comparison to the generated content from Drupal.
If this Apache instance was backed by the likes of Varnish, the host would be revitalized to handle quite a bit more (depending on cache retention).


Testing hardware

  • 1x AWS EC2 Large instance on EBS – Apache host
    • 2 virtual cores with 2 EC2 Compute Units each
      • AKA:  (2) 2Ghz Xeon equiv
    • 7GB RAM
  • 1x AWS EC2 4X Large CPU – AB runner
    • 8 virtual cores with 3.25 EC2 Compute Units each 
      • AKA: (8) 3.25Ghz Xeon Equiv
    •  68GB RAM
5 Comments on ab – Apache Bench, understanding and getting tangible results.

Grepping extremely large files November 28, 2011

So you forgot to set up logrotate on an active log eh? You’ve got a many gigabyte file to weed through and you need to extract a chunk of time from it?

Here’s a quick cheat sheet to help you get by, quickly and sanely.

It’s about byte offsets!

  • Get the byte offset in the file where your time range starts
  • Get the byte offset in the file where your time range ends
  • dd the data out!

Caveats

  • You should tack on extra bytes to the byte length, because the offset_end number is actually the beginning byte of your boundary log entry
  • Figuring out the boundary is a bit tricky because a log entry -has- to be present in order to match, so if you’re looking for what happened at 20:00 hours on X date, you may have to round up to the date level depending on how busy your log is
  • This is just a trick to extract a chunk of entries to speed up further filtering.

Full example

No Comments on Grepping extremely large files
Categories: linux servers

The inherent risks of ‘daemonize’ features in developer tools – Git, Mercurial (hg) September 24, 2011

A handful of tools such as mercurial, git, (soon PHP – which chances are will be it’s own binary) have their own ‘daemonize’ functionality.

Whatever your reasons – if you want to disable these; there’s little to no help in figuring out how… til now…

If you want to disable Mercurial’s hg serve:
Open the file (Your python install path may differ, but this should give you an idea of what to search for)

/usr/local/lib/python2.x/dist-packages/mercurial/hgweb/server.py: 

Find the function ‘create-server’ and add ‘sys.exit()’ in the first line:

How to verify this works:

1. Before patching – run ‘hg serve’ from a mercurial repository. It will report the port number and remain active in console.
2. After patching – ‘hg serve’ from a mercurial repository will simply exit and say nothing.
3. netstat, ps -A ux |grep ‘hg serve’

If you want to disable git’s git daemon:
This one is probably the easiest of the two: find and ‘chmod a-x’ (remove execute permissions) from the ‘git-daemon’ binary on your system – mine is in /usr/libexec/git-core. You can also relocate it somewhere in-accessible.

How to verify this works:

1. Before relocating/removing/chmodding – run ‘git daemon’ – your console will remain active as if it’s listening. (You can try a base dir for a proper daemon setup if you want …)
2. After relocating/removing – run ‘git daemon’, you’ll get an error saying there are insufficient privileges, or in the case of relocating/removing you’ll see “not a git command”.
3. netstat, ps -A ux |grep ‘git daemon’

No Comments on The inherent risks of ‘daemonize’ features in developer tools – Git, Mercurial (hg)
Categories: git hg linux security servers

Kernel.org, linux.com down, still… also, Git! – Updated! September 11, 2011

With news breaking about the compromised systems for kernel.org, linux.com, which are sites are “down for maintenance”. Completely – and it’s been this way for many days now. (Kernel.org since the 28th)

I think it’s safe to say the range and scope of the issues are pretty disappointing – the longer these systems stay down the more obvious it is that the damages are probably higher than perceived before; I’m having a hard time saying the administrators of these groups are just this slow at being cautious. (Especially the ‘forward facing  ‘ hosts)

(Segway to Git)

It’s interesting to note that a LOT of bullets with the security breach are dodged with the parity cryptography used by Git. Pretty cool! (Linus seems to be flirting with the idea of using Github for latest kernel developement.)

After keeping an eye on the “choosing a DVCS” discussions for PHP, a lot of people are in favor of leveraging Git purely because of Github – whereas Mercurial, something we use at M State has been around for a bit less – has a stronger, more mature toolset (Albeit, a bumpy ride for sure!); and, from my standpoint – better cross platform implementation. The speed differences are somewhat minimal.

The allure of Github is the social endeavour; this has fueled a much more active community (compared to Bitbucket, less ‘social-ly’). It looks like Git has finished clobbering competitors like Bazaar and Perforce and finally, I’m willing to throw in the towel for my support of Mercurial and say there’s little room for traction for Hg.

The programming world is a lot of work to follow =\

(Also: It’s September 11th – take a moment to reflect – cast some sympathy and reverence for the lives lost! ) 

UPDATE: 2011-09-21- 
linux.com has been updated and now states that they will be “restoring service shortly” – different from the original FAQ page they had up about the breach.

I’d expect kernel.org to follow.

After over half a month (almost a month for kernel.org) of being down with poorly communicated “maintenance” pages. I hope there’s a fallout for the culprits – and I hope the maintainers of those domains take a more serious approach to how they handle this situation next time…

No Comments on Kernel.org, linux.com down, still… also, Git! – Updated!
Categories: linux security

System admin ‘helper’: Zebra stripe log / console output August 25, 2011

Looking at an ASCII data table can be difficult – so to start a small trip into Perl programming – I tossed together a simple Perl script, with no module requirements – zebra.pl as I call it, and it zebra stripes the output. It adds a nice touch to say vmstat or viewing something like the interrupts on a multicore box. It’s super simple and done in the nature of Aspersa. (Now a part of percona toolkit).

It doesn’t work 100% like I want – I would have liked it to take an $ARGV; to do that it seems like I’d have to create a dependency with a module (something like GetOpts) – so I decided one can simply modify the script to change how many X rows are striped.

You can fetch it here.

No Comments on System admin ‘helper’: Zebra stripe log / console output
Categories: linux purdy servers

/usr/bin/chage – Sending emails when a pasword expires, or is about to June 6, 2011

There’s a lot of scripts out there that do this but they either don’t revolve around /etc/shadow enough or they’re sloppy.

Here’s my spin on a script for nightly cron that will parse /etc/shadow and send out emails based on the per-user values. It’s resistant to garbage dates (99999 ‘expiration’ dates).

Below is my best attempt at making the script ‘cohesive’ in this layout, however you can find the script here as well.

No Comments on /usr/bin/chage – Sending emails when a pasword expires, or is about to
Categories: linux security servers

MySQL 5.5.12 – init script warning May 25, 2011

I’ve just reported a bug regarding the init script that comes in MySQL 5.5’s source distribution .

Basically, if you call the ‘start’ clause of the script twice it will hose the service by allowing multiple instances to run trying to utilize the same resources (pid file, socket and tcp port) – naturally this renders the service that -was- working fine to screech to a halt, mysqladmin shutdown won’t work.. The only way to fix this is to do something like this to get things to normal:

My solution to avoid this for the time being is to put this in the beginning of the ‘start’ case clause in the ‘mysql.server’ script that we’re copying to /etc/init.d:

I chose exit 0; because technically, it’s still a successful command.

No Comments on MySQL 5.5.12 – init script warning
Categories: linux mysql servers

libflashplayer.so – quickie for install. April 11, 2011

Need a place to put it on debian/ubuntu?

Chromium:

Chrome: (You might have to create the ‘plugins’ dir – I did =\ )

Firefox:

Restart all instances of the browser/s and check it out.

PS: Flash on 64-bit linux doesn’t matter, you can put it in lib64/ if you -REALLY- want..

No Comments on libflashplayer.so – quickie for install.
Categories: linux purdy

Another rant on cutesy March 3, 2011

Codenames for releases are ok.

But the countless cutesy names for *nix tools get tiring … (And don’t help their adoption). 

From an email from debian security list today:

“Several vulnerabilities have been found in the Iceape internet suite, an
unbranded version of Seamonkey:”

Imagine if these tools were used in a corporate environment “I removed Iceape and just went with Seamonkey”.

EOF.

No Comments on Another rant on cutesy
Categories: linux rant

Cacti – DNS response time February 25, 2011

When you google for a cacti template for DNS response time, there’s not a whole lot out there, and what is; is pretty out dated or involves too much fidgetry.

This post assumes you’re comfortable with cacti – you should be able to at least initialize a graph and fill one in using datasources for a host. Must also be using linux. BSD has a different pecking order of commands.

This guide shows you how to slap together a quick DNS response data input method that will allow you to setup graphs on a nameserver/domain pair granularity. (Meaning, you can graph the same domain across several NS’s, or vice versa).

So here’s a quick rundown on creating a “data input method” and a “data template” for cacti to utilize for your nameservers.

1. Create a new data input method
1a.
   Name: (anything you want)
   Input type: Script/Command
   Input string:

The 1-liner above should get you the msec for given domain (dom) at given nameserver (ns). To test completely, replace the dom and ns lines with something valid:

   1b. Add the two ‘input fields’, ns and dom.
   1c. Add “ResponseTime” as an ‘output field’.

If done correctly, it should look similar to this:

2. Create the data template – Fill out the values to look similar to the screenshot below. Note, you will probably have to hit ‘create’ after selecting the data input method under “data source”. This will detect the “output field” for the “Data Source Item” values.

Here’s what one of mine looks like:

 I’ve omitted the target host/ns from this example image of course :)

6 Comments on Cacti – DNS response time
Categories: cacti linux servers

On top – Worthy of distribution

No really, this post is about top. I’ve always fancied myself well versed in top; every once and a while I need a reminder. Here’s a pretty cool/well presented in-depth rundown on how powerful this utility really is for people in charge of inspecting system behavior.

View in HD here!

Some more things to add:

Also to add something that is really useful I feel was left out that’s pretty important: “Shift + <", "Shift + >” will move the sort-by in the direction of the ‘arrows’. Oh, and ‘shift-w’ to ‘save’ these settings to .toprc :P.
‘x’ will toggle highlighting the column currently sorted by, coupled with the above, you can easily move through different consumptions.

No Comments on On top – Worthy of distribution
Categories: linkspam linux

Funny break January 28, 2011

Best strip ever:

VIA: http://xkcd.com/838/

No Comments on Funny break
Categories: linkspam linux

On: ntp, ntpd. link dump! January 14, 2011

So, in order to quickly have a (debian) machine up and running on ntp, you’re bound to do something like this ‘apt-get install ntp ntpdate’.

The problem is that this installs ‘ntpd’ too. The default configuration is to allow your server to answer to NTP queries from anywhere.

If you want to give the crackdown you’ll be somewhat frustrated with pre 4.6 config options as they’re somewhat nontraditional to what we usually see; without further ado, here’s a simple ‘link dump’ for a configuration guide.

On ntp 4.x? Guess what? Doesn’t work =[ – must be done with iptables.


Here’s the cheatsheet /etc/ntp.conf :

driftfile /var/lib/ntp/ntp.drift
server my.server.address


restrict default ignore
restrict -6 default ignore

restrict 127.0.0.1

restrict my.server.address

This will allow you to poll things, e.g.: ntpq -p; and keep everyone else from sending packets to your box either on purpose or by accident. Note: You -have- to have your ‘servers’ in restrict lines or else it’ll hang on the first poll. (Indicated by ntpq -p )

When ntp isn’t working right, this is what ntpq -p looks like:

 box:/etc# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================

123.123.123.123  .INIT.          16 –    –   64    0    0.000    0.000   0.000

Note the 0.000’s in the delay/offset/jitter – it’s also stuck on the sync request at INIT.

A properly functioning ntpq -p should look something like this:

box:/etc# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================

123.123.123.123  123.12.1.12      3 u    3   64    1    1.349  2446.01   0.000


No Comments on On: ntp, ntpd. link dump!