Grepping extremely large files

So you forgot to set up logrotate on an active log eh? You’ve got a many gigabyte file to weed through and you need to extract a chunk of time from it?

Here’s a quick cheat sheet to help you get by, quickly and sanely.

It’s about byte offsets!

  • Get the byte offset in the file where your time range starts
  • Get the byte offset in the file where your time range ends
  • dd the data out!

Caveats

  • You should tack on extra bytes to the byte length, because the offset_end number is actually the beginning byte of your boundary log entry
  • Figuring out the boundary is a bit tricky because a log entry -has- to be present in order to match, so if you’re looking for what happened at 20:00 hours on X date, you may have to round up to the date level depending on how busy your log is
  • This is just a trick to extract a chunk of entries to speed up further filtering.

Full example

Leave a Reply

Your email address will not be published. Required fields are marked *