Grepping extremely large files

So you forgot to set up logrotate on an active log eh? You’ve got a many gigabyte file to weed through and you need to extract a chunk of time from it?

Here’s a quick cheat sheet to help you get by, quickly and sanely.

It’s about byte offsets!

  • Get the byte offset in the file where your time range starts
  • Get the byte offset in the file where your time range ends
  • dd the data out!

Caveats

  • You should tack on extra bytes to the byte length, because the offset_end number is actually the beginning byte of your boundary log entry
  • Figuring out the boundary is a bit tricky because a log entry -has- to be present in order to match, so if you’re looking for what happened at 20:00 hours on X date, you may have to round up to the date level depending on how busy your log is
  • This is just a trick to extract a chunk of entries to speed up further filtering.

Full example

No Comments

Start the ball rolling by posting a comment on this article!

Leave a Reply




XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">