Tales of an IT Nobody

devbox:~$ iptables -A OUTPUT -j DROP

pt-online-schema-change and partitions – a word of caution October 30, 2012

Just a quick word of wisdom to those seeking to cleverly change the partitioning on a live table using a tool like pt-online-schema-change from the Percona Toolkit.

You will lose data if you don’t account for ALL of your data’s ranges upfront. (E.g: MAXVALUE oriented partition). The reason being is how online schema changing tools work; they create a ‘shadow copy’ of the table, set up a trigger and re-fill the table contents in chunks.

 

If you try to manually apply partitions to a table that has data from years ranging from 2005-2012 (And up), say something like:

ALTER TABLE _foo PARTITION BY RANGE(YEAR(dt)) ( PARTITION p0 VALUES LESS THAN (2012) )

And it has data from 2012 or above, MySQL will give you this safety error: (errno 1526): Table has no partition for value 2012 – this is a safeguard!

 

Now, if you use a hot schema tool, the safeguard isn’t in place; the copying process will happily rebuild the data with the new partitions and completely discard everything not covered by the partition range!

Meaning, this command will successfully finish, and discard all data from 2012+:

 

Just a heads up for those wanting to do something similar, you’ll definitely want to ALWAYS have something like this too fall back on as your final partition:

No Comments on pt-online-schema-change and partitions – a word of caution
Categories: databases mysql servers

Why are we spending so much time refuting? July 5, 2012

There’s a nice juicy war going on in the ‘data / web’ sector, that seems more heated than I can remember.

It essentially boils down to sensationalist claims from the likes of MongoDB and MemSQL, which in turn draw refuting remarks from industry professionals that are typically embedded with RDBMS technologies.

The typical responses to these new ‘hipster’ systems are usually transaction/consistency centric – as that’s where the RDBMS systems shine – they can perform wonderfully while being ACID compliant.

Or in the case of Node, refuting the ‘Apache doesn’t have concurrency, node is better’ arguments. I have a hunch 99% of the Node fanboys have a damn clue how capable Apache itself is.

There’s also things like Node.js that rub the seasoned people the wrong way, perhaps it’s the sensationalism without actually proving anything? (Check the first few comments) Or the utter lack of security focus? (That’s what bugs me) – I also think it has to do with their approach to enter the market: guns blazing, criticizing other solutions and hoisting their own as THE single option with more tenacity than appropriate for such an immature project. Guys in the trenches can’t stand that crap, we know it’s just another tool to get the/(‘a’) job done in a particular scenario.

But really, I think about how much time is wasted on these subjects going back and forth, so let’s stop wasting time. Be open minded to the new technologies as tools for a particular job and stop making all or nothing stories out of future tech, like it or not – we all have to share the same space.

No Comments on Why are we spending so much time refuting?

NoSQL: stretttcch – pop! May 11, 2012

You know the noise you hear when something is stretched close to it’s breaking point, like a balloon? In the context of the NoSQL bandwagon bubble: here’s some of that noise.
A detailed account of jumping on a bandwagon with an immature scene of products.

Although admittedly I have my eye on MongoDB, I still can’t believe the utter lack of ACL control these “databases”/”document stores” have. Tisk!

Another factoid: Netflix uses NoSQL… and most of their data in it is Tomcat logs.

No Comments on NoSQL: stretttcch – pop!
Categories: databases linkspam nosql

MySQL – multibyte cheat sheet for conditionals April 5, 2012

This is basically a short cheat sheet to launch some ideas for comparisons/matching with multibyte characters.

Connection collation of utf8_unicode_ci and charset of utf8 – Table + data and server:

Check out the behavior of the following queries:

Other resources:

No Comments on MySQL – multibyte cheat sheet for conditionals

Nay say for ext2/ext3, seemingly ext4 for MySQL servers July 19, 2011

 Basically I felt compelled to make a note regarding what filesystem to evaluate when you are performing a MySQL install. There seems to be a lot of reasons NOT to use the ext filesystems, and instead use XFS..

This is a straight out quote from a MySQL at Facebook blog entry:

ext-2 and ext-3 lock a per-inode mutex for the duration of a write. This means that ext-2 and ext-3 do not allow concurrent writes to a file and that can prevent you from getting the write throughput you expect when you stripe a file over several disks with RAID. XFS does not do this which is one of the reasons many people prefer XFS for InnoDB.

More on IO concurrency from  MySQL big name Domas Mituzas

  • O_DIRECT serializes writes to a file on ext2, ext3, jfs, so I got at most 200-250w/s.
  • xfs allows parallel (and out-of-order, if that matters) DIO, so I got 1500-2700w/s (depending on file size – seek time changes.. :) of random I/O without write-behind caching. There are few outstanding bugs that lock this down back to 250w/s

 A patch for ext4 was created, but it doesn’t appear that it made it in; it seems to yield minimal benefit.

And some other performance and risk observations involved with the most wildly used ext3.

If you’re looking to install or upgrade a MySQL server, it may very well be worth the time investment to research the depths of what filesystem you select, since it has just as much to do with the database performance as the MySQL configuration itself! 

No Comments on Nay say for ext2/ext3, seemingly ext4 for MySQL servers
Categories: databases mysql servers

Worthy of distribution: Reset root MySQL password July 18, 2011

Oh snap! Need to reset your mysql root/admin (or any?) MySQL password? Well, you’ll need root and control over MySQLd to some extent, but this is worthy of a rainy-day bookmark indeed: http://mysqlpreacher.com/wordpress/2011/03/recovering-a-mysql-root-password-three-solutions/

No Comments on Worthy of distribution: Reset root MySQL password

Programming through inductive reasoning August 5, 2009

I’ll never forget my Geometry class in high school, not just because it was fun, but the teacher was kind of an oddball (in a good kind of way). I learned so much from this class it was unbelievable, it was a life changing experience because it taught the true powers of using inductive reasoning (in geometry, things like the sums of the angles of a triangle are always 180 degrees, etc)

Kevin Mitnick demonstrated this power in the 90’s – Organization and inductive reasoning work hand in hand. If you know that something is completely organized around a concept you can safely assume a lot about it without truly knowing.

With that said, my previous post about table organization should already (hopefully) make more sense.

You can safely assume that because a table starts with ‘category’, it contains a p_id, id and title, etc. All of this is nice and consistent, other programmers will know what’s going on as well.

Admittedly I have plenty of stuff that doesn’t revolve around this concept – maybe someday I’ll re-organize.

No Comments on Programming through inductive reasoning
Categories: databases programming

Canonically speaking

A co-worker and I recently got into a discussion regarding table naming – specifically prefixes and how to group them.

A common approach you see, is to group the tables around a system using a prefix for it’s name:

_newsevents_news
_newsevents_categories
_newsevents_acl

The hardest thing about keeping everything ‘integrated’ is to get other developer’s off of the concept that they need to build their own access control system, or specialized category organization – a category system is a category system, it’s more so a metadata heirarchy to organize data.

I’ve always adopted a canonical approach, similar to how dns server operate, start with what would be your end:

com.mydomain
com.mydomain2
org.anotherdomain

in table terms:

_acl_news
_acl_content_management
_categories_files
_categories_news

The reason I do this is to keep myself in check with how consistent my designing is –
the schema for ‘newsevents_categories’ and ‘file_categories’ in a conventionally grouped table set may vary widely – the idea is to wean off of these drastic changes and group them by their TRUE function, not just the ‘sub-system’ they compliment – and have their schemas as similar as possible. This ensures better object design and abstraction possibilities as well as truly faster development. You can trust yourself with this naming convention just as much as you can trust yourself with the conventional naming, it just takes a different perspective to have it make sense.

The real issue here is how we name our datastore tables for our various formats – if you have news posts, or faq items you’re likely to see tables like:

news_items
faq_questions

Ideally, these generic datastore tables would follow a naming convention revolving around what they truely are, the ‘data’ that drives the system (not to be confused with various metadata tables like category), e.g.:

_data_news
_data_faq

The idea here is to keep the schemas the same – everything has metadata and other supplemental information that makes it ‘different’, that’s what code is for, bringing those tables together to make them into specific objects.

There’s always the unknown – a good example of a wrench thrown into the mix is if you have file information stored in a table – possibly containing image dimensions, file mime types or category relationships – The thing that makes table designs so inconsistent is typically the metadata that revolves around the specifics of what it’s relevant to. This inconsistent, or ‘unique’ information should be stored boldly in a metadata table.

This post is a little long so I’m going to post the “bring it together” stuff in another post

No Comments on Canonically speaking
Categories: databases programming