When Relational Databases Attack!

January 2012 · 3 minute read

I have been working with relational databases for years and they serve a great purpose, but with the new kid on the block, NoSQL, there is no reason why some data stores should still be using relational databases.

I have worked on a system that stores all person directory information. This information includes things like

Obviously this is a very object oriented structure since a single person can have multiple email address, numbers, and so on so initially a relational database would make sense to use. Eventually you would end up with tables for each object. However, this is possible:

Tablitis

Now you might be saying to yourself:

What’s wrong with that?

But, I continually say:

Seriously, what the fuck!?

Technically there is nothing wrong with this approach since all the data is normalized and related to each other. I do have ask though, is a relational database really the best method for storing this type of data? I don’t think it is because of some very basic points.

  1. Computational queries will never be run against this data on a consistant basis
  2. Most of the time, whenever a person is requested I want ALL of their attributes
  3. This type of data has high read/low write needs

This is where a NoSQL solution would shine! Why not represent this data as it should be, in object notation.

{
    "person":
    {
        "first_name":"Blain",
        "last_name":"Smith",
        "numbers":
        {
            "phone":"111-222-3333",
            "mobile":"333-222-1111",
            "fax":"555-333-1111"
        },
        "address":
        {
            "home":
            {
                "address_1":"12 Somewhere Street",
                "address_2":"",
                "city":"Ballston Spa",
                "state":"New York",
                "postal_code":"12020",
                "country":"United States of America"
            },
            "work":
            {
                "address_1":"12 Somewhere Street",
                "address_2":"",
                "city":"Boston",
                "state":"Massachusetts",
                "postal_code":"02115",
                "country":"United States of America"
            }
        }
    }
}

Ahhhh, I like that much better. Now, the current process that is used to get this data is:

  1. The central relational data store runs a script
  2. The script generates a single differential XML file of the data, encrypts it, and stores on an FTP site
  3. A local script I wrote checks the FTP sites for a new XML file, downloads it, then decrypts it
  4. Another bulk insert script runs by combining the XML file and an XSD file which makes the XML to my relational database tables
  5. Once completed it does UPDATEs and DELETEs

A few problems I see with this process:

Now if all this data was stored with a NoSQL database then a lot of this processing overhead can be eliminated.

I plan to try and run some real world tests with these ideas and get some real numbers on processing time, storage space, and bandwidth usage…when I have some free time!