Tuesday, April 6, 2010

How I'm learning to love simple databases

Once upon a time I was a total believer in strict database rules. Everything should be fully normalized, schema and diagrams thought out ahead of time, strict data validation at the database level, so on and so forth. A total believer in the primacy of MS SQL Server, DB2, so on and so forth. Within the last 6 months, I've had a change of mind that's been brewing, and only lately have I more fully realized it.

When you're developing a webapp nowadays, chances are you have various frameworks and technologies that you use, each one layered on top of the other. MySQL -> Hibernate -> Java application -> JSP/Servlets -> HTML/markup, as one example. The data that you accept in from your users must pass through each layer to finally get to your storage, whether it's in memory at the application layer, or static storage at the db. Each layer has its own data validation. So if you have a date field that a user fills out, it comes in as a string, which the Servlet can accept in and convert to a valid Java date, which passes it to your application layer, in which you can put that data in an object that checks that it's a valid date (if it's defined as a Date object). Then it goes to your persistence layer, which in our example, Hibernate would check that it's a valid date. It finally gets passed to your database layer, which in our example, MySQL would check that it's a valid date. All that work to verify that it's a date several times. Why? What's the point? If you checked that it's a valid date once, and you don't do any measurable transformation of the data, there is no need to check it again.

If I think about it, when is it exactly, that I care that I check if it's a valid date? When I accept data in from the user, when I do fancy things at the application layer like date/time arithmetic, and when I display data back to the user. None of that touches the database layer. Do I care if the static data is stored in a traditional RDBMS, a bunch of XML files, or a straight up B tree like GT.M? Other than the possible performance gains by storing static data one way or another, it doesn't matter.

I currently use Wicket, and Wicket has easy to use form validation built into the framework itself, which covers all the cases in which I would care to check if the data is appropriate or not. Right now, I do datatype management at each layer because that's how I was taught, but I'm seriously considering switching everything underneath Wicket to string, integer, or float, crossing my fingers, and hoping nothing blows up.

Beyond data validation, what about primary/foreign keys? Many databases don't even have pk/fk relationships built in. Originally, I thought those databases were mad (in a bad way). However, if you think about it, the business logic which says that one table must relate to another table via a field based relationship, that business logic has to exist in your RDBMS, your persistence layer, and your application (because you must pass data with the appropriate relationships to the layer below). Why should each layer check the relational integrity of the data? The only case I can think of, is to protect myself from making a mistake at the application layer. Fine then, I can set those rules in the persistence layer as well as my application, and the database can be simply a static container, nothing more.

2 comments:

  1. Your comments are spot on, Branden. Incidentally, could you correct the GT.M link to http://fis-gtm.com please? Thank you very much.

    Regards
    -- Bhaskar

    ReplyDelete
  2. Sure thing Bhaskar, always good to see you around the web!

    ReplyDelete