Wednesday, February 27, 2008

Comparisons between GT.M and other software ecosystems

Since the beginning of modern computer programming (circa 1980 or so), people involved in EHR's have been trying to compare Mumps to say, C/C++, or Java, or VB. Then they try to compare Mumps to common database engines, such as SQL Server, MySQL, and Oracle. Those comparisons are generally not equitable all around, as Mumps is a combination of static file storage, programming language, runtime environment, and related utilities.

As a static file storage system, Mumps alone is inequitable to SQL Server, or other DBMS's, since Mumps only stores data in B-trees. Note that in the extreme back end of SQL Server, data is also stored in B-trees, but that level is never displayed to the developer. Instead, you interact with your B-tree data by using SQL the language. VistA solves this by using Fileman, which in itself is a combination DBMS, Display API, Database API, and various programmer level utilities. So rather, a better comparison would be the disk IO speed of Java vs Mumps globals, or the query speed of SQL Server vs Fileman. As a side note, Fidelity just released a beta version of PIP, which is their own relational engine, available on sourceforge. I'm really excited to try it out!

Mumps is also not directly equatable to Java, or C. Since Mumps is also a runtime environment and set of database utilities in addition to a programming language, it's better to compare Mumps to Java and JVM, or C# and .NET, or C# and WINE.

We were running some numbers at work on a virtual machine running Ubuntu 7.1, and in a nutshell, GT.M is slower than Java and C for number crunching, but GT.M has a higher level of inherent number accuracy. The Java and C implementations were having number overflows when they used a datatype that was too small to handle the large integers we were computing. Note that this did not throw a runtime error, and the results looked real enough until they were compared to a sample set. GT.M did not have this problem, and got the correct answers from the beginning.

When compared to disk IO, I have not had a chance to compare mumps global speed vs Java or C disk IO, but in GT.M I can update 4,000,000 subnodes in 17 seconds. That is blazingly fast, but again, I have nothing to compare it against.

When compared as a database, SQL Server far outperforms Fileman. The general rule of thumb is that SQL Server is faster than Fileman by a magnitude of 4 to 1. I know that many people always say how "SQL Server and Oracle are slower than Fileman", but that is simply not true. My coworkers and I have used the exact same server with the same problem set, implemented in both Fileman and SQL Server, and run the same query to receive the same dataset. SQL Server outperforms Fileman, bar none.

In Fileman's defense, it is more than just a DBMS, it handles user IO, and has it's own programming API (date/time utilities, etc.).

Our next step will be to have a complete problem which requires a large database query, and number crunching as a cohesive unit. Perhaps then the Mumps model of tight integration will see an advantage as you can directly manipulate globals and Fileman from your code, whereas C or Java has to interact with SQL Server through ADO or ODBC.

I know I haven't posted numbers yet, but we have more testing that we'd like to do. I don't want to have anything indexed by google until I have a complete set of numbers to post at once in a coherent manner.

Free hosting on blogger.com

So I did some digging around on my options for hosting, and I realized blogger and wordpress are both free. I bought a domain name that I thought was pretty cool (www.beigehat.com), and I figure I might as well tie it to my domain. Wordpress cost $10 for this feature, whereas blogger is free. That sealed the deal for me, and now I'm the owner of my own blog that I hope no one reads.

Seriously, between Google Apps, Google the search engine, Blogger, etc., Google is taking over the world. I think Google is going to be Skynet.


Please don't kill me Skynet...... or at least save me for last!

Webhosting vs Server@Home

Lately, I've been thinking about where I'm going to host this site. My choices are, host it from home, or host it on a webhost. Here are the pros and cons as far as I can tell.

Webhosting:
Pros -
Cheap
24 x 7 uptime
Someone to bitch at if things go wrong

Cons -
Tech support may be less competent than me
Storage and bandwidth caps
Hassle to move large amounts of data around

Server@Home:
Pros -
I own the server, thus I am root and can do whatever I want
Easy to plug and play new server components, or extend drive space, etc.
Monthly costs are actually comparable to paying for a Virtual Private Server at a hosting company. Electricity and hardware costs factored in, it will cost me roughly $35 a month.
Cons -
I am responsible for my own tech support
More costly upfront, though most cheap-low power servers can be built for $200-$250
If I cap out my home bandwidth, that would suck.

Right now, I'm leaning towards a home server. It would be more fun anyway, and depending on how long the hardware lasts, it might be cheaper. In any case, I have some things to take care of, so this site probably won't be hosted anywhere in the near future.

Playing with Drupal

So far, Drupal totally kicks ass!

Why did I wait so long to try it? I dunno. I'd like to thank www.bitnami.org for turning me onto this and other great open source enterprise level applications.


Lack of BNF Grammar for Mumps

In my quest to find a document generator for mumps, I've found that there's nothing really built for it. Doxygen looks great, but only supports C/C++ style languages. The Doxygen faq says to use an input transform (their term, not fileman!) to translate your language into a C style. So I figure well, if I gotta do that, might as well write my own parser. So the logical place is to start with a grammar. But guess what? There is no BNF grammar for mumps to be found. Supposedly it's part of the ANSI and ISO standards, both of which you have to pay to receive a copy of. Through my searching, I've discovered that supposedly the MDC decided to keep all non-ansi/iso copies of the mumps BNF grammar on paper only, no electronic form available. Other people that have asked for the grammar have been told to look at a vendor specific implementation, and infer the grammar from that. Great.

Testing a blogger account

First post! Yay!