Wednesday, February 27, 2008

Comparisons between GT.M and other software ecosystems

Since the beginning of modern computer programming (circa 1980 or so), people involved in EHR's have been trying to compare Mumps to say, C/C++, or Java, or VB. Then they try to compare Mumps to common database engines, such as SQL Server, MySQL, and Oracle. Those comparisons are generally not equitable all around, as Mumps is a combination of static file storage, programming language, runtime environment, and related utilities.

As a static file storage system, Mumps alone is inequitable to SQL Server, or other DBMS's, since Mumps only stores data in B-trees. Note that in the extreme back end of SQL Server, data is also stored in B-trees, but that level is never displayed to the developer. Instead, you interact with your B-tree data by using SQL the language. VistA solves this by using Fileman, which in itself is a combination DBMS, Display API, Database API, and various programmer level utilities. So rather, a better comparison would be the disk IO speed of Java vs Mumps globals, or the query speed of SQL Server vs Fileman. As a side note, Fidelity just released a beta version of PIP, which is their own relational engine, available on sourceforge. I'm really excited to try it out!

Mumps is also not directly equatable to Java, or C. Since Mumps is also a runtime environment and set of database utilities in addition to a programming language, it's better to compare Mumps to Java and JVM, or C# and .NET, or C# and WINE.

We were running some numbers at work on a virtual machine running Ubuntu 7.1, and in a nutshell, GT.M is slower than Java and C for number crunching, but GT.M has a higher level of inherent number accuracy. The Java and C implementations were having number overflows when they used a datatype that was too small to handle the large integers we were computing. Note that this did not throw a runtime error, and the results looked real enough until they were compared to a sample set. GT.M did not have this problem, and got the correct answers from the beginning.

When compared to disk IO, I have not had a chance to compare mumps global speed vs Java or C disk IO, but in GT.M I can update 4,000,000 subnodes in 17 seconds. That is blazingly fast, but again, I have nothing to compare it against.

When compared as a database, SQL Server far outperforms Fileman. The general rule of thumb is that SQL Server is faster than Fileman by a magnitude of 4 to 1. I know that many people always say how "SQL Server and Oracle are slower than Fileman", but that is simply not true. My coworkers and I have used the exact same server with the same problem set, implemented in both Fileman and SQL Server, and run the same query to receive the same dataset. SQL Server outperforms Fileman, bar none.

In Fileman's defense, it is more than just a DBMS, it handles user IO, and has it's own programming API (date/time utilities, etc.).

Our next step will be to have a complete problem which requires a large database query, and number crunching as a cohesive unit. Perhaps then the Mumps model of tight integration will see an advantage as you can directly manipulate globals and Fileman from your code, whereas C or Java has to interact with SQL Server through ADO or ODBC.

I know I haven't posted numbers yet, but we have more testing that we'd like to do. I don't want to have anything indexed by google until I have a complete set of numbers to post at once in a coherent manner.

No comments:

Post a Comment