Saturday, April 17, 2010

Rethinking the Evil of XML Configuration Files

Do you remember when your system's configuration grew from a simple file containing a few name/value pairs modified by a small set of people, into a large collection of files in multiple locations with their own business processes surrounding them?

When the config files were small and informal, you could easily get by with a YAML file or a Windows .INI file format. But now that there are hundreds or thousands of settings, maintaining them all has become challenging.

The main problems with YAML and .INI configuration files are:
  • Lack of Type Safety - If the configuration reading code expects 'true' or 'false' and the configuration file contains '1', how will the code handle that input? Will it log an error? Will it silently misinterpret the '1' as 'false'? Do you want to have to write type-checking code throughout your application? You could write a comment in the configuration file that specified the type of the setting but not everyone understands all the various types and formats (e.g. dates and times).
  • Lack of Range Checking - If the configuration code expects a value between 1 and 4 inclusive, and someone has configured the setting as 5, how will your system react? What if your configuration reading code expects 'high' or 'low' and someone enters 'medium'? That's another form of range violation. You could write comments that specify the range but the comments had better agree with the code that does the actual range checking.
  • Lack of Validation Support - If one of your configuration settings is mandatory for the system to operate correctly (e.g. a web service endpoint) and it's missing, you don't discover the error until run-time. You could add a comment to the configuration file that stated that the setting was mandatory, but will people read it?
  • Lack of Appropriate Defaults - If some of your configuration values have appropriate defaults that you want to communicate to the user, you are stuck writing comments in the YAML or .INI file that list the defaults. Unfortunately, you have now just introduced duplication between the code that must take the default when the configuration value is missing and the comment in the configuration file.
XML files and, more specifically, XSD files, provide for all of the above.
  • XSD allows you to specify the type of a given configuration value. If your configuration value has the wrong type, XSD validation of the configuration file will alert you to your mistake.
  • XSD has support for range checking of several types.
  • XSD by it's very nature handles validation. If your application does XSD validation upon startup, you can quickly catch configuration errors.
  • XSD allows you to specify default values for configuration settings.
  • With an XSLT transform you could generate an HTML document that would list all your settings along with their defaults.
In short XSD codifies and enforces all of the constraints that we would otherwise add to our configuration files as comments.

Maybe XML configuration files aren't completely evil. The main complaint I had about XML configuration files was that they were so hard to get right. Isn't that ironic! YAML and .INI files are hard to prove that they're right.

How many of my YAML and .INI files are wrong and I just don't know about it?