XML Woes

No Comments June 15, 2006

Amongst the many roles I seem to undertake at my current employer (.NET Evangelist, NHibernate Evangelist, ATLAS Evangelist and so on), I also spend a large amount of my time dealing with XML technologies - whether it be transformation, serialization, querying or whatever.

Over time I've seen some truly awful implementations of so-called XML 'solutions'; one of my favourites involved the DVLA here in the UK. They proudly announced that they'd implemented an XML based system to enable the querying of Driving Licence details (whether the driver was banned, how many points on the licence and so on). So, given an XSD schema describing the object graph required for the query I coded up a simple .NET application to produce the correct XML fragment.

The DVLA instantly rejected it because:

  1. Every element needed to start on a new line
  2. There was an 'encoding' attribute in the XML declaration
  3. Even though the source XSD had specified a target namespace, namespace declarations were not understood

Aaaargh!

Even worse was the size of the response - every single line in the response document was padded to over 200 characters, and with one element on each line this produced a document that was 62,771,702 bytes long.

A simple

cat results.xml | tidy -xml -indent -o parsed.xml

produced a document that was a mere 8,604,602 bytes in length.

Now this is a bit of an extreme example, but I've seen several variations along this theme in active use and nearly all of them stem from one single problem - string concatenated XML, like this:

Console.WriteLine(@"<?xml version='1.0' encoding='utf8'?>");
Console.WriteLine("<root><element attribute='value' /></root>");

XML has a problem in that it is very easy to read and understand, but there are enough pitfalls for the unwary/lazy. I've had documents in the past where the encoding has been specified as UTF-8, but the characters supplied have been ISO-8859-1 which lead to some truly awful hacking with 'awk' and 'sed' later in a complex overnight batch process.

The MSDN article "Five Ways to Emit Test Results as XML" by James McCaffrey provides some interesting XML generation options for the .NET framework which I fully agree with - except for the end of Technique 1.

There are many, many toolkits out there for XML processing and generation - don't resort to string concatenation, you'll get it wrong!


No Comments