Blog

The XML Instance Gamut

19 Oct, 2009
Xebia Background Header Wave

If you happen to be in the business of writing software serving XML documents or consuming XML documents – and if you read this post, then there is a fair chance you are – then there is always one big challenge: how do you make sure your service or client is capable of dealing with all of the XML documents you could possibly expect to be passed around?
And if you happen to come from the test-driven world, the answer is obviously: by testing it. However, if you try to do that, things might be harder than you expect at first.
What about schemas?

I clearly remember having to integrate with Google’s Local Search Service. We managed to get them send us their schema, but the schema was merely illustrative, rather than normative. In fact, it didn’t even ‘parse’ correctly. It was supposed to be a DTD, but in reality, it wasn’t. In that case, you are basically lost. The only thing that you can really do is ‘test by poking around’, trying to see what the web service is going to reply, and then work into your test harness.
If you do however manage to get a schema, then you are still not done yet. Sure, if it’s about SOAP based web services, then you might be able to generate stubs and skeletons, and those stubs and skeletons would give you some guarantee that you are covering most cases. But then there is still a chance that you would not cover for all cases, since – inside your XML document – there might be alternatives for content models, and you might – when you would implement your service – only be dealing with one of them.
If the schema is small, then you can probably figure it out by careful examination. However, if the schema is huge, then the range and variety of XML document instances that you might get will make that impossible. And even if you created the schema yourself, it might sometimes cover for a wider range of options than you expected. (I’m sure, I am not the only one who experienced this. ;-))
XML Instance Generator to the rescue
So, back to test-driven. The good news is, there are tools that take a schema, and generate random instances, basically walking all of the different options. Xmlgen is one of those tools. It’s a little bit hard to find these days. If you follow the ‘XML Instance Generator’ link on Kohsuke’s homepage, you will end up in no-mans land. I dug a little further, and found out it’s currently hosted at Sun’s dev.java.net.
Xmlgen is extremely simple. It takes a schema (any schema language), and will generate any number of sample documents from that. It’s exactly what you want, except… It doesn’t support all datatypes defined by the XML Schema Datatypes specification. And that’s something I ran into more often before.
In fact, I tried to use xmlgen before on a couple of occasions, and each time it broke on missing support for xs:dateTime or xs:pattern restrictions. And there doesn’t seem to be an aweful lot of work going into xmlgen to fix that.
Fixing XML Instance Generator
So I figured I’d fix this myself. It turned out adding support for dateTime wasn’t all that hard, even though xmlgen does not really have extensions points to implement, so you’re basically left with a) hacking the source code big time, or b) hacking it just a little, in order to add plugpoints and then have something else implementing that plugpoint – which is what I did.
Whoops, xs:pattern
Adding support for xs:pattern turned out to be a little tricky. If you are new to this type of restriction, then you should know that it is about restricting content to fit a certain regular expression, as illustrated below.
[xml]
<simpleType name=’better-us-zipcode’>
<restriction base=’string’>
<pattern value='[0-9]{5}(-[0-9]{4})?’/>
</restriction>
</simpleType>
[/xml]
Now, if you would have the desire to generate valid data for this restriction, then you should be able to generate text from that regular expression. It turns out there are quite a few Java libraries out there capable of matching text, but there nothing at all for generating text. So I implemented my own. I blogged about it here, and it is hosted here.
Once that was done, extending xmlgen to have support for xs:pattern restrictions was easy. That means that – with just a few changes – I am now able to generate a test set for a fairly complicated schema. And I’m pretty sure that it will cover all cases, as long as I make the number of instance documents big enough.
So, now for a restriction like this:
[xml]
<xsd:simpleType name = "TimeValue">
<xsd:restriction base = "xsd:string">
<xsd:pattern value = "[0-2][0-9]\:[0-5]0-9?"/>
</xsd:restriction>
</xsd:simpleType>
[/xml]
… it will generate instances like this:

  • 07:36
  • 10:16:26
  • etc.

You can download the modified version of xmlgen here.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts