XML validation, DTD, XSLT and Schematron

Need for XML schema validation.

Why do we need validate XML? Let us say that we have XML document. I need to come out with some sort of example … hm … OK. Here we go. XML document that represents rugby game ( I like rugby, watching it from a safeness of my sofa ).

<?xml version="1.0"?>
<rugbyGame>
	<team name='terminators'>
		<numberOfPlayers>16</numberOfPlayers>
	</team>
	<team name='channel4'>
		<numberOfPlayers>16</numberOfPlayers>
	</team>
</rugbyGame>

We know that each team should have a name and declared number of players. Validating this in a code with large amount of condition statement would seriously increase complexity and embed business logic for document validation in our application. Plus, we will have to go through document and validate nodes and do a lot of crap that just consumes time and irritates. Instead of producing more code that can introduce new bugs we could use one of document validation tools.

DTDs goodness

DTD stands for Document Type Definition. In few words it defines what elements and attributes valid document should have and how they are related. For our rugbyGame XML it will look something like this:

<!ELEMENT numberOfPlayers (#PCDATA)>
<!ELEMENT team (numberOfPlayers) >
<!ATTLIST team
                name CDATA #REQUIRED>
<!ELEMENT rugbyGame (team+)>

Every implementation of programming language has library that in one way or another can consume DTD and XML document, and validate it. We need to put one line to our rugbyGame XML to tell parser what schema it is created for, in other words, what is it’s type. This line can look like this:

<?xml version="1.0"?>
<!DOCTYPE rugbyGame SYSTEM "rugbyGame.dtd">
...

Document could be validated using Java like this:

public class DTDValidator implements Validator {
    public ValidationResult validate(File xmlFile) {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(true);
        ValidationResult result = new DTDValidationResults();
        try {
            DocumentBuilder documentBuilder = factory.newDocumentBuilder();
            documentBuilder.setErrorHandler((ErrorHandler) result);
            documentBuilder.parse(xmlFile);
        } catch (ParserConfigurationException e) {
            // handle problems
        } catch (IOException e) {
            // handle problems
        } catch (SAXException e) {
            // handle problems
        }
        return result;
    }
}

To be honest it’s not straight forward when you look at DTD to say how document should be structured. It’s very NOT user friendly. You wouldn’t believe how much user unfriendly and unclear it can get when it specifies more sophisticated document type.

It is also limited. Let’s say I would like to validate that both teams in rugbyGame have same number of players. Sounds easy but it’s impossible to implement using DTD language.

Another approach

Is there any easier way to validate document or create its schema? Work is carried by W3C to create more expressive and powerful XML Schema language. For the time being there are alternatives like RELAX and SCHEMATRON. I’m going to focus on Schematron as this is the approach that I was using in my current project.

Schematron is much different than DTD as it focuses on document validation not document schema. It relies on Xpath searches and XSLT transformations. One would ask how XSLT transformation can validate document. Well, let’s give an example. Let’s say that we would like make sure that team element has the name attribute set.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="rugbyGame">
        <xsl:for-each select="team">
            <xsl:if test="not(@name)">Each team should have a name ! Invalid Document.</xsl:if>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

And a little bit of Java code 🙂

    public ValidationResult validate(File xmlFile) {
        StringWriter stringWriter = new StringWriter();
        StreamResult streamResult = new StreamResult(stringWriter);
        StreamSource xmlForValidation = new StreamSource(xmlFile);

        try {
            transformer.transform(xmlForValidation, streamResult);
        } catch (TransformerException e) {
            e.printStackTrace();
        }
        return new XSLTValidationResults(stringWriter.toString());
    }

Schematron validation

Same approach takes Schematron. Its validation is based on document transformation. You need to create Schematron validation document, with all rules, then apply transformation on it using Schematron skeleton file (that is XSL file). This transformation will produce XSLT that can be used to validate XML.

It doesn’t do too much does it. Why not use XSLT instead. Schematron document is based purely on Xpath searches and assertions. It’s easier to write, read and understand. Its language is much more powerful and there is no need to do any nasty hacks for more advanced validation.

So, how would Schematron document look like for the same validation that we just did using XSLT.

<?xml version="1.0" encoding="iso-8859-1"?>
<iso:schema xmlns="http://purl.oclc.org/dsdl/schematronValidator"
            xmlns:iso="http://purl.oclc.org/dsdl/schematronValidator"
            xmlns:sch="http://www.ascc.net/xml/schematronValidator"
            queryBinding='xslt2'
            schemaVersion="ISO19757-3"
            defaultPhase="basic">
    <iso:title>Test ISO schematronValidator file. Introduction mode</iso:title>
    <!-- Not used in first run -->
    <iso:ns prefix="dp" uri="http://www.dpawson.co.uk/ns#"/>

    <iso:phase id="basic">
        <iso:active pattern="basic.rugby.checks"/>
    </iso:phase>

    <iso:pattern id="basic.chapter.checks">
        <iso:title>Basic rugby game checks</iso:title>
        <iso:p>All team names checks.</iso:p>
        <iso:rule context="team">
            <iso:assert test="@name">Team should have a name</iso:assert>
        </iso:rule>
    </iso:pattern>
</iso:schema>

I’m not presenting you with Java code as that is very similar to XSLT validation example, Validation document is very straight forward. Xpath searches are simple and easy to write.

You can define validation phases and invoke them independently. Each validation rule is defined on a context of XML element. Every rule can have multiple validation.

There is much more into Schematron, if you are looking for more information visit project web site.

I hope this post brings XML validation a little closer to those little ones who are afraid of it.

Cheers, Gregster

2 thoughts on “XML validation, DTD, XSLT and Schematron

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s