XML Validation
<-- Introduction
Features -->



XML Validation

XML is often used in the automated exchange of data. Whether this is between different local software components or between two remote systems over the Internet, it is always important that both the creator and the processor of the data agree on its structure. Just providing data that can be parsed as XML is usually not enough: it has to conform to a given set of structural rules.

The exact rules depend on the context, but in a lot of cases, they are defined through either XML Schemas, Schematron definitions, or both.

XML Schemas

For XML, a common approach is to use XML Schemas for this. From Wikipedia:

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

In other words, XML Schemas can be used to make sure that XML document adhere to a pre-defined structure, in terms of which elements occur where, how often, and in which order. There is also (limited) capacity to check for the values of elements and element attributes.

Schematron

However, there are a number of use-cases where XML Schemas are not sufficient: for instance, when there are more complex requirements as to the values of specific fields, such as requirements that span multiple fields, or requirements that contain calculation, or even dependencies on other requirements.

For such cases, there is a standard called Schematron. From Wikipedia:

Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath languages. In many implementations, the Schematron XML is processed into XSLT code for deployment anywhere that XSLT can be used.

Schematron is capable of expressing constraints in ways that other XML schema languages like XML Schema and DTD cannot. For example, it can require that the content of an element be controlled by one of its siblings. Or it can request or require that the root element, regardless of what element that is, must have specific attributes. Schematron can also specify required relationships between multiple XML files. Constraints and content rules may be associated with “plain-English” (or any language) validation error messages, allowing translation of numeric Schematron error codes into meaningful user error messages.

Validating XML files

There are many tools and software libraries to parse and create XML documents, and most of them contain functionality to perform XML Schema Validation as well. So, in practice, and in most frameworks, validation for XML schema is as simple as the following diagram:

For Schematron, the story is slightly different: Schematron definitions are generally not used directly, instead, they are transformed into XSLT (eXtensible Stylesheet Language) files, which can be used by an XSLT transformer to transform a given document into an SVRL (Schematron Validation Report Language) document.

Simply said, this is a new XML document that contains a list of warnings and errors, about the XML document. If the document adheres to all rules defined in the Schematron file, these lists are empty. By checking for the presence or absence of errors in the SVRL result file, you can check whether a given XML document is valid or not.

Further complications

In general, when Schematron is used, it is implied that an XML Schema validation is also performed before running the Schematron transformation, so that the schematron rules do not need to take completely malformed input into account. So software that performs schematron validation must generally also perform XML Schema validation.

Also, is it not uncommon to need to verify against multiple Schematron or XML Schema files. For instance, to validate a Peppol BIS Invoice document, you must validate it against 3 definitions:

  1. The XML Schema for UBL 2.1 (or CII D16B) Invoice documents
  2. The Schematron definition for European Norm 16931 (EN-16931)
  3. The Schematron definition for Peppol BIS 3 Invoice itself

A third complicating fact is that a number of Schematron definitions use the XSLT 2 standard. While there are plenty of XSLT (1) transformers available in many programming languages, there are only a few complete and efficient XSLT 2 transformers available, for only a handful of programming languages.

ion-docval

ion-docval aims to address these issues by providing a software library to perform XML validation against multiple XML Schema and Schematron definitions at once, and returning a unified result of warnings and errors.

It also provides a standalone local HTTP validation service for integration in non-Java environments (or as a microservice in Java environments).

It is meant to be run in your own environment, so that you do not need to send potentially confidential documents to a third party.