Configuration
Configuration File Location
You can specify the location of the configuration file (default: ion-docval.conf) with the -c/–config argument. If you do not specify a configuration file, the software will try these three locations, in order:
- $HOME/.config/ion-docval.conf
- /etc/ion-docval.conf
- ion-docval.conf in the current working directory.
Options section
- AutoReload: [true/false] specifies whether validation files should be automatically reloaded when they change on disk. Note that this may cause an initial delay with the first request after the change, while the validation file is reloaded.
- UnknownKeywords: [warn/error/fail/ignore] specifies what the validator should do when a document is validated with a keyword that is not in the configuration.
- warn: Return a normal validation result, but add a warning about the unknown keyword. This warning will be the only content of the result
- error: Return a normal validation result, but add an error about the unknown keyword. This error will be the only content of the result
- ignore: Return an empty validation result, with 0 errors and 0 warnings
- fail: Return a hard error that the document cannot be validated. In a web browser, this will be an error alert. On the HTTP level, it will be a 400 client error.
- LazyLoad: [true/false] when true, the validation files will not be loaded into memory until they are first used. This will speed up start time at the cost of initial reaction response time of each document type
Server section
In the Server Section, you can specify one or more address/port combinations to listen on for incoming http requests for validation. This section is only relevant for ion-docval-server.
Note that it is highly unadvisable to change the IP address to anything other than 127.0.0.1, and especially to any public routable IP address. The internal web server is very minimal and offers not security features such as TLS or IP whitelisting. If you want to run the server validator so that it is reachable from a wider network, we strongly advise to do so through a reverse proxy such as nginx, which can provide these features.
Each Listen entry has the following values:
- Address: [IP address] The IP address to listen on
- Port: [integer] The port number to listen on
Document type section
This is where you define which types of document ion-docval will validate for you. You can specify as many DocumentType entries as you wish, as long as the Keyword value that is used is unique.
Each entry contains the following elements:
- Name: [string] A user-friendly name for this document type, such as ‘SI-UBL 2.0’
- Description: [string] A description of this document type
- Keyword: [string] The keyword by which the server will know which document type a certain document needs to be validated against. See the section Keywords for more information
- ValidationFile: [filename] A validation file that documents of this type should be validated against. The filename must end in either .xsd (for XML Schema files), .sch (for Schematron files), or .xsl (for SVRL stylesheets). You can specify multiple ValidationFile entries for each document type.
Keywords
Keywords are the way ion-docval-server will choose which set of validation files to use when validating any given document.
If you use the command-line client, or use the jar library directly, you may be in a situation where the caller knows exactly which document type a given document has. In that case, you are free to choose the keyword you wish to use for each document type, as long as every keyword is unique.
For ion-docval-server, or any case where the caller may not know exactly which keyword to use, there is a strict process of deriving the keyword from any given document. The keyword you use in your configuration MUST follow this process as well. For convenience, the ion-docval-cli tool provides an option to derive the keyword from a document in the same way that the server will, so you can use that value in your configuration.
The format depends on whether or not the document is a UBL document, a CII document, or any other XML document, and comprises up to 4 elements:
- namespace: The XML namespace of the root element (if any)
- root element: The tag name of the root element (e.g. Invoice)
- Customization ID: The customization ID for UBL, or the GuideLineSpecifiedDocumentContextParameter in case of CII
- version: The UBL version in case of UBL, or D16B in case of CII.
Depending on the general document type, this makes the following formats:
- UBL:
<namespace>::<root element>##<customization id>::<version>
- CII:
<namespace>::<root element>##<customization id>::D16B
- Any other xml with namespaces:
<namespace>::<root element>
- Any other xml without namespaces:
<root element>
When using the command-line client ion-docval-client, or when calling the library directly, you can specify a specific keyword of your own choosing, as long as it matches the correct keyword from the configuration.
Performance
Using .sch files directly is easier to set up, as you won’t have to convert them to .xsl yourself. However, this does slow down the (re)loading of the validation file a lot. It may take tens of seconds to load a single file, depending on the size of the schematron file. Once loaded, it is as efficient as loading an SVRL .xsl file directly, but we recommend using .xsl files.
Security
The HTTP service in ion-docval-server is not meant to be run publicly; administrators should be aware that there are no security measures in the http service itself, such as support for TLS or authorization functionality. It is expected that a reverse proxy, such as Apache or NGINX is used to provide such functionality, if anything but a local service is needed.
The XML parser has protection against XXE attacks, and will refuse to process documents with external entity references, but only for the XML that is parsed for validation; in configured XSD, SCH,m and XSLT files, external files can be referenced, in order to support include-statements. Make sure that any such input is either provided by the administrator, or when used in a wider context where external users can configure such documents, that these documents are validated against XXE attacks.