RSS 1.0 Validator

A Schematron Schema for RSS 1.0

Introduction

DTD based validation does not provide the kind of flexibility required in many applications. A DTD is limited in the structures that it can specify, and cannot validate element content, e.g. to determine they have the correct length or format. A further limitation is that a validating parser, when encountering a validation error, typically just emits an, often cryptic, error message.

Schematron is a schema language that allows a document to be validated by testing it against a set of patterns (XPath expressions). Schematron validation rules allow the author to specify a helpful error message which will be provided to the user if an error is encountered.

The RSS Validator is a Schematron schema for RSS, the XML vocabulary used to syndicate web content in applications such as My.Yahoo, My.Netscape, My.Userland and Meerkat.

The details of the RSS vocabulary can be found at the RSS Info weblog, and the RSS 1.0 specification.

User Guide

A Schematron schema is used to generate an XSLT stylesheet which is then used to apply the validation rules expressed in the schema to a source document.

If you are interested in further developing the RSS Validator schema then follow the Developer User Guide. If you are interested in simply validating some RSS documents using the schema, then follow the Author User Guide.

Developer User Guide

You will need the following components: We'llassume for the following examples that you can invoke your XSLT processor using a batch file or shell script as follows:

xt input-document stylesheet output-document

If you have downloaded the schematron-basic implementation (and you must have downloaded the skeleton to accompany it), then you can generate the validating stylesheet as follows:

xt rss_validator.xml sch-basic.xsl validator.xsl

You can then run the validator against an RSS document as follows:

xt rss_doc.xsl validator.xsl [report.txt]

If you spot any errors in the validator, or add any new rules then please contact me. I'd welcome any feedback or contributions.

Author User Guide

You will need the following components: We'll assume for the following examples that you can invoke your XSLT processor using a batch file or shell script as follows:

xt input-document stylesheet output-document

You can then download the validator stylesheet and run it against an RSS document as follows:

xt rss_doc.xsl validator_text.xsl report.txt

The stylesheet produces a plain-text report. However if you want to have an HTML version of the report, take a look at schematron-report.
 

Experimental Online Validation Service

An online validation service for RSS 1.0 is available, using this Schematron schema. The service makes use of an online XSLT transformation service hosted by the W3C. This service is still an early experiment, and thus the online RSS 1.0 validator should also be considered experimental as well.

A more permanent home for the online validator should be announced shortly.

Development Notes

Schema Structure

To make the RSS Schematron Schema manageable, the validation rules have been separated into individual modules. Each module contains a number of self-contained rules which apply to a specific aspect of the RSS validaton.

Each module is named with a *.sch suffix, and are imported into the main Schematron schema using XML entity references. RSS itself consists of a number of modules, validation code specific to these modules are defined in files using the following naming convention:

module_id.sch

E.g. Dublin Core validation rules are specified in module_dc.sch

The schema currently includes a rules to check the length of text contained within the core RSS elements. These field length restrictions are the same as are applied by the My.Netscape application, and are mainly required for backwards compatibility reasons. As these restrictions are likely to be restricted, these validation rules are defined in a separate schematron file, field_lengths.sch.

Namespace Checking

Each RSS module is identified by a separate namespace. For example the Dublin Core module namespace is:

http://purl.org/dc/elements/1.1/

The schema will report on elements which are from unknown namespaces to highlight to the user that these elements have not undergone validation. Namespace checking is strict and is carried out by the XSLT engine, rather than by doing string comparisions within the validation rules. In other words, it is the responsibility of the XSLT engine to determine whether two namespace URIs compare as equal.

Schematron Implementation

The zip file distribution includes a modified version of Schematron Report (by David Carlisle) which uses a modified version of xmlverbatim (by Oliver Becker) to include a syntax highlighted version of the original XML in the final HTML report.

This modified Schematron implementation requires the use of SAXON, as XT does not properly implement the namespace axis.

Download

Version History

1.1, 19th November 2000
Added checks to ensure rdf:about uniqueness on channel, item, image and textinput elements. Added content checks to note that rdf:about for an image should match its url element, and likewise for textinput and its link element.
Added structural validation checks for the DC module.
Tidied up documentation to include some additional notes on the schema structure, and added link to online validator.
Announced to RSS-DEV mailing list for feedback/testing.
1.0b, 3rd November 2000
Unreleased version. Amended schema to bring it in-line with the latest revision of RSS 1.0, mainly incorporating the additional RDF elements for the item 'table of contents'. Added placeholder for the Dublin Core validation module. Tidied up the namespace handling.
1.0, 3rd September 2000
Produced initial version of this web page, and posted first version of the validator..

TODO List

References

[Top]

Page Maintained by Leigh Dodds. Last Updated 19 November 2000