Slug: Configuration

This page contains some notes on how to configure the Slug crawler.

The Configuration File

Slug requires a configuration file in order to configure a number of settings that describe how the crawler will operate. Collectively these settings are known as a profile.

These settings include details such as:

The Slug distribution includes a sample config file config.rdf that demonstrates how to configure all of the current components.

The configuration file is expressed as RDF/XML. A given configuration file may contain entries for more than one profile. Therefore when running the scutter one must provide the identifier of a Scutter described in the configuration. This is specified with the -id parameter, see Running the Scutter.

The Configuration Schema

The complete schema for the Scutter configuration is available in etc/schema/config.rdfs in the distribution. It is also available online

The namespace URI is http://purl.org/NET/schemas/slug/config/.

The preferred namespace prefix is slug.

The following sections describe some of the key classes and relationships.

Scutter

The slug:Scutter class describes an individual crawler. A given configuration file may describe more than one crawler.

Configuration Example

For now see config.rdf for example configurations.


Image courtesy of Elroy Serrao.