Use Case for Published Subjects and Topic Maps

Issues in Classical Index Management

  • document identification and addressabiliity
  • internal metadata vs external indexing
  • universal subject definition vs local vocabulary

Content is set of document units. Files, pages, sections, images, XML elements...

Each document unit should be reusable outside it's creation context. May make less sense than in it's original context, but emphasis on portability.

Classical tools to achieve this are:

  • unique identification, network addressability
  • external indexing with controlled vocab, domain-experts, indexing tools
  • internal annotation by standard metadata (e.g. RDF) and metadata-aware technology

These three can be separately managed and implemented, therefore difficult to keep consistent.

Ideal view: documents have been annotated with metadata; controlled vocab (index subjects)

Assumes stability of document contents/locations; universally understood vocabulary for the subjects (v. imp); integration of metadata and external indexation.

Real view: documents are moving targets (e.g 404); index subjects only make sense 'locally'. Limits discoverability.

Introducing Publishing Subjects

Subjects vs. Vocabulary: subjects defined as concepts, independently to their names in any language. Subjects are be defined in stable documents (dictionaries); Subject-indexing is more robust than vocabulary-indexing.

Published Subject is:

  • defined by an authoritative source
  • described in a stable resource: subject indicator (human readable)
  • published at a stable URI: subject identifier

PSI double expansion: indicator and identifier

Topic Map view of Index Management

  • document unites and index subjects as topics
  • topic map layer between docs and subjs
  • using Pub. Subj for indexing and metadata

Topic Map layer links between documents and PSI. Topics representing Index Subjects & Document Units. Associations link these together, representing metadata and document index links.

Each document unit has a formal representation as a topic

Relationships from document to index subjecs are declared as topic map associations

Actual document address is attached once to the document topic, independently of other relationships.

Benefits

  • document can move without breaking indexing
  • new address updated in single place
  • index management independent from document management

Each Pub Subj. has a representation as a topic.

Index topics are attached with local names customized to users language and culture.

Index subjects have a stable URI, independent of indexing relationships and naming.

Benefits

  • indexing made on stable universal subject, but achieved with localised vocabs
  • definition of subjects retrievable (this isn't always true for controlled vocabularies)
  • mergeability and reusability beyond local vocabulary

Metadata and Published Subjects

Use PSI to identify subjects in metadata at creation time, or index on PSI with automatic indexingtools using controlled vocab, interfaced with topic map soft.

Use metadata to create associations between document topics and index topics.

Metadata becomes reusable.

PSI are new, so limited use as yet, also limited number available.

OASIS Tech. Commmittees

  • PubSubj TC (tm-pubsubj) -- best practices for publication of Pub Subject; general recommendations
  • GeoLang TC (geolang) -- published subjects for countries and languages
  • XMLvoc TC (xmlvoc) -- published subjects for XML standards

Questions

Different granularities of subjects. Broader/narrower relationships

Have to have agreements on subjects to use classifications -- what is subject definition, and what is localized? Meaning/content.

Add new attachment

In order to upload a new attachment to this page, please use the following box to find the file, then click on “Upload”.
« This page (revision-1) was last changed on 21-Aug-2002 18:21 by unknown [RSS]
G’day (anonymous guest) My Prefs


Referenced by
XMLEurope2002

JSPWiki v2.6.0 [RSS]