Use Case for Published Subjects and Topic Maps
Issues in Classical Index Management
- document identification and addressabiliity
- internal metadata vs external indexing
- universal subject definition vs local vocabulary
Content is set of document units. Files, pages, sections, images, XML elements...
Each document unit should be reusable outside it's creation context. May make less sense than in it's original context, but emphasis on portability.
Classical tools to achieve this are:
- unique identification, network addressability
- external indexing with controlled vocab, domain-experts, indexing tools
- internal annotation by standard metadata (e.g. RDF) and metadata-aware technology
These three can be separately managed and implemented, therefore difficult to keep consistent.
Ideal view: documents have been annotated with metadata; controlled vocab (index subjects)
Assumes stability of document contents/locations; universally understood vocabulary for the subjects (v. imp); integration of metadata and external indexation.
Real view: documents are moving targets (e.g 404); index subjects only make sense 'locally'. Limits discoverability.
Introducing Publishing Subjects
Subjects vs. Vocabulary: subjects defined as concepts, independently to their names in any language. Subjects are be defined in stable documents (dictionaries); Subject-indexing is more robust than vocabulary-indexing.
Published Subject is:
- defined by an authoritative source
- described in a stable resource: subject indicator (human readable)
- published at a stable URI: subject identifier
PSI double expansion: indicator and identifier
Topic Map view of Index Management
- document unites and index subjects as topics
- topic map layer between docs and subjs
- using Pub. Subj for indexing and metadata
Topic Map layer links between documents and PSI. Topics representing Index Subjects & Document Units. Associations link these together, representing metadata and document index links.
Each document unit has a formal representation as a topic
Relationships from document to index subjecs are declared as topic map associations
Actual document address is attached once to the document topic, independently of other relationships.
Benefits
- document can move without breaking indexing
- new address updated in single place
- index management independent from document management
Each Pub Subj. has a representation as a topic.
Index topics are attached with local names customized to users language and culture.
Index subjects have a stable URI, independent of indexing relationships and naming.
Benefits
- indexing made on stable universal subject, but achieved with localised vocabs
- definition of subjects retrievable (this isn't always true for controlled vocabularies)
- mergeability and reusability beyond local vocabulary
Metadata and Published Subjects
Use PSI to identify subjects in metadata at creation time, or index on PSI with automatic indexingtools using controlled vocab, interfaced with topic map soft.
Use metadata to create associations between document topics and index topics.
Metadata becomes reusable.
PSI are new, so limited use as yet, also limited number available.
OASIS Tech. Commmittees
- PubSubj TC (tm-pubsubj) -- best practices for publication of Pub Subject; general recommendations
- GeoLang TC (geolang) -- published subjects for countries and languages
- XMLvoc TC (xmlvoc) -- published subjects for XML standards
Questions
Different granularities of subjects. Broader/narrower relationships
Have to have agreements on subjects to use classifications -- what is subject definition, and what is localized? Meaning/content.
