July 06, 2005

Work At Ingenta

We're trying to recruit a software developer, if you're in the Oxford area and know some Java, Perl, XML and XSLT, then get in touch. The role involves working on a full-text processing system, involving reference extraction and supporting tools.

Posted by ldodds at 05:00 PM | Feedback? | TrackBack

Abulafia Demo

The only real hypertext system I've worked with is the web. I've obviously used hypertext help and documentation browsers, but I've never really done any development within a proper hypertext environment.

I'm therefore always keen to see how richer hypertext linking capabilities can be built using web technologies. Bob DuCharme's linking blog is always a good source of relevant discussion in that area. His one-To-Many linking demo is cool too.

I'm therefore glad to see that Geoff (after some nagging from yours truly!) has put up some screencasts of his old hypertext application, Abulafia.

If, like me, you're a web weenie, then watch the screencasts and see whether it gets your creative juices flowing. The multi-headed, and conditional linking is particularly cool.

If you're an old-hand hypertext user, then take a nostalgic trip down memory lane.

Posted by ldodds at 03:59 PM | Feedback? | TrackBack

RDF and Library Metadata Interoperability

I've been having an interesting email discussion with Bruce D'Arcus and Richard Newman as a result of Bruce pointing us both at this posting on Metadata Interoperability by Kevin Clarke. I thought I'd write up some points to respond to that article here, in particular on the use (or lack of use) of RDF in the library community.

Presently there are a dizzying array of different metadata formats in the library sector, covering cataloguing, authority records, bibliographic metadata, etc. Examples include MARC, MARCXML, XOBIS, and MODS. There are also a number of different schemas in use in publishing sectors for describing research papers, conference proceedings and the like.

Very often there's a lot of detailed data modelling that has formed the groundwork of these schemas. FRBR is an excellent example of this.

Increasingly these formats are being designed as XML schemas. Unfortunately, with an emphasis on XML Schema rather than RELAX NG. "Crosswalks" -- a term I see in the library area, but little used elsewhere -- are used to transform documents between various schemas, enabling some degree of interoperability. The success of these depend on the comparitive richness of the different formats; the usual loss of fidelity in transformations.

Clarke's posting discusses the concept of a "switchboard schema", an all embracing schema into which all the others can be converted with minimum loss of fidelity. Possibly backed by a "switchboard" that can negotiate the best path for a given end-to-end transformation.

I can't help thinking, as did D'Arcus, "why not use RDF?". Clarke's response is interesting:

...I don’t think [RDF] can/will really accomplish anything that agreement on
any XML schema couldn't/wouldn't.

This surprised me greatly. One thing that RDF doesn't mandate is a single all-embracing format, it positively embraces plurality of schemas, and independent adoption and repurposing of schemas. This is a design goal. It does propose a single underlying model for modelling data, in the same way that an RDBMS has the relational model behind it. So one thing that RDF can achieve that XML can't, is the jettisoning of the one-size-fits-all approach.

Clarke describes using RDF as a "leap of faith", and again raises issues about its complexity. This is a meme which I really wish we could put to bed. I've had no trouble introducing engineers to RDF: a quick overview of "what is a triple, what is a graph" and a pointer to the Jena tutorial and they're off. Questions that have arisen have mostly been modelling problems; issues that are largely independent of the technology.

RDF has a definite image problem. My own take on the technology is much more pragmatic. I'm not sure whether the Semantic Web, in its grandest vision, will succeed. But whether it will or not is really beside the point. The important question is: what does RDF provide that XML or a relational schema doesn't?

As Bill says in his opening paragraphs in this post:


I'm telling myself I'm using RDF+XML because I want to be able to pull data in from anywhere. That's true, but to be brutally honest I can't be bothered designing and maintaining yet another relational schema for yet another webapp - doing so is starting as much sense as designing my own filesystem or TP monitor. Life's too short, too short to be working on technology that can only possibly make sense when you're in dressed in combats and vans listening to Pearljam... there's a real wish to conduct oneself at a higher level of abstraction before complete dementia sets in. What's the point in designing tables for a webapp when an RDF-backed store will manage the data for you and RDF queries will come back as tabular data anyway?

RDF lets me avoid the nitty-gritty of physical data model design. I can concentrate on the logical model, and be confident that I can throw any RDF/XML instance document into a triple store without further configuration. I can also be confident that I can throw anyone else's data into that store also, and further, that if we reference common resources (via URIs) that data will be immediately merged and available to me.

I can express can express my logical model as an RDF or OWL schema, and if I do I can get reap many more benefits. Just as XML can be used without a schema language, so can RDF, although the benefits in each case are different. As mentioned above, for RDF alone I get the data flexibilities I've already outlined, for RDF+RDFS or RDF+OWL I get inferencing and validation also. And with inferencing comes the ability to use a declarative approach for data mapping.

With work like FRBR completed, the library community is in the right space to embrace RDF without having to mandate any particular database technology. Which seems like another positive benefit. Sure, with an XML database one has similar benefits, but its much harder to map any given model to XML, because XML is limited in how well can express relationships. Containment of elements, some cross-referencing, and thats it. Anything more complex requires implicit semantics.

RDF is essentially a relational model, although not in the classic RDBMS sense. This means its much easier, IMO, to clearly express a model in RDF.

Much of the functionality that the library community is seeking: the ability to move data between formats and identify authorities, is already present in RDF. It's there in the ability to create local schemas and/or inferencing rules that massage data into the model required for a particular application; RDF allows late binding of your application schema to your data. The functionality is also present in the means to derive variations on existing vocabularies, and annotate existing metadata with new properties. Authorities like the Library of Congress can publish their own schemas.

But the message isn't getting across. I think the failing is that there's too much emphasis on the big vision of the semantic web, and the more immediate, more pragmatic, benefits of RDF (with a sprinkling of OWL) are being lost. There's some tasty morsels at the bottom of that semantic web layer cake. The only way to demonstrate that is to come up with more convincing demonstrations, e.g. a recast of MODS as RDF, backed by some useful code.

Posted by ldodds at 02:03 PM | Feedback? | TrackBack

July 04, 2005

The fruit of our labours

The fruit of our labours
The fruit of our labours,
originally uploaded by ldodds.

Testing out the Flickr photo blogging feature. Thought I'd show off the massive crop of raspberries and tayberries we picked this weekend, after I'd finished landscaping the new patio. Good to get out and do something non-geeky for a change.

The kids enjoyed the fruit picking, and I'm looking forward until the blackberries ripen. Shouldn't be too long now. Raspberry based recipes greatly received!

Posted by ldodds at 09:39 PM | Feedback? | TrackBack