February 26, 2003

Contamination Zone

I'm currently down with this flu that seems to be doing the rounds at the moment, which is a shame as there have been some interesting responses to my "When to use RDF?" question. Look at the comments and trackbacks for input from Dan Brickley, Shelley and Dorothea.

I'm just hoping it isn't going to get worse.

btw, this is my choice of recuperative reading.

Posted by ldodds at 09:14 AM | Feedback? | TrackBack

February 21, 2003

When to use RDF?

I came across RDF vs XML Illustrated via both Dave Beckett's Journal and the RDF IG IRC Scratchpad today. And its brought forward a question I've been meaning to ask for a couple of weeks now.

Take a look at the bottom right of the diagram (e.g. the JPEG version it says:


Some projects are better suited for XML data; others scream out for RDF. RDF will not replace XML, each has its advantages in certain scenarios.

My question is simple: what makes a project scream out for RDF? What property of an application or its data make it better suited to an RDF rather than an XML vocabulary?

I honestly don't have any feeling for the right answers.

I'm working with RDF tools now, but thats because FOAF is an RDF vocabulary. I'm just using the right tools for the job. If I was given a task to design a new system I don't have any feel for why I might choose RDF over XML. I haven't had that "aha" moment yet.

We might loosely classify markup vocabularies into three types:


  • Pure XML vocabularies

  • RDF Friendly vocabularies, e.g. RSS 1.0

  • Pure RDF vocabularies

And we could then generalise the question to: which type of vocabulary is best suited to which applications? Are RDF Friendly vocabularies just a transition step?
And if RDF will never supplant XML, then surely we're going to have to invest a lot of time in RDF Data Mining?

I've been wondering whether the answer might be in Shelley's book somewhere, but haven't had time to get beyond the opening chapter.

Maybe I'm just being dumb, I dunno. But I'd love to know what others think.

Posted by ldodds at 04:49 PM | Feedback? | TrackBack

Link Droppers

I'm interested in building a list of "Link Dropper" sites and would welcome
suggestions if you have any.

What do I mean by a Link Dropper? Basically a site that will drop some data into your existing webpage using a scripting language, e.g. Javascript or PHP.

My canonical example is the Meerkat Javascript Source flavour which drops RSS news feeds into your webapage. This was the first example of this kind of integration that I'd seen at the time. If you know of an earlier example then let me know.

Other examples of Link Droppers include blogrolling.com (list of blogs) and All Consuming (lists of books).

I'm interested partly because I'd like to see what kind of "annotations" I can add to this blog, but also because I'm interested in seeing what people are doing with this kind of loose integration.

Posted by ldodds at 04:11 PM | Feedback? | TrackBack

User-Centred Linking

There's usually more than one way to get something on the net. There are dozens of online bookstores, search engines, new sites, document repositories, etc, etc. And we all have different preferences. Even for sites like Google and Amazon there is room for choice, e.g. different Google mirrors or regional Amazon sites.

Yet when we construct links we are always linking to a single one of those resources. In some cases that is because of an explicit recommendation. We know that one site is cheaper, has better information/context, etc. In others its simply because that's the first place we looked at in order to be able to link a reader to the resource we're talking about.

But why not give the user more choice, and let them decide the destination?

After all hypertext originally offered the promise of multiple destinations for a single given link. Bob DuCharme has recently demonstrated how this can be achieved using XSLT and Javascript. However this seems most applicable when the author knows of a variety of useful end-points for a link and can directly author them into the document. It's unlikely that anyone to take the trouble to deal will all possible end-points for a book link however. It's also very time-sensitive: even if I do include links to as many book sites as I can, if an "Amazon-killer" site appeared (Google Books?) then my readers won't be able to get to it without some extra work on their part.

Of course in a wider sense the "right" way to do this is not to use URLs at all. Instead we introduce URNs (Uniform Resource Names) and simply name the resource. Its up to the browser to find a URN resolver that will let them locate the resource. This is viable but I don't see a great deal of activity here, although the DOI is catching on in publishing circles.

So I've been musing about other ways to achieve the same ends. John Udell has demonstrated one alternative in the Library Lookup project: provide bookmarklets that scrape the URL or the page for an identifier then build a link to an alternate preferred site. In that instance directing users to an OPAC rather than Amazon.

Another option is to not link to the actual resource at all, but instead direct users to a resolver service that forwards them on according to a preference stored in a cookie. I've written about this previously in my "Make A Book Link" suggestion. You can think of it as a pseudo-URN in a way. I recently came across Link Baton which seems to achieve this in a fairly general way.

Unfortunately its not a very de-centralised solution: the resolver is a point of failure. One way to get around this would be for browsers to support additional preferences that allow the user to specify a given resolver for a particular type of link. This would allow users to route around outages by changing their preferences either permanently or temporarily. But again this doesn't seem like a feasible solution as its unlikely that browser vendors would support it.

I've started to think that a viable solution might be a hybrid approach:

Firstly exploit the fact that different sites, while having their own link structures generally include unique identifiers for content (e.g. ISSN, ISBN, etc) that allows us to parameterise the link creation. This is essentially what Udell is relying on.

Secondly exploit the fact that we allow a user to store preferences in their browser, in a reasonably extensible way using cookies. This is what Link Baton is relying on.

Thirdly we can dynamically generate links at the point when a page is delivered, using a templating engine or even just Javascript.

So all we need to do is provide in our blogs/sites a way to "Manage My Endpoints" which simply lists a menu of link types and some entry fields for the target URL. These URLs would include wild cards for where the identifier needs to be inserted. This data can be stored in a cookie. Then I can include some standard Javascript in the site templates, and a suitable onClick handler to my links, that will use the resource identifier plus the user preference to build the real link. This seems nicely decentralised, and very RESTful.

Of course it places some work on the shoulders of the site owner, but much of the work seems like boiler plate code/pages which can be easily introduced into any blogging tool. And it does mean that you have to reset preferences for ever site you read, but I'm sure there are ways around this, and besides in some cases you may not need to alter the preference: the author of that site may share your choice in end-point.

On a related note I've also been wondering about the usefulness of standardising query strings. The library community are already doing this with OpenURL which basically defines how to pack bibliographic metadata into a query string. This means you can construct a URL to an OpenURL compliant service just by knowing the base URL of that service.

I wonder whether this could be generalised to other types of linking? E.g. having search engines provide standard syntaxes for adding keywords, page sizes, sort orders, etc.

Posted by ldodds at 03:14 PM | Feedback? | TrackBack

Schematron and Architectural Forms

Rick Jelliffe has been working on Schematron once more, showing how to add support for variables and an implementation of the "abstract patterns" concept which was apparently the central idea behind Schematron's original design.

The idea is that there are basic patterns, e.g. head/body, table/row/cells that are common to many different markup vocabularies. The individual elements and attributes that form the pattern may vary between vocabularies but the basic structural relationships are the same. You can read about a number of these over on the XML Patterns site. The overall premise is very similar to that of Design Patterns in OO languages.

Ideally one would like to be able to write validation rules for the general pattern (e.g. a table must have rows) and then be able to apply them to any particular instance of that pattern (e.g. a CALS table, or an HTML table). This is what Jelliffe's new preprocessor does: it converts the abstract rules into concrete assertions tailored for a particular vocabulary.

I find this interesting, not only because of its overlap with Design Patterns in general (which I'm a big fan of) but also because of its relationship with Architectural Forms.

Architectural Forms allows the common structures of entirely different or variant schemas to be "transformed" into an architectural document that describes those common structures. E.g. a general table rather than a specific CALS table for example. The architectural document can be validated, because we define a schema for the architecture, and it can also be processed by an architectural processor designed to do something useful with that document, e.g. to layout the table for viewing.

Using Schematron we now have the validation component, but not the transformation.

I'm interested in how this may be combined with John Cowan's Architectural Forms: A New Generation, particularly as Jenni Tennison has an XSLT implementation. Looks like it ought to be possible to combine the two.

I wrote a summary of Architectural Forms for XML-DEV a while back which I've been threatening to write up for a while. Robin Cover also captured a number of follow-up postings. This paper "Architectures in an XML World" is also worth a read and includes pointers to another XSLT implementation of Architectural Forms.

Posted by ldodds at 02:28 PM | Feedback? | TrackBack

Semantic Blogging: A Day Out

Yesterday was one of those rare occasions when I get to unshackle myself from my desk here at Ingenta and get out into the Real World and meet Real People.

Yesterday's trip was to HP Labs in Bristol where I got the opportunity to meet with Steve Cayzer and Paul Shabajee about the Semantic Blogging and Bibliographies project they're involved with. It was a very stimulating meeting with lots of ideas being thrown around. It was one of those nice occasions where my personal research interests coincide perfectly with those of Ingenta's. Their paper has a lot of great references which I'll be digging into over the next few weeks. Agora in particular caught my eye

I also got chance to meet with Andy Seaborne who was hard at work reviewing Shelley's book and testing Jena 2.

The idea of Semantic Blogging, and particularly infrastructure to support it has come up elsewhere recently: Danny Ayers has decided to have a crack at writing JemBlog (Jena Semantic Weblog Server). I've promised Danny that I'll submit by FOAF Java classes to the project once I've finished factoring them out from the core of FOAF-a-Matic Mark 2, which is undergoing a major package reorganisation at the moment. (Yes, I'm still working on it!)

Looks like I'll also have to do some work on the original FOAF-a-Matic soon as we have an open issue about the handling of first and last names in FOAF. I've committed to fixing the tool and providing migration scripts for early adopters once a resolution has been decided.

Posted by ldodds at 01:57 PM | Feedback? | TrackBack

Real World Annotations

Courtesy of Danny Ayers I came across this blog entry from Russell Beattie: Real World Annotations: Manywhere Places.

I've always liked this idea, and have repeatedly suggested it to friends and colleagues as a "cool thing" to watch out for. (Actually I think it first came up during one of those "Lets Start a dotcom" discussions that seemed to dominate pub debates a couple of years back. Funny how we don't have those anymore! :)

My take on it was this: I'm in a new town and want to find someone decent to eat. Finding somewhere isn't that hard. Finding somewhere worth visiting is another matter. How many times have you lurked outside a restaurant (usually with a group) collectively assessing the menu, decor, etc and trying to decide whether to actually go in?

So my thought was, what if I could post a message through a GPS aware mobile device which reviews that particular restaurant or pub (or any public location really)? Then other people can stand on the doorstep and read previous reviews. Or more likely a synthesized rating, after all we're hungry right? Don't want to stand around in the cold for too long.

There are all sorts of other potential uses. How about if you see an accident or a crime, and want to be a good citizen and be a witness? Being able to quickly blog your current position would not only confirm that you were where you said you were, but also help with reconstructions of the scene.

Russell also mentions the idea of virtual tours using SMS messaging. There's a company here in Bath already doing this called Textploitation, they call them "Texting Trails". Some former colleagues of mine (Jim and Alison) work there.

The reason I like the general premise here is that it allows us to being overlaying the virtual environment on the physical one.

Posted by ldodds at 01:29 PM | Feedback? | TrackBack