Markup


2
Dec 10

RDF and JSON: A Clash of Model and Syntax

I had been meaning to write this post for some time. After reading Jeni Tennison’s post from earlier this week I had decided that I didn’t need too, but Jeni and Thomas Roessler suggested I publish my thoughts. So here they are. I’ve got more things to say about where efforts should be expended in meeting the challenges that face us over the next period of growth of the semantic web, but I’ll keep those for future posts.

Everyone agrees that a JSON serialization of RDF is a Good Thing. And I think nearly everyone would agree that a standard JSON serialization of RDF would be even better. The problem is no-one can agree on what constitutes a good JSON serialization of RDF. As the RDF Next Working Group is about to convene to try and define a standard JSON serialization now is a very good time to think about what it is we really want them to achieve.

RDF in JSON, is RDF in XML all over again

There are very few people who like RDF/XML. Personally, while it’s not my favourite RDF syntax, I’m glad its there for when I want to convert XML formats into RDF. I’ve even built an entire RDF workflow that began with the ingestion of RDF/XML documents; we even validated them against a schema!

There are several reasons why people dislike RDF/XML.

Firstly, there is a mis-match in the data models: serialization involves turning a graph into a tree. There are many different ways to achieve that so, without applying some external constraints, the output can be highly variable. The problem is that those constraints can be highly specific, so are difficult to generalize. This results in a high degree of syntax variability of RDF/XML in the wild, and that undermines the ability to use RDF/XML with standard XML tools like XPath, XSLT, etc. They (unsurprisingly) operate only on the surface XML syntax not the “real” data model.

Secondly, people dislike RDF/XML because of the mis-match in (loosely speaking) the native data types. XML is largely about elements and attributes whereas RDF has resources, properties, literals, blank nodes, lists, sequences, etc. And of course there are those ever present URIs. This leads to additional syntax short-cuts and hijacking of features like XML Namespaces to simplify the output, whilst simultaneously causing even more variability in the possible serializations.

Thirdly, when it comes to parsing, RDF/XML just isn’t a very efficient serialization. It’s typically more verbose and can involve much more of a memory overhead when parsing than some of the other syntaxes.

Because of these issues, we end up with a syntax which, while flexible, requires some profiling to be really useful within an XML toolchain. Or you just ignore the fact that its XML at all and throw it straight into a triple store, which is what I suspect most people do. If you do that then an XML serialization of RDF is just a convenient way to generate RDF data from an XML toolchain.

Unfortunately when we look at serializing RDF as JSON we discover that we have nearly all of the same issues. JSON is a tree; so we have the same variety of potential options for serializing any given graph. The data types are also still different: key-value pairs, hashes, lists, strings, dates (of a form!), etc. versus resource, properties, literals, etc. While there is potential to use more native datatypes, the practical issues of repeatable properties, blank nodes, etc mean that a 1:1 mapping isn’t feasible. Lack of support for anything like XML Namespaces means that hiding URIs is also impossible without additional syntax conventions.

So, ultimately, both XML and JSON are poor fits for handling RDF. I think most people would agree that a specific format like Turtle is much easier to work with. It’s also better as starting point for learning RDF because most of the syntax is re-used in SPARQL. That’s why standardising Turtle, ideally extended to support Named Graphs, needs to be the first item on the RDF Next Working Group’s agenda.

What do we actually want?

What purpose are we trying to achieve with a JSON serialization of RDF? I’d argue that there are several goals:

  1. Support for scripting languages: Provide better support for processing RDF in scripting languages
  2. Creating convergence: Build some convergence around the dizzying array of existing RDF in JSON proposals, to create consistency in how data is published
  3. Gaining traction: Make RDF more acceptable for web developers, with the hope of increasing engagement with RDF and Linked Data

I don’t think that anyone considers a JSON serialization of RDF as a better replacement for RDF/XML. I think everyone is looking to Turtle to provide that.

I also don’t think that anyone sees JSON as a particularly efficient serialization of RDF, particularly for bulk loading. It might be, but I think N-Triples (a subset of Turtle) fulfills that niche already: it’s easy to stream and to process in parallel.

Lets look at each of those goals in turn.

Support for scripting languages

Unarguably its much, much easier to process JSON in scripting languages like Javascript, Ruby, PHP than RDF/XML.

Parser support for JSON is ubiquitous as its the syntax de jour. Just as XML was when the RDF specifications were being written. Typically JSON parsing is much more efficient. That’s especially true when we look at Javascript in the browser.

From that perspective RDF in JSON is an instant win as it will simplify consumption of Linked Data and the results of SPARQL CONSTRUCT and DESCRIBE queries. There are other issues with getting wide-spread support for RDF across different programming languages, e.g. proper validation of URIs, but fast parsing of the basic data structure would be a step in the right direction.

Creating Convergence

I think I’ve seen about a dozen or more different RDF in JSON proposals. There’s a list on the ESW wiki and some comparison notes on the Talis Platform wiki, but I don’t think either are complete. If I get chance I’ll update them. The sheer variety confirms my earlier points about the mis-matches between models: everyone has their own conception of what constitutes a useful JSON serialization.

Because there are less syntax options in JSON, the proposals run the full spectrum from capturing the full RDF model but making poor use of JSON syntax, through to making good use of JSON syntax but at the cost of either ignoring aspects of the RDF model or layering additional syntax conventions on top of JSON itself. As an aside, I find it interesting that so many people are happy with subsetting RDF to achieve this one goal.

The thing we should recognise is that none of the existing RDF in JSON formats are really useful without an accompanying API. I’ve used a number of different formats and no matter what serialization I’ve used I’ve ended up with helper code that simplifies some or all of the following:

  • Lookup of all properties of a single resource
  • Mapping between URIs and short names (e.g. CURIES or locally defined keys) for properties
  • Mapping between conventions for encoding particular datatypes (or language annotations) and native objects in the scripting language
  • Cross-referencing between subjects and objects; and vice-versa
  • Looking up all values of a property or a single value (often the first)

In addition, if I’m consuming the results of multiple requests then I may also end up with a custom data structure and code for merging together different descriptions. Even if its just an array of parsed JSON documents and code to perform the above lookups across that collection.

So, while we can debate the relative aesthetics of different approaches, I think its focusing attention on the wrong areas. What we should really be looking at is an API for manipulating RDF. One that will work in Javascript, Ruby or PHP. While I acknowledge the lingering horror of the DOM, I think the design space here is much simpler. Maybe I’m just an optimist!

If we take this approach then what we need is an JSON serialization of RDF that covers as much of the RDF model as possible and, ideally, is already as well supported as possible. From what I’ve seen the RDF/JSON serialization is actually closest to that ideal. It’s supported in a number of different parsing and serialising libraries already and only needs to be extended to support blank nodes and Named Graphs, which is trivial to do. While its not the prettiest serialization, given a vote, I’d look at standardising that and moving on to focus on the more important area: the API.

Gaining Traction

Which brings me to the last use case. Can we create a JSON serialization of RDF that will help Linked Data and RDF get some traction in the wider web development community?

The answer is no.

If you believe that the issues with gaining adoption are purely related to syntax then you’re not listening to the web developer community closely enough. While a friendlier syntax may undoubtedly help, an API would be even better. The majority of web developers these days are very happy indeed to work with tools like JQuery to handle client-side scripting. A standard JQuery extension for RDF would help adoption much more than spending months debating the best way to profile the RDF model into a clean JSON serialization.

But the real issue is that we’re asking web developers to learn not just new syntax but also an entirely new way to access data: we’re asking them to use SPARQL rather than simple RESTful APIs.

While I think SPARQL is an important and powerful tool in the RDF toolchain I don’t think it should be seen as the standard way of querying RDF over the web. There’s a big data access gulf between de-referencing URIs and performing SPARQL queries. We need something to fill that space, and I think the Linked Data API fills that gap very nicely. We should be promoting a range of access options.

I have similar doubts about SPARQL Update as the standard way of updating triple stores over the web, but that’s the topic of another post.

Summing Up

As the RDF Next Working Group gets underway I think it needs to carefully prioritise its activities to ensure that we get the most out of this next phase of development and effort around the Semantic Web specifications. It’s particularly crucial right now as we’re beginning to see the ideas being adopted and embraced more widely. As I’ve tried to highlight here, I think there’s a lot of value to be had in having a standard JSON serialization of RDF. But I don’t think that there’s much merit in attempting to create a clean, simple JSON serialization that will meet everyone’s needs.

Standardising Turtle and an API for manipulating RDF data has more value in my view. RDF/JSON as a well implemented specification meets the core needs of the semantic web developer; a simple scripting API meets the needs of everyone else.


29
Jan 07

XForms on the Intranet

Elliotte Harold has published a nice introduction to XForms in Firefox on IBM developerWorks. In the conclusion he notes that:
Client-side XForms processing won’t be possible for public-facing sites until XForms is more widely deployed in browsers. However, that doesn’t mean you can’t deploy it on your intranet today. If you’re already using Firefox (and if you aren’t, you should be), all that’s required is a simple plug-in. After that’s installed, you can take full advantage of XForms’ power, speed, and flexibility.
I’d agree with this whole-heartedly. I wrote and deployed a little XForms application just before Christmas and it was a very painless exercise indeed.
Over the past few years we’ve rolled out an number of RESTful XML based APIs internally. We’ve also toyed with different ways to build tools to manage systems using these APIs, including using Java Swing desktop tools, simple HTML forms, etc. Mainly we’ve been trying for a while to find a sweet spot between ease of implementation and a reasonably good user experience.
Recently I’d been toying with a Javascript library to one of our REST interfaces based around the Prototype library. It was fun if occasionally frustrating banging my head against Javascript. However it wasn’t finished and I needed to quickly roll out some forms for managing some key data. So I took another look at XForms. I’d researched it a few years ago and had rejected it because of the lack of browser support and the different ways that the plugins required you to deploy the forms.
As almost everyone internal has gravitated towards Firefox cross-browser support isn’t a strong requirement so I went ahead and built the system using XForms. It was a very satisfying experience: the syntax is easy to get to grips with, and its possible to create some fairly slick AJAX style forms with a minimum of fuss. And more fun that messing with Javascript.
So for us at least XForms does seem to hit a sweet spot for rapid tools development, particularly as we already have a lot of existing XML interfaces. In fact the exercise highlighted a few flaws in our interfaces (e.g. delivering correct mime types, under use of “hypermedia” to link between resources in some areas) so was a good learning exercise in its own right.
It would be nice to see some slicker custom controls for different data types though. I think AJAX and client-side scripting still corners the market on slick dynamic UIs, and will do for some time. But for sheer ease of use, and getting things done, XForms gets the thumbs up from me.


12
Dec 05

OpenDocument and XMP

This is the second part of my look at XMP. This time I’m focusing on the potential for using XMP as the metadata format for OpenDocument (ODF).
This is part of a broader discussion to help define the future direction for the ODF metadata format, one proposal on the table is to use RDF, via a constrained RDF/XML syntax. There’s a wiki available for discussing this issue, particularly how to map existing metadata to RDF.
At least some of the impetus for exploring richer metadata support has come from the bibliographic sub-project which aims to build-in support for bibliography management into OpenOffice 3.0.
RDF is a good fit for the flexible storage and formatting requirements that arise from bibliographic metadata. As XMP is an RDF profile its worthy of consideration, and in fact this is the proposal behind Alan Lilich’s posting to the OpenDocument TC member list. Lilich’s discussion document frames the rest of this posting.

Continue reading →


8
Dec 05

Looking at XMP

I’ve been taking a look at XMP as I’ve been considering different ways to “enrich” content. Embedding metadata is one option and XMP aims to fulfill the role of a metadata format suitable for embedding in a diverse range of media formats.
It’s also under discussion as way to embed metadata in the OpenDocument format. The alternatives available in that quarter have been under discussion in various circles for some time. Bruce D’Arcus points to the latest entry to that discussion in his recent “OpenDocument and XMP” posting.
I thought I’d write up some notes on XMP in general and contribute some thoughts towards that debate. This is the first of two postings on this topic.

Continue reading →


5
Nov 05

Florescu: Re-evaluating the Big Picture

Ken North just posted this email to XML-DEV drawing attention to a presentation by Daniela Florescu titled Declarative XML Processing with XQuery — Re-evaluating the Big Picture (Warning: PDF). It makes for interesting reading.
In the presentation, Florescu argues that XML is in a growth crisis and that there’s a need for more architectural work to tie together components of the XML landscape ranging from XQuery and XSLT through to RDF and OWL. Florescu believes that XML is about more than syntax and will in fact become the key model for information, not just bits on a wire. In short Florescu believes that XML has yet to achieve its full potential and to do that some further work needs to be done.
The presentation is worth reading in its entirety. The majority of the presentation does focus on XQuery, in particular the fact that its not really a query language: it’s a programming language and folk are already using it in this context. But there’s much more to it. Semantic web folk will find much that will have them nodding in agreement.
Florescu suggests a number of concrete areas that require work. Amongst these are:

  • Make XML a graph not a tree, by making links a first class part of the model
  • Integrate the XML data model with RDF
  • Extend programming capabilities of XQuery, e.g. to include assertions, error-handling, metadata extraction functions and continous queries. This latter area is interesting as it would allow an Xquery to run continously, acting on a stream of XML documents as they arrive
  • Integrate XQuery with OWL and RDF. E.g. to allow searching an XML document by semantic classification of nodes, rather than their names.
  • Make browsers XQuery aware, and developer a simple HTTP protocol for invoking XQuery on a remote repository. (I’ve been working with the SPARQL protocol recently and its occured to me several times that an equivalent for XQuery is an obvious area for further work)

All in all I find this to be a very thought-provoking presentation; there’s a lot of interesting ideas in there. For the Semantic Web crowd many of these will be old news: being able to query/manipulate data based on semantics is the core of RDF; linking as a first class model element is something we rely on constantly. But there’s also some new angles to consider. For example there’s a lot of work happening to tie programming languages in with XML, and XML vocabularies such as XQuery becoming more like scripting languages: what’s the equivalent in semantic web circles? Could an ontology aware version of XQuery provide a useful data manipulation environment?
I expect the XML-DEV thread to grow pretty quickly. Will be interested to see if this gets picked up and discussed by other communities also.


15
Sep 05

Goodbye XML-Deviant

I see Micah’s latest XML-Deviant is up on XML.com this week, and its also to be the last in the series. It’s a shame to see it go as I’ve enjoyed reading the column over the last few years. I also thoroughly enjoyed contributing to the column during my own period of XML-Deviancy. But all things come to an end; I’m looking forward to seeing what replaces the column in future.
Tip of the hat to the other XML-Deviants: Edd, Kendall and Micah for all of their efforts along the way; especially Edd for originally conceiving of the column.


1
Jun 05

XTech Day Three

Belatedly (I only got back from Amsterdam last Monday), here are some notes from XTech Day 3.
On the Friday morning I initially attended two talks about RDF frameworks, firstly Dave Beckett’s Bootstrapping RDF applications with Redland and then David Wood’s introduction to
Kowari: A Platform for Semantic Web Storage and Analysis. I’ve not really used either of these toolkits yet, but at work we’re looking at trying out Kowari as one of the candidate triple stores for holding our massive dataset. John Barstow’s work on the port of Redland to windows makes it more likely that I’ll be trying out Dave’s toolkit for some personal hacking projects too.

Continue reading →


3
Feb 05

Where Should XML Go?

Liam Quin has been thinking about XML 2.0 and has posted an article to Advogato titled “Where Should XML Go?“.
Quin is obviously trying to reach a wider community than just the hardcode XML users, noting in his diary that: Where would you go (or post) to ask people why they’re not using XML? There are lots of good reasons not to use XML, and lots of good reasons to use it, so I’m particularly interested in people who would like to go with XML but who feel they can’t.
Advogato seems like a good starting place to me. Of course there’s an XML-DEV thread starting on the topic already, so the usual suspects will be weighing in very shortly.
I’m not sure what my most requested improvement to the core specification(s) would be. When asked about this before I’ve often responded that I’d be happy to see the work on packaging resume. Especially as there’s work continuing in the area that could be standardised, such as the Open Office format and Rick Jelliffe’s DZIP2.
I loved Jelliffe’s From Wiki to XML via SGML article demonstrating how to use SGML SHORTREFs to parse Wiki markup as SGML interesting, and thats made me wonder whether that might be an SGML feature worth unearthing. Not likely to be a popular suggestion though! And of course one can simply use an SGML parser when one needs the extra power.
But the syntax could certainly be friendlier, and I wonder whether that might address some users dislike of XML, the format; the can still use XML tools to process their config files, Wiki markup, CSV documents, etc.


3
Aug 04

XML Hacks

I see by the fact that my complementary copy arrived today that XML Hacks has hit the stores. This makes me incredibly pleased as my two contributed hacks mean that this is the most I’ve ever had in print, and that’s like, proper writing, not this new-fangled web malarkey.
My two hacks are #64 (”Identify Yourself with FOAF”) and #93 (”Use Cocoon to Create a Well-Formed View of a Web Page, Then Scrape It For Data”). Both are RDF flavoured. The first is basically an edited version of my XML.com article, “Introduction to FOAF“, while the second is an original piece that provides a lightning introduction to Cocoon then shows how to create a simple web service that will scrape RDF metadata from a web page using a combination of HTML Tidy, XSLT and some rummaging around in the head element.
Kudos to Michael Fitzgerald for pulling together a book that contains such a wide range of useful hacks, and having the patience to do it whilst working with a number of very, very busy people!


13
Apr 04

XML Processing Model

The W3C have posted a Note discussing requirements for an XML Processing Model. This is good news, there’s been a lot of desire to see this work progressing for some time now. Wonder whether XML Pipeline will serve as a possible basis for a specification? It seems to be a good match to the requirements.
It’ll be interesting to see how this ties in with the DSDL work which is essentially defining a validation oriented processing framework. Frustratingly there’s been little updates to the DSDL web site to see how far they’ve progressed. This xmlhack report seems to have the most recent summary of activities. Please update your site DSDL people!