<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lost Boy &#187; Semantic Web</title>
	<atom:link href="http://www.ldodds.com/blog/category/semantic-web/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ldodds.com/blog</link>
	<description>A journal of no fixed aims or direction, by Leigh Dodds</description>
	<lastBuildDate>Sat, 22 Jan 2011 20:23:23 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>RDF Data Access Options, or Isn&#8217;t HTTP already the API?</title>
		<link>http://www.ldodds.com/blog/2010/12/rdf-data-access-options-or-isnt-http-already-the-api/</link>
		<comments>http://www.ldodds.com/blog/2010/12/rdf-data-access-options-or-isnt-http-already-the-api/#comments</comments>
		<pubDate>Fri, 03 Dec 2010 21:13:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[linkeddata]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=507</guid>
		<description><![CDATA[This is a follow-up to my blog post from yesterday about RDF and JSON. Ed Summers tweeted to say:

&#8230;your blog post suggests that an API for linked data is needed; isn&#8217;t http already the API?

I couldn&#8217;t answer that in 140 characters, so am writing this post to elaborate a little on the last section of [...]]]></description>
			<content:encoded><![CDATA[<p>This is a follow-up to my blog post from yesterday about <a href="http://www.ldodds.com/blog/2010/12/rdf-and-json-a-clash-of-model-and-syntax/">RDF and JSON</a>. <a href="http://twitter.com/#!/edsu/status/10763933761667072">Ed Summers</a> tweeted to say:</p>
<blockquote><p>
<cite>&#8230;your blog post suggests that an API for linked data is needed; isn&#8217;t http already the API?</cite>
</p></blockquote>
<p>I couldn&#8217;t answer that in 140 characters, so am writing this post to elaborate a little on the last section of my post in which I suggested that &#8220;there&#8217;s a big data access gulf between de-referencing URIs and performing SPARQL queries&#8221;. What exactly do I mean there? And why do I think that the <a href="http://code.google.com/p/linked-data-api/wiki/Specification">Linked Data API</a> helps?</p>
<h2>Is Your Website Your API?</h2>
<p>Most Linked Data presentations that discuss the publishing of data to the web typically run through the Linked Data principles. At point three we reach the recommend that: </p>
<blockquote><p>
<cite><br />
&#8220;When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)<br />
</cite>
</p></blockquote>
<p>This has encourages us to create sites that consist of a mesh of interconnected resources described using RDF. We can &#8220;follow our nose&#8221; through those relationships to find more information. </p>
<p>This gives us two fundamental two data access options:</p>
<ul>
<li>Resource Lookups: by dereferencing APIs we can obtain a (typically) complete description of a resource</li>
<li>Graph Traversal: following relationships and recursively de-referencing URIs to retrieve descriptions of related entities; this is (typically, not not necessarily) reconstituted into a graph on the client</li>
</ul>
<p>However, if we take the &#8220;Your Website Is Your API&#8221; idea seriously, then we should be able to reflect all of the different points of interaction of that website as RDF, not just resource lookups (viewing a page) and graph traversal (clicking around). </p>
<p>As Tom Coates noted back in 2006 in &#8220;<a href="http://www.plasticbag.org/archives/2006/02/my_future_of_web_apps_slides/">Native to a Web of Data</a>&#8220;, good data-driven websites will have &#8220;list views and batch manipulation interfaces&#8221;. So we should be able to provide RDF views of those areas of functionality too. This gives us another kind of access option:</p>
<ul>
<li>Listing: ability to retrieve lists/collections of things; navigation through those lists, e.g. by paging; and list manipulation, e.g. by filtering or sorting.</li>
</ul>
<p>It&#8217;s possible to handle much of that by building some additional structure into your dataset, e.g. creating RDF Lists (or similar) of useful collections of resources. But if you bake this into your data then those views will potentially need to be re-evaluated every time the data changes. And even then there is still no way for a user to manipulate the views, e.g. to page or sort them.</p>
<p>So to achieve the most flexibility you need a more dynamic way of extracting and ordering portions of the underlying data. This is the role that SPARQL often fulfills, it provides some really useful ways to manipulate RDF graphs, and you can achieve far more with it than just extracting and manipulating lists of things.</p>
<p>SPARQL also supports another kind of access option that would otherwise require traversing some or all of the remote graph. </p>
<p>One example would be: &#8220;does this graph contain any <code>foaf:name</code> predicates?&#8221; or &#8220;does anything in this graph relate to <code>http://www.example.org/bob</code>?&#8221;. These kinds of existence checks, as well as more complex graph pattern matching, also tend to be the domain of SPARQL queries. It&#8217;s more expressive and potentially more efficient to just use a query language for that kind of question. So this gives us a fourth option:</p>
<ul>
<li>Existence Checks: ability to determine whether a particular structure is present in a graph</li>
</ul>
<p>Interestingly though they are not often the kinds of questions that you can &#8220;ask&#8221; of a website. There&#8217;s no real correlation with typical web browsing features although searching comes close for simple existence check queries.</p>
<h2>Where the Linked Data API fits in</h2>
<p>So there are at least four kinds of data access option. I doubt whether its exhaustive, but its a useful starting point for discussion. </p>
<p>SPARQL can handle all of these options and more. The graph pattern matching features, and provision of four query types lets us perform any of these kinds of interaction. For example A common way of implementing Resource Lookups over a triple store is to use a DESCRIBE or a CONSTRUCT query.</p>
<p>However the problem, as I see it, is that when we resort to writing SPARQL graph patterns in order to request, say, a list of people, then we&#8217;ve kind of stepped around HTTP. We&#8217;re no longer specifying and refining our query by interacting with web resources via parameterised URLs, we&#8217;re tunnelling the request for what we want in a SPARQL query sent to an endpoint.</p>
<p>From a hypermedia perspective it would be much better if there were a way to be able to handle the &#8220;Listing&#8221; access option using something that was better integrated with HTTP. It also happens that this might actually be easier for the majority of web developers to get to grips with, because they no longer have to learn SPARQL. </p>
<p>This is what I meant by a &#8220;RESTful API&#8221; in yesterday&#8217;s blog post. In my mind, &#8220;Listing things&#8221; sits in between Resource Lookups and Existence Checks or complex pattern matching in terms of access options. </p>
<p>It&#8217;s precisely this role that the Linked Data API is intended to fulfil. It defines a way to dynamically generate lists of resources from an underlying RDF graph, along with ways to manipulate those collections of resources, e.g. by sorting and filtering. It&#8217;s possible to use it to define a number of useful list views for an RDF dataset that nicely complements the relationships present in the data. It&#8217;s actually defined in terms of executing SPARQL queries over that graph, but this isn&#8217;t obvious to the end user. </p>
<p>These features are supplemented with the definition of simple XML and JSON formats, to supplement the RDF serializations that it supports. This is really intended to encourage adoption by making it easier to process the data using non RDF tools.</p>
<h2>So, Isn&#8217;t HTTP the API?</h2>
<p>Which brings me to the answer to Ed&#8217;s question: isn&#8217;t HTTP the API we need? The answer is yes, but we need more than just HTTP, we also need well defined media-types. </p>
<p>Mike Amundsen has created a nice categorisation of media types and a description of different types of factors they contain: <a href="http://amundsen.com/hypermedia/hfactor/">H Factor</a>.</p>
<p>Section 5.2.1.2 of Fielding&#8217;s dissertation explains that:</p>
<blockquote><p>
<cite><br />
Control data defines the purpose of a message between components, such as the action being requested or the meaning of a response. It is also used to parameterize requests and override the default behavior of some connecting elements.<br />
</cite>
</p></blockquote>
<p>As it stands today neither RDF nor the Linked Data API specification ticks all of the the HFactor boxes. What we&#8217;ve really done so far is define how to parameterise some requests, e.g. to filter or sort based on a property value, but we&#8217;ve not yet defined that in a standard media type; the API configuration captures a lot of the requisite information but isn&#8217;t quite there.</p>
<p>That&#8217;s a long rambly blog post for a Friday night! Hopefully I&#8217;ve clarified what I was referring to yesterday. I absolutely don&#8217;t want to see anyone define an API for RDF that steps around HTTP. We need something that is much more closely aligned with the web. And hopefully I&#8217;ve also answered Ed&#8217;s question.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/12/rdf-data-access-options-or-isnt-http-already-the-api/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>RDF and JSON: A Clash of Model and Syntax</title>
		<link>http://www.ldodds.com/blog/2010/12/rdf-and-json-a-clash-of-model-and-syntax/</link>
		<comments>http://www.ldodds.com/blog/2010/12/rdf-and-json-a-clash-of-model-and-syntax/#comments</comments>
		<pubDate>Thu, 02 Dec 2010 20:35:53 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Markup]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[json]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[rest]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=497</guid>
		<description><![CDATA[I had been meaning to write this post for some time. After reading Jeni Tennison&#8217;s post from earlier this week I had decided that I didn&#8217;t need too, but Jeni and Thomas Roessler suggested I publish my thoughts. So here they are. I&#8217;ve got more things to say about where efforts should be expended in [...]]]></description>
			<content:encoded><![CDATA[<p><i>I had been meaning to write this post for some time. After reading <a href="http://www.jenitennison.com/blog/node/149">Jeni Tennison&#8217;s post from earlier this week</a> I had decided that I didn&#8217;t need too, but Jeni and Thomas Roessler suggested I publish my thoughts. So here they are. I&#8217;ve got more things to say about where efforts should be expended in meeting the <a href="http://blogs.talis.com/nodalities/2010/12/challenges-and-opportunities-for-linked-data.php">challenges</a> that face us over the next period of growth of the semantic web, but I&#8217;ll keep those for future posts</i>.</p>
<p>Everyone agrees that a JSON serialization of RDF is a Good Thing. And I think nearly everyone would agree that a <i>standard</i> JSON serialization of RDF would be even better. The problem is no-one can agree on what constitutes a good JSON serialization of RDF. As <a href="http://www.w3.org/2010/09/rdf-wg-charter.html">the RDF Next Working Group</a> is about to convene to try and define a standard JSON serialization now is a very good time to think about what it is we really want them to achieve. </p>
<h2>RDF in JSON, is RDF in XML all over again</h2>
<p>There are very few people who like RDF/XML. Personally, while it&#8217;s not my favourite RDF syntax, I&#8217;m glad its there for when I want to convert XML formats into RDF. I&#8217;ve even built an entire RDF workflow that began with the ingestion of RDF/XML documents; we even validated them against a schema!</p>
<p>There are several reasons why people dislike RDF/XML.</p>
<p>Firstly, there is a mis-match in the data models: serialization involves turning a graph into a tree. There are many different ways to achieve that so, without applying some external constraints, the output can be highly variable. The problem is that those constraints can be highly specific, so are difficult to generalize. This results in a high degree of syntax variability of RDF/XML in the wild, and that undermines the ability to use RDF/XML with standard XML tools like XPath, XSLT, etc. They (unsurprisingly) operate only on the surface XML syntax not the &#8220;real&#8221; data model.</p>
<p>Secondly, people dislike RDF/XML because of the mis-match in (loosely speaking) the native data types. XML is largely about elements and attributes whereas RDF has resources, properties, literals, blank nodes, lists, sequences, etc. And of course there are those ever present URIs. This leads to additional syntax short-cuts and hijacking of features like XML Namespaces to simplify the output, whilst simultaneously causing even more variability in the possible serializations.</p>
<p>Thirdly, when it comes to parsing, RDF/XML just isn&#8217;t a very efficient serialization. It&#8217;s typically more verbose and can involve much more of a memory overhead when parsing than some of the other syntaxes.</p>
<p>Because of these issues, we end up with a syntax which, while flexible, requires some profiling to be really useful within an XML toolchain. Or you just ignore the fact that its XML at all and throw it straight into a triple store, which is what I suspect most people do. If you do that then an XML serialization of RDF is just a convenient way to <i>generate</i> RDF data from an XML toolchain.</p>
<p>Unfortunately when we look at serializing RDF as JSON we discover that we have nearly all of the same issues. JSON is a tree; so we have the same variety of potential options for serializing any given graph. The data types are also still different: key-value pairs, hashes, lists, strings, dates (of a form!), etc. versus resource, properties, literals, etc. While there is potential to use more native datatypes, the practical issues of repeatable properties, blank nodes, etc mean that a 1:1 mapping isn&#8217;t feasible. Lack of support for anything like XML Namespaces means that hiding URIs is also impossible without additional syntax conventions.</p>
<p>So, ultimately, both XML and JSON are poor fits for handling RDF. I think most people would agree that a specific format like Turtle is much easier to work with. It&#8217;s also better as starting point for learning RDF because most of the syntax is re-used in SPARQL. That&#8217;s why standardising Turtle, ideally extended to support Named Graphs, needs to be the first item on the RDF Next Working Group&#8217;s agenda.</p>
<h2>What do we actually want?</h2>
<p>What purpose are we trying to achieve with a JSON serialization of RDF? I&#8217;d argue that there are several goals:</p>
<ol>
<li>Support for scripting languages: Provide better support for processing RDF in scripting languages</li>
<li>Creating convergence: Build some convergence around the dizzying array of existing RDF in JSON proposals, to create consistency in how data is published</li>
<li>Gaining traction: Make RDF more acceptable for web developers, with the hope of increasing engagement with RDF and Linked Data</li>
</ol>
<p>I don&#8217;t think that anyone considers a JSON serialization of RDF as a better replacement for RDF/XML. I think everyone is looking to Turtle to provide that.</p>
<p>I also don&#8217;t think that anyone sees JSON as a particularly efficient serialization of RDF, particularly for bulk loading. It <i>might</i> be, but I think N-Triples (a subset of Turtle) fulfills that niche already: it&#8217;s easy to stream and to process in parallel.</p>
<p>Lets look at each of those goals in turn.</p>
<h3>Support for scripting languages</h3>
<p>Unarguably its much, much easier to process JSON in scripting languages like Javascript, Ruby, PHP than RDF/XML. </p>
<p>Parser support for JSON is ubiquitous as its the syntax <i>de jour</i>. Just as XML was when the RDF specifications were being written. Typically JSON parsing is much more efficient. That&#8217;s especially true when we look at Javascript in the browser. </p>
<p>From that perspective RDF in JSON is an instant win as it will simplify consumption of Linked Data and the results of SPARQL CONSTRUCT and DESCRIBE queries. There are other issues with getting wide-spread support for RDF across different programming languages, e.g. proper validation of URIs, but fast parsing of the basic data structure would be a step in the right direction.</p>
<h3>Creating Convergence</h3>
<p>I think I&#8217;ve seen about a dozen or more different RDF in JSON proposals. There&#8217;s <a href="http://esw.w3.org/JSON%2BRDF">a list on the ESW wiki</a> and <a href="http://n2.talis.com/wiki/RDF_JSON_Brainstorming">some comparison notes on the Talis Platform wiki</a>, but I don&#8217;t think either are complete. If I get chance I&#8217;ll update them. The sheer variety confirms my earlier points about the mis-matches between models: everyone has their own conception of what constitutes a useful JSON serialization. </p>
<p>Because there are less syntax options in JSON, the proposals run the full spectrum from capturing the full RDF model but making poor use of JSON syntax, through to making good use of JSON syntax but at the cost of either ignoring aspects of the RDF model or layering additional syntax conventions on top of JSON itself. As an aside, I find it interesting that so many people are happy with subsetting RDF to achieve this one goal.</p>
<p>The thing we should recognise is that <i>none</i> of the existing RDF in JSON formats are really useful without an accompanying API. I&#8217;ve used a number of different formats and no matter what serialization I&#8217;ve used I&#8217;ve ended up with helper code that simplifies some or all of the following:</p>
<ul>
<li>Lookup of all properties of a single resource</li>
<li>Mapping between URIs and short names (e.g. CURIES or locally defined keys) for properties</li>
<li>Mapping between conventions for encoding particular datatypes (or language annotations) and native objects in the scripting language</li>
<li>Cross-referencing between subjects and objects; and vice-versa</li>
<li>Looking up all values of a property or a single value (often the first)</li>
</ul>
<p>In addition, if I&#8217;m consuming the results of multiple requests then I may also end up with a custom data structure and code for merging together different descriptions. Even if its just an array of parsed JSON documents and code to perform the above lookups across that collection.</p>
<p>So, while we can debate the relative aesthetics of different approaches, I think its focusing attention on the wrong areas. What we should really be looking at is an API for manipulating RDF. One that will work in Javascript, Ruby or PHP. While I acknowledge the lingering horror of the DOM, I think the design space here is much simpler. Maybe I&#8217;m just an optimist!</p>
<p>If we take this approach then what we need is an JSON serialization of RDF that covers as much of the RDF model as possible and, ideally, is already as well supported as possible. From what I&#8217;ve seen <a href="http://n2.talis.com/wiki/RDF_JSON_Specification">the RDF/JSON serialization</a> is actually closest to that ideal. It&#8217;s supported in a number of different parsing and serialising libraries already and only needs to be extended to support blank nodes and Named Graphs, which is trivial to do. While its not the prettiest serialization, given a vote, I&#8217;d look at standardising that and moving on to focus on the more important area: the API.</p>
<h3>Gaining Traction</h3>
<p>Which brings me to the last use case. Can we create a JSON serialization of RDF that will help Linked Data and RDF get some traction in the wider web development community?</p>
<p>The answer is no.</p>
<p>If you believe that the issues with gaining adoption are purely related to syntax then you&#8217;re not listening to the web developer community closely enough. While a friendlier syntax may undoubtedly help, an API would be even better. The majority of web developers these days are very happy indeed to work with tools like JQuery to handle client-side scripting. A standard JQuery extension for RDF would help adoption much more than spending months debating the best way to profile the RDF model into a clean JSON serialization.</p>
<p>But the real issue is that we&#8217;re asking web developers to learn not just new syntax but also an entirely new way to access data: we&#8217;re asking them to use SPARQL rather than simple RESTful APIs.</p>
<p>While I think SPARQL is an important and powerful tool in the RDF toolchain I don&#8217;t think it should be seen as the standard way of querying RDF over the web. There&#8217;s a big data access gulf between de-referencing URIs and performing SPARQL queries. We need something to fill that space, and I think the <a href="http://code.google.com/p/linked-data-api/wiki/Specification">Linked Data API</a> fills that gap very nicely. We should be promoting a range of access options.</p>
<p>I have similar doubts about SPARQL Update as the standard way of updating triple stores over the web, but that&#8217;s the topic of another post.</p>
<h2>Summing Up</h2>
<p>As the RDF Next Working Group gets underway I think it needs to carefully prioritise its activities to ensure that we get the most out of this next phase of development and effort around the Semantic Web specifications. It&#8217;s particularly crucial right now as we&#8217;re beginning to see the ideas being adopted and embraced more widely. As I&#8217;ve tried to highlight here, I think there&#8217;s a lot of value to be had in having a standard JSON serialization of RDF. But I don&#8217;t think that there&#8217;s much merit in attempting to create a clean, simple JSON serialization that will meet everyone&#8217;s needs. </p>
<p>Standardising Turtle and an API for manipulating RDF data has more value in my view. RDF/JSON as a well implemented specification meets the core needs of the semantic web developer; a simple scripting API meets the needs of everyone else.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/12/rdf-and-json-a-clash-of-model-and-syntax/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Gridworks Reconciliation API Implementation</title>
		<link>http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/</link>
		<comments>http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/#comments</comments>
		<pubDate>Wed, 25 Aug 2010 21:19:13 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[gridworks]]></category>
		<category><![CDATA[linkeddata]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[talis]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=489</guid>
		<description><![CDATA[Gridworks is a really fantastic tool and there&#8217;s scope to extend it in all kinds of interesting ways. Jeni Tennison has recently published a great blog post describing how to use Gridworks for generating Linked Data. I strongly encourage you to read her posting as it not only provides a good introduction to Gridworks itself, [...]]]></description>
			<content:encoded><![CDATA[<p>Gridworks is a really fantastic tool and there&#8217;s scope to extend it in all kinds of interesting ways. Jeni Tennison has recently published a great blog post describing <a href="http://www.jenitennison.com/blog/node/145">how to use Gridworks for generating Linked Data</a>. I strongly encourage you to read her posting as it not only provides a good introduction to Gridworks itself, but also shows a nice real world example of generating RDF using its built-in data cleaning and templating tools.</p>
<p>I was luckily enough to meet David Huynh as a workshop recently and chatted to him briefly about another aspect of the Gridworks: its ability to match field values in a dataset to entities in Freebase, e.g. identifying a place based on just it&#8217;s name. Within Gridworks this process is known as &#8220;reconciliation&#8221;.</p>
<p>Reconciliation is an important step for generating good Linked Data as you&#8217;ll often need to correlate values in a dataset with URIs in existing datasets in order to generate links. E.g. matching company names to their URIs. While it is possible to generate identifiers algorithmically during a conversion this typically just defers the reconciliation work until a later stage, when you carry out cross-linking to introduce <a href="http://patterns.dataincubator.org/book/equivalence-links.html">equivalence links</a>.</p>
<p>Recognising that the ability to introduce new reconciliation services would be a powerful extension to Gridworks, David Huynh has been creating <a href="http://code.google.com/p/freebase-gridworks/wiki/ReconciliationServiceApi">a draft specification</a> that will allow third-parties to create and deploy their own reconciliation services. He&#8217;s been documenting his <a href="http://freebase-gridworks.blogspot.com/2010/06/progress-on-generic-reconciliation.html">progress on implementing the client side of this protocol</a> and has published <a href="http://standard-reconcile.freebaseapps.com/">a testing service</a>.</p>
<p>It occurred to me that the reconciliation API is essentially a structured search over a dataset and thus could be implemented against the <a href="http://n2.talis.com/wiki/Contentbox#Searching_The_Contentbox">search interface</a> exposed by Talis Platform stores. The RSS 1.0 feeds that the Platform returns includes enough information to rank and filter results as required by the API.</p>
<p>I&#8217;ve created a simple Ruby application, using the Sinatra web framework, that implements the reconciliation API for any Talis Platform store. You can find <a href="http://github.com/ldodds/pho-reconcile">the code on github</a> if you want to have a play with it. As I note in the README there are some areas where customisation is useful to get the most from the service. So while in principle it can be used against any existing Platform store you can create a simple JSON config to tweak it for particular datasets.</p>
<p>There&#8217;s a live version of the code running one my server here: <a href="http://ldodds.com/gridworks/">http://ldodds.com/gridworks/</a>.</p>
<p>That page has a simple API console for carrying out queries, but consult <a href="http://code.google.com/p/freebase-gridworks/wiki/ReconciliationServiceApi">the draft specification</a> for more details. I think I&#8217;ve covered all of the basic features (but bug reports welcome!). Consult the README for notes on configuration options and implementation decisions.</p>
<p>As a simple illustration, lets say that I have the value &#8220;<code>Bath</code>&#8221; in a dataset and want to match that to some area in the UK administrative geography. This information is available from the Linked Data exposed by <code>statistics.data.gov.uk</code> and this happens to be hosted in <a href="http://api.talis.com/stores/govuk-statistics">this platform store</a>. The reconciliation API we need can therefore be found at: <a href="http://ldodds.com/gridworks/govuk-statistics/reconcile">http://ldodds.com/gridworks/govuk-statistics/reconcile</a>. An HTTP GET on that location retrieves the service metadata.</p>
<p>If we use <a href="http://ldodds.com/gridworks/">the API explorer</a> we can use a simple HTML form to try out examples. Select <code>govuk-statistics</code> from the Store drop-down and then type <code>Bath</code> into the search box. You&#8217;ll <a href="http://ldodds.com/gridworks/govuk-statistics/reconcile?query={%22query%22%3A%22Bath%22%2C%22limit%22%3A%225%22%2C%22type%22%3A%22%22}">get this result</a>. This is not very readable by default, so if you&#8217;re using Firefox I recommend you <a href="https://addons.mozilla.org/en-US/firefox/addon/10869/">install the JSONView extension</a> which provides a nicely formatted display.</p>
<p>Our initial search returns a number of results. The highest ranked of these being <a href="http://statistics.data.gov.uk/id/parliamentary-constituency/019">the Westminster Constituency for Bath</a>. That seems like a pretty good initial result to me. As it is the most relevant result in the search it&#8217;s marked as an exact match, so once integrated with Gridworks it will capture and store the reconciled identifier for you.</p>
<p>However, we may know that in the imaginary dataset we&#8217;re working with, that a particular field doesn&#8217;t contain names of constituencies. It may instead refer to <a href="http://statistics.data.gov.uk/def/geography/LocalEducationAuthority">a Local Education Authority</a>. We can refine our search by adding the URI that defines that type of resource into the <code>type</code> field in the API explorer. </p>
<p>Try pasting in <code>http://statistics.data.gov.uk/def/geography/LocalEducationAuthority</code> into the post and running the search again. You&#8217;ll find that this time you <a href="http://ldodds.com/gridworks/govuk-statistics/reconcile?query={%22query%22%3A%22bath%22%2C%22limit%22%3A%225%22%2C%22type%22%3A%22http%3A//statistics.data.gov.uk/def/geography/LocalEducationAuthorityArea%22}">get a single result</a>, which is <a href="http://statistics.data.gov.uk/id/local-education-authority/800">Bath and North East Somerset</a>. Job done. </p>
<p>Of course, to get the most from this you need to know what URIs you can use for filtering by types (and properties). But this is something that the Gridworks UI will help with. It can integrate with &#8220;suggestion services&#8221; that can be used to help map values to a properties and types within a schema. I&#8217;ll be looking at how to expose those as my next piece of work.</p>
<p>Hopefully you can see how the overall system works. Feel free to have a play with the API to try it out for yourself. If you have comments on the implementation then I&#8217;d love to hear them, but I&#8217;d suggest that comments on the specification are best addressed to <a href="http://groups.google.com/group/freebase-gridworks">the gridworks mailing list</a>.</p>
<p>I also suspect the Reconciliation API has uses outside of just Gridworks. For example, I wonder how easy it would be to introduce reconciliation into Google Spreadsheets using <a href="http://code.google.com/googleapps/appsscript/">Google Apps Script</a>? It&#8217;s also another nice demonstration of how easy it is to map simple RESTful APIs onto RDF datasets, this implementation works for any data in the Platform, no matter what schema it confirms with. Neat.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/08/gridworks-reconciliation-api-implementation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>RDF Dataset Notifications</title>
		<link>http://www.ldodds.com/blog/2010/04/rdf-dataset-notifications/</link>
		<comments>http://www.ldodds.com/blog/2010/04/rdf-dataset-notifications/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 19:43:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[linkeddata]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=483</guid>
		<description><![CDATA[Like many people in the RDF community I&#8217;ve been thinking about the issue of syndicating updates to RDF datasets. If we want to support truly distributed aggregation and processing of data then we need an efficient way to share updates.
There&#8217;s been a lot of experimentation around different mechanisms, and PubSubHubbub seems to be a current [...]]]></description>
			<content:encoded><![CDATA[<p>Like many people in the RDF community I&#8217;ve been thinking about the issue of syndicating updates to RDF datasets. If we want to support truly distributed aggregation and processing of data then we need an efficient way to share updates.</p>
<p>There&#8217;s been a lot of experimentation around different mechanisms, and <a href="http://code.google.com/p/pubsubhubbub/">PubSubHubbub</a> seems to be a current favourite approach. I&#8217;ve been playing with it myself recently and have hacked up a basic push mechanism around Talis Platform stores. More on that another time.</p>
<p>But I&#8217;ve not yet seen any general discussion about the merits of different approaches, or even discussion about what it is that we really want to syndicate.</p>
<p>So let&#8217;s take it from the top.</p>
<p>It seems to me that there&#8217;s basically three broad categories of information we want to syndicate:</p>
<ul>
<li><i>Dataset Notifications</i> &#8212; has a new dataset been added to a directory? has one been updated in some way, e.g. through the addition or removal of triples?</li>
<li><i>Resource Notifications</i> &#8212; what resources have been added or modified within a dataset?</li>
<li><i>Triple Notifications</i> &#8212; what triples have been changed within a dataset?</li>
</ul>
<p>Each one of these categories is syndicating a different level of detail and may benefit from a different technical approach. For example there&#8217;s a different volume of information being exchanged if one is simply notifying dataset changes vs every triple. We&#8217;ll also likely need a different format or syntax.</p>
<p>Actually there may be a fourth category: notifications of graph structural changes to a dataset, e.g. adding or removing named graphs. I&#8217;ve not yet seen anyone exploring that level of syndication, but suspect it may be very useful.</p>
<p>Now, for each of those different categories, there are two different styles of notifications: <i>push</i> or <i>pull</i>. Pull mechanisms are typified by feed subscriptions, crawlers, or repeated queries of datasets. Push mechanisms are usually based on some form of publish-subscribe system.</p>
<p>Given those different scenarios, we can take a look at some existing technologies and categorise them. I&#8217;ve done just that and <a href="http://spreadsheets.google.com/pub?key=tLWdskoM-2--vLjUI05e7qQ&#038;output=html">published a simple Google spreadsheet with my first stab at this analysis</a>. (This probably needs a little more context in places but hopefully the classifications are fairly obvious).</p>
<p>PubSubHubbub seems to offer the most flexibility in that it mixes a standard Pull based Feed architecture with a Push based subscription system. Clearly worthy of the attention its getting. Other technologies offer similar features but are optimised for different purposes.</p>
<p>However that doesn&#8217;t mean that PubSubhubbub is just perfect out of the box. For example it&#8217;s worth noting that consumers aren&#8217;t <i>required</i> to use the Push aspects of the system, they can just subscribe to the feeds. So you need to be prepared to scale a PubSubHubbub system just as you would a Pull based Feed. </p>
<p>It may also be sub-optimal for systems which are syndicating out high-volume Triple level updates. The Feeds can potentially get very large and the hub system needs to be prepared to handle large exchanges. It also doesn&#8217;t say anything about how to catch-up or recover from missed updates. A hybrid approach may be required to cover for all use cases and scenarios and to produce a robust system.</p>
<p>In order to be able to properly compare different approaches we need to understand their respective trade-offs. I&#8217;m hoping this posting contributes to that discussion and can complement the ongoing community experimentation.</p>
<p>Am interested to hear your thoughts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/04/rdf-dataset-notifications/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Linked Data Patterns: a free book for practitioners</title>
		<link>http://www.ldodds.com/blog/2010/04/linked-data-patterns-a-free-book-for-practitioners/</link>
		<comments>http://www.ldodds.com/blog/2010/04/linked-data-patterns-a-free-book-for-practitioners/#comments</comments>
		<pubDate>Tue, 06 Apr 2010 14:37:08 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[linkeddata]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=475</guid>
		<description><![CDATA[A few months ago Ian Davis and I were chatting about some new approaches to helping practitioners climb the learning curve around Linked Data, RDF and related technologies. We were both keen to help communicate the value of Linked Data, share knowledge amongst practitioners, and to encourage the community to converge on best practices. We [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago <a href="http://iandavis.com">Ian Davis</a> and I were chatting about some new approaches to helping practitioners climb the learning curve around Linked Data, RDF and related technologies. We were both keen to help communicate the value of Linked Data, share knowledge amongst practitioners, and to encourage the community to converge on best practices. We kicked around a number of different ideas in this vein.</p>
<p>For example, Ian was keen to provide guidance as to how to mix and match different vocabularies to achieve a particular goal, like describing a person or a book. Having a ready reference containing recipes for these common tasks would address a number of goals. He&#8217;s ended up exploring that idea further in the recently released <a href="http://schemapedia.com">Schemapedia</a>. If you&#8217;ve not seen it yet, then you should take a look. It provides a really nice way to navigate through RDF vocabularies and explore their intersections.</p>
<p>The other thing that we discussed was Design Patterns. I&#8217;ve been a Design Pattern nut for some time now. Discovering them was something of a right of passage for me during my Master&#8217;s dissertation. I&#8217;d spent weeks revising and honing a design for the distributed system I was building, only to discover that what I&#8217;d produced was already documented as a design pattern in an obscure corner of the research literature. While I&#8217;d clearly reinvented the wheel, the discovery not only provided external validation for what I&#8217;d produced, but also neatly illustrated the benefit of using design patterns to share knowledge and experience within a community. Knowing when to apply particular patterns is a key skill for any developer, and the terms are a part of the design vocabulary we all share.</p>
<p>I suggested to Ian that we explore writing some patterns for Linked Data. Patterns for assigning identifiers, modelling data, as well as application development. We experimented with this for a while but ended up parking the discussion for a few months whilst other priorities intervened.</p>
<p>I recently revived the project. It&#8217;s pretty clear to me that there&#8217;s still a big skills gap between experienced practitioners and those seeking to apply the technology. I think the current situation is reminiscent of the move of OO programming from the research lab out into the developer community; design patterns played a key role there too.</p>
<p>Ian and I have decided to share this with the community as an on-line book, a pattern catalogue that covers a range of different use cases. We started out with about half a dozen patterns, but over the last few weeks I&#8217;ve expanded that figure to thirty. I&#8217;ve still got a number on my short-list (more than a dozen, I think) but it&#8217;s time to start sharing this with the community. The work won&#8217;t ever be complete as the space is still unfolding, it will just get refined over time.</p>
<p>You can read the book online at <a href="http://patterns.dataincubator.org">http://patterns.dataincubator.org</a>. </p>
<p>The work is licensed under a Creative Commons Attribution license so you&#8217;re free to use it as you see fit, but please attribute the source. If you want to download it, then <a href="http://patterns.dataincubator.org/book/linked-data-patterns.pdf">there&#8217;s a PDF</a>, and <a href="http://patterns.dataincubator.org/book/linked-data-patterns.epub">an EPUB too</a>. We&#8217;re using DocBook for the text so there will be a number of different access options.</p>
<p>I&#8217;ll stress that this is a very early draft, so be gentle. But we&#8217;d love to hear your comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/04/linked-data-patterns-a-free-book-for-practitioners/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>A Tour of the OS 50k Gazetteer Linked Data</title>
		<link>http://www.ldodds.com/blog/2010/04/a-tour-of-the-os-50k-gazetteer-linked-data/</link>
		<comments>http://www.ldodds.com/blog/2010/04/a-tour-of-the-os-50k-gazetteer-linked-data/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 11:34:12 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[linkeddata]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=460</guid>
		<description><![CDATA[The Ordnance Survey have today published the first in a series of open datasets. In addition to the administrative geography that was published last year, the Linked Data available from data.ordnancesurvey.co.uk now includes data from their 1:50 000 Scale Gazetteer. In this blog post I thought I&#8217;d post an overview of the dataset to summarise [...]]]></description>
			<content:encoded><![CDATA[<p>The Ordnance Survey have today published the first in a series of open datasets. In addition to the administrative geography that was published last year, the Linked Data available from <a href="http://data.ordnancesurvey.co.uk">data.ordnancesurvey.co.uk</a> now includes data from their 1:50 000 Scale Gazetteer. In this blog post I thought I&#8217;d post an overview of the dataset to summarise what it contains.</p>
<h2>Analysis</h2>
<p>The Gazetteer identifiers all have a base URL of: </p>
<p><code style="font-size:12px">http://data.ordnancesurvey.co.uk/id/50kGazetteer/</code>.</p>
<p>The base URL is suffixed with a unique numeric code. I&#8217;m not sure where this originates from, and its not present in the underlying data.</p>
<p>The dataset consist of 2,368,655 triples (individual facts) asserted over 259,080 unique resources. So about 9 triples per resource. Here&#8217;s how the properties break down:</p>
<table style="font-size:12px">
<tr>
<td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td>
<td>259080</td>
</tr>
<tr>
<td>http://xmlns.com/foaf/0.1/name</td>
<td>259080</td>
</tr>
<tr>
<td>http://www.w3.org/2000/01/rdf-schema#label</td>
<td>259080</td>
</tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/spatialrelations/northing</td>
<td>259080</td>
</tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/spatialrelations/easting</td>
<td>259080</td>
</tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/featureType</td>
<td>259080</td>
</tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/oneKMGridReference</td>
<td>259080</td>
</tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/twentyKMGridReference</td>
<td>259080</td>
</tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/mapReference</td>
<td>296015</td>
</tr>
</table>
<p>The first few properties are labels and a type for each resource. The additional predicates are from the OS Spatial Relations ontology, providing the Eastings and Northings for each feature. The remainining four predicates provide a &#8220;feature type&#8221; and OS map &#038; grid references. There are slightly more map references, so some resources have more than one such property, i.e. because they&#8217;re large enough to span a particular map. You can see that there are no links to other datasets as yet, or lat/long co-ordinates.</p>
<p>
Lets look closer at some of the predicates. For the RDF types, I discovered that the every resource has the same type, they&#8217;re all instances of a &#8220;Named Place&#8221;: </p>
<p><code style="font-size:12px">http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/NamedPlace</code>.</p>
<p>Presumably then the detailed classification for the different types of landscape feature is present in the &#8220;feature type&#8221; predicate. A SPARQL query to count and group the values for that predicate gives me:</p>
<table style="font-size:12px">
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/Other</td>
<td>128662</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/OtherSettlement</td>
<td>41228</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/Farm</td>
<td>34723</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/WaterFeature</td>
<td>24425</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/HillOrMountain</td>
<td>14524</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/ForestOrWood</td>
<td>8708</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/Antiquity</td>
<td>5252</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/Town</td>
<td>1259</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/RomanAntiquity</td>
<td>237</td>
<tr>
<tr>
<td>http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/City</td>
<td>62</td>
<tr>
</table>
<p>
We can see that 128,662 resources (49% of total) are simply &#8220;Other&#8221; with another 41,228 being &#8220;Other Settlement&#8221;; not that inspiring! The rest of the feature types are more interesting, and give us some very basic data on various geographic features. The Roman Antiquity features piqued my interested; Hadrian&#8217;s Wall has the following identifier (click to see the data):</p>
<p><code style="font-size:12px"><a href="http://data.ordnancesurvey.co.uk/id/50kGazetteer/106584">http://data.ordnancesurvey.co.uk/id/50kGazetteer/106584</a></code></p>
<p>
The values for the Easting and Northing properties should be obvious, so I&#8217;ll skip over those. The remaining properties are all map references, and the values of these are all resources. So the Gazetteer has begun assigning URIs to all of the 1KM and 20KM grid references, as well as each of OS LandRanger Maps. Here are some sample URLs for each, taken from the descripion of Hadrian&#8217;s Wall:</p>
<p><code style="font-size:12px">http://data.ordnancesurvey.co.uk/id/1kmgridsquare/NY3359</code><br />
<code style="font-size:12px">http://data.ordnancesurvey.co.uk/id/20kmgridsquare/NY24</code><br />
<code style="font-size:12px">http://data.ordnancesurvey.co.uk/id/OSLandrangerMap/85</code></p>
<p>The URIs seem predictable and can probably be derived from data found elsewhere. Unfortunately, no further data has been included about these resources. I believe they are place-holders for data that has yet to be released.</p>
<p>
Overall the data in the Gazetteer is pretty sparse but presumably it will become much richer once more OS data is released. Latitude and longitudes is something that I&#8217;d particularly like to see added. There&#8217;s an opportunity here for someone to link up these resources with pages in Wikipedia &#038; resources in DbPedia.
</p>
<h2>Sample Queries</h2>
<p>If you want to play with the data, here are a couple of SPARQL queries to get you started. The first retrieves 10 features classified as Roman Antiquities</p>
<pre>
<code style="font-size:12px">
PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#>
PREFIX spatial: &lt;http://data.ordnancesurvey.co.uk/ontology/spatialrelations/>
PREFIX gaz: &lt;http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/>

SELECT ?uri ?label ?easting ?northing ?one ?twenty ?map
WHERE {
  ?uri
    #filter on type
    gaz:featureType gaz:RomanAntiquity;

    #bind everything we want to return
    rdfs:label ?label;
    spatial:easting ?easting;
    spatial:northing ?northing;
    gaz:oneKMGridReference ?one;
    gaz:twentyKMGridReference ?twenty;
    gaz:mapReference ?map.
}
LIMIT 10
</code>
</pre>
<p><a href="http://api.talis.com/stores/ordnance-survey/services/sparql?output=json&#038;query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+spatial%3A+%3Chttp%3A%2F%2Fdata.ordnancesurvey.co.uk%2Fontology%2Fspatialrelations%2F%3E%0D%0APREFIX+gaz%3A+%3Chttp%3A%2F%2Fdata.ordnancesurvey.co.uk%2Fontology%2F50kGazetteer%2F%3E%0D%0A%0D%0ASELECT+%3Furi+%3Flabel+%3Feasting+%3Fnorthing+%3Fone+%3Ftwenty+%3Fmap%0D%0AWHERE+{%0D%0A++%3Furi%0D%0A++++%23filter+on+type%0D%0A++++gaz%3AfeatureType+gaz%3ARomanAntiquity%3B%0D%0A%0D%0A++++%23bind+everything+we+want+to+return%0D%0A++++rdfs%3Alabel+%3Flabel%3B%0D%0A++++spatial%3Aeasting+%3Feasting%3B%0D%0A++++spatial%3Anorthing+%3Fnorthing%3B%0D%0A++++gaz%3AoneKMGridReference+%3Fone%3B%0D%0A++++gaz%3AtwentyKMGridReference+%3Ftwenty%3B%0D%0A++++gaz%3AmapReference+%3Fmap.%0D%0A}%0D%0ALIMIT+10%0D%0A">Results in JSON</a></p>
<p>The following query lists all of the features on a specific OS Landranger map. So even though we don&#8217;t (yet) have any details about the map, we can use its identifier as a means to filter the results:</p>
<pre>
<code style="font-size:12px">
PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#>
PREFIX spatial: &lt;http://data.ordnancesurvey.co.uk/ontology/spatialrelations/>
PREFIX gaz: &lt;http://data.ordnancesurvey.co.uk/ontology/50kGazetteer/>

SELECT ?uri ?label ?easting ?northing ?featureType
WHERE {
  ?uri
    #filter on map reference
    gaz:mapReference &lt;http://data.ordnancesurvey.co.uk/id/OSLandrangerMap/85>;

    #bind everything we want to return
    rdfs:label ?label;
    spatial:easting ?easting;
    spatial:northing ?northing;
    gaz:featureType ?featureType.
}
</code>
</pre>
<p><a href="http://api.talis.com/stores/ordnance-survey/services/sparql?output=json&#038;query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+spatial%3A+%3Chttp%3A%2F%2Fdata.ordnancesurvey.co.uk%2Fontology%2Fspatialrelations%2F%3E%0D%0APREFIX+gaz%3A+%3Chttp%3A%2F%2Fdata.ordnancesurvey.co.uk%2Fontology%2F50kGazetteer%2F%3E%0D%0A%0D%0ASELECT+%3Furi+%3Flabel+%3Feasting+%3Fnorthing+%3FfeatureType%0D%0AWHERE+{%0D%0A++%3Furi%0D%0A++++%23filter+on+map+reference%0D%0A++++gaz%3AmapReference+%3Chttp%3A%2F%2Fdata.ordnancesurvey.co.uk%2Fid%2FOSLandrangerMap%2F85%3E%3B%0D%0A%0D%0A++++%23bind+everything+we+want+to+return%0D%0A++++rdfs%3Alabel+%3Flabel%3B%0D%0A++++spatial%3Aeasting+%3Feasting%3B%0D%0A++++spatial%3Anorthing+%3Fnorthing%3B%0D%0A++++gaz%3AfeatureType+%3FfeatureType.%0D%0A}%0D%0A%0D%0A">Results in JSON</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/04/a-tour-of-the-os-50k-gazetteer-linked-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Enhanced Descriptions: &#8220;Premium Linked Data&#8221;</title>
		<link>http://www.ldodds.com/blog/2010/03/enhanced-descriptions-premium-linked-data/</link>
		<comments>http://www.ldodds.com/blog/2010/03/enhanced-descriptions-premium-linked-data/#comments</comments>
		<pubDate>Sun, 28 Mar 2010 19:46:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[linkeddata]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=454</guid>
		<description><![CDATA[I&#8217;ve had several conversations recently with people who are either interested in, or actually implementing Linked Data, and are struggling with some important questions


How much data should I give away?
If I wanted to charge for more than just the basic data, then how would I handle that?


My usual response to the first of those questions [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had several conversations recently with people who are either interested in, or actually implementing Linked Data, and are struggling with some important questions
</p>
<ul>
<li>How much data should I give away?</li>
<li>If I wanted to charge for more than just the basic data, then how would I handle that?</li>
</ul>
<p>
My usual response to the first of those questions is: &#8220;as much as you feel comfortable with&#8221;. There&#8217;s still so much data that&#8217;s not yet visible or accessible in machine-readable formats that any progress is good progress. Let&#8217;s get more data out there now. More is better.
</p>
<p>
It usually doesn&#8217;t take long to get to the second question. If you&#8217;ve spent time evangelising to people about the power and value of data, and particularly <i>their</i> data, then its natural for them to begin thinking about how it can be monetized.</p>
<p>Scott Brinker has done a good job of summarising <a href="http://www.chiefmartec.com/2010/03/business-models-for-linked-data-and-web-30.html">a range of options for Linked Data business models</a>. I&#8217;ve chipped into that discussion already. Instead what I wanted to briefly discuss here is some of the mechanics of implementing access to what we might call &#8220;premium Linked Data&#8221;, or as I&#8217;ll refer to it &#8220;Enhanced Descriptions&#8221;.</p>
<h2>Premium Linked Data</h2>
<p>
It&#8217;s possible to publish Linked Data that is entirely access controlled. Access might be limited to users behind the firewall (&#8221;Enterprise Linked Data&#8221;) or only to authorised paying customers. As a paid up customer you&#8217;d be given an entry point into that Linked Data and would supply appropriate credentials in order to access it.
</p>
<p>
This data isn&#8217;t going to be something you&#8217;d discover on the open web. There are many different authentication models that could be used to mediate access to this &#8220;Dark Data&#8221;. The precise mechanisms aren&#8217;t that important and the right one is likely to vary for different industries and use cases. Although I think there&#8217;s a strong argument in using something that dove-tails nicely with HTTP and web infrastructure in general.
</p>
<p>
What interests me more is the scenario in which a data publisher might be exposing some public data under a liberal open license, but <i>also</i> wants to make available some &#8220;premium&#8221; metadata. I.e. some value-added data that is only available to paid-up customers. In this scenario it would be useful to be able to link together the open and closed data, allowing a user agent to detect that there is extra value hidden behind some kind of authentication barrier. I think this is likely to become a very common pattern as it aids discovery of the value-added material. Essentially its the existing pattern for access controlling content that we have on the web of documents.</p>
<p>Its the mechanics of implementing this public/private scenario that has cropped up in my recent conversations.
</p>
<h2>Enhanced Descriptions</h2>
<p>
When I dereference the URI of a resource I will typically get redirected to a document that describes that resource. This document might contain data like this (in Turtle):
</p>
<pre><code>
ex:document
  foaf:primaryTopic ex:thing.

ex:thing
  rdfs:label "Some Thing".
</code></pre>
<p>
i.e. the document contains some data about the resource, and there&#8217;s a primary topic relationship between the document and the resource.
</p>
<p>
If we want to point to additional RDF documents that also describe this resource, or related data, then we can use an <code>rdfs:seeAlso</code> link:
</p>
<pre><code>
ex:document
  foaf:primaryTopic ex:thing.

ex:thing rdfs:label "Some Thing";
  rdfs:seeAlso ex:otherDocument.
</code></pre>
<p>
We can use the <code>rdfs:seeAlso</code> relationship to point to additional documents either within a specific dataset or in other locations on the web. Those documents provide useful <a href="http://www.ldodds.com/blog/2009/12/annotated-data/">annotations about a resource</a>.
</p>
<p>
An &#8220;Enhanced Description&#8221; will contain additional value-added data about a resource. We could just refer to this document using an <code>rdfs:seeAlso</code> link. But if we do that then a user agent can&#8217;t easily distinguish between an arbitrary <code>rdfs:seeAlso</code> link and one that refers to some additional data. We could instead use an additional relationship, a specialisation of <code>rdfs:seeAlso</code>, that can be used to disambiguate between the relationships. I&#8217;ve defined just such a predicate: <a href="http://open.vocab.org/terms/enhancedDescription"><code>ov:enhancedDescription</code></a>.
</p>
<pre><code>
ex:document
  foaf:primaryTopic ex:thing.

ex:thing rdfs:label "Some Thing";
  rdfs:seeAlso ex:otherDocument;
  ov:enhancedDescription ex:premiumDocument.

</code></pre>
<p>
By using a separate document to hold the value-added annotations we have the opportunity for user agents to identify those documents (via the predicate) and to also be challenged for credentials when they retrieve the URI (e.g. with an HTTP 401 response code).</p>
<p>
It also means data publishers can safely dip a toe in the open data waters, but leave richer descriptions protected but still discoverable behind an access control layer.
</p>
<h2>Another Approach?</h2>
<p>
Interestingly I discovered earlier today that OpenCalais returns a &#8220;402 Payment Required&#8221; status code for some documents.</p>
<p>To see this in practice visit <a href="http://d.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633.html">their description of IBM</a> and try accessing the last of the <code>owl:sameAs</code> links. I&#8217;m guessing they&#8217;re using a similar technique to the one I&#8217;ve outlined here. But the key difference is that rather than use separate documents, they&#8217;ve decided to create new URIs for the access controlled version of the Linked Data. It would be nice if someone out there could confirm that.</p>
<p>Assuming I&#8217;ve interpreted what they&#8217;re doing correctly, I think this approach has some failings. Firstly it creates extra URIs <a href="http://www.ldodds.com/blog/2009/12/annotated-data/">that aren&#8217;t really needed</a>. I&#8217;m not sure that we really need more URIs for things; a pattern in which publishers have 2 URIs (public &#038; private) for each resource isn&#8217;t going to help matters</p>
<p>Secondly, just like using a generic &#8220;see also&#8221; relation, using <code>owl:sameAs</code> means its impossible to detect which resource is the one providing access to premium data, and others that exist on the web, without doing some fragile URI matching.</p>
<p>
Apologies for the OpenCalais team if I&#8217;ve misunderstood the mechanism they&#8217;re using. I&#8217;ll happily publish a correction, but regardless, I&#8217;m intrigued by the 402 status code! <img src='http://www.ldodds.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
<h2>Summary</h2>
<p>
In my view, the &#8220;Enhanced Description&#8221; approach is a simple to implement pattern. Its one that I&#8217;ve been recommending to people recently but I&#8217;ve not seen documented anywhere, so thought I&#8217;d write it up.</p>
<p>
I&#8217;d be interested to hear from others that have either implemented the same mechanism, or like OpenCalais are using other schemes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/03/enhanced-descriptions-premium-linked-data/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Predicate Based Services</title>
		<link>http://www.ldodds.com/blog/2010/03/predicate-based-services/</link>
		<comments>http://www.ldodds.com/blog/2010/03/predicate-based-services/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 19:58:16 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[linkeddata]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=448</guid>
		<description><![CDATA[sameAs.org is a great service on a number of different levels. It provides a much needed piece of Semantic Web infrastructure and it achieves that through a simple clean interface and API. You don&#8217;t even need to know anything about RDF to get value from the service. In short it&#8217;s one of those nice web [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://sameas.org">sameAs.org</a> is a great service on a number of different levels. It provides a much needed piece of Semantic Web infrastructure and it achieves that through a simple clean interface and API. You don&#8217;t even need to know anything about RDF to get value from the service. In short it&#8217;s one of those nice web services that do one thing and do it really well.</p>
<p>I use the service as a frequent example in my talks and training sessions on Linked Data. For example, while it&#8217;s useful to review techniques for linking together datasets, in practice you can achieve a lot by simply doing a series of look-ups against sameAs.org. I&#8217;ve had some happy experiences of discovering connections between datasets without having to do any manual linking.</p>
<p>More than a few times recently I&#8217;ve been thinking that it would be useful to repeat what Hugh Glaser and Ian Millard achieved with sameAs.org, but for a number of other common RDF predicates.</p>
<p>In my opinion there are a small number of general predicates that will act as the backbone for the web of data. At the head of the predicate long tail we&#8217;ll find properties like: <code>owl:sameAs</code>, but also useful properties like <code>dc:subject</code>, <code>foaf:knows</code> and <code>foaf:primaryTopic</code>.</p>
<p>The topic based predicates (<code>dc:subject</code>, <code>foaf:primaryTopic</code>, <code>foaf:topic</code>, et al) are particularly useful for discovering documents and material that relate to a specific resource. An index of these would be extremely useful for inter-linking between content from different news and media organisations for example. I&#8217;d envisage that &#8220;topicOf.org&#8221; might index a range of different topic related predicates and expose some useful discovery tools, relations and equivalencies. Dan Brickley has <a href="http://www.flickr.com/photos/danbri/3282565132/">a nice diagram that shows how these different predicates inter-relate</a>.</p>
<p>&#8220;topicOf&#8221; is currently top of my list of these predicate based services. But the same approach would work in other contexts. For example a service that indexed <code>foaf:knows</code> would be useful for social networking applications. But I think that this area is already well-served by existing services already. But what about:</p>
<ul>
<li>&#8220;reviewsOf.org&#8221; &#8212; find reviews about a specific resource. I believe Tom Heath has thought about doing something like with for <a href="http://revyu.com">Revyu</a></li>
<li>&#8220;depictionsOf.org&#8221; &#8212; find pictures of a specific resource (<code>foaf:depiction</code>), e.g. person, place or thing (and reliably, not like the Flickr Wrapper)</li>
<li>&#8220;madeBy.org&#8221;> &#8212; find documents, photos, or other resources that were made by a particular person (<code>dc:creator</code>, <code>foaf:maker</code>)</li>
</ul>
<p>I can think of all sorts of useful purposes for these services. I also think that they could offer additional ways of engaging with the broader developer community and getting them to buy into the Linked Data vision.</p>
<p>Anyone want to have a crack at implementing some of these?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/03/predicate-based-services/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Thoughts on Linked Data Business Models</title>
		<link>http://www.ldodds.com/blog/2010/01/thoughts-on-linked-data-business-models/</link>
		<comments>http://www.ldodds.com/blog/2010/01/thoughts-on-linked-data-business-models/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 13:08:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[linked data]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=442</guid>
		<description><![CDATA[Scott Brinker recently published a great blog post covering 7 business models for Linked Data. The post is well worth a read and reviews the potential for both direct and indirect revenue generation from a range of different business models. I&#8217;ve been thinking about these same issues myself recently so I&#8217;m pleased to see that [...]]]></description>
			<content:encoded><![CDATA[<p>Scott Brinker recently published a great blog post covering <a href="http://www.chiefmartec.com/2010/01/7-business-models-for-linked-data.html">7 business models for Linked Data</a>. The post is well worth a read and reviews the potential for both direct and indirect revenue generation from a range of different business models. I&#8217;ve been thinking about these same issues myself recently so I&#8217;m pleased to see that others are doing similar analysis. Scott&#8217;s conclusion that, currently, Linked Data is more likely to drive indirection revenue is sound, and reflects where we are with the deployment of the technology.</p>
<p>The time is ripe though for organizations to begin exploring direct revenue generation models and it&#8217;s there that I wanted to add some thoughts and commentary to Scott&#8217;s posting.</p>
<h2>Traffic</h2>
<p>The traffic model, with its indirect revenue generation by driving traffic to existing content and services, is well understood. The same model has been used to encourage organizations to open up Web APIs, so its natural to consider this for Linked Data also. </p>
<p>Because it is tried and tested it&#8217;s currently one of the strongest arguments for driving adoption of Linked Data, so I&#8217;d put this right at the top of the list. The feedback loop that is in place now with search engines makes that traffic generation a reality.</p>
<h2>Advertising</h2>
<p>Scott mentions adverts as a possible revenue stream and raises the possibility of &#8220;data-layer ads&#8221;, by which I understand him to mean advertising included in the Linked Data itself. While I agree that an advertising model is a potential revenue stream, I don&#8217;t see that &#8220;data-layer ads&#8221; are really viable or actually useful in practice.</p>
<p>Adverts incorporated into raw data will be too easily stripped out or ignored by applications; by definition the adverts will be easily identifiable. RSS advertising doesn&#8217;t seem to have really taken off (I certainly never see them anyway) and I think this is for similar reasons: if the adverts are easily identifiable, then they can be stripped. And if they&#8217;re included in content or data values, then this causes problems for further machine-processing of the data and annoyances for end users. </p>
<p>Of course a business could enforce that users of its Linked Data should display ads through its terms and conditions, e.g. requiring data-layer ads to be displayed in some form to users of an application. In practice this can get problematic, especially if there&#8217;s not an obvious way to surface the ads to end users. But I think its also problematic as unlike a Web API where I sign up to gain access, for an arbitrary Linked Data site, there is no prior agreement required. My crawler or browser might fetch data without any knowledge of what those terms and conditions might be.</p>
<p>Adverts embedded into data is are not a useful way to distribute them to end-users. In an environment where adverts are increasingly profiled by a range of geographic, demographic or behavioural factors, incorporating blanket ads into data feeds loses all of that targetting capability. It also potentially loses the feedback, e.g. on click-throughs or impressions, that are useful for gauging the success of a campaign.</p>
<p>In my view advertising as a model to support Linked Data publishing is more likely to echo that used by the Guardian as part of its <a href="http://www.guardian.co.uk/open-platform/terms-and-conditions">Open Platform terms and conditions</a> (See Section 8, Advertising and Commercial Usage). The terms require users of the content to display ads from Guardian&#8217;s advertising network on its website. This avoids the need to include adverts in the data layer and supports a conventional model for delivering ads, making it play well with current advertising platforms and targeting options.</p>
<h2>Subscriptions</h2>
<p>As Brinker notes, subscription models for data, content and services have been around for some time. The interesting thing is to see how these models have been evolving of late due to pressures in various industries, and how these intersect with the open data movement. For Linked Data to be most useful some of its needs to be free: you need to make at least a bare minimum of data freely available, e.g. to identify objects of interest, to enable <a href="http://www.ldodds.com/blog/2009/12/annotated-data/">annotation</a> and linking, etc. In my opinion a freemium model is the core of any subscription model for Linked Data.</p>
<p>Having previously worked in the academic publishing industry which is very heavily driven by subscription revenues, I&#8217;ve noticed a number of models that have come to the fore there, most recently driven by the Open Access movement. I think many of these are transferable to other contexts. So while the particulars will vary in different industries, the means of slicing up data into subscription packages are likely to be repeatable.</p>
<p>All of the following assume that some basic element of the Linked Data is free, but that one is paying for:</p>
<ul>
<li>Full Access &#8212; Pay for access to detailed, denser data. The value-added data might include richer links to other datasets, more content, etc</li>
<li>Timely Access &#8212; Pay for access to the most recent, or more current version of the data. This leaves the bulk of the data open but delivers a commercial advantage to subscribers. As data gets older, it automatically becomes free</li>
<li>Archival Access &#8212; Putting archives of content, or large archival datasets on-line can be expensive in terms of data conversion, digitization, and service provision. So deep archives of data might only be available to subscribers. Commercial advantage derives from having more data to analyse and explore.</li>
<li>Block Access &#8212; paying for access to a dataset based on time, e.g. &#8220;for the next 24 hours&#8221;; or based on the number, frequency of accesses; or the number of concurrent accesses.</li>
<li>Convenient Access &#8212; paying for access to the data through a specific mechanism. This might seem at odds with Linked Data, but its reasonable to assume that some organizations might want data feeds or dumps rather than on-line only access. This might come at a premium.</li>
</ul>
<p>These variants can combined and might also be separated out into personal (non-commercial) and commercial subscription packages. </p>
<p>It&#8217;s interesting to see how some of these (Timely Access, Convenient Access) are <a href="http://blog.okfn.org/2009/11/27/featured-project-musicbrainz/">already in use in projects like Musicbrainz</a> that blend Open Data with commercial models.</p>
<h2>Sponsorship</h2>
<p>One model that Scott Brinker doesn&#8217;t mention in his posting is Sponsorship. An organization might be funded to publish Linked Data, e.g. for the public good. The organization itself might be a charity and funded by donations. </p>
<p>It&#8217;s arguable that this might be more about cost recovery for service provision rather than a true business model, but I think its worth considering. Some of the open government data publishing efforts and possibly even the Linked Data from the BBC, could be seen as falling into this category. </p>
<p>It&#8217;s probably most viable for public sector, cultural heritage and similar organizations.</p>
<h2>Closing Thoughts</h2>
<p>What needs to happen to explore these different models? Is it just a matter of individual organizations experimenting to see what works and what doesn&#8217;t? </p>
<p>I think that is largely the case, and we&#8217;ll definitely be seeing that process begin to happen in earnest in 2010; a process that we&#8217;ll be supporting and enabling with the <a href="http://www.talis.com/platform">Talis Platform</a>.</p>
<p>From a technical perspective I&#8217;m interested to see how well protocols like OAuth and FOAF+SSL can be deployed to mediate access to licensed Linked Data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2010/01/thoughts-on-linked-data-business-models/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Thoughts on Enterprise Linked Data</title>
		<link>http://www.ldodds.com/blog/2009/12/thoughts-on-enterprise-linked-data/</link>
		<comments>http://www.ldodds.com/blog/2009/12/thoughts-on-enterprise-linked-data/#comments</comments>
		<pubDate>Sun, 27 Dec 2009 17:31:32 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[linkeddata]]></category>

		<guid isPermaLink="false">http://www.ldodds.com/blog/?p=434</guid>
		<description><![CDATA[There have been a number of discussions about &#8220;Enterprise Linked Data&#8221; recently, and I took part on a panel on precisely that topic at ESTC 2009. Unfortunately the panel was cut short due to time pressures so I didn&#8217;t get chance to say everything I&#8217;d hoped. In lieu of that debate here&#8217;s a blog post [...]]]></description>
			<content:encoded><![CDATA[<p>There have been a number of discussions about &#8220;Enterprise Linked Data&#8221; recently, and I took part on a panel on precisely that topic at <a href="http://www.estc2009.com/">ESTC 2009</a>. Unfortunately the panel was cut short due to time pressures so I didn&#8217;t get chance to say everything I&#8217;d hoped. In lieu of that debate here&#8217;s a blog post containing a few thoughts on the subject.</p>
<p>When we refer to enterprise use of Linked Data, there are a number of different facets to that discussion which are worth highlighting. In my opinion the issues and justifications relating to each of them are quite different. So different in fact that we&#8217;re in danger of having a confused debate unless we tease out this different aspects.</p>
<h2>Aspects of the Debate</h2>
<p>In my view there are three facets to the discussion:</p>
<ul>
<li><em>Publishing</em> Linked Data, the key question here being: What does an Enterprise have to benefit by publishing Linked Data?</li>
<li><em>Consuming</em> Linked Data: What does an Enterprise have to benefit from consuming Linked Data?</li>
<li><em>Adopting</em> Linked Data: What benefits can an Enterprise gain by deploying Linked Data technologies internally?</li>
</ul>
<p>I think these facets whilst obviously closely related are largely orthogonal. For example I could see a scenario in which an organization consumed Linked Data but didn&#8217;t store or use it as RDF, but just fed it into existing applications. Similarly businesses could clearly adopt Linked Data as a technology without publishing or using any data to the web at all.</p>
<p>These issues are also largely orthogonal to the Open Data discussion: an enterprise might use, consume and publish Linked Data but this might not be completely open for others to reuse. The data may only be available behind the firewall, amongst authorised business partners, or only available to licensed third-parties. So, while the issue as to whether to publish open data is a very important aspect of the discussion, its not a defining one.</p>
<p>Here&#8217;s a few thoughts on each of these different facets.</p>
<h2>Publishing Linked Data</h2>
<p>So why might an enterprise publish Linked Data? And if that is a worthwhile goal, then is it clear how to achieve it? Lets tackle the second question first as its the simplest. </p>
<p>There is an increasingly large amount of good advice available online, as well as tools and applications, to support the publishing of Linked Data. We&#8217;re making good strides towards making the important transition from moving Linked Data out of the research area and into the hands of actual practitioners. The <a href="http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/">How to Publish Linked Data on the Web</a> tutorial is an great resource but to my mind Jeni Tennison&#8217;s <a href="http://www.jenitennison.com/blog/node/135">recent series on publishing Linked Data</a> is an excellent end-to-end guide full of great practical advice. </p>
<p>We can declare victory when someone writes the O&#8217;Reilly book on the subject and do for Linked Data what <a href="http://oreilly.com/catalog/9780596529260">RESTful Web Services</a> did for REST. (And the two would make great companion pieces).</p>
<p>But technology issues aside, what are the benefits to an organization in publishing Linked Data? There are several ways to approach answering that question but I think in most discussions Linked Data tends to get compared with Web APIs. The value of creating an API is now reasonably well understood, and many of the benefits that come from opening data through an API also apply to Linked Data. </p>
<p>However the argument that Linked Data married with a SPARQL endpoint is as easy for developers to use as a Web API is still a little weak at this stage. SPARQL can be off-putting for developers used to simpler more tightly defined APIs. As a community we ought to consider it as a power tool and look for ways to make it easier to get started with. It&#8217;s also worth recognising that a search API is also a useful addition to a SPARQL endpoint as part of Linked Data deployment.</p>
<p>But publishing Linked Data can&#8217;t be directly compared to just creating an API, because its also largely <a href="http://www.bbc.co.uk/blogs/radiolabs/2009/01/how_we_make_websites.shtml">a pattern for web publishing in general</a>. Its increasingly easier to instrument existing content management systems to expose RDF(a) and Linked Data. So rather than create a custom API, which will involve expensive development costs, particularly if its going to scale, its possible to simply expose Linked Data as part of an existing website. </p>
<p>By following the Linked Data pattern for web publishing, in particular the use of strong identifiers, an enterprise can end up with a single point of presence on the web for publishing all of its human and machine-readable data, resulting in <a href="http://priyankmohan.blogspot.com/2009/12/online-retail-how-best-buy-is-using.html">a website that is strongly Search Engine Optimised</a>. Search engines can better crawl and index well structured websites and are increasingly ingesting embedded RDFa to improve search results and rankings. That&#8217;s a strong incentive to publish Linked Data by itself.</p>
<p>Adopting Linked Data, particularly as part of a reorganization of an existing web presence, could deliver improved search engine rankings and exposure of content whilst saving on the costs of developing and running a custom API. The longer term benefits of being part of the growing web of data can be the icing on the cake.</p>
<h2>Consuming Linked Data</h2>
<p>Next we can consider why an enterprise might want to consume Linked Data. </p>
<p>To my knowledge organizations are currently only publishing Linked Open Data (albeit with <a href="http://www.flickr.com/photos/ldodds/4043803502/">some wide variations in licensing terms</a>), so we&#8217;ll skip for the present whether enterprises have an option of consuming non-open Linked Data, e.g. as part of a privately licensed dataset.</p>
<p>The LOD Cloud is still growing and provides a great resource of highly interlinked data. The main issues that face an organization consuming this data are ones of quantity (there&#8217;s still a lot more data that could be available); quality (how good is the data, and how well is it modelled); and trust (picking and choosing reliable sources). </p>
<p>To some extent these issues face any organization that begins relying on a third-party API or dataset. However at present a lot of the data in the LOD cloud is still from secondary sources. The same can&#8217;t be said for the majority of web APIs, which tend to be published by the original curators of the data.</p>
<p>These issues should resolve themselves over time as more primary sources join the LOD cloud. Because Linked Data is all based on the same data model bulk loading and merging data from external sources is very simple. This gives enterprises the option of creating their own mirrors of LOD data sources which will provide some additional reassurances around stability and longevity.</p>
<p>Linked Data, with its reliance on strong identifiers, is much easier to navigate and process than other sources, even if you&#8217;re not storing the results of that processing as RDF. There&#8217;s also a much greater chance of serendipity, resulting in the discovery of new data sources and new data items. Whereas there is virtually no serendipity in a Web API as each API needs to be explicitly integrated.</p>
<p>But this benefit is only going to become evident if we continue to put effort into helping (enterprise) developers understand how to consume Linked Data. E.g. as part of existing frameworks or using new data integration patterns is another area that needs more attention. The <a href="http://www.consuminglinkeddata.org/">Consuming Linked Data</a> tutorial at ISWC 2009 was a good step in that direction, although the message needs to be circulated wider, outside of the core semantic web community. </p>
<p>In my opinion it will be easier for enterprises to consume Linked Data if they first begin to publish it. By publishing data they are putting their identifiers out into the wild. These identifiers become points for <a href="http://www.ldodds.com/blog/2009/12/annotated-data/">annotation and reuse</a> by the community, creating <a href="http://www.ldodds.com/blog/2009/11/linked-data-liminal-zones/">liminal zones</a> from which the enterprise can harvest and filter useful data. This is a benefit that I think is unique to Linked Data as with an Web API the end results are typically mashups or widgets displaying in a third-party application; these are just new silos one step removed from the data publisher.</p>
<h2>Adopting Linked Data</h2>
<p>Finally, what value could be gained if an organization adopts Linked Data internally as a means to manage and integrate data behind the firewall?</p>
<p>The issues and potential benefits here are largely a mixture of the above, except that there are little or no issues with trust as all of the data comes from known sources. In a typical enterprise environment Linked Data as an integration technology will be compared to a wider range of systems ranging from integrated developer tools through to middleware systems. There&#8217;s a reason why SOAP based systems are still well used in enterprise IT as most organizations aren&#8217;t (yet?) internally organized as if they were true microcosms of the web.</p>
<p>Its interesting to see that Linked Data can potentially provide a means for solving many of the issues that <a href="http://en.wikipedia.org/wiki/Master_Data_Management">Master Data Management</a> is trying to address. Linked Data encourages strong identifiers; clean modelling; and linking to, rather than replicating data. These are core issues for data consolidation within the enterprise. Coupled with the ability to link out to data that is part of the LOD Cloud, or published by business partners, Linked Data has the potential to provide a unifying infrastructure for managing both internal and external data sources.</p>
<p>Its worth noting however that semantic technologies in general, e.g. document analysis, entity extraction, reasoning and ontologies seem to be much more widely deployed in enterprise systems than Linked Data. This is no doubt in large part because the advantages of those technologies may currently be much more easily articulated as they&#8217;re more easily packaged into a <em>product</em>. </p>
<h2>Summary</h2>
<p>In this post I wanted to tease out some of the questions that underpin the discussions about enterprise adoption of Linked Data. I&#8217;ve presented a few thoughts on those questions and I&#8217;d love to hear your opinions.</p>
<p>Along the way I&#8217;ve attempted to highlight some areas where we need to focus to help transition from a researcher-led to a practioner-led community. More data, more documentation, and more tools are the key themes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ldodds.com/blog/2009/12/thoughts-on-enterprise-linked-data/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

