Came across DinnerBuzz after reading about it on You're It: Yummier and Yummier.
OK, cool, this is somewhere I can collect my restaurant recommendations/reviews, perhaps more usefully than just tagged as Restaurant on del.icio.us.
So I go to add a review of the Orient Cafe in Oxford, and what do I get:
You've submitted a place that we've never heard of! We're going to try to figure out why, and will publish your post soon.
I should add this one to my other recommendations: if you're building a social content application that includes use of geographical data, then make sure that you're aware of geography outside of the United States! At least 43 Places does (link via the pants people).
So another couple of silos in which I can place my microcontent.
Wonder how far until we reach the tipping point where its more cost effective and easier to build new social content sites, similar to these efforts, from data that's already published, wild on the web, than growing a community from scratch.
Personally I don't think we're actually that far away.
Just came across this via Flickr ("Shiny New Toy"): Yahoo Search My Web 2.0 Beta. See also the obligatory product blog and developer APIs.
Combines social networking and search to limit your search results based on trust metrics. I've not been able to get very far into the site yet to try it out. At first pass it looks like its relying on your cached (you can save pages from the web) and tagged pages to seed the recommendations to others. Also looks like your social network is limited to others with Yahoo accounts. Which is a limited view of my social network.
I'd probably say that "Act III", to borrow Mayfield's term, would be letting go of the ownership of the social network. Just let me import or point to contacts, various tagging systems, etc. Use the FOAF.
Some thoughts on the Simple List Extensions Specification. I've been waiting a few days as I wanted to get a feel for what problems are being addressed by the new module; it's not clear from the specification itself. Dare Obasanjo has summarised the issues, so I now feel better armed to comment.
My first bits of feedback on the specification are mainly editorial: include some rationale, include some examples, and include a contact address for the author/editor so feedback can be directly contributed. There's at least one typo: where do I send comments?
The rest of the comments come from two perspectives: as an XML developer and as an RSS 1.0/RDF developer. I'll concentrate on the XML side as others have already made most of the RDF related points.
First up, I think the namespace URI should resolve to the specification.
The intent of the cf:treatAs element is unclear. The specification merely says that a consuming application should "treat the content of the feed as if it represents a complete, ordered list of content from the server". One presumes that the intent is that this element is a switch: if it's present, then the application should be prepared to apply some specific processing rules. But it's unclear.
Dare Obasanjo explains that:
To solve the first problem Microsoft has provided the cf:treatAs element with the value "list" to be used as a signal to aggregators that whenever the feed is updated that the previous contents should be dumped or archived and replaced by the new contents of the list.
But the behaviour he describes -- dumping, archiving, or replacing the contents of the list -- is not in the specification. That's a big hole in my view.
The background section in the specification also alludes to different processing models for certain types of feed, but again these are not properly described:
...a feed that contains the entire collection of items on the server should be processed differently from a feed that contains only the most recently added or updated items.
My suggestion is to remove the need for the cf:treatAs element to have content: make it an empty element. Unless there are going to be future revisions of the specification that allow for other types of treatment, in which case this ought to be described.
The rest of the specification has several aims:
Back in the day this is the kind of data that would go in a schema and not an instance document. This avoids redundancy in terms of repeated definition of the same data, and removes verbosity from feeds.
In other words it avoids millions of RSS feeds all having declarations that dc:created is a date and can be sorted. Weren't we all recently worried about the internet grinding to its news about the volume of RSS traffic? Or did that problem get solved already?
RSS aggregators really ought to rely on schema, prcessed at runtime or baked into the application, to guide these decisions.
The specification currently allows feed authors to associate three different data types with elements: date, text, number.
XML Schema provides a way for instance elements to be labelled with types: the xsi:type attribute. I'm not immersed enough in XML Schema to seriously suggest it as an alternative -- there are issues, apparently -- but it did spring to mind immediately when I read the specification.
The full breadth of XML Schema Datatypes are surely overkill for RSS feeds, but one would assume that tieing descriptions to a formal type system would be a good thing, making it easer to define sort orders, legal lexical values, etc.
I think there's also some internationalisation issues lurking here.
The specification also leaves it unclear which of the sorting options specified in the RSS feed had already been applied when the feed was produced. One must assume its one of the options, otherwise aggregators may have to add "original order" as an option as well the feed specified sort options.
The following may result from reading too much Walter Perry on XML-DEV (time well spent though!). Perry has always advocated the position -- and I'm paraphrasing many a dense posting here, so apologies if I'm misrepresenting his views -- that ultimately its the consumer who decides how data is processed, not the producers. In short, the consumer may have vary different ideas about how the data will be used, and all the producer can provide are suggestions, or cues on how the data could be processed.
In the context of the Simple List Extensions specification one has to wonder why the "sort by any element" and "completely replace all items when reloading this feed" features aren't already provided for in RSS aggregators. There's no need for a simple list specification at all.
The two issues that Dare describes can be implemented by adding additional options on the client, without requiring changes to feed content.
To express this differently, once an aggregator developer has extended her application to include sorting, grouping and other options, are these going to be limited solely to those feeds with an cf:treatAs element, or supported, albeit in limited form, across all feeds?
What's needed, IMO are better controls over how aggregators manage and process list on my behalf.
The "Background" section opens with this statement:
A feed is a collection of items
This is true. But, to be precise, an RSS 1.0 feed is a specific type of collection: a list. So there's some misunderstanding here already. Sure, many RSS 1.0 consumers are almost certainly ignoring the rdf:Seq in the syntax, but it's still there and there's a defined meaning. Even if you're not using an RDF parser. An RSS 1.0 feed is an ordered list of items. The ordering criteria are unspecified, but it's a list nevertheless.
Just pretend that there's an "RDF List Extensions" module with two elements: rdf:Seq and rdf:li and go at it.
The other points to make relate to how the RDF syntax and model makes it easier to add the kind of annotation that the Simple List Extension specification is trying to make:
rdf:Seqrdf:datatyperdf:labelOnly the former is intrusive in the syntax, the last two being attributes. Once you've assigned a type, there's really no need to declare whether a value is sortable or groupable: everything should be ripe for sorting and grouping.
In short, I think that while the Simple List Extensions module is generally trying to solve some real problems, I don't believe that this necessarily requires a new RSS module, just new aggregator functionality, perhaps supplemented by schema annotation.
However innovation in the RSS world seems inextricably linked to creation of new modules, so perhaps this is inevitable.
The specification itself needs work as despite being very simple, there are several grey areas that need clarifying.
Today's lunchtime special involves fun with the Jena 2 rule engine.
I've been wondering for a while whether it'd be possible to extracting richer metadata from tagging conventions. Of course it's possible, I'm just playing with different ways to achieve it. Quick XSLT conversions are my normal method of choice, but I wanted to have a play with RDF rules, and this seemed like an opportune time.
Actually what triggered this was something that Damian Steer said at XTech (Damian, apologies is I'm misquoting you, or misattributing the idea): "Rules are like XSLT for RDF". It's a loose analogy of course, as rule engines don't typically have the power of XSLT which is a complete language. Although Jena is extensible.
So I decided to see how far I could go. This RSS 1.0 feed is being produced by del.icio.us. I've bookmarked some friends 'blogs, tagging them with "Me/Friends". I've also entered the name of the author in the description field.
Can I turn this into an RDF document that contains the following data:
foaf:weblogIt turns out I can...
I don't have time to write this up fully, so for now, here's the rules:
@prefix rss: <http://purl.org/rss/1.0/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
[rssItemsAreFoafDocuments:
(?C rdf:type rss:item)
->
(?C rdf:type foaf:Document) ]
[preferDCTitles:
(?C rdf:type rss:item), (?C rss:title ?title)
->
(?C dc:title ?title)]
[createPersonFromDescription:
(?C rss:description ?author),
(?C dc:creator ?myNick),
makeTemp(?person)
->
(?person rdf:type foaf:Person),
(?C foaf:maker ?person),
(?person foaf:weblog ?C),
(?person foaf:name ?author)]
[thisChannelIsMe:
(?C rdf:type rss:channel),
makeTemp(?person)
->
(?person rdf:type foaf:Person)
(?person foaf:maker ?C)
(?person foaf:nick 'ldodds')]
[iKnowMyFriends:
(?C rdf:type rss:item),
(?C foaf:maker ?friend),
(?C dc:creator ?creator),
(?me foaf:nick ?creator)
->
(?me foaf:knows ?friend)]
You can test these out using the jena.RuleMap command-line tool thats bundled with Jena 2.2. Pass it a parameter of "-ol RDF/XML" to get RDF/XML output and a "-d" to only see the inferred triples and you end up with this:
<rdf:RDF
xmlns="http://purl.org/rss/1.0/"
xmlns:admin="http://webns.net/mvcb/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
xmlns:dc="http://purl.org/dc/elements/1.1/" >
<rdf:Description rdf:nodeID="A0">
<foaf:knows rdf:nodeID="A1"/>
<foaf:nick>ldodds</foaf:nick>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<foaf:knows rdf:nodeID="A2"/>
<foaf:knows rdf:nodeID="A3"/>
<foaf:maker rdf:resource="http://del.icio.us/ldodds/Me/Friends"/>
<foaf:knows rdf:nodeID="A4"/>
<foaf:knows rdf:nodeID="A5"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A1">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<foaf:name>Geoff Bilder</foaf:name>
<foaf:weblog rdf:resource="http://breakawayrepublic.com/blog/"/>
</rdf:Description>
<rdf:Description rdf:about="http://journal.dajobe.org/journal/">
<dc:title>Dave Beckett - Journalblog</dc:title>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
<foaf:maker rdf:nodeID="A2"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A5">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<foaf:weblog rdf:resource="http://www.hackdiary.com/"/>
<foaf:name>Matt Biddulph</foaf:name>
</rdf:Description>
<rdf:Description rdf:about="http://planb.nicecupoftea.org/">
<dc:title>Plan B</dc:title>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
<foaf:maker rdf:nodeID="A3"/>
</rdf:Description>
<rdf:Description rdf:about="http://breakawayrepublic.com/blog/">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
<foaf:maker rdf:nodeID="A1"/>
<dc:title>Louche Cannon</dc:title>
</rdf:Description>
<rdf:Description rdf:nodeID="A2">
<foaf:name>Dave Beckett</foaf:name>
<foaf:weblog rdf:resource="http://journal.dajobe.org/journal/"/>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A3">
<foaf:weblog rdf:resource="http://planb.nicecupoftea.org/"/>
<foaf:name>Libby Miller</foaf:name>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
</rdf:Description>
<rdf:Description rdf:about="http://usefulinc.com/edd/blog">
<foaf:maker rdf:nodeID="A4"/>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
<dc:title>Edd Dumbill's Weblog: Behind the Times</dc:title>
</rdf:Description>
<rdf:Description rdf:about="http://www.hackdiary.com/">
<dc:title>hackdiary</dc:title>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
<foaf:maker rdf:nodeID="A5"/>
</rdf:Description>
<rdf:Description rdf:nodeID="A4">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<foaf:weblog rdf:resource="http://usefulinc.com/edd/blog"/>
<foaf:name>Edd Dumbill</foaf:name>
</rdf:Description>
</rdf:RDF>
Which is pretty much exactly what I wanted.
I could achieve exactly the same with XSLT, but the rules are much more succinct. Obviously this example is a bit contrived as I'm unlikely to extract my social network from an RSS feed; del.icio.us only keeps the most recent entries anyway. But as a means to create and use microcontent, e.g. book reviews, events metadata, etc, it seems pretty useful.
Phil Wilson is also thinking about how to subscribe to someone's brain.
Phil has hacked his aggregator to attempt to discover as many RSS feeds as possible starting from autodiscovery of someone's FOAF description. Nice work. It's also closer to the original lazyweb request as its applying some rules to try and find additional data not necessarily linked from the original FOAF description.
Phil writes that:
What's really needed is a quick and simple form for people to either create a new FOAF file with their online account details in it, or which will accept a FOAF URL as well, and merge the details in...
I'd add this to the FOAF-a-Matic but have been wary about extending it due to the need to chase down all the different translations. So for now, here's the results of todays lunchtime hack: FOAF Online Account Description Generator. Which is a long name for a very hacky JSP page and worse HTML form.
Anyway, the idea is that you fill in the location of your FOAF file, sha1sum (for smushing) and a bunch of usernames. Click the button and it generates simple metadata connecting you to each service, including links to RSS channels where these can be easily guessed from the username. Unfortunately this isn't possible for all services, especially Upcoming.org and Flickr that use some internal id in the links.
I'll leave you with an example URL.
You can also pipe these URLs through yesterdays hack to generate an OPML file for importing.
Over Thai food last week, Geoff and I were chatting about subscribing to RSS feeds for all of a person's outputs. No just blogging but bookmarking, listening and other activities.
Its a topic that's seen some previous discussion. I've written about my life in RDF, Jo Walsh has discussed externalising absolutely everything and Morten Frederiksen maintains his own personal planet feed which is similar to John Resig's Life as RSS plans. Of course there's also MeNow which aims for a more real-time view of someone's activities. Feedburner's ability to splice RSS feeds into a single synthetic feed operates in a similar area.
Geoff had a nice phrase for this: Subscribing to someone's brain. As he notes in his blog entry, the problem is in discovering someone's output then getting to the point where its easy to add to an RSS aggregator.
I've taken an initial stab at implementing this by writing a little JSP page that, given a FOAF URL and an mbox_sha1sum, produces an OPML document listing all the RSS channels seeAlso'd from that document. This OPML feed can then be imported directly into an aggregator.
The URL is this: http://www.ldodds.com/micro-util/brain-subscribe.jsp?foaf=URL&mbox_sha1sum=SHA1
and here's a live example.
Here's a quick and dirty form that'll help you along. This was just a lunchtime hack, so its still very rough. An autodiscovery bookmarklet would be nice also.
To add data to your FOAF document to enable people to use this service, you need to add sections like the following within your foaf:Person description:
<rdfs:seeAlso>
<rss:channel rdf:about="http://del.icio.us/rss/ldodds">
<dc:title>del.icio.us/ldodds</dc:title>
<dc:description>del.icio.us bookmarks as
an RSS 1.0 news feed</dc:description>
</rss:channel>
</rdfs:seeAlso>
See my FOAF file for a number of examples.
I was very pleased to see this post pop up on PlanetRDF: Stress test your triple store. Ten million triples from the Swoogle cache ready for download.
As it happens I'm trying to get sign off at the moment to release part of our data set for research purposes. Not confident of how far I'm going to get as there are a number of different parties that would have to agree, but I have my fingers crossed.
Katie and Priya have been doing some sterling work; a 200M triple data set ain't that easy to work with. So far we've found that Jena on Postgres has proved to be the most stable. We've had problems with both Kowari and Sesame. In some cases we've been able to resolve them. Query performance times on that size of data set are (not surprisingly) really slow, but accessing resources directly (i.e. by URI) is just fine. We'll produce a more structured report as soon as we can.
It strikes me that the search/text retrieval community benefited from having large test collections, I think the RDF community needs something similar. It's not hard to generate synthetic triples, but they can't compare to real data sets for comparison purposes. Seeing the Swoogle data be released is great news.
I have a confession to make: Revenge of the Sith is the first Star Wars film I've seen at the cinema. And I'm 33.
That's probably enough to get me ostracised from geek society. I was old enough to see at least Jedi when it came out. And surely I should have been frothing at the mouth to see the recent films? I can't put my finger on why I never bothered though.
For the earlier films it's easy, as a family we were never great cinema goers, but we did enjoy gathering to watch the family movies on TV at Christmas. So Star Wars for me brings back memories of sitting with the rest of the extended family in a darkened room, post Xmas slap-up dinner. It was never really connected with the cinema experience.
For the later films I think my expectations were just low. And Lucas managed to fall short of even those. Episode I was pants, but Episode II was better. Simply because it was slightly darker and, frankly, more mature.
So you can see why I was grinning all over my fizzog after watching The Sith last week. And why my wife asked me, the next morning, "Are you going to do that all day?" in response to my unconsciously humming the Imperial March whilst giving the kids their breakfast.
"Dark" doesn't quite cover it.
Sith repaints Vader as quite a different character. IMO, in the latter films Vader was more of a comic book villian: evil, but in an implied, even reserved kind of way. OK, so he offed a few people, but that's par for the course for your average villian, let alone a Dark Lord of the Sith. In contrast, The Vader in Episode III is a nasty piece of work. He lives up to his potential.
I'm definitely going to have to go see it again. Not for the effects, as I found them to be standard Star Wars fare. The segue into the stylings of the later films was very nicely handled though. No, amazingly, I'm going to see it for the story. Really.
Here's another final confession that will definitely seal my fate: Revenge of the Sith is better than The Empire Strikes Back. It's. The. Best. Star. Wars. Ever. Evah I say.
Mind you, I still think Lucas should have given Dave Prowse a pop at wearing the Vader suit for old times sake.
And did you know that Vader has a blog now? Subscribed
In an attempt to put my various projects into the Public Domain, it seems that I've caused some confusion.
All I want to do is the following: label my code as being in the Public Domain, but require that people at least acknowledge the fact that they're using something I wrote. I'd prefer it if people didn't take anything I wrote and make a quick buck out of it, but I'm not adverse to my code being bundled in a payware application. But that's a nice to have, I basically just want to give stuff away.
This lead me to start adding Creative Commons licences to my work. The Attribution-ShareALike licence seemed to exactly cover my requirements. Previously I'd either not included a licence, or labelled it as "Public Domain". But I'd seen some code I wrote used verbatim with someone else's name on it and that naturally upset me. I won't go into details about who or where, but how hard is it to add an @author tag to Java source (or better yet, leave the one that's already in there)?
So anyway, the CC licence seemed to fit. However when the Jaikoz developers contacted me a few months ago about reusing my MusicBrainz API they weren't sure whether they could, as their application is payware. I said they could.
This week Henning Koch emailed me under similar confusion: would his application have to be similarly licensed. I didn't think so, and that certainly wasn't my intention. Koch pointed me at the CC FAQ entry that I'd stupidly overlooked:
CC licenses are not written for software. They should not be used for software...
But which one of the many licences should I use? Why does it have to be so difficult to give stuff away? I know creating new open source licences is discouraged but to be frank, its not that easy to pick and choose, and I'm not sure I want to wade through endless legal documents: I want to give stuff away, but be acknowledged. That's it.
Why does open sourcing software have to be so difficult? It seems to me that the Creative Commons folk could help clean up this mess. They're wading into scientific research, so why not software?
A nice example of how broken software licencing is, is this summary of the creative commons licences from a Debian perspective. Conclusion: they're not free. Debian has a reputation for being particularly prescriptive, but this seems a little barking.
I guess the answer is either RTFL (Read the F'ing Licence), or just switch back to a plain "This work is in the Public Domain" statement.
Back in February I posted some sample sparql queries that might be useful as additional examples of the SPARQL syntax. Since then we've had several new drafts including some syntax changes. In this post I'm including updated versions of all queries. Except for one that is, see later discussion). I've also thrown in a few more for good measure, and some notes on other things that I can't find a way to do (so thats where you can help dear reader...).
Oh, and despite my initial grumblings I've not found the tweaked syntax too troublesome.
List all of the names, weblogs and encrypted emails of people defined in http://www.ldodds.com/ldodds-knows.rdf
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?weblog ?sha1
FROM <http://www.ldodds.com/ldodds-knows.rdf>
WHERE
{
?x foaf:name ?name.
?x foaf:weblog ?weblog.
?x foaf:mbox_sha1sum ?sha1
}
That doesn't list everyone in the document though, as not all people in my "knows" description have weblogs. We can amend the query to become the equivalent of an SQL OUTER JOIN to retrieve everyone with a name, encrypted email, but optionally a foaf:weblog property:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?weblog ?sha1
FROM <http://www.ldodds.com/ldodds-knows.rdf>
WHERE
{
?x foaf:name ?name.
OPTIONAL {?x foaf:weblog ?weblog.}
?x foaf:mbox_sha1sum ?sha1.
}
'List the titles and publication dates of documents written by someone with the name "Leigh Dodds"
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?title ?date
FROM <http://www.ldodds.com/ldodds-documents.rdf>
WHERE
{
?x dc:title ?title.
?x dc:created ?date.
?x foaf:maker ?me.
?me foaf:name "Leigh Dodds".
}
List the state and city for all Australian airports
PREFIX air: <http://www.daml.ri.cmu.edu/ont/AirportCodes.daml#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?state ?city
FROM <http://www.daml.ri.cmu.edu/ont/AirportCodes.daml>
WHERE
{
?x air:country "Australia".
?x air:city ?city.
?x air:state ?state.
}
Find all the European princesses and the date they got married
PREFIX ged: <http://www.daml.org/2001/01/gedcom/gedcom#>
SELECT ?name ?marriedOn
FROM <http://www.daml.org/2001/01/gedcom/royal92.daml>
WHERE
{
?royal ged:title "Princess".
?royal ged:name ?name.
?royal ged:spouseIn ?family.
?family ged:marriage ?marriage.
?marriage ged:date ?marriedOn.
}
List the names, symbols, atomic weights and numbers of all the Noble Gases
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?symbol ?weight ?number
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{ ?element table:group ?group.
?group table:name "Noble gas".
?element table:name ?name.
?element table:symbol ?symbol.
?element table:atomicWeight ?weight.
?element table:atomicNumber ?number. }
New: Now that SPARQL has an ORDER BY clause, we can improve this query to return the Noble Gases in order of their atomic number:
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?symbol ?weight ?number
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{ ?element table:group ?group.
?group table:name "Noble gas".
?element table:name ?name.
?element table:symbol ?symbol.
?element table:atomicWeight ?weight.
?element table:atomicNumber ?number. }
ORDER BY ASC[?number]
The query that I've not been able to rewrite is the following:
List the name, mbox of all contributors to the 2003 Dublin Core conference, along with the title of the paper they (co-)authored
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?name ?mbox ?title
FROM <http://www.siderean.com/dc2003/dc2003_presentations.rdf>
FROM <http://www.siderean.com/dc2003/dc2003_agents.rdf>
WHERE
{?person foaf:publication ?doc.
?doc dc:title ?title.
?person foaf:mbox ?mbox.
?person foaf:name ?name.}
Notice that this one queries two sources, relying on the RDF graph to be automatically merged. There's currently an open issue in the RDF DAWG that relates to this kind of "union query" see fromUnionQuery. Earlier SPARQL specifications allowed multiple FROM and FROM NAMED clauses, while the current draft only allows a single FROM clause, but multiple NAMED. It may be possible to recast that query using FROM NAMED but I can't get my head round it yet.
While its likely that in most cases SPARQL queries are going to be run against a pre-defined data set, the ability to do ad hoc aggregation of sources like this is very useful IMO. So I'm hoping that the issue will be resolved in favour allowing union queries.
New: Here are some new queries that demonstrate a few other SPARQL features. For example the next one demonstrates the SPARQL FILTER clause, making use of the RDF index that I've added to my blog.
List all documents from http://www.ldodds.com/blog/blog-scutterplan.rdf in order of publication date, that have been published in 2005 and that mention SPARQL in the title.
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?title ?date
FROM <http://www.ldodds.com/blog/blog-scutterplan.rdf>
WHERE
{
?doc dc:title ?title.
?doc dc:created ?date.
FILTER ?date > xsd:dateTime("2005-01-01T00:00:00Z").
FILTER REGEX(?title, "sparql", "i").
}
ORDER BY ASC[?date]
List all the name, weight and atomic number of all elements in the periodic table that are heavier than uranium
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
SELECT ?name ?symbol ?weight ?number
FROM <http://www.daml.org/2003/01/periodictable/PeriodicTable.owl>
WHERE
{
?uranium table:name "uranium".
?uranium table:atomicWeight ?uraniumWeight.
?element table:name ?name.
?element table:symbol ?symbol.
?element table:atomicWeight ?weight.
?element table:atomicNumber ?number.
FILTER ?weight > ?uraniumWeight.
}
ORDER BY ASC[?weight]
List all Egyptian pharaohs of the 13th Dynasty, in name order
PREFIX p: <http://www.historical-id.info/ns/person/1.0#>
SELECT ?name ?born
FROM <http://www.historical-id.info/files/egypt-pharaohs.rdf>
WHERE
{?pharaoh p:first-name ?name.
?pharaoh p:description ?desc.
?pharaoh p:birth-date ?born.
FILTER REGEX(?desc, "13th Dynasty.", "i").
}
ORDER BY ASC[?name]
Now, at this point I was fishing about in some of the other historical metadata I found linked from rdfdata.org and found this list of UK Kings. You'll notice that the descriptons include the dates of their reigns.
Ideally this would be separately marked up, but in theory a bit of string manipulation could extract it. This would be useful in a SELECT expression. And using a CONSTRUCT query one could scrape out a few more useful triples. However, I couldn't find a way to do this. The grammar doesn't seem to allow this and some brief experiments confirmed it. Hopefully I've missed something obvious.
Update: No I'm not missing anything. According to dajobe SPARQL doesn't have any assignment and so there's no way to assign values to variables. So the answer in this case is that if I want the dates of those reigns I'll have to fix up the data in some way. And so, in general, if you want to return a literal value from a SPARQL query, it needs to be explicit in the data source. It's either that, or you have to pre/post-process the data to extract what you want.
In the above example I could improve the data by publishing additional triples that can be used to annotate the original. But then I'd need a union query to merge them!
I've just uploaded the latest version of Twinkle: a SPARQL query tool. The project now has its own homepage and DOAP description.
Version 0.3 of Twinkle basically just brings the tool up to date with the latest SPARQL syntax by moving to ARQ 0.9.5.
I also added a few UI niceties such as icons, tooltips, etc. There's also an additional output option which formats the results into a table. This is in addition to the original output formats: text, SPARQL Query format, and Turtle.
There's also an examples directory included in the distribution with a few queries to get you started. These are basically just updated versions of my Sample Sparql Queries. I'll post them somewhere visible as soon as I get chance.
Feedback very welcome. I've got a TODO list of possible featues mapped out, but suggestions greatly received.
Continuing yesterday's hack here's another stylesheet that converts RSS 1.0 annotated with geourl:latitude and geourl:longitude (e.g. geourl feeds) to the Google Maps format.
I added a form you can use to generate links similar to this one which shows bloggers in my area. Looks like I can shout out the window to contact some of them!
I've been itching to have a play with hacking about with GoogleMaps for a while now, especially so once their coverage reached the UK.
So as an exercise I thought I'd try gluing together Norm Walsh's Where In The World service and the myGmaps.com proxy. The former problems simple location based data such as where a given user currently is, along with who and what is nearby. The latter provides a quick way to create standalone GoogleMaps from custom XML documents.
The result was a trivial XML stylesheet that converts from one vocabulary to another. Where there are links available, e.g. about locations or homepages, I've included them in the description associated with the map entry.
The FOAF mbox_sha1sum's returned by WITW are hyperlinked to FOAFnaut so you jump directly from a geographical view of a person's location into their "FOAF space" and view their position within a social network.
As WITW only includes basic data about nearby users (essentially just their username) I included the ability to "re-orient" a map around a given user. This basically means you'll switch to looking at their WITW data, their nearby locations, etc. Crude but basically effective.
If you want to try it out, use the form on this page: WITW using GoogleMaps. It redirects via a JSP page, but all that does is simply build a pipeline URL that connects together WITW -> W3C XSLT Service -> myGMaps.com Proxy. The proxy then fetches the resulting map and displays it in your browser. Not the most efficient way to achieve this, but another nice demonstration of simple HTTP GET based service integration.
The hack is obviously limited, in that WITW captures locations anywhere on the globe, whereas GoogleMaps has limited coverage (US and UK as far as I'm aware). If you're outside of that coverage you unfortunately just see a whole bunch of broken images instead of a map, although the locations and annotations still display. However with a visualisation available, I thought it might entice more people to sign up and try out WITW, add more locations, etc.
Let me know if you can think of any improvements.
Oh, and if you're looking at GoogleMaps hacking yourself, then this Gazetteer Protocol looks like it has potential. More information available here. I need to try and clarify the licensing of the service and data though.
And in a similar vein, it ought to be possible to transform the geourl RSS feeds, e.g. mine, which now include latitude and longitude to achieve a similar effect but showing all bloggers near your current location. Hmmm, might do that this evening, if someone doesn't get there first... Update: I did do it, see here
The developers of Jaikoz, a Java MP3 tag editor mailed be yesterday to say that their latest release is now live on their site. I'm mentioning this because Jaikoz bundles my MusicBrainz API for doing metadata lookups using MusicBrainz.
Jaikoz is payware although there's a free trial available. I should note that I'm not getting any kickbacks from this: the API is CreativeCommons licenced so they're free to do what they want with it. They did check in with me first though, which was very friendly. I did suggest that they may want to consider donating money to MusicBrainz if they get enough sales.
I'm just pleased that they found it useful enough to include it in their application.
Just some notes on a few changes I've made to my blog setup.
Firstly I've added the Make Poverty History white band to the home page. Obviously not visible if you're reading this in a browser but its a cause I do believe in, so why not go and click that link and find out how you can get involved?
Secondly, a while ago I altered my MT templates to spit out a scutterplan for my blog, which is now seeAlso'd from my FOAF description. You can see it here. Pretty basic to start with, just a few FOAF and DC terms.
Thirdly, and you'll have already noticed this, I changed my RSS 1.0 feed to include full postings. This was at user request, but was something I've been intending to do for a while. I much prefer being able to read full postings in my aggregator.
Lastly, I've been playing a bit with Feedburner. I was interested to see what facilities they offer for managing and tweaking RSS feeds. The interface is pretty slick and they're offering some statistics on feed usage. I guess they have some relationship with the guys at Bloglines because now that I have a Feedburner version of my feed, when I click my "subscribe with Bloglines" bookmarklet, its the that version, not the one linked from my blog thats offered up my Bloglines. Presumably this has already caused endless debates elsewhere in the blogosphere. I don't think I'm very happy about it as if there's a "sanctioned" version of my RSS output I'll link to it, thanks very much.
Frankly though the Feedburner features, while pretty inclusive, are not enough for me. I'd like the ability to splice together any number of feeds, not just those from a couple of services. E.g. my Feedburner feed includes my flickr photos and del.ico.us links. But what about my recent listening feed from Audioscrobbler? After signing up I wrote some code to do this, turned out to be a few lines using Informa. I'll publish that as an example/service when I get time.
I am interested to find out more about how FeedBurner are capturing statistics. IMO, monitoring RSS feed usage is the next big issue (if its not already). More on that in another post.
If you're interested in web service descriptions, and in particular RESTful service descriptions you should get yourself over to public-web-http-desc, a new W3C mailing list dedicated to precisely that topic.
From his introduction, Philippe Le Hegaret described the list as being ...dedicated to discussion of Web description languages based on URI/IRI and HTTP, and aligned with the Web and REST Architecture. Unlike WSDL (Web Services Description Language), such languages are not targeted towards description of Web Services..
Le Hegaret's posting includes some introductory pointers that round up a lot of the recent proposals in this space, including those from Bray, Cowan, Baker, Orchard, etc. This thread contains some other useful background.
The initial topic of discussion is the scope of the problem at hand, specifically: are we discussing a description language for XML services or any web service, regardless of representation formats? My vote is with the more inclusive option.
Definitely a space to watch if you're interested in REST services.
Belatedly (I only got back from Amsterdam last Monday), here are some notes from XTech Day 3.
On the Friday morning I initially attended two talks about RDF frameworks, firstly Dave Beckett's Bootstrapping RDF applications with Redland and then David Wood's introduction to
Kowari: A Platform for Semantic Web Storage and Analysis. I've not really used either of these toolkits yet, but at work we're looking at trying out Kowari as one of the candidate triple stores for holding our massive dataset. John Barstow's work on the port of Redland to windows makes it more likely that I'll be trying out Dave's toolkit for some personal hacking projects too.
Later in the morning I went to Jeni Tennison's presentation on Managing Complex Document Generation through Pipelining. Tennison gave an excellent overview of the concept of pipelining, as well as the typical components and transformation operations that arise from a pipeline architecture.
I took the liberty of culling a number of relevant links from the paper and presentation and adding them to del.icio.us. I've long been a fan of pipelining so it was good to see the concept being aired more widely.
In fact work on pipelining is one item I'd like to see from the The Future of XML at W3C, which was the theme of a panel discussion that followed Tennison's paper. The panel consisted of Liam Quinn, Norman Walsh and Robin Berjon. I've summarised some of the discussion in the XTech Wiki but was ultimately disappointed as the audience didn't engage with the speakers as much as I'd hoped. There was a greater community feel, and lively debate, in the previous days session on XHTML and the WHATWG. This was no fault of the panel, and may just have been symptomatic of the panel occuring on the last day. The positive side to this is that this was one of several public discussions that the W3C had taken part in; it's very good to see that they're listening to community feedback. My own request is straight-forward: Pipelining, please!
The conference ended after lunch with Jean Paoli's keynote. I grew frustrated during this talk for several reasons. Firstly, it was mostly Microsoft centric, to the extent that Paoli even showed a product case study video for the Infopath features in Microsoft Office. His general theme was that there was a quiet revolution happening in offices around the world and millions more XML documents are being generated every day by people quietly going about their business. I quite agree, but the MS slant was a bit too heavy for my taste. Granted, Paoli was quick to recognise other platforms, e.g. as points of integration, but not surprisingly ignored developments with OpenDocument.
Secondly, Paoli raised the issue of the web service debate (i.e. SOAP+WSDL vs. REST, etc, etc). He admitted that he didn't get it, and just saw that SOAP+WSDL was, no question, an evolutionary forward step in web services. In fact he stressed to the audience that they were doing users a disservice by continuing the debate, and that we should instead just focus on selling them what they need, i.e. Web Services. This left me quietly fuming, so perhaps coloured my view of the entire talk.
I should stress that this isn't meant to be general MS-bashing. I do strongly agree with Paoli's point that the ability to generate, manipulation and process XML documents within desktop suites is a major revolution and a great step forwards. I just think it could have been better positioned, especially with respect to the Open Data theme which was a major topic of the whole conference.
I've uploaded the slides (Powerpoint) from my XTech 2005 talk: Connecting Social Content Services using FOAF, RDF and REST.
In the presentation I basically gave an overview of the paper, touching on some areas where I thought further work was needed and attempted to do a little RDF advocacy, but coming from a slightly different direction than normal.
Overall, my goal is to move the debate and discussion concerning REST web services beyond comparisons with the SOAP+WSDL stack and instead focus on best practice issues; ideally I want to help foster a "community of practice" around this area. A review of existing services seemed a good way to go about this, but what I didn't want to focus on (too much) was disregarding services because they're not fully fledged REST services: even "accidentally RESTful" services have characteristics that are worth reviewing.
On the RDF advocacy front I wanted to diverge from the usual semantic web sales pitch, and instead try to demonstrate how adopting RDF as a uniform data model is a natural extension of the REST uniform interface. My thought experiment was to consider several stages of "refactoring" a service and consider the benefits that RDF brings. In short I think the emphasis can shift from API mechanics towards a focus on the data. Whether I was successful or not remains to be seen. If you were at the talk, or just read through the paper and slides, I'd love to hear your comments, so please drop me an email.
One area I did touch on in the talk is the growing use of API keys in RESTful service interfaces. This concerns me a great deal as it greatly impedes ad hoc service integration by limiting the ability to freely publish links to data. The slides includes a proposal on handling this better (i.e. using HTTP headers) which I will write more about at a later date.