September 06, 2004

foaf-beans 0.1

I'm pleased to announce the first iteration of a Java API for FOAF based around the Jena semantic web toolkit.

The API, which I've dubbed "foaf-beans", is an attempt to provide a number of convenience classes that will allow Java developers to quickly get to grips with reading and writing FOAF data. With this in mind the API provides a thin layer of abstraction which hides much of the RDF processing, instead presenting the user with simple factory classes that create FOAFGraph and FOAFWriter objects for reading and writing respectively. These objects generate and process simple Java Beans that should play nicely with other Java APIs and toolkits (particularly JSP, JSTL, etc).

In this first version of the API, the FOAFGraph interface supports:

  • Loading FOAF data from disk or the network
  • Determining the primaryTopic of a FOAF document, either using an explicit property of a foaf:PersonalProfileDocument, or some guesswork
  • Listing all people mentioned in a document
  • Finding people with a specific property (identified by URI) or property and value
  • Working with general RDF graphs as well as foaf:PersonalProfileDocuments
  • Reading basic foaf:Person properties and foaf:knows relationships

The FOAFWriter interface supports:

  • Writing basic foaf:Person metadata, including foaf:knows relationships
  • Writing to files, or POSTing to a URI (this is shamefully untested, but "release early" and all that...)
  • Writing foaf:PersonalProfileDocument documents including admin:generatorAgent and other useful metadata

Developers familiar with Jena can directly make use of a number of utility classes that provide this functionality, so hopefully there's something for everyone.

The next release of the API will incorporate at least the following:

  • Smushing -- a major failing, but this is version 0.1
  • Dealing more sensibly with multiple properties of the same person (e.g. several 'blogs, emails, etc)
  • Better documentation and examples

It's also worth noting that the API doesn't do any inferencing or schema processing. If you dip into the Jena specific classes then you can substitute a model containing schema information, however I want to expose this more explicitly through the API. The basic classes and interfaces also have some basic hooks that will allow me to wire in other RDF toolkits as and when I get time to play with them (Redland is top of my list).

The majority of the code has been reasonably well tested but I don't expect it to be bug free, so tread carefully. In fact in time honoured tradition I expect there's at least one school boy error in there somewhere which'll have me scrabbling to post 0.1.1 sometime tomorrow. The code has mainly evolved from work I originally did on the Java version of the FOAF-a-Matic, and there are a number of JUnit test cases included which should illustrate how the API works.

I'm happy to take suggestions, patches, bug reports, whatever. It's all Public Domain (although attribution would be nice) so do with it what you will.

Download foaf-beans-0.1-src.zip (7.5MB, includes Jena)

Download foaf-beans-0.1-nolibs-src.zip (~200kb, no Jena)

Posted by ldodds at 10:52 PM | Feedback? | TrackBack

Chaals on Schema Documentation

This is an interesting posting from Charles McCathieNevile to the rdf-interest group discussing how to correctly document an RDF Schema.

Basically:

  • Ensure that the terms are annotated with labels and comments
  • Flag annotations with their language code, and seek translations into other languages
  • Use SKOS or custom properties to embed actual examples in the schema
  • Publish schema and documentation at the namespace URI

Must remember to apply these to my own schema, and also see if we can get the FOAF schema similarly up to scratch.

Posted by ldodds at 02:26 PM | Feedback? | TrackBack

del.icio.us and foaf:interest

Using the foaf:interest property it's possible for me to describe my interests (musical, technical, etc) in my FOAF profile.

The term has been specified so that it has a range of foaf:Document, with the implication that the foaf:topic of that foaf:Document is what I'm interested in. Seem a bit convoluted? Maybe, but there are benefits...

Firstly, by using a URI rather than a simple literal value we have more flexibility. For example we can provide further information about the URI, ranging from simple things like the documents title, topic, etc through to describing it's relationships to other documents on the web. We can also merge data using this URI allowing us to link people together, e.g. "show me everyone with an interest in the semantic web".

The second benefit to using URIs is that there's no need to maintain a controlled vocabulary of terms from which the user might select their areas of interest. No centralization for us.

However there is a downside to this approach, and I've previously grumbled along these lines. To usefully merge data about interests, we have to hope that users will consistently choose the same URIs. And that seems unlikely, as not all areas of interest have an obvious URI.

If I'm interested in XML, RDF, or SVG there are some obvious places to link to: either the specifications of those formats, or perhaps the relevant W3C Activity pages. But for more general topics ("Gardening", "Cooking") it's hard to select a definitive URI.

Now we could just rely on users from a given community to crib from each others profiles and copy-and-paste the URIs that their friends use to describe their interests. Which is quite likely, and will work well enough, but still doesn't seem ideal.

Also, if I'm building a user interface to allow authoring of FOAF data it's friendlier to be able to provide suggestions as to what URIs people should use (actually it'd be friendlier to hide the fact that there's a URI there at all, but that's a separate issue). Difficult to do unless one uses a google "I'm feeling lucky" search and prompt the user for a couple of keywords. One might also use the Google search URI itself, but as the meaning of terms drift over time, this seems slightly fragile.

It dawned on the other day, that folksonomy might be the answer: i.e. rely on social classification to build us a vocabulary from which topics can be selected.

So I'm proposing to use del.icio.us as the source of foaf:interest URIs.

There are some other interesting angles to explore here. For example I can use the API to get items that a user has classified under a given tag. Each tag URI also exposes an RSS feed that includes metadata about the articles, and some basic details about each user. And then there's Ben
experimenting with using del.icio.us tags to categorize his blog postings
.

Posted by ldodds at 02:13 PM | Feedback? | TrackBack

September 03, 2004

Public Collections of RDF

Bob DuCharme is looking for public collections of RDF.

He's compiled an initial list and is looking for further examples of, ideally large, data sets.

Posted by ldodds at 09:51 AM | Feedback? | TrackBack

Bad Fall

Yesterday my father-in-law fell 20ft from the Severn Bridge. If you're in the South West you may have seen it reported on the news, there's some coverage and even a photo story of the three hour rescue on the BBC website.

The good news is that he's basically OK, although he had to have surgery last night to deal with two broken legs and a collapsed lung. He's now stable, and doesn't have any other serious internal injuries.

It's a weird experience seeing something that concerns your family on the news like this. In fact my mother-in-law rang him after hearing a report about an accident, so it was through the local news that we first found out. We still don't have all the details about how the accident happened, but it seems likely that there will be an inquiry. For now we've been poring over the pictures trying to imagine how it happened. He was in the towers carrying out a lighting inspection when he fell; he works as an electrician carrying out maintenance of the bridge. He has no memory of the fall itself.

Posted by ldodds at 09:21 AM | Feedback? | TrackBack

September 02, 2004

Bayesian Agents

Classifier4J is a Java text classification library that includes a text summariser and a Bayesian classifier. It was my interest in the latter that lead me to play with the API recently, as I wanted to demonstrate to some colleagues the ease with which one can use Bayesian classification to create a content filter/recommender. Well, it's easy if all the hard work is done for you in a library!

The Classifier4J API is very easy to use, and you can plug a Bayesian classifier into an application with very few lines of code.

One of the things that intrigued me about the API design was that it separates out the Classifier from the storage of the words and their probabilities. The API comes with a simple in-memory implementation and a JDBC Words Data Source which stores the data in a database table.

It occured to me that it'd be an interesting experiment to create an implementation of the data source interface that stored the data as RDF.

Why RDF? Because then we'd have the share and aggregate the results of training classifiers.

For example I could export and share a classifier trained to spot spam, semantic web topics, or any number of other categories. The classifiers could be imported into both desktop applications (e.g. Thunderbird) as well as web applications. For example I might train a classifier to spot articles that I'm interested in, and then upload that configuration into a content management system and have it mine that data for material I may be interested in -- hence "bayesian agents"

By tieing my exported bayesian probabilities to my FOAF file an aggregator may merge my data with others known to share similar interests. Trust is another aspect that may reflect whether my data is shared.

Anyone have any comments on this? Is anyone doing anything similar already? (They must be...)

I'll try and hack something up when I get a few minutes.

For the RDF I was thinking of something like the following:


<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdfs:Class rdf:ID="WordProbability"/>

<rdf:Property rdf:ID="classifier">
  <rdfs:domain rdf:resource="#WordProbability"/>
  <rdfs:range rdf:resource="http://xmlns.com/foaf/0.1/Agent"/>
</rdf:Property>

<rdf:Property rdf:ID="word">
  <rdfs:domain rdf:resource="#WordProbability"/>
  <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:Property>

<!-- classifier4j uses strings for categories, but URIs seem better -->
<rdf:Property rdf:ID="category">
  <rdfs:domain rdf:resource="#WordProbability"/>
  <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
</rdf:Property>

<!-- need to type these two... -->
<rdf:Property rdf:ID="matchCount">
  <rdfs:domain rdf:resource="#WordProbability"/>
  <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:Property>

<rdf:Property rdf:ID="nonMatchCount">
  <rdfs:domain rdf:resource="#WordProbability"/>
  <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:Property>

</rdf:RDF>
Posted by ldodds at 01:36 PM | Feedback? | TrackBack

URLinfo

I've been doing some playing with a neat tool called URLinfo. It's a simple form and customizable bookmarklet that allows you to reflect on a given URL to discover all sorts of interesting information, which ranges from related links, validators, del.icio.us bookmarks, and blog backlinks. You can even carry out some basic textual analysis on the page.

The tool does this by delegating the actual hard work to a number of other existing services. So even if you don't find URLinfo useful in itself, it provides a nicely categorized list of other useful web tools.

Which makes me wonder: which of the other services have XML/RSS/RDF export options, and how easy would it be to aggregate the output to create higher level services?

For example URLinfo links to nine different blog aggregator/search engines that provide a "backlinks from this URL" feature. Would be nice to have a single view across all those services, but for now URLinfo is a nice start.

The only service I can see missing is FOAF Explorer. I've mailed in a suggestion to incorporate this and other FOAF tools.

Posted by ldodds at 10:24 AM | Feedback? | TrackBack

September 01, 2004

Working In A Small World

Stumbled over these musings on how small world theory applies to company organization. They've been languishing in my personal wiki for many months, thought I might as well post them as is.

Whilst reading the first few chapters of "Small World" by Mark Buchanan, I was fascinated by the work of Granovetter (see "The Strength of Weak Ties"). This basically highlights the fact that it is weak ties between individuals that are the important ones in a social network; not strong ties as one would expect. People with strong ties in common often have strong ties between them also, hence these links are less important than weak ties (acquaintances) as their removal has little effect on the structure of the graph (as measured in number of degrees between points). Previously descriptions I've read about small world phenomena have focussed on hubs/authorities which is a much less human-centric metaphor; quite rightly perhaps as "small worldism" isn't tied to any particular type of graph, but it's not very evocative.

This lead me to thinking about relationships within companies. Exploiting social networks to find work, etc seems well explored, indeed it's behind the current drive for many of the social networking sites and applications that are springing up at the moment. Work relationships seems like a different framework within which to explore the small world phenomena. Or at least it's the one that occured to me whilst washing up after dinner.

So some thoughts on this:

  • encouraging small world social graphs in a company is beneficial to the flow of information. This is "small world for spreading memes" in the microcosm. However it can also be detrimental as this similar social structure encourages spreading of other kinds of information: gossip and rumours. We might take from this that even if morale is low in a company, at least the lines of communication are still open
  • that networking is important is not really news to anyone, but small world studies prove it's effective, and support all those fluffy corporate events.
  • that the optimum corporate structure isn't hierarchical, neither is it completely decentralised, its somewhere in between. Networks that lie in the middle of these two should gain the most benefits (stability, but still good sharing of knowledge)
  • that inter-team communication is as important as bonding in a team, and that the manager alone shouldn't be the gateway between the team and the rest of the company. If the manage leaves, or is on leave, then you've lost the all important weak links and while the team may still stay cohesive they can become isolated.
  • that as an individual, your role in a company can be secured by networking with others. However the detrimental side of this is that as you quickly become a "hub" (people come to you for information because you can find it quicker, or know who to refer them to) the many communication channels can distract you from your key role, and also lift you away from the work that you're interested in. A company would do wise to acknowledge it's hubs, but immediately route around them so that they have backups and that those weaks links don't cripple the company during leave/job changes.
  • it's the small odd little tasks that people do, those that have them interact with a slightly wider social circle, that can be the most important: they build weak links between teams. It's too easy to re-organize and draw lines, saying "this isn't something that Team X should do", but then you're further isolating Team X.

Posted by ldodds at 01:08 PM | Feedback? | TrackBack

PhotoBlogFeeds Scutterplan

Danny was looking for an easy way to generate a scutterplan from his PhotoBlogFeeds page on the ESW wiki (see Danny's posting for background).

I pointed him at the FOAFBulletinBoard which provides a quick and dirty way to create scutterplans collaboratively. Anyone can add links to a Wiki page and all you need is a bit of Tidy+XSLT+URL Chaining to convert that page into RDF.

I've created an XSLT script that convert the relevant page into RDF. It relies on the class='external' attributes that the ESW wiki adds to external links to find the links that become rdfs:seeAlso references.

The transform also sniffs for w3photo URLs and automatically ensures that these get piped through Danny's w3photo.xsl to ensure that RDF processors will only see an RSS 1.0 view of the data rather than the RSS 2.0 that the site pumps out by default.

View this link to see the automatically generated scutterplan.

I love URL chaining.

Posted by ldodds at 12:51 PM | Feedback? | TrackBack