Java


28
Sep 05

Using Jena in an Application Server

I’ve been lurking on the jena-dev mailing list for a while now, and I’m constantly impressed with the level of patience displayed by the jena team at handling repeated questions and queries. This is despite the comprehensive documentation which covers all aspects of the toolkit.
Often these queries stray outside the realm of RDF and Jena into basic questions such as “how do I write a JSP or a web application”. Makes me wonder if there’s been a sudden increase in the number of undergraduate semantic web projects. Anyway one question I’ve seen quite often recently is “How do I use Jena within an Application Server?”
Here are some notes and pointers that may help answer that particular question. I don’t have time for a complete tutorial, but hopefully the following pointers may be sufficient to get your oriented.

The Database

I’ll assume that you’re going to be working with data held in a relational database. In Jena terminology this is known as a “persistent model”.
The Jena team have created a HOWTO on using persistent models. See that page for detailed database configuration options and pointers to database specific documentation.
You don’t have to worry about creating the relational database structure into which your RDF data will be stored. Jena will do that for you automatically once you create your first persistent model. This makes it very simple to get up and running.
The persistent model HOWTO contains example code that shows how to create and configure a persistent model.
However within an application server the code you’ll write is going to be slightly different: you’re going to need a connection pool.

Connection Pooling

All Java application servers allow you to configure a database connection pool, the specifics vary from server to server so you’ll need to consult your server documentation to find out how to do that. Here’s the Tomcat 5.5 JDBC data source documentation. You should be able to find similar documentation for JBoss, Weblogic, et al.
Once correctly configured a connection pool will allow you to do a JNDI lookup to obtain a DataSource from which you can create a Connection.
Creating a Jena Model is then simply a matter of instantiating a DBConnection. Here’s a code snippet which illustrates this:


// Obtain our JNDI context
Context initialContect = new InitialContext();
Context env = (Context) initialContext.lookup("java:comp/env");
// Look up our data source
DataSource dataSource = (DataSource)env.lookup("jdbc/MyDataSource");
// Allocate and use a connection from the pool
Connection connection = dataSource.getConnection();
//Create a Jena IDBConnection
IDBConnection jenaConnection = new DBConnection(connection, "MySQL");
//use open for an existing model, or createModel to create a new one
Model model = ModelRDB.open(jenaConnection);
//do some useful work, then tidy up

Business Logic

So far we’ve looked at creating connections and opening a Model to get access to the persistent data. For example you may navigate through the model using the Jena API or query it using ARQ the SPARQL query engine built upon Jena. More information on how to do that can be found in Phil McCarthy’s “Search RDF data with SPARQL” tutorial.
The context within which this code lives will depend on the overall architecture of your application.
If you’re just writing a simple Java web application that uses servlets and/or JSPs then you’ll want to structure your code so that the logic is in a servlet or utility code accessed from a JSP, ideally a tag library. This avoids mixing up your user interface code with your application logic. To ensure that your connection pool is available to your web application you’ll need to configure a resource reference in its web.xml
However if you’re writing a full J2EE application that uses EJBs, then you’ll want to do all of your Jena manipulation from with a bean. As J2EE Container Managed Persistence is designed for relational databases and not triple stores, you’ll have to use Bean Managed Persistence. In other words write the database manipulation code yourself.
Personally I’d suggest going with a Session bean that delegates to a Data Access Object to do the real work. Your Jena specific code will then be relegated to a small manageable layer in your application. In this scenario you’ll need to configure the bean’s deployment descriptor to ensure that it has a resource to your connection pool.
Hopefully that’s some useful pointers that’ll help get you started.


7
Jun 05

Jaikoz

The developers of Jaikoz, a Java MP3 tag editor mailed be yesterday to say that their latest release is now live on their site. I’m mentioning this because Jaikoz bundles my MusicBrainz API for doing metadata lookups using MusicBrainz.
Jaikoz is payware although there’s a free trial available. I should note that I’m not getting any kickbacks from this: the API is CreativeCommons licenced so they’re free to do what they want with it. They did check in with me first though, which was very friendly. I did suggest that they may want to consider donating money to MusicBrainz if they get enough sales.
I’m just pleased that they found it useful enough to include it in their application.


1
Feb 05

MusicBrainz Java API beta-2

I’ve just uploaded beta-2 of my Java API to MusicBrainz RDF web service.
The API is Creative Commons licensed and is built around the Jena 2 Semantic Web toolkit.
The API provides raw access to the RDF returned from the service, but also a simple JavaBean layer for developers wanting a simpler interface to the data. You can read the Javadoc and view the changes since the last beta; these mainly consist of some bug fixes and support for a few new properties (including Amazon ASINs).
The API doesn’t aim to mimic everything in the C/C++ API, e.g. track id calculation or submission, it’s merely a read-only version suitable for embedding in Java applications.
I’ve included a trivial demo in this release: a simple command-line application that reads in a list of album names, looks them up in the service and aggregates the basic metadata into a new RDF document which is dumped to the console.


9
Dec 04

Slug: A Simple Semantic Web Crawler

Back in March I was tinkering with writing a Scutter. I’d never written a web crawler before, so was itching to give it a go as a side project. I decided to call it Slug because I was pretty sure it’d end up being a slow and probably icky; crafting a decent web crawler is an art in itself.

I got as far as putting together a basic framework that did the essential stuff: reading a scutter plan, fetching the documents using multi-threaded workers, etc. But I ended up getting sucked into a work project that ate up all my time so didn’t get much further with it.

Anyway, because the world is obviously sorely in need of another half-finished Scutter implementation, I’ve spent a few hours this evening tidying up some of the code so that it’s suitable for sharing.

Continue reading →


6
Sep 04

foaf-beans 0.1

I’m pleased to announce the first iteration of a Java API for FOAF based around the Jena semantic web toolkit.
The API, which I’ve dubbed “foaf-beans”, is an attempt to provide a number of convenience classes that will allow Java developers to quickly get to grips with reading and writing FOAF data. With this in mind the API provides a thin layer of abstraction which hides much of the RDF processing, instead presenting the user with simple factory classes that create FOAFGraph and FOAFWriter objects for reading and writing respectively. These objects generate and process simple Java Beans that should play nicely with other Java APIs and toolkits (particularly JSP, JSTL, etc).

Continue reading →


2
Sep 04

Bayesian Agents

Classifier4J is a Java text classification library that includes a text summariser and a Bayesian classifier. It was my interest in the latter that lead me to play with the API recently, as I wanted to demonstrate to some colleagues the ease with which one can use Bayesian classification to create a content filter/recommender. Well, it’s easy if all the hard work is done for you in a library!

The Classifier4J API is very easy to use, and you can plug a Bayesian classifier into an application with very few lines of code.

One of the things that intrigued me about the API design was that it separates out the Classifier from the storage of the words and their probabilities. The API comes with a simple in-memory implementation and a JDBC Words Data Source which stores the data in a database table.

It occured to me that it’d be an interesting experiment to create an implementation of the data source interface that stored the data as RDF.

Why RDF? Because then we’d have the share and aggregate the results of training classifiers.

For example I could export and share a classifier trained to spot spam, semantic web topics, or any number of other categories. The classifiers could be imported into both desktop applications (e.g. Thunderbird) as well as web applications. For example I might train a classifier to spot articles that I’m interested in, and then upload that configuration into a content management system and have it mine that data for material I may be interested in — hence “bayesian agents”

By tieing my exported bayesian probabilities to my FOAF file an aggregator may merge my data with others known to share similar interests. Trust is another aspect that may reflect whether my data is shared.

Anyone have any comments on this? Is anyone doing anything similar already? (They must be…)

I’ll try and hack something up when I get a few minutes.

For the RDF I was thinking of something like the following:

Continue reading →


29
Mar 04

How to make RDF and JSP place nicely together?

Via Gavin (via the chumpologica): An application architecture that should yield superior productivity.
Interesting stuff. I’ve been pondering something similar myself, mainly because I have a slice of an application I’m working on that I want to replace with an RDF data model and storage. To achieve this successfully I need to make sure that the data nicely dovetails with the JSP 2.0/JSTL templating environment we’ve built on top. However I don’t want to model everything as objects if I can help it, because by doing so I’m going to sacrifice some of the flexibility I gain from using RDF.
Ideally I want to gut the current Data Access Objects and replace them with node that navigates the underlying RDF graph, perhaps using an RDF query language, and then return a subset of that graph in a form that suitable for traversing with JSTL. There’s not a great deal of business logic in that slice of the application so there’s little else to change.
I had been wondering whether the technique used in RDF Twig could be generalized to creation of simple object hierarchies (Lists and Maps). Rx4RDF might be another useful place to mine for ideas.
Suggestions for other useful APIs to techniques to explore will be gratefully received.
btw, if you find that you start extending your object model to allow arbitrary property annotation, and some of those properties are actually pointers to other objects in your graph, then that’s probably a sign that you may be better off using an RDF based model. And possibly Python too but I’ve not explored that angle yet.


9
Mar 04

SAXON

Via Cafe con Leche I notice that Saxon 7.9 has been released. The interesting thing is that Mike Kay has founded Saxonica Limited which will offer professional services and additional modules, including a schema-aware processor as a commercial offering.
I’ve used Saxon for a long time now. It’s my XSLT processor of choice. I’ve never bothered with Xalan or other processors as Saxon has always Just Worked.
Like any good tool Saxon is adjustable enough to help you solve any particular problem. Just recently I’ve benefited from both the saxon:preview which helped me deal with a large transform and the very easy extension mechanism that allowed me to invoke some Java code during a transformation (generating a SHA1 sum for an email address).
I think it’s good news that Mike is intending to continue offering the basic product for free and wish him well in the commerical venture.


23
Oct 03

Java Web Start and Signing Jars

In response to a feature request from L. M. Orchard I’ve just spent a couple of hours packaging up the FOAF-a-Matic Mark 2 as a Java Web Start application.
Actually creating the requisite JNLP file was straight-forward; the specification is clear and the format simple. I very quickly had the application launching from a web page link. What took a bit longer is working out how to sign the jar files so that I could request permission to access the file system, open local ports and remote connections. Actually with the current version of JNLP you have to create all permissions, there’s no granularity in what you can request or grant access to. Suprising really as you’d expect this to be relatively easy to implement giving that the underling security manager and permissions model is all in place.
Anyway, the JNLP and jarsigner documentation just refer you to a certificate authority to get a certificate to sign your jar files. This is frustrating as I’m not about to fork out for a certificate when I’m giving the code away for free. A quick bit of googling dug up this excellent document from Richard Dallaway, “Java Web Start and Code Signing“. Dallaway had met exactly this problem and documented how to sign up for a free certificate from Thawte.
Completing the requisite application forms, and awaiting for email confirmations ate up the rest of the time required to get FM Mark 2 running under Web Start. Happily Ant already has tasks for signing jars so it was quite straight-forward to add a new target to my build file to create the Web Start distribution.
The lesson to be learned here is to take the time to write up any non-trivial problems you resolve, because you’re going to save someone (and probaby many people) from floundering around. Doing so with bring good karma. Guaranteed
The Web Start enabled FM Mark 2, plus a couple of bug fixes, will be beta-2.1 arriving at a browser near you shortly.


30
Sep 03

Entity Management in XML applications

I’m very pleased to say that my latest tutorial for IBM developerWorks is now up on their site:
Enity Management in XML applications
It covers the XML catalog specification and using the Apache XML Resolver classes to add catalog support to your XML applications. Why would you do that? Read the tutorial and find out…