SPECTRa on Sourceforge

May 24, 2007

I’ve just finished moving the SPECTRa code to the spectra-chem sourceforge.net site. The technical web sites will be moving over in due course, and I’ll be switching issue tracking to the sourceforge trackers too.

I had a brief look at DBpedia.org thanks to PMR’s excitement in the area. I was particularly interested in how they would deal with the problem of describing and linking representations.

In comments to one of PMR’s posts, Richard Cyganiak writes “Note that the DBpedia URIs also work in a web browser, so you can go to http://dbpedia.org/resource/Uppsala and the DBpedia server will generate a web page showing the information it has about the item.”. Well, kind of they work, but actually what happens is that you get redirected to http://dbpedia.org/page/Uppsala (*). DBpedia have chosen for their concept URIs not to resolve to a representation, but to redirect to others.

I imagine the architecture options available to DPpedia for this were something like: –

Assign unique URLs to alternative representations
Pro: simple for people to see the different representations.
Con: need Follow-Your-Nose (FYN) linking system to reach the data, for which there is no formal standard
Use a single URL for all the representations and use content negotiation to switch between them
Pro: no confusion about which URI is the concept URI, URI resolves.
Con: Precious little support in browsers, limited to MIME type switching, no way of finding out which representations are available before you make a request.
Pro: Formally defined profiles, developing standard.
Con: Need all of the resource description in the view representation.

So DBpedia went for the first option, and they provide a way for a programmatic client to link from the HTML view to the metadata; you could use something like /html/head/link[@rel=’alternate’ and @type=’application/rdf+xml’ and @title=’RDF’]@href (apologies for any mistakes, my XPath-fu is not very hot). I have a vague recollection that this is a W3C endorsed best practice, but I can’t remember the link.

Why not GRDDL? GRDDL is about extracting metadata from the original XML source, which means that you have to have all the metadata in the HTML. This might not be desirable if you have a lot of metadata or if it’s important to you to keep the HTML small and tight, or if you have a pressing desire not to use XHTML.

Note that you can also make the link solution work in a GRDDL world by transforming the link element to an rdfs:seeAlso statement that points to the bulk of your RDF representation, but that requires a little more sophistication on the part of the GRDDL client.

Of course, both of these approaches assume that once you’re in RDF it’s ponies for everybody; you can describe all the alternative representations, list metadata and properties and describe relationships.

In conclusion: –

  • If we can assume that our description of different resource representations will be in RDF, then the link solution seems to do the trick of describing and linking representations
  • We can make the link solution work with GRDDL, but GRDDL without rdfs:seeAlso won’t be universally applicable.
  • Is content negotiation a dead end?
  • Is there a good reason concept URIs should / should not be directly resolvable?

* In fact, web browsers get redirected. My command line client (curl) just got ditched with no content.

The legacy2cml converter tools have been released at the CML sourceforge site.

For the mavenites: –
repo: – http://wwmm.ch.cam.ac.uk/maven2/
g: cml, a: legacy2ml, v:1.0b1

Hmmm, that really cries out for a microformat, doesn’t it?

Jumbo is a Java library for dealing with CML and doing some chemistry on top. This release (as with all the others) is available from the sourceforge site, or Mavenites can get it from the wwmm repository using groupId:cml, artifactId: jumbo and version: 5.4-b1.

Have fun!

Now that Buildr (like build, but 2.0. Geddit?) is there to soothe the ache of being so cool that it hurts and Ant caters for the no-fruit-in-my-muesli crowd, it’s increasingly unpopular to like Maven2.

I like M2 though. I haven’t come across a situation where I needed a 5000 line build file – it does most of the stuff I want at minimal cost.

Mark Diggory evidently also likes (or at least uses) M2 – he’s just posted a neat scheme for using M2 assemblies to easily make alternative add-on distributions of DSpace on the DSpace wiki.

There are a few other interesting possibilities here too – we’ve used assemblies to chuck project dependencies into a single jar so you can run them using java -jar. Not a big thing, admittedly, but the usability gain from not having to worry about classpaths including lib files is handy.

It should also be possible to create an assembly that included an embedded Jetty instance, and started it up using simply:

java -jar dspace-demo.jar

Which would be nice.

Matt Cockerill on a landmark in BioMedCentral’s role in the DSpace community Open Repository – open source in action: –

In late 2006 and early 2007, BioMed Central participated in an architectural review of DSpace. The review brought together experts from major academic institutional users of DSpace alongside technical specialists from commercial organizations such as Google, HP and BioMed Central. This review group produced a technical roadmap that is now guiding the development of DSpace.

Last month, an important milestone was reached with the beta release of version 1.4.2 of DSpace. This is the first official release of DSpace to incorporate enhancements that have been contributed back as patches to the DSpace project by BioMed Central.

Comment: It’s been great to see BMC raise their profile within the DSpace community in the last 18 months or so. They are at the forefront of a growing number of companies who offer products or consultancy based on DSpace, and this mixture of sectors is a huge asset to the community.

Since Matt doesn’t name names I will: Graham Triggs deserves some props here for many things including his work on the architecture review, revamping the Oracle support and support on the mailing lists.