A basic repository feature is providing a list of all the resources in a collection, and a way to incrementally discover changes. The usual way for repos to enable this is OAI-PMH, using either the ListRecords verb or the ListIds verb, and the ‘from’ argument to perform efficient incremental update, and the resumptionToken system to enable the server to condition the load generated.

The way the rest of the world does it is with Atom or RSS. Unnecessary retrievals can be prevented using conditional GET. The server chooses the size of the feed documents so it can control it’s own load. It’s even possible to avoid lost updates or list an entire collection using ‘first’, ‘last’, ‘next’ and ‘previous’ links (as in this tip). There’s no direct equivalent of PMH’s ‘from’ but as long as the feed has timestamps on each entry, then the client knows when to stop retrieving more feed chunks.

I’m currently reading the REST book, so I’m in a frenzy of resource-oriented fervour. OAI-PMH is, in the REST patois, a STREST interface (this theme was picked up in the discussion between Carl Lagoze and Andy Powell recently). The rich resource discovery possible with OAI-PMH is also overkill for what I’m after here.

I’m also unsure about syndication – I have a feeling that the resource representations in Atom / RSS feeds are unlikely to satisfy most repository clients’ needs. Isn’t a more resource-oriented approach to simply link to the resource and let the client negotiate with the resource for an appropriate representation? If so, Sitemaps fit the bill perfectly.

Well, maybe, but on balance I still think that Atom / RSS is a better choice; the RESTful repository will almost certainly have a feed around for human clients, and it’s better to adapt this for machine clients than adopt an additional mechanism.

The project web sites for the SPECTRa tools are now online, and I’ll be moving the help and documentation over in due course. The main point of the SPECTRa tools is to make building repository ready packages of X-Ray crystallography, NMR spectroscopy and Gaussian input files as easy as possible.

Once prepared, the packages can be saved to local storage for manual deposition or deposited into a DSpace repository (although this requires some customization of DSpace). Hopefully I’ll have time to write a SWORD client for it too, and I’ve been thinking about writing an S3 client for fun.

In other news: –

  • Peter Suber blogged the SPECTRa report
  • The WWMM server is finally back on decent hardware with all its data. Hopefully there won’t need to be any more outages for a while.

SPECTRa on Sourceforge

May 24, 2007

I’ve just finished moving the SPECTRa code to the spectra-chem sourceforge.net site. The technical web sites will be moving over in due course, and I’ll be switching issue tracking to the sourceforge trackers too.

The legacy2cml converter tools have been released at the CML sourceforge site.

For the mavenites: –
repo: – http://wwmm.ch.cam.ac.uk/maven2/
g: cml, a: legacy2ml, v:1.0b1

Hmmm, that really cries out for a microformat, doesn’t it?

Jumbo is a Java library for dealing with CML and doing some chemistry on top. This release (as with all the others) is available from the sourceforge site, or Mavenites can get it from the wwmm repository using groupId:cml, artifactId: jumbo and version: 5.4-b1.

Have fun!

Now that Buildr (like build, but 2.0. Geddit?) is there to soothe the ache of being so cool that it hurts and Ant caters for the no-fruit-in-my-muesli crowd, it’s increasingly unpopular to like Maven2.

I like M2 though. I haven’t come across a situation where I needed a 5000 line build file – it does most of the stuff I want at minimal cost.

Mark Diggory evidently also likes (or at least uses) M2 – he’s just posted a neat scheme for using M2 assemblies to easily make alternative add-on distributions of DSpace on the DSpace wiki.

There are a few other interesting possibilities here too – we’ve used assemblies to chuck project dependencies into a single jar so you can run them using java -jar. Not a big thing, admittedly, but the usability gain from not having to worry about classpaths including lib files is handy.

It should also be possible to create an assembly that included an embedded Jetty instance, and started it up using simply:

java -jar dspace-demo.jar

Which would be nice.

The latest version of the JCamp-DX library (https://sourceforge.net/projects/jcamp-dx/) is available from the wwmm maven repo (Group:net.sf artifact:jcamp-dx, version:0.9).

More on Erlang

March 9, 2007

Toby White put me on to this blog entry about Erlang whence I found this Eclipse plugin for Erlang. I’m glad I’m not the only one who found the expression termination characters confusing, it reminded me of my early days learning pascal. Anyway, interesting stuff – I wonder if Erlang has will and ability to become more mainstream?

I’ve been mulling for a while about what programming languages we’re going to be like a few years from now. A current angle I’m taking is to look at the trends in hardware development and conclude that a good programming language for future will make writing concurrent code as easy as possible. When I mentioned this, Andrew Walkingshaw said I should take a look at Erlang.

I’m impressed so far – a simple example of a two thread program with message passing fits comfortably into 30 lines, and you can locate either thread on a local node or a remote node with minor code changes. And I’ve realized something else – if concurrency does end up driving language choice in the way I think it might, then functional languages are going to get a lot more limelight than previously.

Erlang comes from the telecoms world, and many of the current production applications reflect that – multithreaded mail, web and messaging servers. But it’s also easy to see erlang being used in an eScience setting, both to write programs that take advantage of SMP nodes, or to orchestrate batch jobs across clusters (etc etc).

Andrew has also pointed me at Fortress, which is even higher level than Erlang with respect to threading (some operations like “for” loops are implicitly multi-threaded). Another one to check out!

Elliotte Harold on Java 7

February 1, 2007

Whilst I don’t agree with all of the points he’s making, anyone who’s interested in the future of Java as a language or generally geeks out on programming language development should check out Elliotte Harold’s recent blog posts. Start here.

Elliotte is conservative in his approach to Java, arguing that additional syntaxes and ones which make constructs implicit are bad for learning a language and bad for understanding programs. They’re valid points. On the other hand I can also see that there’s no point understanding programs that no-one runs any more because the programmers got bored and went off to program .net or ruby, which is what Sun must be worried about.

In his latest post he takes apart a proposed syntax for cleaning up functor anonymous inner classes (implementations of interfaces with only one method). I’m not entirely convinced by it either, but I am convinced by the requirement – if Java is going to be useful in an increasingly distributed environment it needs better syntax for doing things like callbacks and threads.

Previously, when Elliotte was disparaging about the properties proposals (which are almost totally over-complex in my eyes) he had a better proposal waiting in the wings – hopefully that’s the case this time!