Andrew Walkingshaw came back from semantic camp brimming with enthusiasm and bearing gifts; stickers bearing a likeness of Roy Fielding and the slogan “Fielding has a posse” and “RFC 2616” (the HTTP 1.1 spec of course!). I could stick it on my trusty powerbook, apparently all the cool semantic/web/2.0 kids have stickers all over their macs these days. My instinct to preserve the pure clean lines is evidently old hat, and as we know, old hat don’t dance.

This is timely since Roy Fielding now has a blog, and there’s been a flurry of RESTful repository discussion in the wake of Andy Powell’s keynote at VALA (responses a, b, c, d).

From Andy’s original post: –

Finally, that the ‘service oriented’ approaches that we have tended to adopt in standards like the OAI-PMH, SRW/SRU and OpenURL sit uncomfortably with the ‘resource oriented’ approach of the Web architecture and the Semantic Web. We need to recognise the importance of REST as an architectural style and adopt a ‘resource oriented’ approach at the technical level when building services.

In the comments there’s the fashionable spat over whether the word “repository” should be pejorative, but I’m surprised nobody’s trodden on the “service-oriented” banana skin. Andy does clarify with “at the technical level” at the end of the point, but care is needed since SOA is a historically infamous weasel phrase: –

“… that’s a service oriented approach for you.”

“I don’t know what you mean by service oriented approach” said Alice

Humpty Dumpty smiled contemptuously “Of course you don’t, until I tell you.”

Repositories Thru the Looking Glass, missing chapter, with apologies to A. Powell and L. Carroll

There are (at least) three distinct meanings of “service oriented” in the repositories context.

The Good
Services as in “a set of services that a university offers to the members of its community for the management and dissemination of digital materials” (Cliff Lynch).

The Bad
Protocols such as those Andy mentions (OAI-PMH, SRW/SRU, OpenURL). These are also sometimes referred to as STREST interfaces (Service Trampled REST) as they work using the same URL and HTTP mechanisms as REST, but do so in a way that doesn’t take advantage of the web architecture (or rather, that doesn’t observe the constraints of the web architecture).
The Ugly
Snake Oil Architecture. SOAP, WSDL, WS-*, standards documentation as thick as your arm. Bleuch.

At a certain level, thinking about services makes sense. The mistake is to be too literal and carry it through to implementation. The JISC e-Framework animation describing SOA looks like they were thinking about resource oriented services – it’s all about common formats, GET and PUT. From a techie’s point of view, your manager can take a Service Oriented Approach and you can implement it RESTfully.

Mike describes a recent storage development, which started off emphasising correctness over performance…

… We spent a good long time on the data model…

Trouble was around the corner …

… our fabulously correct design nevertheless performed like a dog…

One RDBMS expert to the rescue, one index and …

Five minutes later, with that one change, round-trip transaction time for a request/response cycle had gone from 800ms to 50ms.

These things matter. The fact that they matter is immensely cool.

Compare, if you care to, with the impact of web caching when you get the model right in the web architecture.

A quote from Sam Ruby

February 13, 2008

“…the ease with which a Ruby client (or a Python one) can be wired up to a Java middle tier talking to a Erlang back end using only HTTP, Atom, and JSON is a testament to the power of these simple protocols and formats.”

Sam Ruby

Substitute in some repository software names in for the programming languages. That’s why we had a barcamp on RESTful repository interfaces last week.

CrystalEye is a repository of crystallographic data. It’s built by a software system written by Nick Day that uses sections of Jumbo and CDK for functionality. It isn’t feasible for Nick to curate all this data (>100,000 structures) manually, and software bugs are a fact of life, so errors creep in.

Egon Willighagen and Antony Williams (ChemSpiderMan) have been looking at the CrystalEye data, and have used their blogs (as well as commenting on PM-R’s) to feed issues back. This is a great example of community data checking. Antony suggested that we implement a “Post a comment” feature on each page to make feedback easier. This is a great idea, so we had a quick think about it and propose a web2.0 alternative mechanism: Connotea.

To report a problem in CrystalEye, simply bookmark an example of the problem with the tag “crystaleyeproblem”, using the Description field to describe the problem. All the problems will appear on the tag feed.

When we fix the problem we’ll add the tag “crystaleyefixed” to the same bookmark. If you subscribe to this feed, you’ll know to remove the crystaleyeproblem tag.

In the fullness of time, we’re planning to use connotea tags to annotate structures where full processing hasn’t been possible (uncalculatable bond orders, charges etc).

Neat ORE shorthand

February 5, 2008

A useless piece of trivia, perhaps, but since ORE requires resource maps to dereference to the resource map document, the resource map URI is implicitly the base URI for the document. This means you can use relative URIs in the ORE document for the aggregation and the resource map itself. In Turtle, the following would be a valid complete resource map document.

@prefix ore
ore:describes .
ore:aggregates ,


Warning! Here be RDF and ORE concepts. Some pre-reading is required. I’m referring to the alpha version of the specs in this – things may have changed without my knowledge – I’m hoping that someone from the technical committee will correct me if so. RDF prefixes are used as in the ORE specs

In this post I think out loud about the ramifications of the implicit types in ORE…

An ORE resource map graph is stitched together with two main predicates: ore:describes and ore:aggregates. The resources at either end of these predicates have implicit RDF types; ore:ResourceMap for Resource Maps, ore:Aggregation Aggregations, and ore:AggregatedResource for (you guessed it) AggregatedResources. So this: –

@prefix eg .
eg:myResourceMap ore:describes eg:myResourceMap#aggregation.
eg:myResourceMap#aggregation ore:aggregates eg:myResource1,

is implicitly equivalent to this: –

eg:myResourceMap ore:describes eg:myResourceMap#aggregation;
rdf:type ore:ResourceMap.
eg:myResourceMap#aggregation rdf:type ore:Aggregation;
ore:aggregates eg:myResource1,
eg:myResource1 rdf:type ore:AggregatedResource.
eg:myResource2 rdf:type ore:AggregatedResource.
eg:myResource3 rdf:type ore:AggregatedResource.

So what? Well, in the basic case, you shouldn’t use those types for querying, since they might well not be present. So: –

?ar a ore:AggregatedResource;
dc:author .

Could work, but only if the types have been put in there manually, or if you know that the end point has reasoned over the RDF. In general, the query that will work more reliably would be: –

?a ore:aggregates ?ar.
?ar dc:author .

I think this will also have an impact when extending ORE. Say I’m using ORE to describe a journal paper and it’s supplemental data. I could create a hasSupportingInformation property as a sub class of ore:aggregates, but that will break queries like the one just above (unless the end-point has reasoned over the graph). So it looks like it’s safest to create a SupportingInformationResource subclass of ore:AggregatedResource instead; then clients that know about SupportingInformation can make specific queries, without breaking generic queries or requiring RDFS reasoning in the end-point.

Background: An alpha version of the OAI-ORE specifications was released in December, and has prompted less public discussion than I’d hoped for, so I’m going to post some of the issues as I perceive them in an attempt to promote awareness. I’ll inevitably fail to be comprehensive, so I won’t try – I’ll stick to the ones that interest me.

ORE is a way of describing aggregations of web resources; complex objects in digital library / repository parlance. It’s based on semantic web principles and technology and is RESTful (unlike PMH, but that’s a story for another day), which is a Good Thing.

So what is it good for: –

i) Provides an alternative to content packaging. Content packaging standards and security are two of the biggest hurdles to repository interop. ORE could provide a route around one of them, and bring the repository world closer to the web in doing so.

ii) Takes forward named graphs for defining boundaries on the semantic web. The semantic web can be visualized as a big network of statements about things, that lacks a way of defining a chunk of the network (in order to make statements about it…). You perhaps have to be a bit of a semantic web geek to appreciate the importance of this at first flush.

The alpha of the standard itself stood out for a couple of things too. It seemed to have been written with a mindset of “what is the least we can specify whilst being useful?”. It’s also a well rounded spec; there are constraints to make it simple, but they’re not out of balance with the amount of specification and support provided.

ORE is likely to be important to the repository community; there is a lot of momentum behind it (on which more later), and it provides a piece of perviously-missing infrastructure. So it might well be worth your while to read the spec, join the discussion group and maybe even read some of the following posts…