Repository Listing: OAI-PMH vs Atom vs Sitemaps
June 19, 2007
A basic repository feature is providing a list of all the resources in a collection, and a way to incrementally discover changes. The usual way for repos to enable this is OAI-PMH, using either the ListRecords verb or the ListIds verb, and the ‘from’ argument to perform efficient incremental update, and the resumptionToken system to enable the server to condition the load generated.
The way the rest of the world does it is with Atom or RSS. Unnecessary retrievals can be prevented using conditional GET. The server chooses the size of the feed documents so it can control it’s own load. It’s even possible to avoid lost updates or list an entire collection using ‘first’, ‘last’, ‘next’ and ‘previous’ links (as in this tip). There’s no direct equivalent of PMH’s ‘from’ but as long as the feed has timestamps on each entry, then the client knows when to stop retrieving more feed chunks.
I’m currently reading the REST book, so I’m in a frenzy of resource-oriented fervour. OAI-PMH is, in the REST patois, a STREST interface (this theme was picked up in the discussion between Carl Lagoze and Andy Powell recently). The rich resource discovery possible with OAI-PMH is also overkill for what I’m after here.
I’m also unsure about syndication – I have a feeling that the resource representations in Atom / RSS feeds are unlikely to satisfy most repository clients’ needs. Isn’t a more resource-oriented approach to simply link to the resource and let the client negotiate with the resource for an appropriate representation? If so, Sitemaps fit the bill perfectly.
Well, maybe, but on balance I still think that Atom / RSS is a better choice; the RESTful repository will almost certainly have a feed around for human clients, and it’s better to adapt this for machine clients than adopt an additional mechanism.