A successful codebash (via) as part of the ICE RS project, that got a load of useful vertical integration done. Hopefully we’ll be seeing something similar in the CRIG space soon!

A couple of weeks ago, after using Blogbridge for around 18 months, I packed up my OPML and went over to Google Reader. This post is summarizes my experiences.

I chose BlogBridge in the first place because I wanted a reader capable of offline reading (which meant a desktop app at the time), and needed something that a) ran on both linux and os x and b) synch’d between installations. I switched to Google Reader for one reason: BlogBridge doesn’t synch which items I’ve read.

Google Reader Pros

Subscription suggestions. Google’s suggestions are OK – but mainly for finding the subscriptions you should have aggregated anyway rather than for finding interesting but offbeat blogs.

It remembers what I’ve read. It was painful in BlogBridge to switch to my laptop for the first time after a week and seeing thousands of unread items you’ve spent the week reading and clearing down.

There’s something about the rendering that means I end up reading more articles in the feed reader, which is kind of the point. This is probably because it works in the browser window, so I give lots of screen real estate to the browser. With BlogBridge, the app and the browser had to share the real estate, especially on OS X where I had to click on the app again to regain focus (Windows and OS X users probably don’t understand how annoying this is to some linux users).

Reader + Gears is as good as a desktop app, which is the point, of course!

The subscription bookmarklet means I’m more likely to subscribe to things I find interesting. Which should be a good thing.

I use the article star to indicate “come back and read in more depth”, which works well.

Google Reader Cons

Authenticated feeds. Reader doesn’t have them, but frankly, if it did I wouldn’t use them (Google knows enough about me without me giving them my passwords). I’ve realized how important the few authenticated feeds were to me, so I’m going to be running BlogBridge again, just for them.

Prioritisation. I used guides in BlogBridge to tier my feeds – I’d work my way down the list of guides until I’d run out of blog reading time. I could have used the feed starring mechanism to do the same thing. Reader simply doesn’t give me the tools to prioritize 162 subscriptions.

Trends. When it comes to attention data, blog reading stats are solid gold. Reader’s Trends console is cute, but isn’t giving me a lot back for my attention data. Where’s the tool that automatically prioritizes my feeds in order of which I’m most likely to find interesting? Where’s the management tool that points out I haven’t read a certain feed in months so I could think about de-subscribing? Where’s the XML download that allows me to get my attention data back from Google?

Roundup 14th Dec P.S.

December 14, 2007

I omitted some important news: OAI-ORE released an alpha spec. I’d urge anyone with an interest in interoperability to read and comment – the definition of compound object boundaries on the semantic web isn’t done fantastically well at the moment and the current idiom of pass-by-val between repositories (with content packages) means a bunch of headaches that pass-by-ref (a la ORE) avoids – so it’s important to get this right early.

Roundup 14th Dec

December 14, 2007

It’s been a tough week for (especially institutional) repositories: – with some of the criticism specific (David Flanders, Dorothea Salo) and some a little more general (“centralization is a bug”). But it’s not all doom and gloom; Paul Walk clears up being taken out of context and Peter Murray-Rust announces the Microsoft e-Chemistry project, which I’m optimistic will make big advances in practical repository interop.

On a complete tangent, I discovered Mark Nottingham’s blog and read with delight about his HTTP caching extensions. My relationship with HTTP has been like one of those infernal teen movies where the in-crowd-but-with-a-conscience kid finally dispels the fog of peer pressure and preconception, and invariably finds the class dork to be the member-of-opposite-sex of dreams. Not that an in-crowd kid could ever utter the words “my relationship with HTTP”.

About a year ago; Peter Murray-Rust showed his research group a web interface that allowed you to type SPARQL into a textarea input and have it evaluated. I had a flashback to people being shown the same thing with SQL years ago. So if SPARQL follows the same pattern, the textareas will disappear so the developers take the complexity of the query language and data model away from the users, then the developers will write enormous libraries (c.f. Object Relational Mapping tools) so they don’t have to deal with the query language either.

Ben O’Sheen recently posted on Linking resources [in Fedora] using RDF, and one part particularly jumped out at me: –

The garden variety query is of the following form:

“Give me the nodes that have some property linking it to a particular node” – i.e. return all the objects in a given collection, find me all the objects that are part of this other object, etc.

I think the common-or-garden query is “I’m interested in uri:foo, show me what you’ve got”, which is the same, but doesn’t require you to know the data model before you make the query. Wouldn’t it be cool to have a tech that gave you the “interesting” sub-graph for any uri? Maybe the developer would have to describe “interestingness” in a class based way, or it could be as specific as templates (I suspect Fresnel could be useful here, but I looked twice and still didn’t really didn’t get it). Whatever solution looks like, I doubt that a query language as general and flexible as SPARQL will be the best basis for it, for the reasons Andy Newman gives – what’s needed is a query language where the result is another graph.

One thing and another meant I was unable to blog final thoughts and summaries about the CRIG unconference that I attended last week, so this is rather long, being a combination of the post I would have written Friday afternoon and a couple of consequent thoughts.

Firstly, on unconferencing, or at least the way it was implemented for CRIG; I like it a lot. Since we were partly a talking shop and partly a workshop to refine interoperability scenarios and challenges, the main session worked essentially like a lightweight breakout session system – topics were assigned to whiteboards, and people chose topics, migrated, discussed and withdrew as they wished. It was leagues more interesting and more productive than being assigned a breakout group with a topic. Successive rounds of dotmocracy helped to sort out the zeitgeist from the soapboxes. I could see this format working extremely well for e.g. the Cambridge BarCamp, or the e-Science All Hands meeting.

This was the first face to face meeting of the CRIG members as CRIG members, and really helped to frame the agenda for CRIG. I realized that there are some big issues underlying repositories that only become really important when discussing interoperability. For example, I can see OAI-ORE creating the same kind of fun and games around pass-by-ref, pass-by-val that the community currently enjoys when discussing identifiers, and just like identifiers, it touches just about every scenario.

One message came out pretty strongly; the emphasis on repositories isn’t useful in itself. One of the topics for discussion that passed dotmocracy (i.e. was voted as something people wanted to talk about) was “Are repositories an evolutionary dead end?”, a theme picked up by David Flanders. Well, I personally don’t think so, but then I’ve probably got a more malleable definition of “the R word” (as Andy Powell puts it) than most. If I’ve read the mood correctly, people are beginning to regard centralized, data storing, single-software, build-it-and-they-will-come IRs as a solution looking for a problem. Some regard repositories as a complete diversion, others that we should act on our improved understanding of the problems in academic content management and dissemination by acknowledging failed experiments and moving on quickly. Nobody gave me the impression that they thought the current approaches would work given a couple more years, more effort or more funding.

This has all been said before; when the conference was over, I reminded myself of Cliff Lynch’s 2003 definition of the Institutional Repositories, which describes institutional repositories in terms of services, collaboration, distributed operation, commitment. If you haven’t read it, or haven’t read it in a while, go back and take a look, it’s in the 5th para.

Whilst it’s only a view of how things should be, I think it’s a good view, and it neatly sums up what’s important about repository interoperability – it’s about the interaction between systems needed to achieve a repository.

Discussions this morning: –

What are we GETting? How to answer questions like “where’s the license for this resource”, “where’s the thumbnail of this large image”. There was talk of content negotiation services (e.g. http://openurl.myrepo.com/open?resourceid=567&action=license). The alternative (which I strongly favour) is to use descriptive links (e.g. link rel).

Problems and opportunities in utility computing (using EC2 / S3 etc etc). The problems are most often extremely prosaic – persuading the institution to provide a credit card with an unknown spend. Probably the best idea that came out was to use utility computing for a personal repository – your institution covers the costs and adds their branding while you work with them, and you can take your personal repo between repositories easily.

Multiple submission (e.g. to IR + subject repo + RAE tracker etc). As users, we’d like a single submission system for all these systems, e.g. put a presentation in ‘the system’ and have it propagated to slideshare + IR. As an observation, there are huge issues in pass by val (c.f. packaging) / pass by ref (c.f. ORE) that are not going to be resolved soon (probably at all).

P.S. can you guess the theme of the post titles?

[In previous episodes] : I’m at a JISC Common Repositories Interfaces working Group unconference event. This is novel to most of the participants, so we’re learning about the format as well as discussing the issues. The first day consisted of introductions and some brainstorming type exercises designing to bring out issues to take forward into the unconference.

Last night’s networking event was essentially a continuation of the discussion during the day. The change of location was useful, though; a lot of the early conversation was of the form “I’m amazed we didn’t talk about …”. The great thing about the unconference format is that we can fix those problems easily, rather than going home from the conference thinking that it was all very interesting, but didn’t really tackle burning issues.

One of the random thoughts that came up last night: communication between people involves semantic loss. To put it another way, you have a set of meanings you associate with the word “communication”, and so do I. They’re unlikely to be the same, but here we are, you reading (probably wondering why at the moment) and me writing. This isn’t a problem, because we naturally know that this is happening, and have ways of avoiding (rather than preventing) problems – like redundancy (e.g. “To put it another way…”). Perhaps the starting point for any interoperability should be about “good enough” and redundancy should be encouraged?

The first session of the unconference turned out to be a kind of brainstorm to extract pertinent issues from the mindmaps generated through the preparatory chats.

The next step is a round of ‘dotmocracy’, which is a way of getting a bit of consensus on which of these issues people are interested in.

The last chat I was part of brought up the question of why we should bother with digital preservation. The argument against it usually goes that if people find resources useful they will preserve them anyway. I personally think that a kind of public interest theory is applicable due to the fact that current value of a resource is often lower than the future value of a resource – intervention is needed to protect the future value of the resource.

On reflection, though, it’s not the issue we should be discussing at an interoperability meeting. What we should be thinking about is “If someone wanted to preserve the resources in a repository, what interfaces / services would they need to be provided with?”

(I’m blogging this now because I don’t expect preservation to make the cut after dotmocracy).

CRIG Live un-blogging

December 6, 2007

I’m at the JISC CRIG (Common Repositories Interfaces working Group) two-day Unconference today and tomorrow. We’re about to start the unconference proper.

This will all make sense very soon. Just follow me blindly for now.

David Flanders.

Well, here goes…