Repository architectures, leaky abstractions and Paul’s principles

July 7, 2008

Last Thursday I attended a JISC workshop on repository architectures. It was a thought provoking day, and I learned a lot. Firstly I learnt that I need to pay more attention to context when quoting people on twitter (sorry, @paulwalk).

Paul Walk kicked off the day by presenting his work on a next generation architecture for repositories. His presentation started off with a number of starting principles and moved on to some diagrams illustrating a specific architecture based on them. As Paul mentions in that blog post, his diagrams and the principles behind them were “robustly challenged”. As far as I remember, the diagrams were challenged more robustly than the principles.

To cut a longish story short, the discussion and the workshop exercises brought up some interesting ideas, but did relatively little to either validate Paul’s architecture diagrams, or to provide a working alternative. Chatting about it over lunch and later over a pint, I was persuaded that we were looking for an abstraction that doesn’t exist, and that the desire for a single generic repository architecture might have led us down the garden path.

Software engineering, being a field that values pragmatic epistemology, has a couple of empirically derived laws that might help to explain why. Firstly, Larry Tesler’s law of the Conservation of Complexity states (in a nutshell) that complexity can’t be removed, just moved around. A natural way to manage this complexity is find abstractions that hide some of it. This, fundamentally, is what the repositories architecture is trying to do – reduce the multiplicity of interests, politics and data-borne activities of HE into a single abstract architecture.

A second empirical law, The Law of Leaky Abstractions, states that all non-trivial abstractions leak. Some of the complexity cannot be hidden behind the abstraction and leaks through. It feels to me that this is what’s happening with repositories at the moment. Our abstraction (centralization, services provided at point of storage etc) fails to cope with real, current complexities. The problem itself is extremely complex, and if anyone really has their head around it, they’ve still got the hard task of communicating it to the whole community so a good shared abstraction can be developed.

I found myself going back to Paul’s starting principles, and concluded that they were a much more constructive framework for thinking about repository issues than the concrete architectures in the diagrams. Paraphrasing the principles: –

  • Move necessary activity to the point of incentive
  • [Terms of reference for IRs]
  • Pass by reference, not by copy
  • Move complexity towards the point of specialisation
  • Expect and accept increasing complexity on the local side of the repository with more sophisticated workflow integration.

With the exception of the point on IRs, they are all forms of guidance on complexity, either where to move it (“Move [metadata] complexity towards the point of specialisation”), or which trade-offs to make (“Pass by reference, not by copy” => “Prefer to deal with the complexities of references than the complexities of duplication”). The reason I like this approach is that different disciplines, institutions and activities (e.g. REF, publication, administration) all have different complexities and different drivers. Perhaps we need a number of different architecture abstractions based on constraints and drivers. Perhaps the idea of an architecture abstraction is premature in this community and we should focus on local solutions (in the sense of ‘minima’ rather than geography). This needn’t end in technical balkanization; the repositories architecture is driven by business models, and focusing on interoperability and the web architecture allows more of the technical discussion to happen in parallel.

To get the ball rolling, I’d like to add a caveat to Paul’s “Move [metadata] complexity towards the point of specialisation”: “… unless it’s there already and it’s harder to recreate than maintain”. Any more?


Andrew McGregor has posted extensive minutes and notes from the meeting.

4 Responses to “Repository architectures, leaky abstractions and Paul’s principles”

  1. Where I get stuck on “pass by reference” is at the point of ingest. Are we expecting SlideShare, Flickr, etc. to pass stuff to us by reference, or are we going to go out and get it?

  2. ojd20 Says:

    1: Flickr et al already do pass us references in as far as they use nicely structured URLs and (presumably) understand how much it would upset people if they changed them.

    If I’ve understood your question, a question in reply is “Are flickr’s URLs trustable enough for your application?”. If not, you might want to copy the content to somewhere that’s going to give you a more trustable reference, and/or use an indirect reference (e.g. purl) so you can move it later without breaking the reference.

    I think that principle could be rewritten as “Don’t create duplicates without good cause or considering the problems created”.

  3. Paul Walk Says:

    a useful commentary – thanks. I think I reached the same conclusion – that architecture diagrams at this stage are a distraction from the guiding principles.

    I’ll be revising the work based on comments such as these and will include your suggestions for new principles.


  4. Daniel Craig Says:

    Hey, I was looking around for a while searching for data architecture and I happened upon this site and your post regarding Repository architectures, leaky abstractions and Paul’s principles, I will definitely this to my data architecture bookmarks!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: