Charles W. Bailey Jr. has blogged some statistics on repository software adoption, based on the OpenDOAR directory.

1) Before drawing conclusions about the comparative uptake of softwares, it’s worth considering what’s being counted here – some repository softwares support / tend to be used for multiple collections, some tend not to be used in this way.

2) Some of the softwares seem to be regionally focussed (although again you need to be careful since the normalization is by country (e.g. 14% of NL != 14% of US). This indicates that community is a crucial factor in software adoption for digital repositories.

I don’t usually geek out over hardware, and I try to resist becoming a sysadmin myself, but I found this interesting: Everything you know about disks is wrong. It has some some implications for how preservation systems are run, and is in part a fillip for the LOCKSS approach.

I’ve been mulling for a while about what programming languages we’re going to be like a few years from now. A current angle I’m taking is to look at the trends in hardware development and conclude that a good programming language for future will make writing concurrent code as easy as possible. When I mentioned this, Andrew Walkingshaw said I should take a look at Erlang.

I’m impressed so far – a simple example of a two thread program with message passing fits comfortably into 30 lines, and you can locate either thread on a local node or a remote node with minor code changes. And I’ve realized something else – if concurrency does end up driving language choice in the way I think it might, then functional languages are going to get a lot more limelight than previously.

Erlang comes from the telecoms world, and many of the current production applications reflect that – multithreaded mail, web and messaging servers. But it’s also easy to see erlang being used in an eScience setting, both to write programs that take advantage of SMP nodes, or to orchestrate batch jobs across clusters (etc etc).

Andrew has also pointed me at Fortress, which is even higher level than Erlang with respect to threading (some operations like “for” loops are implicitly multi-threaded). Another one to check out!

The truth is out where?

February 21, 2007

I would be fairly surprised if any of the readers of this blog didn’t also read Jon Udell, but just in case: Jon Udell has been blogging about the two fundamentally approaches to maintaining metadata about files.

One way is to use metadata embedded in the file (or potentially in a sidecar file). The other is to “stand-off” the metadata in a database. This debate will sound familiar to many, and to anyone who followed the early DSpace 2 discussions in 2005, where this was one of the most contentious issues.

Now to my mind if you’re just talking about a tagging photos then the correct approach is to keep the data in the file – most OS’s have mechanisms to observe file changes, so it seems sensible to let all the applications that index tags just observe changes.

Isn’t this the case for a repository application like DSpace? It would make for really great decoupling if we could just let multiple update components independently write to a filesystem and have reading components observe the changes. It would also probably scale well with respect to system complexity and load, since any work done by the observers wouldn’t block the update.

The big wrinkle comes when you want to constrain the metadata you write. If you write a “part of” field referring to another resource in the repo, does that resource exist? Are the controlled vocab fields valid?* If you want to maintain consistent state like this you need a single point of access for storage updates.

Of course you don’t have to maintain consistent state. You can accept that state might be corrupted and make sure that anything dealing with relationship metadata is fault tolerant. The web has scaled well, because of (rather than despite) the ability to create broken links, and to break them by moving content.

I’m undecided about what this means for repository systems. Are the rigours of worrying about transactions, correctness and constraints more of a pain than cleaning up after (hopefully infrequent) conflicts occur?

* A couple of years ago I was chatting to Mick Bass (now head of the SIMILE project) and wondered how you could do this using RDF. “How closed-world of you” was the reply. Why is the answer always slightly zen when the question involves semantic web?

Eloy Rodrigues has posted to dspace-dev announcing the releaseof new versions of Minho University’s Statistics and Request-Copy add-ons. These releases bring DSpace 1.4 compatibility and some new features: –

Statistics Add-on:
– Detection, processing and distinction of access/downloads from the
repository institution
– New 3:4 country flags (Courtesy from Arthur Sale)

Request Copy Add-on:
– Change access to Open Access option at the end of the replying process.

Elliotte Harold on Java 7

February 1, 2007

Whilst I don’t agree with all of the points he’s making, anyone who’s interested in the future of Java as a language or generally geeks out on programming language development should check out Elliotte Harold’s recent blog posts. Start here.

Elliotte is conservative in his approach to Java, arguing that additional syntaxes and ones which make constructs implicit are bad for learning a language and bad for understanding programs. They’re valid points. On the other hand I can also see that there’s no point understanding programs that no-one runs any more because the programmers got bored and went off to program .net or ruby, which is what Sun must be worried about.

In his latest post he takes apart a proposed syntax for cleaning up functor anonymous inner classes (implementations of interfaces with only one method). I’m not entirely convinced by it either, but I am convinced by the requirement – if Java is going to be useful in an increasingly distributed environment it needs better syntax for doing things like callbacks and threads.

Previously, when Elliotte was disparaging about the properties proposals (which are almost totally over-complex in my eyes) he had a better proposal waiting in the wings – hopefully that’s the case this time!