Jumbo 5.4 released

September 25, 2007

The final release of Jumbo 5.4 is available from the sourceforge downloads page, or from the WWMM maven repository at http://wwmm.ch.cam.ac.uk/maven2/ with g:cml a:jumbo v:5.4

The code formerly known as CIFDOM is now in its new home at https://cml.svn.sourceforge.net/svnroot/cml/cifxml/trunk . The package for the library is now org.xmlcml.cif (rather than uk.co.demon.ursus.cif).

Maven users obtain the new package from the WWMM repo at http://wwmm.ch.cam.ac.uk/maven2/ groupId:cml artefactId:cifxml version:1.2-SNAPSHOT.

The latest CIFDOM code is still available from sourceforge or the WWMM repo g:cml a:cifdom v:1.1

Just picked this up on the digital preservation list – the national library and national archives in the Netherlands have announced Dioscuri, a digital preservation emulator. Dioscuri “is capable of emulating an Intel 8086-based computer platform with support for VGA-graphics, screen, keyboard, and storage devices like a virtual floppy drive and hard drive.”. As such it can emulate 16-bit operating systems.

This is interesting in itself – there are some great arguments for emulation as an approach to preservation and it’s good to see more progress in the area. But what’s really exciting about Dioscuri is that it has taken only 2 man-years to develop. The expense and difficulty in writing emulators has often been used as an argument against emulation, as has the perceived need for the efforts to be centralized. It seems Dioscuri changes the game!

In an interesting post, Karen Coombs shares some of her issues in relating their library web site redevelopment to the need to provide web services to the rest of the university: –

If faculty could do their searches without coming to the library site would they? I think the answer is yes.

Long term I’d like a site which has a series of web services that can be exploited by my developers but also my the university web developers and who knows who else. Focusing on content rather than look and feel will allow us to provide these different types of services. It will also allow different types of users to potentially selectively access content.

I don’t think I’ve read anything like this outside a REST advocacy presentation!

Ultimately, I feel like it is these kinds of services that will make of break a library’s virtual presence not the library website. And with a limited staff, this means I like to choose carefully how much time I have my small staff spend on the tradition site. Otherwise, we could spend all our time caught up in look and not enough time working to make the library meet users where they are and be a seamless part of their work processes.

This doesn’t have to be a choice. Because Karen is concentrating on content, she is in a superb position to deliver the services she describes through the website, using good semantic markup, linked resources and well tempered feeds or sitemaps, using The REST book as a manual. This is an advantage of REST I hadn’t fully grokked; it’s cheaper. If you already have a website and need to provide service to your users, it’s quicker and easier to develop the website further RESTfully than to start an entirely separate service delivery development.

My congratulations to Peter, Peter, Colin, Richard and all involved in making Project Prospect such a success.

Why InChIKey?

September 12, 2007

Egon Willighagen has posted on the release of the latest InChI software. Egon (and others) are concerned about the implementation, especially that InChIKeys aren’t guaranteed unique. At a more basic level, I’m wondering whether people agree with the stated needs for InChIKey.

Facilitate web searching
Even though Google are coping with InChI very well, having a representation of InChI that didn’t break standard tokenization routines, and that could be attractively included in prose would be handy.
Allow development of a web based lookup service
Not really sure what’s meant here. As Egon pointed out in the comments to his post, he already has one of these, and it didn’t require InChIKey!
Permit an InChI representation to be stored in fixed length fields make chemical structure database indexing easier
Because RDBMSs have such a hard time indexing VARCHAR? Really?
Allow verification of InChI strings after network transmission
This is not a problem that needs solving again – using MD5SUMs would do the same job.

I make that one out of four and would argue that the only problem with InChI is the length of the identifiers and the issues caused by the characters used. This could be solved by having a centralized service that assigned short HTTP URLs for InChIs, ensuring a one to one relationship between InChIs and shorthand URLs.

Java is the new Bash?

September 7, 2007

Shock! Dilbert programs in Java!

Since that link probably won’t survive the test of time, today’s Dilbert cartoon has Dilbert coding an incompetent co-worker job in Java. This is (hopefully) a reference to the original line: “I will replace you with a shell script. A very short shell script.“. Amusing how this translates into Java in Dilbert: –

  • Java is the language of choice for replacing incompetent people
  • Even an incompetent person’s competent funcations can’t be coded in a short app

Damnation by slight praise, then?

Foo-Oriented Software

September 5, 2007

Erlang is oh-so-hot at the moment (which must be a novel experience for such an old, mature language), but Tim Bray isn’t convinced: –

… I think that the human mind naturally thinks of solving problems along the lines “First you do this, then you do that” and thinks that Variables are naturally, you know, variable, and has grown comfortable with living in a world of classes and objects and methods.

This got me thinking. My reaction to Erlang variables was the same: “they’re not variables, why don’t you call them something else?”. I think the answer is basically that it’s easier to understand the concept by remembering them as variables that aren’t, rather than having to build up a whole new concept. The other thing about variables is that they’re not all that instinctive when you’re learning to program – they don’t work like in maths, so an expression looks like an equation, but doesn’t (in imperative languages) work like one. In Erlang, it does (or it fails ;-)).

It is practically impossible to teach good programming style to students that have had prior exposure to Basic; as potential programmers they are mentally mutilated beyond hope of regeneration. — Edsger Dijkstra

Most people start OO code by writing huge long main methods and writing most other code in static methods with some objects only if they need data structures, i.e. the most natural way to solve a problem is to start somewhere and perform a sequence of actions. So why use OO at all? To abstract the solution to solve a whole a family of problems, to decompose the problem to make it easier to solve and to make the solution easier to understand etc. The thing I like best about in Java is that it’s fairly easy to create code that can be easily understood by others. I suspect that this quality is at the root of Java’s adoption, and that the tools are as important as the language features (or lack thereof…); Javadoc’s contribution shouldn’t be underestimated.

More investigation is required to see how well Erlang does in creating comprehensible, reusable code!