SPECTRa released

November 28, 2007

Now that a number of niggling bugs have been ironed out, we’ve released a stable version of the SPECTRa tools.

There are prebuilt binaries for spectra-filetool (command line tool helpful for performing batch validation, metadata extraction and conversion of JCAMP-DX, MDL Mol and CIF files), and spectra-sub (flexible web application for depositing chemistry data in repositories). The source code is available from the spectra-chem package, or from Subversion. All of these are available from the spectra-chem SourceForge site.

Mavenites can obtain the libraries (and source code) from the SPECTRa maven repo at http://spectra-chem.sourceforge.net/maven2/. The groupId is uk.ac.cam.spectra – browse around for artifact ids and versions.


The JISC funded eCrystals project began a fortnight ago, and “will establish a solid foundation of crystallography data repositories across an international group of partner sites”. It’s being led by the Simon Coles at the UK National Crystallographic Service, and we’re core partners along with UKOLN and the DCC. The project wiki is now available, so stick it in your aggregator to keep up to date.

eCrystals is an exciting opportunity for us to work to see the outcomes of SPECTRa and CrystalEye put into wider use – I’m looking forward to it!

Miscellaneous links

July 11, 2007

Chemistry Central (and PMR) link the SPECTRa final report.

Nico Adams blogs about the Polymer Builder I’m now responsible for developing.

A couple of frameworks I’ve been playing with (for a pet project that got scooped by PMR at the end of a recent post).

Les Carr joins the blogosphere.

After the persistence of a shoulder problem was ascribed to my work posture I’ve been using work monitors to remind me to take “straighten up” breaks, and can recommend both Workrave (Win, Linux) and AntiRSS (OSX). Both far less irritating than their predecessors of a couple of years ago.

I finished working for the SPECTRa-T project last Friday. As of today, I’m employed by the Unilever Centre. I’ll still be providing technical advice and software development management for the SPECTRa-T project, and involved in various JISC groups (especially the SWORD project which, incidentally has been garnering approval already!)

The role at the Unilever Centre basically splits into two parts. I’ll be working as software development co-ordinator and integrator (pumpkin holder) on the polymer informatics project, and I’ll be expanding the work I’ve been doing in my ‘spare time’ providing support for software engineering in the centre, with a much greater emphasis on providing training and best practice advice.

The SPECTRa-T project has appointed a developer, who will arrive (and be introduced!) in a couple of weeks. In the meantime, if you’re interested in the project, some initial documentation is appearing on the spectra-chem sourceforge site.

Peter Sefton has blogged about his progress integrating chemistry into ICE. I’m excited about this – chemical molecules are a good example of needing custom representations depending on the output context; CML for machines, CML + the JMol applet for hypermedia documents, PNG for static documents. This also has potential for us as part of SPECTRa-T, providing a rich view of chemical metadata annotations.

The project web sites for the SPECTRa tools are now online, and I’ll be moving the help and documentation over in due course. The main point of the SPECTRa tools is to make building repository ready packages of X-Ray crystallography, NMR spectroscopy and Gaussian input files as easy as possible.

Once prepared, the packages can be saved to local storage for manual deposition or deposited into a DSpace repository (although this requires some customization of DSpace). Hopefully I’ll have time to write a SWORD client for it too, and I’ve been thinking about writing an S3 client for fun.

In other news: –

  • Peter Suber blogged the SPECTRa report
  • The WWMM server is finally back on decent hardware with all its data. Hopefully there won’t need to be any more outages for a while.

Earlier this week I attended JISC’s Dealing with the Data Deluge conference; part of their digital repositories programme work. The presentations were good, and more importantly there were some very interesting thoughts flying around in coffee rooms, dinner halls and pubs.

One of the stand out presentations for me was John MacColl’s presentation on the findings of the StORe project, which was investigating issues around data repositories and linking research publication repositories to data repositories. Two items in particular caught my notice.

Firstly, StORe found that whilst academia treats PhD students very differently to postdoctoral researchers, their data management, curation and reposition requirements are the same. This is interesting from my point of view on the SPECTRa-T project; it’s reassurance that SPECTRa-T will be relevant to the wider problem of chemistry publications even though our focus is on theses.

It’s also encouraging for anyone who wants the state of the art in data repositories to move forward, since this will almost inevitably require changes in the behaviour of researchers and PhD candidates tend to be more open to change.

The second thing that particularly caught my notice was StORe’s conclusion that data curation is difficult task which we cannot / should not burden researchers with. Additionally, it’s so specialised that the expertise probably can’t be provided at an institutional level, but could be successfully handled by a number of (perhaps peripatetic) specialist data librarians (e.g. funded by JISC).

This strikes a chord; from my early experiences with chemistry data on the DSpace@Cambridge project building the WWMM collection there, it was clear to me that a centralized institutional repository service could not hope to effectively preserve specialist scientific data. It seemed to me that preservation could only be achieved by a collaboration between people with curation expertise (librarians) and domain expertise on data formats and trends. Thinking on it more I’ve decided that you can apply this not just to “specialist scientific data”, but to any data that isn’t in the usual run of office and web formats. John’s findings are a more wide ranging statement of this, applying to all of curation, not just to preservation. It’ll be interesting to see whether and how the JISC or other funding bodies take this idea up.

As John pointed out (supported by Chris Rusbridge subsequently), this all makes the AHRC’s strange decision to cease funding for the AHDS particularly disappointing, especially since AHDS are providing a service that’s pretty close to John’s vision. Let’s hope this petition has some positive impact.

SPECTRa on Sourceforge

May 24, 2007

I’ve just finished moving the SPECTRa code to the spectra-chem sourceforge.net site. The technical web sites will be moving over in due course, and I’ll be switching issue tracking to the sourceforge trackers too.

SPECTRaT is go!

March 16, 2007

The application for funding for SPECTRa-T (Theses) from JISC has been successful. SPECTRaT will kick off at the start of April, and will be looking at extracting data from chemistry theses and depositing both in digital repositories, and will be collaborating with researchers on the SciBorg project for natural language processing.

Congratulations to all the SPECTRa team!

SPECTRa in the news

March 8, 2007

Novel search engine matches molecules in a flash” is an article referencing an RSC proceedings paper (DOI: 10.1098/rspa.2007.1823) describing a novel method of similarity searching. Interesting stuff, and SPECTRa is mentioned near the end by Prof. Rzepa.