Tiny Clojure snippet

December 2, 2009

At the Cambridge Clojure meetup last night I decided to have a go at something work-related, but that I don’t usually get time for. The OSCAR3 chemistry entity extraction and text-mining tool uses N-Grams to create feature vectors to classify words. It turned out to be pretty short work for loop...recur, but then Nick Day pointed out that I’d just reimplemented partition, which turned the function into: –

(defn ngrams [word n]
  (let [pad (repeat (dec n) \#))]
    (partition  n 1 (concat pad word pad))))
(ngrams "foobar" 3)
-> ((\# \# \f) (\# \f \o) (\f \o \o) (\o \o \b) (\o \b \a) (\b \a \r) (\a \r \#) (\r \# \#))

In case they’re of wider interest.

  1. Get used to "literal".equals(object). Get used to it. Yes, yes, blah blah blah. Just get to used to it!
  2. Useful classes 101: there’s a generic BidiMap in the Commons-collections library, version 4.0. Hasn’t hit maven2 repo yet, but it’s released (javadocs).
  3. Useful classes 101: org.apache.commons.io.output.NullWriter – writes to dev/null
  4. Boolean unpleasantness. Don’t test myBoolean == true, just use myBoolean. Also remember that this: –

    if(something) {
      return true;
    else {
      return false; 

    is a really long way of saying return something.

  5. No Magic Numbers
  6. Yes, Tuples are lovely. If you really don’t want to design your way around their absence in Java, use something like the generic Pair class.

A couple of months ago, we started using the Murray-Rust group meetings to perform code review. It’s been interesting – sometimes we look at fairly low level idioms, sometimes at more fundamental refactoring and design issues. I’m particularly pleased that the tone has remained constructive, even when PM-R took centre stage last week.

Occasionally we stumble across an issue for which there is no established or widely accepted best practice. One of the things that came up recently was what you should do about the “Unchecked Type Conversion” warning that appear when you try to shoehorn a 1.4 library into 1.5 code that’s expecting generics. I should probably have posted it myself, but Joe has done me the service of recording the results of our discussion and research.

I’ll be going to the weekly Cambridge Clojure Meetup next Tuesday, 10th November, 2009.

The time place is the usual, 8pm at the Kingston Arms.

Update: I’ve set up an upcoming entry for next week’s meeting.

The paper by Sefton, Barnes, Ward and Downing that I presented at the International Digital Curation Conference last Edinburgh has been published in the International Journal of Digital Curation: http://www.ijdc.net/index.php/ijdc/article/view/121.

Myself and a couple of colleagues from the Unilever Centre will be at the Cambridge Clojure meetup this evening.

Time: Tuesday 6th October, 2009. 8pm
Location: The Kingston Arms. Has food, wifi and usually excellent and varied ale. 10 min walk from the train station.

The JISC-sponsored ICE-TheOREM project’s final report is available from the IE repository. The project was a technical investigation into applying ORE to scholarly workflows; along with Peter Sefton’s team at the Australian Digital Futures Institute at the University of Southern Queensland, we developed a demonstrator that passed ORE-described ETDs between systems to recreate a thesis submission process.

Au revoir, Nico!

October 1, 2009

I’d like to publicly mark the occasion of Nico Adams leaving the Murray-Rust group. Nico has been a hugely influential member of the group, and a great colleague to work with, always ready with a rigorous position in a vigorous discussion!

I’m hoping we’ll be able to collaborate in future. Enjoy your break, Nico!

There are a diminishing number of people in the world I haven’t cornered and harangued with this argument, which is that the dominant languages of the future will prevent the worst 95% of programmers hanging themselves when writing concurrent programs, rather than the one that is efficient in the hands of the other 5%. This is why I’m convinced functional languages will happen this time around. It seems Tim Bray agrees, at least in principle:

… Once you get past Doug Lea, Brian Goetz, and a few people who write operating systems and database kernels for a living, it gets very hard to find humans who can actually reason about threads well enough to be usefully productive.

Tim Bray

The Cambridge (UK) Clojure Meetups have become an informal but regular occurence, weekly on Tuesday evenings at 8pm in the Kingston Arms. I’ll be there next Tuesday.

There’s also a camclj google group set up for discussion.