Lensfield IEEE paper

December 11, 2009

I’ve uploaded the slides (DSpace@Cambridge, Slideshare) from my presentation on Lensfield at the IEEE eScience conference today. Thanks to everyone who made it through a long week and still chose to come to listen – I hope it was worth it.

Tiny Clojure snippet

December 2, 2009

At the Cambridge Clojure meetup last night I decided to have a go at something work-related, but that I don’t usually get time for. The OSCAR3 chemistry entity extraction and text-mining tool uses N-Grams to create feature vectors to classify words. It turned out to be pretty short work for loop...recur, but then Nick Day pointed out that I’d just reimplemented partition, which turned the function into: –


(defn ngrams [word n]
  (let [pad (repeat (dec n) \#))]
    (partition  n 1 (concat pad word pad))))
(ngrams "foobar" 3)
-> ((\# \# \f) (\# \f \o) (\f \o \o) (\o \o \b) (\o \b \a) (\b \a \r) (\a \r \#) (\r \# \#))