Sunday, June 29, 2014

Exercism: "Nucleotide count in Clojure"

This is my solution to the Nucleotide count problem in Clojure:

(ns dna)
(def nucleotides #{\A, \T, \C, \G, \U})
(defn nucleotide-counts [strand]
(let
[count
(fn [counted-nucleotides nucleotide]
(assoc
counted-nucleotides
nucleotide
(+ (get counted-nucleotides nucleotide) 1)))]
(reduce count {\A 0, \T 0, \C 0, \G 0} strand)))
(defn count [nucleotide strand]
(if (contains? nucleotides nucleotide)
(get (nucleotide-counts strand) nucleotide 0)
(throw (Exception. "invalid nucleotide"))))

The code is very similar to the one for the Word count exercise. The main difference is that here I used a set to validate that the given nucleotide is valid.

I eliminated the duplication between the set and the map contents in this second version:

(ns dna)
(def dna-nucleotides #{\A, \T, \C, \G})
(def nucleotides (conj dna-nucleotides \U))
(defn nucleotide-counts [strand]
(let
[counted-nucleotides
(zipmap dna-nucleotides (repeat (count dna-nucleotides) 0))
count
(fn [counted-nucleotides nucleotide]
(assoc
counted-nucleotides
nucleotide
(+ (get counted-nucleotides nucleotide) 1)))]
(reduce count counted-nucleotides strand)))
(defn count [nucleotide strand]
(if (contains? nucleotides nucleotide)
(get (nucleotide-counts strand) nucleotide 0)
(throw (Exception. "invalid nucleotide"))))

where I defined a set containing only DNA nucleotides, dna-nucleotides, that I used to define the nucletides set using conj. This dna-nucleotides set served to generate the counted-nucleotides map using the zipmap function.

Trying to avoid duplication I discovered several new things about Clojure.

You can nitpick my solution here or see all the exercises I've done so far in this repository.

--------------------------------

Update:

After learning some new stuff, I've been able to simplify the code a bit more:

(ns dna)
(def ^:private dna-nucleotides #{\A, \T, \C, \G})
(def ^:private nucleotides (conj dna-nucleotides \U))
(defn nucleotide-counts [strand]
(merge {\A 0, \T 0, \C 0, \G 0} (frequencies strand)))
(defn count [nucleotide strand]
(if (contains? nucleotides nucleotide)
(get (nucleotide-counts strand) nucleotide 0)
(throw (Exception. "invalid nucleotide"))))

It turned out that the frequencies function already did the counting out of the box, I jut needed to merge it with the result for an empty strand to make frequencies output conform with what the tests were expecting.

I also made the dna-nucleotides and nucleotides sets private.

You can nitpick this new version here.

No comments:

Post a Comment