July 27, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Clojure: Immutability at the Language Level

  • April 2, 2010
  • By Michael Fogus, Chris Houser
  • Send Email »
  • More Articles »

Map Varieties and How to Use Them

Compared to the other composite data types Clojure provides, it seems that its map should be unambiguous in its use cases. Although this is certainly true in many cases, there are some points to be made concerning the differing flavors of map provided. In addition, we'll make a small point about using maps in many instances where you might be tempted to use other structures. But in general a simple rule applies: Maps should be used in any case where your data structure requires a logical mapping of keys to values.

It's difficult to write a program of any significant size without a map of some sort. The use of maps is ubiquitous in writing software because frankly it's difficult to imagine a more robust data structure. But we as programmers tend to view maps as a special-case structure outside the normal exaltations bestowed on classes. That is, the object-oriented school of thought has relegated the map as a supporting player in favor of the class. We're not going to talk about the merits, or lack thereof, for this relegation here, but throughout our book we discuss moving away from thinking in classes and instead thinking in sequence abstractions, maps, sets, and types. Having said all of that, it need hardly be mentioned that maps should be used to store named values. But there are times when you may wish to use positional elements within a sequence to represent some structure. This approach tends to be brittle and can often be better served by using maps instead because they're self-organizing through the use of named values. Clojure's maps support an arbitrarily complex nesting of persistent types for both keys and values, so if you want a map that has vectors of maps as its keys, that's perfectly okay. Let's talk about each form of map that Clojure provides and the tradeoffs surrounding each.

Hash Maps

Arguably, the most ubiquitous form of map found in Clojure programs is the hash map, which provides an unsorted key/value associative structure (Although with the pervasiveness of the map literal, the ubiquity may instead fall to the array map.). In addition to the literal syntax touched on in chapter 1 of the book, hash maps can be created using the hash-map function, which likewise takes alternating key/value pairs, with or without commas:

(hash-map :a 1, :b 2, :c 3, :d 4, :e 5)


The difference in results between the literal form and the construction function is that the latter will always provide a hash map, whereas the former could at times provide an array map. The reason for this incongruity isn't important at the moment, but we'll discuss it in the upcoming section about array maps.

Clojure hash maps support heterogeneous keys, meaning that they can be of any type and each key can be of a differing type.

(let [m {:a 1, 1 :b, [1 2 3] "4 5 6"}]
  [(get m :a) (get m [1 2 3])])

;=> [1 "4 5 6"]


Many of Clojure's composite types can be used as functions, and in the case of maps they're functions of their keys. Using maps in this way will act the same as using the get function in the previous code sample.

(let [m {:a 1, 1 :b, [1 2 3] "4 5 6"}]
  [(m :a) (m [1 2 3])])

;=> [1 "4 5 6"]


Providing a map to the seq function will return a sequence of map entries:

(seq {:a 1, :b 2})
;=> ([:a 1] [:b 2])


This sequence appears to be composed of the sets of key/value pairs contained in vectors and for all practical purposes should be treated as such. A new hash map can be created idiomatically using this precise structure:

(into {} (list [:a 1] [:b 2]))
;=> {:a 1, :b 2}


Even if your embedded pairs are not vectors, they can be made to be for building a new map:

(into {} (map vec '((:a 1) (:b 2))))
;=> {:a 1, :b 2}


Your pairs don't have to be explicitly grouped, because you can use apply to create a hash map given that the key/value pairs are laid out in a sequence consecutively.

(apply hash-map [:a 1 :b 2])
;=> {:a 1, :b 2}


You can also use apply in this way with sorted-map and array-map. Another idiomatic way to build a map is to use zipmap to zip together two sequences, the first of which contains the desired keys and the second their corresponding values:

(zipmap [:a :b] '(1 2))
;=> {:b 2, :a 1}


Our use of zipmap illustrates nicely the final property of hash maps: Hash maps in Clojure have no order guarantees. If you do require ordering, then you should use sorted maps, discussed next.

Keeping Your Keys in Order with Sorted Maps

There's one potential downside to using maps in place of other common structures such as lists and vectors: order agnosticism. It's impossible to rely on a specific ordering of the key/value pairs for a standard Clojure map because there are no order guarantees at all. Should the ordering of a map's entries be important for your applications, then Clojure has two map implementations that provide order assurances. The first of these, which we'll talk about, is the sorted map, which can be constructed using one of a pair of functions: sorted-map and sorted-map-by.

By default, the function sorted-map will build a map sorted by the comparison of its keys:

(sorted-map :thx 1138 :r2d 2)
;=> {:r2d 2, :thx 1138}


But you may require an alternative key ordering or perhaps for keys not normally comparable, say for nontrivial keys. In these cases you must use sorted-map-by, which takes an additional comparison function:

(sorted-map "bac" 2 "abc" 9)
;=> {"abc" 9, "bac" 2}

(sorted-map-by #(compare (subs %1 1) (subs %2 1)) "bac" 2 "abc" 9)
;=> {"bac" 2, "abc" 9}


This means that sorted maps don't generally support heterogeneous keys the way hash maps do, although it depends on the comparison function provided; for example, the previous one assumes all keys are strings (The default sorted-map comparison function is a static member of Clojure's RT class. You can try it out directly like this: (.compare clojure.lang.RT/DEFAULT_COMPARATOR "abc" "bac"). Clearly, based on the ugliness of that call, this isn't something you're expected to do very often.). The default sorted-map comparison function supports maps whose keys are all numbers or are comparable with each other. Attempts to use keys that aren't supported by whichever comparison function you're using generally results in a class cast exception:

(sorted-map :a 1, "b" 2)
;=> java.lang.ClassCastException: clojure.lang.Keyword cannot be cast to
    java.lang.String


One unique feature supported by sorted maps (and also sorted sets) is the ability to jump efficiently to a particular key and walk forward or backward from there through the collection. This is done with the subseq and rsubseq functions, for forward and backward, respectively. Even if you don't know the exact key you want, you can use these functions to round up the next closest key that exists.

Another way that sorted maps and hash maps differ is in their handling of numeric keys. A number of a given magnitude can be represented by many different types; for example, 42 can be a long, int, float, and so on. Hash maps would treat each of these different objects as different, whereas a sorted map would treat them as the same. You can see the contrast in this example, where the hash map keeps both keys and the sorted map keeps just one:

(assoc {1 :int} 1.0 :float)
;=> {1.0 :float, 1 :int}

(assoc (sorted-map 1 :int) 1.0 :float)
;=> {1 :float}


Sorted maps will otherwise work like hash maps and can be used interchangeably. You should use sorted maps if you need to specify or guarantee a specific key ordering. On the other hand, if you need to maintain insertion ordering, then array maps are required, as we'll see next.

Keeping Your Insertions in Order with Array Maps

When dealing with map literals in Clojure, you might be surprised by the following:

{:x 1 :x 2 :x 3}
;=> {:x 1, :x 2, :x 3}

(:x {:x 1 :x 2 :x 3})
;=> 1


It seems counter to logic that a map could allow duplicate keys. Although this is certainly true, this is a side effect of Clojure's current implementation of map literals. That is, in order to build map literals as quickly as possible, Clojure uses an array map to store its instances. An array map in turn stores its key/value pairs as alternating elements within a flat array, which can be populated quickly by ignoring the form of the key/value pairs and blindly copying them into place. This condition refers to the current implementation of Clojure literal maps as array maps, which could conceivably change at any moment. But there's an important lesson to learn from this regardless: Sometimes your best choice for a map isn't a map at all.

You've probably heard many times (including in our book) that when searching for elements, arrays and lists are O(n), or proportional to the number of total elements, and maps are O(1), or constant. Big-O notation refers to a relative measure of significant algorithmic complexity and ignores tangential costs such as hash calculation, index manipulation, and the like. But in practice it turns out that these tangential costs have an impact when dealing with sizes below a small threshold: For structures sized below a certain count, the cost associated with map lookup bridges the gap between a linear search through an equally sized array or list. That's not to say that the map will be slower; instead, it allows the map and linear implementations to be comparable. The advantage of maintaining an insertion order by using a linear structure such as an array map can at times outweigh the slight slowdown. Therefore, it's sometimes useful to design your programs to take into account the possibility of pointed tradeoff such as using lists or vectors in place of maps for sizes smaller than a predetermined threshold. You can work to ensure robustness in the face of these scenarios by not presupposing a map-versus-linear structure at all but instead working against the sequence abstraction whenever possible. Where have we heard that before?

(defn raw [n] (map keyword (map str (map char (range 97 (+ 97 n))))))
(defn mk-lin [n] (interleave (raw n) (range n)))
(defn mk-map [n] (apply hash-map (mk-lin n)))

(defn lhas [k s] (some #{k} s))
(defn mhas [k s] (s k))

(defn churn [lookup maker n]
  (let [ks (raw n)
         elems (maker n)]
   (dotimes [i 100000] 
     (doseq [k ks] (lookup k elems)))))


(time (churn lhas mk-lin 5))
; "Elapsed time: 998.997 msecs"


(time (churn mhas mk-map 5))
; "Elapsed time: 133.133 msecs"


As you can see here, for small sizes, the linear version is about 7.5 times slower than the version using maps, which isn't terribly bad. Not only does the linear map implementation provide comparable speeds at small sizes, but it has the added advantage of maintaining key insertion ordering, and that truly is the sweet spot of Clojure's array maps. But when we increase the number of elements for churn to process, the picture suddenly spoils:

(time (churn lhas mk-lin 10))
; "Elapsed time: 3282.077 msecs"


(time (churn mhas mk-map 10))
; "Elapsed time: 172.579 msecs"


Doubling the size of the structures now widens the divide in execution times between the linear and map implementations. Further increasing the sizes will eventually lead to pauses worthy of taking in a movie. It might be worth trying it yourself if for no other reason than it nicely shows that the size of your data structures will highlight the differences in orders of magnitude.





Page 4 of 5



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel