October 25, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Clojure: Immutability at the Language Level

  • April 2, 2010
  • By Michael Fogus, Chris Houser
  • Send Email »
  • More Articles »

Vectors: Creating and Using All Their Varieties

A vector stores zero or more values sequentially indexed by number, a bit like an array, but they're immutable and persistent. They're also versatile and make efficient use of memory and processor resources at both small and large sizes.

Vectors are probably the most frequently used collection type in Clojure code. They're used as literals for argument lists and let bindings, for holding large amounts of application data, as stacks, and as map entries. We'll examine all of these facets and finish with an examination of cases where vectors don't work well.

Building Vectors

Vectors can be built using the literal square-bracket syntax [1 2 3]. This syntax, compared to lists' round parentheses, is one reason you might choose to use a vector over a list. For example, the let form would work perfectly well, and with a nearly identical implementation, if it took a literal list of bindings instead of a literal vector. But the square brackets are visually different from the round parenthesis surrounding the let form itself as well as the likely function calls in the body of the let form, and this is quite useful for us humans. Using vectors to indicate bindings for let, with-open, fn, and such is idiomatic in Clojure and is a pattern you're encouraged to follow in any similar macros you write.

The most common way to create a vector is with the literal syntax described previously. But in many cases you want to create a vector out of the contents of some other kind of collection. For this there's the function vec:

(range 10)
;=> (0 1 2 3 4 5 6 7 8 9)

(vec (range 10))
;=> [0 1 2 3 4 5 6 7 8 9]


Note the round parens on the result from range, compared to the square brackets on the result from vec. This is our hint that range returned a seq, whereas vec returned a vector filled with the values from the seq.

If you already have a vector but want to concatenate several values to it, into is your friend:

(let [my-vector [:a :b :c]]
  (into my-vector (range 10)))

;=> [:a :b :c 0 1 2 3 4 5 6 7 8 9]


If you want it to return a vector, the first argument to into must be a vector, as we have above. The second arg, however, can be any sequence, such as what range returns, or anything else that works with seq function. The values from the seq are "poured into" the supplied vector.

Because vectors themselves work with seq, you can use into to concatenate two vectors:

(into [:a :b :c] [:x :y :z])
;=> [:a :b :c :x :y :z]


But be aware that this is treating the second vector as a seq, so it's a linear-time operation based on the size of the second vector (Vectors can't be concatenated any more efficiently than O(n), but later we'll cover a data structure that can. Finger trees, a possible future addition to Clojure, can be looked up by numerical index a bit like vectors and yet also be concatenated in logarithmic time.).

That's all you need to know about building vectors, though for completeness we'll show you the rarely used vector function, which builds a vector from the arguments you give it instead of from a seq:

(vector 1 2 3)
;=> [1 2 3]


There's hardly ever a good reason to use vector instead of a literal vector. This is truer of vectors than it is of lists, but that's getting ahead of ourselves.

Using vec and into, it's easy to build vectors much larger than are usually built using vector literals. Once you have a large vector like that, what are you doing to do with it? Let's look at that next.

Large Vectors

When collections are small, the performance differences between vectors and lists hardly matters. But as they get larger, each becomes dramatically slower at operations the other can still perform efficiently. Vectors are particularly efficient at three things relative to lists: adding or removing items from the right-hand end of the collection, accessing or changing items in the interior of the collection by numeric index, and walking in reverse order. Adding and removing from the end is done by treating the vector as a stack -- we'll cover that a bit later.

Any item in a vector can be accessed by its index number from 0 up to but not including (count my-vector) in essentially constant time. (Several operations on Clojure's persistent data structures are described in our book as "essentially constant time." In all cases these are O(log32 n).) This can be done using the function nth; the function get, treating the vector like a map; or by invoking the vector itself as a function. Let's look at each of these as applied to this example vector:

(def some-chars (vec (map char (range 65 75))))

some-chars
;=> [A B C D E F G H I J]


All three of these do the same work, and each returns E:

(nth some-chars 4)
(get some-chars 4)
(some-chars 4) ; vector as a function


Which to use is a judgment call, so here are some things you might consider when choosing:

 

nth

get

vector as function

Vector is nil

Returns nil

Returns nil

Throws exception

Index out of range

Throws exception

Returns nil

Throws exception

Supports "not found"

Yes

Yes

No

Because vectors are indexed, they can be efficiently walked in either direction, left to right or right to left. The seq and rseq functions return sequences that do exactly that:

(seq some-chars)
;=> (A B C D E F G H I J)

(rseq some-chars)
;=> (J I H G F E D C B A)


Any item in a vector can be "changed" using the assoc function. Because vectors are immutable, the item isn't actually changed -- assoc returns a new vector that's exactly like the old one except for the updated value. Clojure does this in essentially constant time using structural sharing between the old and new vectors, as described at the beginning of this article.

(assoc some-chars 4 "no longer E")
;=> [A B C D "no longer E" F G H I J]


Note that the value of some-chars hasn't changed at all. The REPL printed a new vector that is almost but not quite entirely like some-chars.

The assoc function for vectors works only on indexes that already exist in the vector or, as a special case, exactly one step past the end. In this latter case, the returned vector will be one item larger than the input vector. More frequently, however, vectors are "grown" using the conj function, as you'll see in the next section.

A few higher-powered functions are provided that use assoc internally. For example, the replace function works on both seqs and vectors, but when given a vector it uses assoc to fix up and return a new vector:

(replace {2 :a, 4 :b} [1 2 3 2 3 4])
;=> [1 :a 3 :a 3 :b]


The functions assoc-in and update-in are for working with nested structures of vectors and/or maps, like this one (Nested vectors are far from the most efficient way to store or process vectors, but they're convenient to manipulate in Clojure and so make a good example here. More efficient options include a single vector, arrays, or a library for matrix processing like Colt or Incanter.):

(def matrix
     [[1 2 3]
      [3 4 5]
      [5 6 7]])


Both assoc-in and update-in take a series of indexes to pick items from each more deeply nested level. For a vector arranged like the matrix example, this amounts to row and column coordinates:

(assoc-in matrix [1 2] 'x)
;=> [[1 2 3] [3 4 x] [5 6 7]]


The update-in function works the same way, but instead of taking a value to overwrite an existing value, it takes a function to apply to an existing value. It will replace the value at the given coordinates with the return value of the function you give:

(update-in matrix [1 2] * 10)
;=> [[1 2 3] [3 4 50] [5 6 7]]


The coordinates refer to the value 5, and the function given above is *, so the value 5 will be replaced by the return value of (* 5 10). The trailing 10 comes from the extra argument given to update-in -- any number of arguments may be given to be appended to the function applied.

So far we've not changed the size of any of the vectors we've looked at; we just looked things up and changed items in place. Next we'll look at growing and shrinking vectors -- treating them like stacks.

Vectors as Stacks

Classic mutable stacks have at least two operations, push and pop. These also work on vectors, though they're called conj and pop respectively, by adding and removing things from the right-hand end. Because vectors are immutable, pop returns a new vector with the rightmost item dropped; this is a bit different from the many mutable stack APIs, which generally return the dropped item. Consequently, peek becomes more important as the primary way to get an item from the top of the stack.

Enough talking -- let's do some doing.

(def my-stack [1 2 3])

(peek [1 2 3])
;=> 3

(pop [1 2 3])
;=> [1 2]

(conj my-stack 4)
;=> [1 2 3 4]

(-> my-stack (conj 4) (conj 5) pop (conj 6))
;=> [1 2 3 4 6]


Each of these operations completes in essentially constant time. Most of the time, a vector that's used as a stack is used that way throughout its life. It's helpful to future readers of your code to keep this is mind and use the stack operations consistently, even when other functions might work. For example, last on a vector returns the same thing as peek, but besides being slower it leads to unnecessary confusion about how the collection is being used. If the algorithm involves calls for a stack, use conj not assoc for growing the vector, peek not last, and pop not dissoc for shrinking it.

If conj and pop aren't fast enough for you, faster equivalents are available for vectors that support transient, which we cover in our book.

The functions conj, pop, and peek work on any object that implements clojure.lang.IPersistentStack (The conj function also works for all of Clojure's other persistent collection types, even if they don't implement clojure.lang.IPersistentStack.). Besides vectors, Clojure lists also implement this interface, but the functions operate on the left-hand side of lists instead of the right-hand side as with vectors.

Using Vectors Instead of Reverse

The ability of vectors to grow efficiently on the right-hand end and then be walked left to right produces a noteworthy emergent behavior: Idiomatic Clojure code rarely uses the reverse function. This is quite different from most Lisps and Schemes, so if you're familiar with one of those, we can take a short look at how this works out. If you don't care about Lisps other than Clojure (and why would you?), feel free to skip down to "Vectors as stacks."

When processing a list, it's common to want to produce a new list in the same order. But if all you have are classic Lisp lists, often the most natural algorithm (the most natural tail-recursive algorithm anyway.) leaves you with a backwards list that needs to be reversed. Here's an example of a function a bit like Clojure's map, but strict instead of lazy:

(defn strict-map1 [f coll]
  (loop [coll coll, acc nil]
    (if (empty? coll)
      (reverse acc)
      (recur (next coll) (cons (f (first coll)) acc)))))

(strict-map1 odd? (range 5))
;=> (false true false true false)


This is perfectly good, idiomatic Clojure code, except for that glaring reverse of the final return value. After the entire list has been walked once to produce the desired values, reverse walks it again to get them in the right order. This is both inefficient and non-idiomatic. One way to get rid of the reverse is to use a vector instead of a list as the accumulator:

(defn strict-map2 [f coll]
  (loop [coll coll, acc []]
    (if (empty? coll)
      acc
      (recur (next coll) (conj acc (f (first coll)))))))

(strict-map2 odd? (range 5))
;=> [false true false true false]


A small change, but the code is now a touch cleaner and a bit faster. It does return a vector instead of a list, but this is rarely a problem because any client code that wants to treat this as a seq can usually do so automatically. On occasion, though, it may be appropriate to return (seq acc) instead of just acc.

By the way, another way to get rid of a reverse is to build a lazy sequence instead of a strict collection. This is how Clojure's own map function is implemented. We cover that and much more about lazy seqs in chapter 5 of our book.

The term vector in Clojure is a colloquial description of any object that implements clojure.lang.IPersistentVector. A few concrete classes act as vectors: PersistentVector, SubVector, and MapEntry (Clojure 1.0 also had a LazilyPersistentVector for performance reasons, but this is no longer necessary because PersistentVector now supports transient.). So far, the examples we've seen have all been plain PersistentVectors, though they would have worked for any of these types. But let's turn now to the special features of these other vector types, starting with SubVector.

SubVectors

Although items can't be removed efficiently from a vector (except the rightmost item), SubVectors provide a fast way to take a slice of an existing vector based on start and end indexes. They're created using the subvec function.

Let's use the some-chars vector we defined earlier:

some-chars
;=> [A B C D E F G H I J]


The subvec takes a start and end index and returns a new vector:

(subvec some-chars 3 6)
;=> [D E F]


The first index given to subvec is inclusive (starts at index 3), but the second is exclusive (ends before index 6).

But the interesting thing is that this new vector internally hangs onto the entire original some-chars vector. Each lookup we do on the new vector causes the SubVector to do a little offset math and then look up in the original vector. This makes creating a SubVector fast. If either the SubVector or its underlying vector was mutable, you could detect this fact by making a change to one and observing the effect in the other. But because both are immutable, SubVector can be treated exactly the same as any other vector.

You can use subvec on any kind of vector and it'll work fine. But it has special logic for taking a subvec of a subvec, in which case the newest SubVector keeps a reference to the original vector, not the intermediate SubVector. This prevents SubVectors of SubVectors from stacking up needlessly and keeps both the creation and use of the sub-subvecs fast.

Vectors as MapEntries

Clojure's hash map, like hash tables or dictionaries in many other languages, has a mechanism to iterate through the entire collection. Clojure's solution for this iterator is, unsurprisingly, a seq. Each item of this seq needs to include both the key and the value, so they're wrapped in a MapEntry. When printed, each entry looks like a vector:

(first {:width 10, :height 20, :depth 15})
;=> [:width 10]


But not only does a MapEntry look like a vector, it is one:

(vector? (first {:width 10, :height 20, :depth 15}))
;=> true


This means we can use all the regular vector functions on it: conj, get, and so on. It even supports destructuring, which can be handy. For example, the locals dimension and amount in the following example will take on the value of each key/value pair in turn:

(doseq [[dimension amount] {:width 10, :height 20, :depth 15}]
  (println (str (name dimension) ":") amount "inches"))
; width: 10 inches
; height: 20 inches
; depth: 15 inches
;=> nil


Two-element vectors, whether they're MapEntrys or not, can also be used to add values to a hash map or sorted map, if you prefer to use conj instead of assoc:

(conj {} [:a 1] [:b 2])
;=> {:b 2, :a 1}


But a MapEntry is its own type and supports a couple extra functions for retrieving its contents: key and val, which do exactly the same thing as (nth my-map 0) and (nth my-map 1), respectively. These are sometimes useful for the clarity they can bring to your code, but quite frequently destructuring is used instead because it's so darned handy.

Now you know what vectors are, what specific kinds of vectors are included in Clojure, and some of the things that they're good at doing. To round out our understanding of vectors, let's conclude with a brief look at things that vectors are bad at doing.

What Vectors Are Not

Vectors are quite versatile, but there are some commonly desired patterns where they might seem like a good solution but are not. Although we prefer to focus on the positive, hopefully a few negative examples will help prevent you from using the wrong tool for the job.

Vectors Are Not Sparse

If you have a vector of length n, the only position where you can insert a value is at index n, that is, appending to the far right-hand end. You can't skip some indexes and insert at a higher index number. If you want a collection indexed by nonsequential numbers, consider a hash map or sorted map.

Although you can replace values within a vector, you can't insert or delete items such that indexes for the subsequent items would have to be adjusted. Clojure doesn't currently have a native persistent collection that supports this kind of operation, but finger trees may help for some use cases.

If you'll always be wanting to prepend things at the left-hand end of a sequential collection, instead of the right-hand end the way vectors do, you might have found a good opportunity to use a PersistentList.

Vectors Are Not Queues

Some people have tried to use vectors as queues. One approach would be to push onto the right-hand end of the vector using conj and then to pop items off the left using rest or next. The problem with this is that rest and next return seqs, not vectors, so subsequent conj operations wouldn't behave as desired. Using into to convert the seq back into a vector is O(n) which is less than ideal for every pop.

Another approach is to use subvec as a pop, leaving off the leftmost item. Because subvec does return a vector, subsequent conj operations will push onto the right-hand end as desired. But as we described previously, subseq maintains a reference to the entire underlying vector, so none of the items being popped this way will ever be garbage collected -- also less than ideal.

So what would be the ideal way to do queue operations on a persistent collection? Use a PersistentQueue, of course, which Clojure provides.

Vectors Can't Contain Primitives

Java primitives like int and byte take less memory and are often much faster to process than their boxed counterparts Integer and Byte. Unfortunately, none of Clojure's persistent collections support primitives yet. Any attempt to put a primitive into a vector will result in autoboxing of the item, and any value retrieved from a vector will similarly be an object. Although there are plans for Clojure vectors to eventually support primitives, that's way off yet. If you need the memory or performance benefits of primitives in a collection, for now your only choice is to use a Java array.

Vectors Are Not Sets

If you want to find out whether a vector contains a particular value, you might be tempted to use the contains? function, but you'd be disappointed by the results. contains? is for asking whether a particular key, not value, is in a collection, which is rarely useful for a vector. Vector objects do have a contains method that you can call via Java interop that will give the results you want, but Clojure offers something much better -- a real set collection.

Well there, that was certainly enough detail about vectors. We saw how to create them using literal syntax or by building them up programmatically. We looked at how to push them, pop them, slice them, and stuff them into maps. We also looked at some of the things vectors can't do well. One of these was adding and removing items from the left-hand side; though vectors can't do this, lists can.





Page 3 of 5



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel