Clojure: Immutability at the Language Level
Vectors: Creating and Using All Their Varieties
A vector stores zero or more values sequentially indexed by number, a bit like an array, but they're immutable and persistent. They're also versatile and make efficient use of memory and processor resources at both small and large sizes.
Vectors are probably the most frequently used collection type in Clojure code. They're used as literals for argument lists and let
bindings, for holding large amounts of application data, as stacks, and as map entries. We'll examine all of these facets and finish with an examination of cases where vectors don't work well.
Building Vectors
Vectors can be built using the literal square-bracket syntax [1 2 3]
. This syntax, compared to lists' round parentheses, is one reason you might choose to use a vector over a list. For example, the let
form would work perfectly well, and with a nearly identical implementation, if it took a literal list of bindings instead of a literal vector. But the square brackets are visually different from the round parenthesis surrounding the let
form itself as well as the likely function calls in the body of the let
form, and this is quite useful for us humans. Using vectors to indicate bindings for let
, with-open
, fn
, and such is idiomatic in Clojure and is a pattern you're encouraged to follow in any similar macros you write.
The most common way to create a vector is with the literal syntax described previously. But in many cases you want to create a vector out of the contents of some other kind of collection. For this there's the function vec
:
(range 10)
;=> (0 1 2 3 4 5 6 7 8 9)
(vec (range 10))
;=> [0 1 2 3 4 5 6 7 8 9]
Note the round parens on the result from range
, compared to the square brackets on the result from vec
. This is our hint that range
returned a seq
, whereas vec
returned a vector filled with the values from the seq
.
If you already have a vector but want to concatenate several values to it, into
is your friend:
(let [my-vector [:a :b :c]]
(into my-vector (range 10)))
;=> [:a :b :c 0 1 2 3 4 5 6 7 8 9]
If you want it to return a vector, the first argument to into
must be a vector, as we have above. The second arg, however, can be any sequence, such as what range
returns, or anything else that works with seq
function. The values from the seq
are "poured into" the supplied vector.
Because vectors themselves work with seq
, you can use into
to concatenate two vectors:
(into [:a :b :c] [:x :y :z])
;=> [:a :b :c :x :y :z]
But be aware that this is treating the second vector as a seq
, so it's a linear-time operation based on the size of the second vector (Vectors can't be concatenated any more efficiently than O(n), but later we'll cover a data structure that can. Finger trees, a possible future addition to Clojure, can be looked up by numerical index a bit like vectors and yet also be concatenated in logarithmic time.).
That's all you need to know about building vectors, though for completeness we'll show you the rarely used vector
function, which builds a vector from the arguments you give it instead of from a seq
:
(vector 1 2 3)
;=> [1 2 3]
There's hardly ever a good reason to use vector
instead of a literal vector. This is truer of vectors than it is of lists, but that's getting ahead of ourselves.
Using vec
and into
, it's easy to build vectors much larger than are usually built using vector literals. Once you have a large vector like that, what are you doing to do with it? Let's look at that next.
Large Vectors
When collections are small, the performance differences between vectors and lists hardly matters. But as they get larger, each becomes dramatically slower at operations the other can still perform efficiently. Vectors are particularly efficient at three things relative to lists: adding or removing items from the right-hand end of the collection, accessing or changing items in the interior of the collection by numeric index, and walking in reverse order. Adding and removing from the end is done by treating the vector as a stack -- we'll cover that a bit later.Any item in a vector can be accessed by its index number from 0 up to but not including (count my-vector)
in essentially constant time. (Several operations on Clojure's persistent data structures are described in our book as "essentially constant time." In all cases these are O(log32 n).) This can be done using the function nth
; the function get
, treating the vector like a map; or by invoking the vector itself as a function. Let's look at each of these as applied to this example vector:
(def some-chars (vec (map char (range 65 75))))
some-chars
;=> [A B C D E F G H I J]
All three of these do the same work, and each returns E
:
(nth some-chars 4)
(get some-chars 4)
(some-chars 4) ; vector as a function
Which to use is a judgment call, so here are some things you might consider when choosing:
|
nth |
get |
vector as function |
Vector is nil |
Returns nil |
Returns nil |
Throws exception |
Index out of range |
Throws exception |
Returns nil |
Throws exception |
Supports "not found" |
Yes |
Yes |
No |
Because vectors are indexed, they can be efficiently walked in either direction, left to right or right to left. The seq
and rseq
functions return sequences that do exactly that:
(seq some-chars)
;=> (A B C D E F G H I J)
(rseq some-chars)
;=> (J I H G F E D C B A)
Any item in a vector can be "changed" using the assoc
function. Because vectors are immutable, the item isn't actually changed -- assoc
returns a new vector that's exactly like the old one except for the updated value. Clojure does this in essentially constant time using structural sharing between the old and new vectors, as described at the beginning of this article.
(assoc some-chars 4 "no longer E")
;=> [A B C D "no longer E" F G H I J]
Note that the value of some-chars
hasn't changed at all. The REPL printed a new vector that is almost but not quite entirely like some-chars
.
The assoc
function for vectors works only on indexes that already exist in the vector or, as a special case, exactly one step past the end. In this latter case, the returned vector will be one item larger than the input vector. More frequently, however, vectors are "grown" using the conj
function, as you'll see in the next section.
A few higher-powered functions are provided that use assoc
internally. For example, the replace
function works on both seq
s and vectors, but when given a vector it uses assoc
to fix up and return a new vector:
(replace {2 :a, 4 :b} [1 2 3 2 3 4])
;=> [1 :a 3 :a 3 :b]
The functions assoc-in
and update-in
are for working with nested structures of vectors and/or maps, like this one (Nested vectors are far from the most efficient way to store or process vectors, but they're convenient to manipulate in Clojure and so make a good example here. More efficient options include a single vector, arrays, or a library for matrix processing like Colt or Incanter.):
(def matrix
[[1 2 3]
[3 4 5]
[5 6 7]])
Both assoc-in
and update-in
take a series of indexes to pick items from each more deeply nested level. For a vector arranged like the matrix
example, this amounts to row and column coordinates:
(assoc-in matrix [1 2] 'x)
;=> [[1 2 3] [3 4 x] [5 6 7]]
The update-in
function works the same way, but instead of taking a value to overwrite an existing value, it takes a function to apply to an existing value. It will replace the value at the given coordinates with the return value of the function you give:
(update-in matrix [1 2] * 10)
;=> [[1 2 3] [3 4 50] [5 6 7]]
The coordinates refer to the value 5
, and the function given above is *
, so the value 5
will be replaced by the return value of (* 5 10)
. The trailing 10
comes from the extra argument given to update-in
-- any number of arguments may be given to be appended to the function applied.
So far we've not changed the size of any of the vectors we've looked at; we just looked things up and changed items in place. Next we'll look at growing and shrinking vectors -- treating them like stacks.
Vectors as Stacks
Classic mutable stacks have at least two operations, push and pop. These also work on vectors, though they're called conj
and pop
respectively, by adding and removing things from the right-hand end. Because vectors are immutable, pop
returns a new vector with the rightmost item dropped; this is a bit different from the many mutable stack APIs, which generally return the dropped item. Consequently, peek
becomes more important as the primary way to get an item from the top of the stack.
Enough talking -- let's do some doing.
(def my-stack [1 2 3])
(peek [1 2 3])
;=> 3
(pop [1 2 3])
;=> [1 2]
(conj my-stack 4)
;=> [1 2 3 4]
(-> my-stack (conj 4) (conj 5) pop (conj 6))
;=> [1 2 3 4 6]
Each of these operations completes in essentially constant time. Most of the time, a vector that's used as a stack is used that way throughout its life. It's helpful to future readers of your code to keep this is mind and use the stack operations consistently, even when other functions might work. For example, last
on a vector returns the same thing as peek
, but besides being slower it leads to unnecessary confusion about how the collection is being used. If the algorithm involves calls for a stack, use conj
not assoc
for growing the vector, peek
not last
, and pop
not dissoc
for shrinking it.
If conj
and pop
aren't fast enough for you, faster equivalents are available for vectors that support transient
, which we cover in our book.
The functions conj
, pop
, and peek
work on any object that implements clojure.lang.IPersistentStack
(The conj
function also works for all of Clojure's other persistent collection types, even if they don't implement clojure.lang.IPersistentStack
.). Besides vectors, Clojure lists also implement this interface, but the functions operate on the left-hand side of lists instead of the right-hand side as with vectors.
Using Vectors Instead of Reverse
The ability of vectors to grow efficiently on the right-hand end and then be walked left to right produces a noteworthy emergent behavior: Idiomatic Clojure code rarely uses the reverse
function. This is quite different from most Lisps and Schemes, so if you're familiar with one of those, we can take a short look at how this works out. If you don't care about Lisps other than Clojure (and why would you?), feel free to skip down to "Vectors as stacks."
When processing a list, it's common to want to produce a new list in the same order. But if all you have are classic Lisp lists, often the most natural algorithm (the most natural tail-recursive algorithm anyway.) leaves you with a backwards list that needs to be reversed. Here's an example of a function a bit like Clojure's map
, but strict instead of lazy:
(defn strict-map1 [f coll]
(loop [coll coll, acc nil]
(if (empty? coll)
(reverse acc)
(recur (next coll) (cons (f (first coll)) acc)))))
(strict-map1 odd? (range 5))
;=> (false true false true false)
This is perfectly good, idiomatic Clojure code, except for that glaring reverse
of the final return value. After the entire list has been walked once to produce the desired values, reverse
walks it again to get them in the right order. This is both inefficient and non-idiomatic. One way to get rid of the reverse
is to use a vector instead of a list as the accumulator:
(defn strict-map2 [f coll]
(loop [coll coll, acc []]
(if (empty? coll)
acc
(recur (next coll) (conj acc (f (first coll)))))))
(strict-map2 odd? (range 5))
;=> [false true false true false]
A small change, but the code is now a touch cleaner and a bit faster. It does return a vector instead of a list, but this is rarely a problem because any client code that wants to treat this as a seq
can usually do so automatically. On occasion, though, it may be appropriate to return (seq acc)
instead of just acc
.
By the way, another way to get rid of a reverse
is to build a lazy sequence instead of a strict collection. This is how Clojure's own map
function is implemented. We cover that and much more about lazy seq
s in chapter 5 of our book.
The term vector in Clojure is a colloquial description of any object that implements clojure.lang.IPersistentVector
. A few concrete classes act as vectors: PersistentVector
, SubVector
, and MapEntry
(Clojure 1.0 also had a LazilyPersistentVector
for performance reasons, but this is no longer necessary because PersistentVector now supports transient.). So far, the examples we've seen have all been plain PersistentVector
s, though they would have worked for any of these types. But let's turn now to the special features of these other vector types, starting with SubVector
.
SubVectors
Although items can't be removed efficiently from a vector (except the rightmost item), SubVector
s provide a fast way to take a slice of an existing vector based on start and end indexes. They're created using the subvec
function.
Let's use the some-chars
vector we defined earlier:
some-chars
;=> [A B C D E F G H I J]
The subvec
takes a start and end index and returns a new vector:
(subvec some-chars 3 6)
;=> [D E F]
The first index given to subvec
is inclusive (starts at index 3), but the second is exclusive (ends before index 6).
But the interesting thing is that this new vector internally hangs onto the entire original some-chars
vector. Each lookup we do on the new vector causes the SubVector
to do a little offset math and then look up in the original vector. This makes creating a SubVector
fast. If either the SubVector
or its underlying vector was mutable, you could detect this fact by making a change to one and observing the effect in the other. But because both are immutable, SubVector
can be treated exactly the same as any other vector.
You can use subvec
on any kind of vector and it'll work fine. But it has special logic for taking a subvec
of a subvec
, in which case the newest SubVector
keeps a reference to the original vector, not the intermediate SubVector
. This prevents SubVector
s of SubVector
s from stacking up needlessly and keeps both the creation and use of the sub-subvec
s fast.
Vectors as MapEntries
Clojure's hash map, like hash tables or dictionaries in many other languages, has a mechanism to iterate through the entire collection. Clojure's solution for this iterator is, unsurprisingly, a seq
. Each item of this seq
needs to include both the key and the value, so they're wrapped in a MapEntry
. When printed, each entry looks like a vector:
(first {:width 10, :height 20, :depth 15})
;=> [:width 10]
But not only does a MapEntry
look like a vector, it is one:
(vector? (first {:width 10, :height 20, :depth 15}))
;=> true
This means we can use all the regular vector functions on it: conj
, get
, and so on. It even supports destructuring, which can be handy. For example, the locals dimension
and amount
in the following example will take on the value of each key/value pair in turn:
(doseq [[dimension amount] {:width 10, :height 20, :depth 15}]
(println (str (name dimension) ":") amount "inches"))
; width: 10 inches
; height: 20 inches
; depth: 15 inches
;=> nil
Two-element vectors, whether they're MapEntry
s or not, can also be used to add values to a hash map or sorted map, if you prefer to use conj
instead of assoc
:
(conj {} [:a 1] [:b 2])
;=> {:b 2, :a 1}
But a MapEntry
is its own type and supports a couple extra functions for retrieving its contents: key
and val
, which do exactly the same thing as (nth my-map 0)
and (nth my-map 1)
, respectively. These are sometimes useful for the clarity they can bring to your code, but quite frequently destructuring is used instead because it's so darned handy.
Now you know what vectors are, what specific kinds of vectors are included in Clojure, and some of the things that they're good at doing. To round out our understanding of vectors, let's conclude with a brief look at things that vectors are bad at doing.
What Vectors Are Not
Vectors are quite versatile, but there are some commonly desired patterns where they might seem like a good solution but are not. Although we prefer to focus on the positive, hopefully a few negative examples will help prevent you from using the wrong tool for the job.
Vectors Are Not Sparse
If you have a vector of length n, the only position where you can insert a value is at index n, that is, appending to the far right-hand end. You can't skip some indexes and insert at a higher index number. If you want a collection indexed by nonsequential numbers, consider a hash map or sorted map.
Although you can replace values within a vector, you can't insert or delete items such that indexes for the subsequent items would have to be adjusted. Clojure doesn't currently have a native persistent collection that supports this kind of operation, but finger trees may help for some use cases.
If you'll always be wanting to prepend things at the left-hand end of a sequential collection, instead of the right-hand end the way vectors do, you might have found a good opportunity to use a PersistentList
.
Vectors Are Not Queues
Some people have tried to use vectors as queues. One approach would be to push onto the right-hand end of the vector using conj
and then to pop items off the left using rest
or next
. The problem with this is that rest
and next
return seq
s, not vectors, so subsequent conj
operations wouldn't behave as desired. Using into
to convert the seq
back into a vector is O(n) which is less than ideal for every pop.
Another approach is to use subvec
as a pop, leaving off the leftmost item. Because subvec
does return a vector, subsequent conj
operations will push onto the right-hand end as desired. But as we described previously, subseq
maintains a reference to the entire underlying vector, so none of the items being popped this way will ever be garbage collected -- also less than ideal.
So what would be the ideal way to do queue operations on a persistent collection? Use a PersistentQueue
, of course, which Clojure provides.
Vectors Can't Contain Primitives
Java primitives like int
and byte
take less memory and are often much faster to process than their boxed counterparts Integer
and Byte
. Unfortunately, none of Clojure's persistent collections support primitives yet. Any attempt to put a primitive into a vector will result in autoboxing of the item, and any value retrieved from a vector will similarly be an object. Although there are plans for Clojure vectors to eventually support primitives, that's way off yet. If you need the memory or performance benefits of primitives in a collection, for now your only choice is to use a Java array.
Vectors Are Not Sets
If you want to find out whether a vector contains a particular value, you might be tempted to use the contains?
function, but you'd be disappointed by the results. contains?
is for asking whether a particular key, not value, is in a collection, which is rarely useful for a vector. Vector objects do have a contains
method that you can call via Java interop that will give the results you want, but Clojure offers something much better -- a real set collection.
Well there, that was certainly enough detail about vectors. We saw how to create them using literal syntax or by building them up programmatically. We looked at how to push them, pop them, slice them, and stuff them into maps. We also looked at some of the things vectors can't do well. One of these was adding and removing items from the left-hand side; though vectors can't do this, lists can.
Page 3 of 5
This article was originally published on April 3, 2010