 September 24, 2020
Hot Topics:

# Clojure: Immutability at the Language Level

### Vectors: Creating and Using All Their Varieties

A vector stores zero or more values sequentially indexed by number, a bit like an array, but they're immutable and persistent. They're also versatile and make efficient use of memory and processor resources at both small and large sizes.

Vectors are probably the most frequently used collection type in Clojure code. They're used as literals for argument lists and `let` bindings, for holding large amounts of application data, as stacks, and as map entries. We'll examine all of these facets and finish with an examination of cases where vectors don't work well.

#### Building Vectors

Vectors can be built using the literal square-bracket syntax `[1 2 3]`. This syntax, compared to lists' round parentheses, is one reason you might choose to use a vector over a list. For example, the `let` form would work perfectly well, and with a nearly identical implementation, if it took a literal list of bindings instead of a literal vector. But the square brackets are visually different from the round parenthesis surrounding the `let` form itself as well as the likely function calls in the body of the `let` form, and this is quite useful for us humans. Using vectors to indicate bindings for `let`, `with-open`, `fn`, and such is idiomatic in Clojure and is a pattern you're encouraged to follow in any similar macros you write.

The most common way to create a vector is with the literal syntax described previously. But in many cases you want to create a vector out of the contents of some other kind of collection. For this there's the function `vec`:

``````(range 10)
;=> (0 1 2 3 4 5 6 7 8 9)

(vec (range 10))
;=> [0 1 2 3 4 5 6 7 8 9]``````

Note the round parens on the result from `range`, compared to the square brackets on the result from `vec`. This is our hint that `range` returned a `seq`, whereas `vec` returned a vector filled with the values from the `seq`.

If you already have a vector but want to concatenate several values to it, `into` is your friend:

``````(let [my-vector [:a :b :c]]
(into my-vector (range 10)))

;=> [:a :b :c 0 1 2 3 4 5 6 7 8 9]``````

If you want it to return a vector, the first argument to `into` must be a vector, as we have above. The second arg, however, can be any sequence, such as what `range` returns, or anything else that works with `seq` function. The values from the `seq` are "poured into" the supplied vector.

Because vectors themselves work with `seq`, you can use `into` to concatenate two vectors:

``````(into [:a :b :c] [:x :y :z])
;=> [:a :b :c :x :y :z]``````

But be aware that this is treating the second vector as a `seq`, so it's a linear-time operation based on the size of the second vector (Vectors can't be concatenated any more efficiently than O(n), but later we'll cover a data structure that can. Finger trees, a possible future addition to Clojure, can be looked up by numerical index a bit like vectors and yet also be concatenated in logarithmic time.).

That's all you need to know about building vectors, though for completeness we'll show you the rarely used `vector` function, which builds a vector from the arguments you give it instead of from a `seq`:

``````(vector 1 2 3)
;=> [1 2 3]``````

There's hardly ever a good reason to use `vector` instead of a literal vector. This is truer of vectors than it is of lists, but that's getting ahead of ourselves.

Using `vec` and `into`, it's easy to build vectors much larger than are usually built using vector literals. Once you have a large vector like that, what are you doing to do with it? Let's look at that next.

#### Large Vectors

When collections are small, the performance differences between vectors and lists hardly matters. But as they get larger, each becomes dramatically slower at operations the other can still perform efficiently. Vectors are particularly efficient at three things relative to lists: adding or removing items from the right-hand end of the collection, accessing or changing items in the interior of the collection by numeric index, and walking in reverse order. Adding and removing from the end is done by treating the vector as a stack -- we'll cover that a bit later.

Any item in a vector can be accessed by its index number from 0 up to but not including `(count my-vector)` in essentially constant time. (Several operations on Clojure's persistent data structures are described in our book as "essentially constant time." In all cases these are O(log32 n).) This can be done using the function `nth`; the function `get`, treating the vector like a map; or by invoking the vector itself as a function. Let's look at each of these as applied to this example vector:

``````(def some-chars (vec (map char (range 65 75))))

some-chars
;=> [A B C D E F G H I J]``````

All three of these do the same work, and each returns `E`:

``````(nth some-chars 4)
(get some-chars 4)
(some-chars 4) ; vector as a function``````

Which to use is a judgment call, so here are some things you might consider when choosing:

 nth get vector as function Vector is nil Returns nil Returns nil Throws exception Index out of range Throws exception Returns nil Throws exception Supports "not found" Yes Yes No

Because vectors are indexed, they can be efficiently walked in either direction, left to right or right to left. The `seq` and `rseq` functions return sequences that do exactly that:

``````(seq some-chars)
;=> (A B C D E F G H I J)

(rseq some-chars)
;=> (J I H G F E D C B A)``````

Any item in a vector can be "changed" using the `assoc` function. Because vectors are immutable, the item isn't actually changed -- `assoc` returns a new vector that's exactly like the old one except for the updated value. Clojure does this in essentially constant time using structural sharing between the old and new vectors, as described at the beginning of this article.

``````(assoc some-chars 4 "no longer E")
;=> [A B C D "no longer E" F G H I J]``````

Note that the value of `some-chars` hasn't changed at all. The REPL printed a new vector that is almost but not quite entirely like `some-chars`.

The `assoc` function for vectors works only on indexes that already exist in the vector or, as a special case, exactly one step past the end. In this latter case, the returned vector will be one item larger than the input vector. More frequently, however, vectors are "grown" using the `conj` function, as you'll see in the next section.

A few higher-powered functions are provided that use `assoc` internally. For example, the `replace` function works on both `seq`s and vectors, but when given a vector it uses `assoc` to fix up and return a new vector:

``````(replace {2 :a, 4 :b} [1 2 3 2 3 4])
;=> [1 :a 3 :a 3 :b]``````

The functions `assoc-in` and `update-in` are for working with nested structures of vectors and/or maps, like this one (Nested vectors are far from the most efficient way to store or process vectors, but they're convenient to manipulate in Clojure and so make a good example here. More efficient options include a single vector, arrays, or a library for matrix processing like Colt or Incanter.):

``````(def matrix
[[1 2 3]
[3 4 5]
[5 6 7]])``````

Both `assoc-in` and `update-in` take a series of indexes to pick items from each more deeply nested level. For a vector arranged like the `matrix` example, this amounts to row and column coordinates:

``````(assoc-in matrix [1 2] 'x)
;=> [[1 2 3] [3 4 x] [5 6 7]]``````

The `update-in` function works the same way, but instead of taking a value to overwrite an existing value, it takes a function to apply to an existing value. It will replace the value at the given coordinates with the return value of the function you give:

``````(update-in matrix [1 2] * 10)
;=> [[1 2 3] [3 4 50] [5 6 7]]``````

The coordinates refer to the value `5`, and the function given above is `*`, so the value `5` will be replaced by the return value of `(* 5 10)`. The trailing `10` comes from the extra argument given to `update-in` -- any number of arguments may be given to be appended to the function applied.

So far we've not changed the size of any of the vectors we've looked at; we just looked things up and changed items in place. Next we'll look at growing and shrinking vectors -- treating them like stacks.

#### Vectors as Stacks

Classic mutable stacks have at least two operations, push and pop. These also work on vectors, though they're called `conj` and `pop` respectively, by adding and removing things from the right-hand end. Because vectors are immutable, `pop` returns a new vector with the rightmost item dropped; this is a bit different from the many mutable stack APIs, which generally return the dropped item. Consequently, `peek` becomes more important as the primary way to get an item from the top of the stack.

Enough talking -- let's do some doing.

``````(def my-stack [1 2 3])

(peek [1 2 3])
;=> 3

(pop [1 2 3])
;=> [1 2]

(conj my-stack 4)
;=> [1 2 3 4]

(-> my-stack (conj 4) (conj 5) pop (conj 6))
;=> [1 2 3 4 6]``````

Each of these operations completes in essentially constant time. Most of the time, a vector that's used as a stack is used that way throughout its life. It's helpful to future readers of your code to keep this is mind and use the stack operations consistently, even when other functions might work. For example, `last` on a vector returns the same thing as `peek`, but besides being slower it leads to unnecessary confusion about how the collection is being used. If the algorithm involves calls for a stack, use `conj` not `assoc` for growing the vector, `peek` not `last`, and `pop` not `dissoc` for shrinking it.

If `conj` and `pop` aren't fast enough for you, faster equivalents are available for vectors that support `transient`, which we cover in our book.

The functions `conj`, `pop`, and `peek` work on any object that implements `clojure.lang.IPersistentStack` (The `conj` function also works for all of Clojure's other persistent collection types, even if they don't implement `clojure.lang.IPersistentStack`.). Besides vectors, Clojure lists also implement this interface, but the functions operate on the left-hand side of lists instead of the right-hand side as with vectors.

#### Using Vectors Instead of Reverse

The ability of vectors to grow efficiently on the right-hand end and then be walked left to right produces a noteworthy emergent behavior: Idiomatic Clojure code rarely uses the `reverse` function. This is quite different from most Lisps and Schemes, so if you're familiar with one of those, we can take a short look at how this works out. If you don't care about Lisps other than Clojure (and why would you?), feel free to skip down to "Vectors as stacks."

When processing a list, it's common to want to produce a new list in the same order. But if all you have are classic Lisp lists, often the most natural algorithm (the most natural tail-recursive algorithm anyway.) leaves you with a backwards list that needs to be reversed. Here's an example of a function a bit like Clojure's `map`, but strict instead of lazy:

``````(defn strict-map1 [f coll]
(loop [coll coll, acc nil]
(if (empty? coll)
(reverse acc)
(recur (next coll) (cons (f (first coll)) acc)))))

(strict-map1 odd? (range 5))
;=> (false true false true false)``````

This is perfectly good, idiomatic Clojure code, except for that glaring `reverse` of the final return value. After the entire list has been walked once to produce the desired values, `reverse` walks it again to get them in the right order. This is both inefficient and non-idiomatic. One way to get rid of the `reverse` is to use a vector instead of a list as the accumulator:

``````(defn strict-map2 [f coll]
(loop [coll coll, acc []]
(if (empty? coll)
acc
(recur (next coll) (conj acc (f (first coll)))))))

(strict-map2 odd? (range 5))
;=> [false true false true false]``````

A small change, but the code is now a touch cleaner and a bit faster. It does return a vector instead of a list, but this is rarely a problem because any client code that wants to treat this as a `seq` can usually do so automatically. On occasion, though, it may be appropriate to return `(seq acc)` instead of just `acc`.

By the way, another way to get rid of a `reverse` is to build a lazy sequence instead of a strict collection. This is how Clojure's own `map` function is implemented. We cover that and much more about lazy `seq`s in chapter 5 of our book.

The term vector in Clojure is a colloquial description of any object that implements `clojure.lang.IPersistentVector`. A few concrete classes act as vectors: `PersistentVector`, `SubVector`, and `MapEntry` (Clojure 1.0 also had a `LazilyPersistentVector` for performance reasons, but this is no longer necessary because PersistentVector now supports transient.). So far, the examples we've seen have all been plain `PersistentVector`s, though they would have worked for any of these types. But let's turn now to the special features of these other vector types, starting with `SubVector`.

#### SubVectors

Although items can't be removed efficiently from a vector (except the rightmost item), `SubVector`s provide a fast way to take a slice of an existing vector based on start and end indexes. They're created using the `subvec` function.

Let's use the `some-chars` vector we defined earlier:

``````some-chars
;=> [A B C D E F G H I J]``````

The `subvec` takes a start and end index and returns a new vector:

``````(subvec some-chars 3 6)
;=> [D E F]``````

The first index given to `subvec` is inclusive (starts at index 3), but the second is exclusive (ends before index 6).

But the interesting thing is that this new vector internally hangs onto the entire original `some-chars` vector. Each lookup we do on the new vector causes the `SubVector` to do a little offset math and then look up in the original vector. This makes creating a `SubVector` fast. If either the `SubVector` or its underlying vector was mutable, you could detect this fact by making a change to one and observing the effect in the other. But because both are immutable, `SubVector` can be treated exactly the same as any other vector.

You can use `subvec` on any kind of vector and it'll work fine. But it has special logic for taking a `subvec` of a `subvec`, in which case the newest `SubVector` keeps a reference to the original vector, not the intermediate `SubVector`. This prevents `SubVector`s of `SubVector`s from stacking up needlessly and keeps both the creation and use of the sub-`subvec`s fast.

#### Vectors as MapEntries

Clojure's hash map, like hash tables or dictionaries in many other languages, has a mechanism to iterate through the entire collection. Clojure's solution for this iterator is, unsurprisingly, a `seq`. Each item of this `seq` needs to include both the key and the value, so they're wrapped in a `MapEntry`. When printed, each entry looks like a vector:

``````(first {:width 10, :height 20, :depth 15})
;=> [:width 10]``````

But not only does a `MapEntry` look like a vector, it is one:

``````(vector? (first {:width 10, :height 20, :depth 15}))
;=> true``````

This means we can use all the regular vector functions on it: `conj`, `get`, and so on. It even supports destructuring, which can be handy. For example, the locals `dimension` and `amount` in the following example will take on the value of each key/value pair in turn:

``````(doseq [[dimension amount] {:width 10, :height 20, :depth 15}]
(println (str (name dimension) ":") amount "inches"))
; width: 10 inches
; height: 20 inches
; depth: 15 inches
;=> nil``````

Two-element vectors, whether they're `MapEntry`s or not, can also be used to add values to a hash map or sorted map, if you prefer to use `conj` instead of `assoc`:

``````(conj {} [:a 1] [:b 2])
;=> {:b 2, :a 1}``````

But a `MapEntry` is its own type and supports a couple extra functions for retrieving its contents: `key` and `val`, which do exactly the same thing as `(nth my-map 0)` and `(nth my-map 1)`, respectively. These are sometimes useful for the clarity they can bring to your code, but quite frequently destructuring is used instead because it's so darned handy.

Now you know what vectors are, what specific kinds of vectors are included in Clojure, and some of the things that they're good at doing. To round out our understanding of vectors, let's conclude with a brief look at things that vectors are bad at doing.

### What Vectors Are Not

Vectors are quite versatile, but there are some commonly desired patterns where they might seem like a good solution but are not. Although we prefer to focus on the positive, hopefully a few negative examples will help prevent you from using the wrong tool for the job.

#### Vectors Are Not Sparse

If you have a vector of length n, the only position where you can insert a value is at index n, that is, appending to the far right-hand end. You can't skip some indexes and insert at a higher index number. If you want a collection indexed by nonsequential numbers, consider a hash map or sorted map.

Although you can replace values within a vector, you can't insert or delete items such that indexes for the subsequent items would have to be adjusted. Clojure doesn't currently have a native persistent collection that supports this kind of operation, but finger trees may help for some use cases.

If you'll always be wanting to prepend things at the left-hand end of a sequential collection, instead of the right-hand end the way vectors do, you might have found a good opportunity to use a `PersistentList`.

#### Vectors Are Not Queues

Some people have tried to use vectors as queues. One approach would be to push onto the right-hand end of the vector using `conj` and then to pop items off the left using `rest` or `next`. The problem with this is that `rest` and `next` return `seq`s, not vectors, so subsequent `conj` operations wouldn't behave as desired. Using `into` to convert the `seq` back into a vector is O(n) which is less than ideal for every pop.

Another approach is to use `subvec` as a pop, leaving off the leftmost item. Because `subvec` does return a vector, subsequent `conj` operations will push onto the right-hand end as desired. But as we described previously, `subseq` maintains a reference to the entire underlying vector, so none of the items being popped this way will ever be garbage collected -- also less than ideal.

So what would be the ideal way to do queue operations on a persistent collection? Use a `PersistentQueue`, of course, which Clojure provides.

#### Vectors Can't Contain Primitives

Java primitives like `int` and `byte` take less memory and are often much faster to process than their boxed counterparts `Integer` and `Byte`. Unfortunately, none of Clojure's persistent collections support primitives yet. Any attempt to put a primitive into a vector will result in autoboxing of the item, and any value retrieved from a vector will similarly be an object. Although there are plans for Clojure vectors to eventually support primitives, that's way off yet. If you need the memory or performance benefits of primitives in a collection, for now your only choice is to use a Java array.

#### Vectors Are Not Sets

If you want to find out whether a vector contains a particular value, you might be tempted to use the `contains?` function, but you'd be disappointed by the results. `contains?` is for asking whether a particular key, not value, is in a collection, which is rarely useful for a vector. Vector objects do have a `contains` method that you can call via Java interop that will give the results you want, but Clojure offers something much better -- a real set collection.

Well there, that was certainly enough detail about vectors. We saw how to create them using literal syntax or by building them up programmatically. We looked at how to push them, pop them, slice them, and stuff them into maps. We also looked at some of the things vectors can't do well. One of these was adding and removing items from the left-hand side; though vectors can't do this, lists can.

Page 3 of 5