A Heap for Proletarians Home Twitter schemer.in

2017-June-22

Hash tables and arrays typically support efficient access to arbitrary elements.
What if we need efficient access to elements based on some priority? Say for example, we need to fetch
the *minimum* element in O(1) time. The data type that provide this kind of access is known as a priority queue.
Priority queues are typically implemented as
*heap-ordered trees, *
in which the element at each node has higher "priority" than the elements at its children. Under this invariant,
the element with the top priority will always be at the root of the tree. The priority of each node will be determined by a predicate,
like one of the comparison operators.

In this post we take a detailed look at the implementation of a
purely functional
priority queue using a variant of the heap data structure known as the
*leftist* heap.
We will do the implementation in two steps. First, a core set of Clojure functions that manipulate a leftist heap is implemented.
Then we will wrap these functions into an object that can be recognized by the built-in sequence functions –
*first, rest* and *conj*.

In a leftist heap, each node has an associated *s-value* or *rank *which is the node's distance to the nearest* leaf*
(an empty-node)*. *The leftist heap also maintains the invariant that the right subtree of each node has the lowest rank.
In other words, the left subtree of a node will be at least as large as the right subtree. It is usually taller than the right subtree,
giving the heap its name. As a consequence, the right subtree will always have the shortest path to an empty node.

The right subtree of a leftist heap is stored in sorted order. This means two leftist heaps can be merged by simply merging their right subtrees and then swapping the nodes along this path as necessary to restore the leftist property.

Merging leftist heaps has worst-case O(log n) complexity and this is what makes this data structure interesting. The merge operation we just described can be expressed in Clojure as,

```
(defn heap-merge
[h1 h2]
(cond
(empty-heap? h1) h2
(empty-heap? h2) h1
:else (let [[_ x a1 b1] h1
[_ y a2 b2] h2]
(if (<= x y)
(make-node x a1 (heap-merge b1 h2))
(make-node y a2 (heap-merge h1 b2))))))
```

`Heap-merge`

makes use of a few helper functions. The `make-node`

function encodes a
single node as the list `[rank, node-value, left-subtree, right-subtree]`

. It also calculates the rank of
the node and swaps the left and right subtrees as needed.

```
(defn make-node
[obj subtree-a subtree-b]
(let [ra (node-rank subtree-a)
rb (node-rank subtree-b)]
(if (>= ra rb)
[(inc rb) obj subtree-a subtree-b]
[(inc ra) obj subtree-b subtree-a])))
```

The `node-rank`

function returns the rank associated with a node:

```
(defn node-rank
[h]
(if (empty-heap? h)
0
(first h)))
```

An empty heap is represented by the keyword `:E`

.

```
(def empty-heap :E)
(defn empty-heap?
[h]
(= h empty-heap))
```

With the efficient merge function in place, we may proceed to implement the interface of the priority queue.
This consists of three functions – `pq-insert`

, `pq-find-min`

and `pq-delete-min`

.

`Pq-insert`

creates a single-node tree with the new element and merges it with an existing heap:

```
(defn pq-insert
[obj h]
(heap-merge [1 obj empty-heap empty-heap] h))
```

`Pq-find-min`

return the root of the heap:

```
(defn pq-find-min
[h]
(when-not (empty-heap? h)
(let [[_ obj _ _] h] obj)))
```

`Pq-delete-min`

will discard the root and return a new heap constructed by merging the two subtrees:

```
(defn pq-delete-min
[h]
(if (empty-heap? h)
h
(let [[_ obj left right] h]
(heap-merge left right))))
```

As `pq-insert`

and `pq-delete-min`

basically does a merge, both of them run in O(log n) time.
`Pq-find-min`

runs in O(1) time.

With these core functions in place, we are ready to use the leftist-heap priority queue!

```
clojure> (def pq (pq-insert 231
(pq-insert 154
(pq-insert 900 empty-heap))))
clojure> (pq-find-min pq)
;; 154
clojure> (def pq (pq-delete-min pq))
clojure> (pq-find-min pq)
;; 231
clojure> (def pq (pq-delete-min pq))
clojure> (pq-find-min pq)
;; 900
clojure> (def pq (pq-delete-min pq))
clojure> (pq-find-min pq)
;; nil
```

**Exercise 1**. `Heap-merge`

uses the `<=`

numeric comparison function for prioritizing the nodes.
Make this function more generic by making the priority-operation configurable by the user.

Clojure defines many useful operations in terms of sequences. A sequence is immutable and persistent. As a result, they are thread-safe and can share their state freely.

The priority queue we implemented in the previous section is also immutable and persistent. So it would be a good candidate for being a
sequence. Let us make the priority queue respond to the *first*, *rest* and *conj* functions. These functions
in turn can be used to describe more sophisticated sequence processing operations. This way our custom data structure is given
a chance to blend nicely with the rest of Clojure.

To be treated as a sequence, a type has to implement the
clojure.lang.ISeq
interface. We will use the deftype macro to
define a *LeftistHeapPriorityQueue* that implements the ISeq interface.

While iterating over a sequence, it is idiomatic in Clojure to call the *seq* function to check for termination.
A priority queue may sometimes be involved in an iteration, as part of a debugging session or for fetching all values sorted by priority.
So let us also make our new type respond properly to the *seq* function call. This function is declared in the
clojure.lang.Seqable
interface.

All these requirements leads to the following final definition of *LeftistHeapPriorityQueue*:

```
(deftype LeftistHeapPriorityQueue [pq]
clojure.lang.ISeq
;; return a seq of the rest of the elements,
;; return nil if no elements remain.
(next [_] (let [pq-rest (pq-delete-min pq)]
(when-not (empty-heap? pq-rest)
(LeftistHeapPriorityQueue. pq-rest))))
;; return the first element.
(first [_] (pq-find-min pq))
;; return the sequence with the first item removed,
;; or an empty sequence if no items remain.
(more [this] (or (next this) '()))`
;; return a new sequence with `obj` prefixed,
;; invoked via the global `conj`.
(cons [_ obj] (LeftistHeapPriorityQueue. (pq-insert obj pq)))
clojure.lang.Seqable
(seq [this] (when (.next this) this)))
```

The helper function `pq-insert-all`

used by *cons* is,

```
(defn pq-insert-all
"Insert all elements from the sequence `objs` into the
priority queue `pq`."
[objs pq]
(loop [xs objs, pq pq]
(if (seq xs)
(recur (rest xs) (pq-insert (first xs) pq))
pq)))
```

**Exercise 2.** The *pq-insert-all* function runs in O(n log n) time. Is a more efficient
(as in O(n)) alternative implementation possible?

To make it convenient to create priority queues from a sequence of values,
let us define a "constructor" of *LeftistHeapPriorityQueue*s:

```
(defn priority-queue
([objs]
(LeftistHeapPriorityQueue. (pq-insert-all objs empty-heap)))
([]
(priority-queue [])))
```

Finally, by implementing the *print-method*
multimethod, we can let the REPL know how to display a *LeftistHeapPriorityQueue* literal:

```
(defmethod print-method LeftistHeapPriorityQueue
[tree ^java.io.Writer w]
(.write w (str "#lhpq<" (.first tree) " ...>")))
```

Before we say goodbye, here is a REPL session with our new sequence type:

```
clojure> (def a (priority-queue [1 123 12 45]))
clojure> (seq a)
;; #lhpq<1 ...>
clojure> (seq? a)
;; true
clojure> (first a)
;; 1
clojure> (def a (rest a))
clojure> (first a)
;; 12
clojure> (def a (conj a 50))
clojure> (first (rest a))
;; 45
clojure> (first (rest (rest a)))
;; 50
```

The full source code for this post.

- The original implementation of the leftist-heap priority queue can be found in Chapter 3 of Chris Okasaki's Purely Functional Data Structures.
- Inside Clojure/Sequences.