Heaps are a data structure for supporting the priority queue operations insert and find-minimum. They work by maintaining a partial order on the set of elements that is weaker than the sorted order (so it can be efficient to maintain), yet stronger than random order (so the minimum element can be quickly identified).

Heap-Labeled Tree

A heap-labeled tree is defined to be a binary tree such that the key of each node dominates the keys of its children.

  • In a min-heap, a node dominates its children by having a smaller key than they do
  • In a max-heap, a node dominates its children by having a bigger key than they do

The most natural implementation of this binary tree would store each key in a node with pointers to its two children. However, similar to binary search trees, the memory used by the pointers can easily outweigh the size of keys, which is the data we’re mostly interested in.

Sometimes, “heap” is used to refer to a heap-labeled tree.

Heap Data Structure

The heap is a slick data structure that enables us to represent binary trees without using any pointers. We store data as as an array of keys, and use the position of the keys to implicitly play the role of pointers. We assume that the array starts with index 1 for simplicity.

We store the root of the tree at the first position of the array, and its left and right children in the second and third positions. In general, we store the keys of the th level of a complete binary tree from left to right in positions to , as shown above.

  • Level 1 (1492): key, stored at position
  • Level 2 (1783, 1776): keys, stored between positions and
  • Level 3 (1804, 1865, 1945, 1963): keys, stored between positions and .

What is especially nice about this representation is that the positions of the parent and children of the key at position are readily determined. Thus, we can move around the tree without any pointers.

  • The left child of sits in position and the right child in
  • The parent of k holds court in position  .
typedef struct { 
	item_type q[PQ_SIZE+1]; /* body of queue */ 
	int n;                  /* number of queue elements */ 
} priority_queue;
typedef struct { 
	item_type q[PQ_SIZE+1]; /* body of queue */ 
	int n;                  /* number of queue elements */ 
} priority_queue;
 
int pq_parent(int n) {
	if (n == 1) { 
		return(-1); 
	} 
	return((int) n/2); /* implicitly take floor(n/2) */ 
}
 
int pq_young_child(int n) { 
	return(2 * n); 
}

This approach means that we can store any binary tree in an array without pointers. What is the catch? Suppose our height tree was sparse, meaning that the number of nodes . All of the missing internal nodes still take up space in our structure, since we must represent a full binary tree to maintain the positional mapping between parents and children.

To avoid these holes and ensure space efficiency, we make each level be packed as much as it can be, such that:

  • Only the last level may be incomplete.
  • The elements of the last level as far left as possible.

Thus, we can represent an -key tree using the first elements of the array. If we did not enforce these structural constraints, we might need an array of size to store elements.

With heaps, all but the last level are filled, so the height of an element heap is logarithmic because:

Benefits/Drawbacks

This implicit representation of binary trees saves memory, but is less flexible than using pointers.

  • We cannot store arbitrary tree topologies without wasting large amounts of space.
  • We cannot move subtrees around by just changing a single pointer, only by explicitly moving all the elements in the subtree.

This loss of flexibility explains why we cannot use this idea to represent binary search trees, but it works just fine for heaps.

How can we efficiently search for a particular key in a heap? We can’t. Binary search does not work because a heap is not a binary search tree, and the array form of the heap is not sorted. We know almost nothing about the relative order of the leaf elements in a heap that would let us avoid doing linear search through them.

Heap Construction

Heaps can be constructed incrementally, by inserting each new element into the left-most open spot in the array (st position of a previously -element heap). This ensures the desired balanced shape of the heap-labeled tree, but does not maintain the dominance ordering of the keys. The new key might be less than its parent in a min-heap, or greater than its parent in a max-heap, which we don’t want.

The solution to this is to swap any dissatisfied element with its parent:

  • The old parent is now happy, as it is properly dominated
  • The other child of the old parent is happy, because it is now dominated by an element even more extreme than before
  • The new element is now happier, but may still dominate its new parent. So we recur at a higher level, bubbling up the new key to its proper position in the hierarchy.
    • Since we replace the root of a subtree by a larger one at each step of bubbling up, we preserve the heap order elsewhere.
void pq_insert(priority_queue *q, item_type x) {
 
	if (q->n >= PQ_SIZE) { // q->n dereferences pointer and accesses member of the struct it points to. Equivalent to (*q).n
		printf("Warning: priority queue overflow! \n");
	}
	else {
		q->n = (q->n) + 1; 
		q->q[q->n] = x;
		bubble_up(q, q->n);
	}
}
 
void bubble_up(priority_queue *q, int p) {
 
	if (pq_parent(p) == -1){
		return;
	}
 
	if (q->q[pq_parent(p)] > q->q[p]) {
		pq_swap(q, p, pq_parent(p));
		bubble_up(q, pq_parent(p));
	}
 
}

This swap process takes constant time at each level. Since the height of an -element heap is , each insertion takes at most time. A heap of elements can thus be constructed in time through such insertions:

void pq_init(priority_queue *q) {
	q->n = 0
}
 
void make_heap(priority_queue *q, item_type s[], int n) {
	int i;
	pq_init(q);
 
	for (i = 0; i < n; i++){
		pq_insert(q, s[i]);
	}
}