What data representation is appropriate for implementing a Dictionary? What are the running times for all the dictionary operations (search, insert, delete, successor, predecessor, minimum, maximum) when the data structure is implemented as:
- Unsorted array
- Sorted array
- Singly-linked unsorted list
- Doubly-linked unsorted list
- Singly-linked sorted list
- Doubly-linked sorted list
This reveals some of the inherent trade-offs of data structure design. A given data representation may permit efficient implementation of certain operations at the cost that other operations are expensive.
Arrays
Given that is the number of elements currently in the array, we have:
Operation | Unsorted Array | Sorted Array |
---|---|---|
Search(A, k) | ||
Insert(A, x) | ||
Delete(A, x) | ||
Successor(A, x) | ||
Predecessor(A,x) | ||
Maximum(A) | ||
Minimum(A) |
- Search – For an unsorted array, this is implemented by testing the search key
k
against each element of an unsorted array. Thus, it takes linear time in the worst case (keyk
is not found inA
). For sorted arrays, this can be done via binary search, which reduces to logarithmic time. - Insertion – For an unsorted array this is implemented by incrementing
n
and then copying itemx
to the th cell in the array,A[n]
. The bulk of the array is untouched, so this operation takes constant time. This becomes more expensive in the sorted case, since we need to find make room for the new item and shuffle everything to the right. - Deletion – This is defined such that we are given a pointer to
x
, so we don’t need to spend time searching. However, removing the th element from the array leaves a hole.- For a sorted array, we need to shift all elements to the left, which is .
- For an unsorted array, we can just overwrite
A[x]
withA[n]
, and decrementn
. This takes constant time.
- Predecessor/Successor – This require us to find the biggest element smaller than
A[x]
and smallest element bigger thanA[x]
respectively. For an unsorted array, we need to sweep through all elements, so this is linear time. For a sorted array, we can just retrieveA[x-1]
orA[x+1]
, which is constant time. - Minimum/Maximum – Similar to predecessor/successor above.
Linked Lists
Operation | Single/Unsorted | Single/Sorted | Double/Unsorted | Double/Sorted |
---|---|---|---|---|
Search(A, k) | ||||
Insert(A, x) | ||||
Delete(A, x) | ||||
Successor(A, x) | ||||
Predecessor(A,x) | ||||
Maximum(A) | ||||
Minimum(A) |
- Search – Sorting doesn’t provide benefit for linked lists than it did for arrays. Binary search is no longer possible, because we can’t access the median element without traversing all the elements before it. What sorted lists do provide is quick termination of unsuccessful searches, for if we have not found “A” by the time we hit “C” we can deduce that it doesn’t exist in the list. This is still linear time in the worst case.
- Insertion/Deletion – For a singly-linked list, this is hard to do because the definition of the Delete operation states we are given a pointer
x
to the item to be deleted; but what we need is a pointer to the node pointing tox
in the list, because that’s the node that needs to be changed. We can’t do anything without this predecessor pointer, so we need to spend linear time searching. Doubly-linked lists solve this problem, since we can just retrieve the predecessor ofx
; this is a benefit of doubly-linked sorted lists over sorted arrays. - Predecessor/Successor – The predecessor pointer problem again complicates implementing
Predecessor
with singly linked lists, requiring a constant time search. The logical successor is equivalent to the node successor for both types of sorted lists, and hence can be implemented in constant time. - Minimum/Maximum – The minimum element sits at the head of a sorted list, and so is easily retrieved. The maximum element is at the tail of the list, which normally requires linear time to reach in either singly or doubly linked lists.
- We can maintain a pointer to the list tail but we need to pay maintenance costs on every insertion and deletion. This tail pointer can be updated in constant time for doubly-linked lists; on insertion, check whether
last->next == NULL
, and on deletion, setlast
to point to the list predecessor oflast
if the last element is deleted. - We have no efficient way to find this predecessor for singly linked lists. So why can we implement
Maximum
in ? The trick is to add the cost to each deletion, which already took linear time. Adding an extra linear sweep to update the pointer does not harm the asymptotic complexity ofDelete
, while gaining usMaximum
andMinimum
in constant time as a reward.
- We can maintain a pointer to the list tail but we need to pay maintenance costs on every insertion and deletion. This tail pointer can be updated in constant time for doubly-linked lists; on insertion, check whether