Merge Sort

Problem Solved

Merge Sort solves the Sorting problem.

Design and Strategy

Merge Sort uses the divide-and-conquer approach. It proceeds as follows:

  1. If the list has at most one item, it’s already sorted, so do nothing.
  2. Otherwise, divide the list into two halves—a first half and a second half.
  3. Sort each half independently, using Merge Sort, of course.
  4. Take the two sorted halves and merge them together by always picking the smaller front item from either half. Once one half is empty, copy the remaining elements of the other list to the end of the array. An auxiliary array is needed for this step.

Before we can give a more formal description of the algorithm we must discuss how to accomplish the merge step. If you are unfamiliar with how this is done, please read the Merge section before continuing.

To sort an array \(A\) of length \(n\), we let \(left=0\) and \(right=n-1\) and proceed as follows. (Note: it treats both \(left\) and \(right\) as inclusive endpoints.)

  1. Base case: When your subarray has zero or one element, it is by definition already sorted, so do nothing.
  2. Split the problem in half: For a subarray that starts at index \(\mathrm{left}\) and ends at index \(\mathrm{right}\), compute \[ \mathrm{mid} \;=\; \left\lfloor \frac{\mathrm{left} + \mathrm{right}}{2} \right\rfloor. \] Now you have two smaller tasks: sort the left half of \(A\) \(\bigl(\mathrm{left}\dots\mathrm{mid}\bigr)\) and sort the right half of \(A\) \(\bigl(\mathrm{mid}+1\dots\mathrm{right}\bigr)\).
  3. Sort each half recursively: Call Merge Sort on the left half and then again on the right half.
  4. Merge the sorted halves: Once both halves are individually sorted, Merge them back into a single sorted array.

Implementation in Java, C++, Python

The implementations are initially called as mergeSort(A,0,n-1). An alternative version for Python is given that takes advantage of some shortcuts, but it is not as efficient as the first version.
public static void mergeSort(int[] A, int left, int right) {
  if (left < right) {
    int mid = left + (right - left) / 2;
    mergeSort(A, left, mid);
    mergeSort(A, mid + 1, right);
    merge(A, left, mid, right);
  }
}

public static void merge(int[] A, int left, int mid, int right) {
  int n1 = mid - left + 1;
  int n2 = right - mid;
  int[] L = new int[n1];
  int[] R = new int[n2];
  // copy into temporary lists
  for (int i = 0; i < n1; i++) 
      L[i] = A[left + i];
  for (int j = 0; j < n2; j++) 
      R[j] = A[mid + 1 + j];
  int i = 0, j = 0, k = left;
  // merge back into A
  while (i < n1 && j < n2) {
    if (L[i] <= R[j]) 
	    A[k++] = L[i++];
    else 
	    A[k++] = R[j++];
  }
  // copy any remaining elements of L (if R is exhausted)
  while (i < n1) 
      A[k++] = L[i++];
  // copy any remaining elements of R (if L is exhausted)
  while (j < n2) 
      A[k++] = R[j++];
}
void mergeSort(int A[], int left, int right) {
  if (left < right) {
    int mid = left + (right - left) / 2;
    mergeSort(A, left, mid);
    mergeSort(A, mid + 1, right);
    merge(A, left, mid, right);
  }
}

void merge(int A[], int left, int mid, int right) {
  int n1 = mid - left + 1;
  int n2 = right - mid;
  int* L = new int[n1];
  int* R = new int[n2];
  // copy into temporary lists
  for (int i = 0; i < n1; i++) 
      L[i] = A[left + i];
  for (int j = 0; j < n2; j++) 
      R[j] = A[mid + 1 + j];
  int i = 0, j = 0, k = left;
  // merge back into A
  while (i < n1 && j < n2) {
    if (L[i] <= R[j]) 
	    A[k++] = L[i++];
    else 
	    A[k++] = R[j++];
  }
  // copy any remaining elements of L (if R is exhausted)
  while (i < n1) 
      A[k++] = L[i++];
  // copy any remaining elements of R (if L is exhausted)
  while (j < n2) 
      A[k++] = R[j++];
  // delete dynamically allocated memory
  delete[] L;
  delete[] R;
}
def merge_sort(A, left, right):
    if left < right:
        mid = left + (right - left) // 2
        merge_sort(A, left, mid)
        merge_sort(A, mid + 1, right)
        merge(A, left, mid, right)

def merge(A, left, mid, right):
    n1 = mid - left + 1
    n2 = right - mid

    # copy into temporary lists
    L = [A[left + i] for i in range(n1)]
    R = [A[mid + 1 + j] for j in range(n2)]

    i = j = 0       # current indices in L and R
    k = left        # current index in A where we write

    # merge back into A
    while i < n1 and j < n2:
        if L[i] <= R[j]:
            A[k] = L[i]
            i += 1
        else:
            A[k] = R[j]
            j += 1
        k += 1

    # copy any remaining elements of L (if R is exhausted)
    while i < n1:
        A[k] = L[i]
        i += 1
        k += 1

    # copy any remaining elements of R (if L is exhausted)
    while j < n2:
        A[k] = R[j]
        j += 1
        k += 1
def merge_sort(A):
    if len(A) <= 1:
        return A
    mid = len(A) // 2
    left = merge_sort(A[:mid])
    right = merge_sort(A[mid:])
    return merge(left, right)

def merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            result.append(left[i]); i += 1
        else:
            result.append(right[j]); j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result

Time/Space Analysis

Time Complexity: Let \(T(n)\) be the time (number of basic operations) required by Merge Sort to sort an array of length n. We can break each call into three parts:

  1. Divide: Compute the midpoint and split the array into two halves of size roughly \(n/2\). This takes constant time, say \(\alpha\).
  2. Conquer (recursive calls): Recursively call Merge Sort on each half. Each half has size \(\lfloor n/2\rfloor\) or \(\lceil n/2\rceil\), but for simplicity we write it as two calls of size \(n/2\). Since \(T(n)\) is the time to sort an array of size \(n\), it takes \(T(n/2)\) time to sort an array of size \(n/2\). Thus, the two recursive calls take \(2\,T(n/2)\) time.
  3. Combine (merge): Merge the two sorted halves back together with a single left-to-right sweep, comparing elements and copying them into a result. This takes \(\beta\,n\) time (for some constant \(\beta\)).

Since sorting an array of size 1 takes constant time, we get

\[ T(n) \;=\; 2\,T\!\bigl(\tfrac n2\bigr)\;+\;\alpha\;+\;\beta\,n, \quad T(1) = 1. \]

Assuming we only want a bound on the complexity, we can simplify this to

\[ T(n) \;=\; 2\,T\!\bigl(\tfrac n2\bigr)\;+\;n, \quad T(1) = 1. \]

There are various ways of solving this recurrence relation (backward substitution, forward substitution, Master Theorem) that all give us \[T(n)=\Theta(n\log n).\]

Notice that the complexity does not depend on the initial order of elements. Thus, Merge Sort takes a predictable amount of time to sort any array of a given size.

Space Complexity: Merge Sort requires two types of extra space:

Thus, overall the space complexity is \(\Theta(\log n\;+\;n)\;=\;\Theta(n)\).

Variations/Improvements

There are several well-known variations and optimizations:

Reading Comprehension Questions

  1. What is the worst-, average-, and best-case time complexity of Merge Sort?
  2. Which recurrence relation models Merge Sort, and how is it solved?
  3. Why is Merge Sort called a divide-and-conquer algorithm?
  4. Is Merge Sort stable? Explain your answer.
  5. What extra space does Merge Sort need and why?
  6. Does Merge Sort do the work on the way down the recursive calls or on the way up?

In-Class Activities

  1. Draw the recursion tree for a size-8 array and annotate work per level.
  2. Trace Merge Sort on [5,3,8,1,2,7,4,6] by hand.
  3. Trace Insertion Sort on the data from the previous question and discuss the number of comparisons used for each algorithm.
  4. Implement bottom-up (iterative) Merge Sort and discuss pros/cons.
  5. Experiment with stability using arrays containing duplicates. Can you make versions of Merge Sort that are both stable and unstable? Discuss.

Problems

  1. Implement bottom-up (iterative) Merge Sort in your favorite language.
  2. Design an in-place merge and analyze its complexity.
  3. Count the number of comparisons and moves Merge Sort uses on [8,7,6,5,4,3,2,1] and [1,2,3,4,5,6,7,8].
  4. Implement Merge Sort for a singly-linked list and compare its performance and space usage to the array-based version.
  5. Design and implement a parallel Merge Sort (using threads or processes); measure its speedup on large inputs and analyze the overheads that limit scalability.