Heap Sort & Analysis of Algorithms
Design and Analysis of Algorithms
This lecture delves into Heap Sort, an efficient comparison-based sorting algorithm. We will revisit the definition of a binary heap, understand the core operations, analyze the algorithm’s complexity, and discuss important properties related to maintaining the heap structure.
Heap: Definition Recap
A Binary Heap is a specialized tree-based data structure that satisfies the following conditions:
- Shape Property: It must be an almost complete binary tree. This means the tree is completely filled on all levels, except possibly the lowest level, which is filled from left to right. This structure allows efficient representation using an array.
- Heap Property: The nodes must satisfy a specific ordering relationship. There are two main types:
- Max-Heap: The value of any node is greater than or equal to the values of its children. Consequently, the largest element in the heap resides at the root.
- Min-Heap: The value of any node is less than or equal to the values of its children. Consequently, the smallest element in the heap resides at the root.
(Heap Sort typically uses a Max-Heap to sort in ascending order or a Min-Heap to sort in descending order. We will focus on Max-Heap for ascending sort.)
Every node must follow the heap property. Leaf nodes trivially satisfy the condition as they have no children.
Heap Sort Algorithm
Heap Sort leverages the properties of a max-heap to sort an array A
. The core idea is to first arrange the array elements into a max-heap structure and then repeatedly extract the maximum element (which is always at the root of the heap) and place it at the end of the sorted portion of the array.
The algorithm consists of two main phases:
Phase 1: Build Max Heap
Transform the input array A
(of length n
) into a max-heap.
-
BUILD-MAX-HEAP(A)
Pseudocode:heap-size[A] ← length[A]
(Initialize heap size, though often implicitly the array lengthn
)for i ← floor(length[A]/2) downto 1
(Iterate from the last non-leaf node up to the root - using 1-based indexing convention here for pseudocode consistency)do MAX-HEAPIFY(A, i)
(Call Max-Heapify on each non-leaf node to establish the heap property)
(Note: If using 0-based indexing as in Python, the loop runs from
n//2 - 1
down to0
)
Phase 2: Sort by Extraction Repeatedly extract the maximum element and rebuild the heap.
-
HEAPSORT(A)
Pseudocode:BUILD-MAX-HEAP(A)
(First, build the initial max-heap)for i ← length[A] downto 2
(Iterate from the last element down to the second element - using 1-based indexing)do exchange A[1] ↔ A[i]
(Swap the root (max element) with the last element in the current heap)heap-size[A] ← heap-size[A] – 1
(Reduce the effective heap size by one, excluding the now-sorted element at the end)MAX-HEAPIFY(A, 1)
(Restore the max-heap property on the reduced heap, starting from the root)
(Note: If using 0-based indexing, the loop runs from
n-1
down to1
, the swap isA[0] ↔ A[i]
, and theMAX-HEAPIFY
call is on the heap of sizei
starting at index0
)
The MAX-HEAPIFY
Procedure
This is a crucial subroutine used both in building the heap and during the sorting phase. It ensures that the subtree rooted at index i
satisfies the max-heap property, assuming its children’s subtrees already satisfy it.
MAX-HEAPIFY(A, i)
Pseudocode (1-based index):l ← LEFT(i)
(Get index of left child, e.g.,2*i
)r ← RIGHT(i)
(Get index of right child, e.g.,2*i + 1
)if l ≤ heap-size[A] and A[l] > A[i]
then largest ← l
else largest ← i
if r ≤ heap-size[A] and A[r] > A[largest]
then largest ← r
if largest ≠ i
then exchange A[i] ↔ A[largest]
MAX-HEAPIFY(A, largest)
(Recursively call on the subtree where the swap occurred)
(The heap-size[A]
checks are essential to ensure children indices are within the bounds of the current heap being considered.)
Python Implementation (Revisiting from Lecture 4):
def max_heapify(arr, n, i):
""" Ensure subtree rooted at index i is a max heap. n is size of heap."""
largest = i # Initialize largest as root (using 0-based index)
l = 2 * i + 1 # left child index
r = 2 * i + 2 # right child index
# Check if left child exists and is greater than root
if l < n and arr[l] > arr[largest]:
largest = l
# Check if right child exists and is greater than the largest found so far
if r < n and arr[r] > arr[largest]:
largest = r
# If root is not the largest, swap with largest and heapify down
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i] # swap
# Recursively heapify the affected sub-tree
max_heapify(arr, n, largest)
def build_max_heap(arr):
"""Builds a max heap from an array."""
n = len(arr)
# Start from the last non-leaf node (n//2 - 1) and move upwards to index 0
start_index = n // 2 - 1
for i in range(start_index, -1, -1):
max_heapify(arr, n, i) # Pass n (total size) as heap size initially
def heap_sort(arr):
"""Sorts an array using Heap Sort."""
n = len(arr)
# 1. Build a maxheap.
build_max_heap(arr)
# print(f"Heap built: {arr}") # Optional: view the heap structure
# 2. One by one extract elements
# Iterate from the end of the array (n-1) down to the second element (1)
for i in range(n - 1, 0, -1):
# Move current root (max element) to the end (index i)
arr[i], arr[0] = arr[0], arr[i]
# print(f"Swapped {arr[i]} to end, Array: {arr}, Heap Size: {i}") # Optional
# Call max heapify on the reduced heap (size i) starting at root (0)
max_heapify(arr, i, 0) # Pass i as the new heap size
# print(f"Heap restored: {arr[:i]}") # Optional
# Example usage
my_list = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
print(f"Original list: {my_list}")
heap_sort(my_list)
print(f"Sorted list (Heap Sort): {my_list}")
Heap Sort Analysis
Let’s analyze the time complexity of the different components:
-
MAX-HEAPIFY
:- The work done at each node involves a constant number of comparisons and potential swaps.
- The recursion depth is limited by the height of the tree,
h
. For a complete or almost complete binary tree withn
nodes, the heighth = Θ(log n)
. - Therefore, the time complexity of
MAX-HEAPIFY(A, i)
isO(log n)
.
-
BUILD-MAX-HEAP
:- Simpler Analysis: The procedure calls
MAX-HEAPIFY
for roughlyn/2
nodes (the non-leaf nodes). Since each call takesO(log n)
, a straightforward upper bound is(n/2) * O(log n) = O(n log n)
. - Tighter Analysis (O(n)): A more precise analysis considers that
MAX-HEAPIFY
’s runtime depends on the height of the node it’s called on, not the total height of the heap. Most nodes are near the bottom of the heap with small heights.- Number of nodes at height
h
in a complete binary tree is at most⌈n / 2^(h+1)⌉
. - Work done by
MAX-HEAPIFY
at heighth
isO(h)
. - Total time =
Σ (from h=0 to logn) [ ⌈n / 2^(h+1)⌉ * O(h) ]
- Total time ≈
Σ (n / 2^(h+1)) * ch
- Total time ≈
(cn/2) * Σ (h / 2^h)
- The summation
Σ (h / 2^h)
converges to a constant (related to the derivative of a geometric series, approximately 2). - Therefore, the total time is
O(n)
. Building the heap is surprisingly linear!
- Number of nodes at height
- Simpler Analysis: The procedure calls
-
HEAPSORT
:- Step 1 (
BUILD-MAX-HEAP
): TakesO(n)
time (using the tighter analysis). - Step 2 (Extraction Loop): Runs
n-1
times.- Inside the loop:
- Swap:
O(1)
. MAX-HEAPIFY(A, 0)
: TakesO(log k)
wherek
is the current heap size (i
in the Python code). In the worst case, this isO(log n)
.
- Swap:
- Total time for the loop:
(n-1) * O(log n) = O(n log n)
.
- Inside the loop:
- Overall Performance: The total time is dominated by the extraction loop.
T(n) = O(n) + O(n log n) = O(n log n)
.
- Step 1 (
- Summary:
- Time Complexity:
O(n log n)
for best, average, and worst cases. - Space Complexity:
O(1)
(assuming the sort is done in-place within the original array). - Stability: Not stable.
- Swaps: The number of swaps performed is roughly
n-1
in the extraction phase, plus swaps duringBUILD-MAX-HEAP
andMAX-HEAPIFY
.
- Time Complexity:
Some Things to Know About Heaps
Heaps are dynamic structures, and their properties must be maintained when elements are added, deleted, or modified.
-
Restoring Heap Property:
- If a node’s value increases (potentially violating the max-heap property with its parent), the property can be restored by repeatedly swapping the node with its parent until it reaches a position where it’s less than or equal to its parent, or it becomes the root. This upward movement is called Percolation Up or Sifting Up.
- If a node’s value decreases (potentially violating the max-heap property with its children), the property can be restored by repeatedly swapping the node with its larger child until it reaches a position where it’s greater than or equal to both its children, or it becomes a leaf. This downward movement is essentially what
MAX-HEAPIFY
does and is called Percolation Down or Sifting Down.
-
Array Storage: The complete binary tree structure allows efficient mapping to an array:
- (Using 1-based indexing for clarity here, adjust for 0-based)
- Root:
A[1]
- Parent(i):
A[floor(i/2)]
- Left Child(i):
A[2i]
- Right Child(i):
A[2i + 1]
Home Task Examples
-
Maintain Minimum Heap / Sort Descending:
- Given array:
[1, 6, 9, 2, 7, 5, 2, 7, 4, 10]
- To sort in descending order, you would build a Min-Heap.
- Follow the
HEAP SORT
algorithm steps, but useBUILD-MIN-HEAP
andMIN-HEAPIFY
. In the extraction phase, swapping the root (minimum element) to the end of the current heap range effectively places the smallest elements at the end of the array, resulting in descending order.
- Given array:
-
Sort Descending using Heap Sort:
- Given array:
[24, 21, 23, 22, 36, 29, 30, 34, 28, 27]
- Again, build a Min-Heap and follow the Heap Sort extraction process to get descending order.
- Given array:
-
Sort Descending using Heap Sort:
- Given array:
[5, 3, 1, 9, 8, 2, 4, 7]
- Build a Min-Heap and proceed with extraction for descending sort.
- Given array:
This detailed analysis covers the Heap Sort algorithm, its implementation logic, complexity analysis (including the tighter bound for building the heap), and key heap maintenance operations.