Hash Tables, Binary Trees, Graphs, Balanced Trees, Heaps & DP
Chapter 6: Hash Tables
What is a Hash Table?
A hash table is a data structure used to store key-value pairs efficiently. It uses a hashing function to compute an index (or hash) from the key, where the value is stored in an array-like structure.
1. Collisions
Collisions occur when two keys hash to the same index in the hash table. Hash tables must handle collisions to work correctly. Common methods for collision resolution:
- Chaining
- Each slot in the hash table holds a linked list of key-value pairs.
- When collisions occur, new pairs are added to the linked list at that index.
- Example:
Index 0 -> (key1, value1) -> (key2, value2)
- Open Addressing
- Collisions are resolved by finding another empty slot using a probing technique:
- Linear Probing: Check the next slot sequentially.
- Quadratic Probing: Use a quadratic function to calculate the next slot.
- Double Hashing: Use a secondary hash function to calculate the next slot.
- Collisions are resolved by finding another empty slot using a probing technique:
2. How to Calculate Key-Value Pairs
- Hash Function
- The hash function maps a key to an index in the table:
Index = hash(key) % Table Size
- A good hash function minimizes collisions by uniformly distributing keys.
- The hash function maps a key to an index in the table:
- Insertion
- Apply the hash function to the key.
- Store the value at the calculated index.
- Handle collisions if necessary.
- Lookup
- Apply the hash function to the key.
- Retrieve the value stored at the calculated index.
- If collisions occur, search through the linked list (in chaining) or follow the probing sequence.
- Deletion
- Locate the key using the hash function.
- Remove the key-value pair while preserving the structure (e.g., rehash elements in open addressing).
Key Characteristics of Hash Tables
- Fast Operations: On average, insertion, deletion, and lookup take O(1).
- Space Usage: Needs additional memory for chaining or handling empty slots in open addressing.
- Applications:
- Caching (e.g., LRU cache).
- Implementing sets.
- Database indexing.
Chapter 7: Binary Trees
What is a Binary Tree?
A binary tree is a hierarchical data structure in which each node has at most two children, referred to as the left child and right child.
- Root: The topmost node of the tree.
- Leaf Nodes: Nodes with no children.
- Height: The number of edges on the longest path from the root to a leaf.
1. Binary Search
Binary search is a technique used to search for an element in a binary search tree (BST), which is a special type of binary tree where:
- The left subtree contains values less than the parent node.
- The right subtree contains values greater than the parent node.
Binary Search Algorithm
- Start at the root node.
- Compare the target value with the current node’s value:
- If the target value equals the current node, the search is successful.
- If the target value is smaller, move to the left subtree.
- If the target value is larger, move to the right subtree.
- Repeat until the target is found or the subtree becomes null (not found).
Time Complexity:
- Best case: O(1) (when the target is at the root).
- Average/Worst case: O(log n) for a balanced tree and O(n) for an unbalanced tree.
2. Insertion and Removal
Insertion
To insert a value into a binary search tree:
- Start at the root.
- Compare the new value with the current node:
- If smaller, move to the left subtree.
- If larger, move to the right subtree.
- Repeat until you find a null subtree (empty spot).
- Insert the new value as a child of the current node.
Example: Insert 8 into this BST:
10
/ \
5 15
/
3
- Compare 8 with 10 → go left.
- Compare 8 with 5 → go right.
- Insert 8 as the right child of 5.
Result:
10
/ \
5 15
/ \
3 8
Removal
To remove a node from a binary search tree, there are three cases:
-
Node has no children (leaf node)
- Simply delete the node.
- Example: Removing node 3:
10 / \ 5 15 \ 8
-
Node has one child
- Replace the node with its child.
- Example: Removing node 5:
10 / \ 8 15
-
Node has two children
- Replace the node with its in-order successor (smallest value in the right subtree) or in-order predecessor (largest value in the left subtree).
- Example: Removing 10:
15 / 8
Time Complexity: O(log n) for balanced trees, O(n) for unbalanced trees.
Key Characteristics of Binary Trees
- Binary trees are used in searching, sorting, and many hierarchical applications like file systems.
- The efficiency depends on whether the tree is balanced or unbalanced.
Chapter 8: Graphs
What is a Graph?
A graph is a data structure consisting of:
- Vertices (Nodes): Represent entities.
- Edges (Links): Represent connections between vertices.
Graphs can be:
- Directed: Edges have a direction (e.g., A → B).
- Undirected: Edges are bidirectional (e.g., A ↔ B).
- Weighted: Edges have weights (e.g., distance, cost).
- Unweighted: Edges are uniform in value.
1. Graph Representations
Graphs can be represented in two common ways:
Adjacency Matrix
- A 2D array where:
- Rows represent vertices.
- Columns represent vertices.
- An entry at
A[i][j]
is 1 (or the weight of the edge) if there’s an edge between i and j, otherwise 0.
Example (Undirected Graph):
Vertices: A, B, C
Edges: A-B, B-C
Adjacency Matrix:
A B C
A |0 1 0|
B |1 0 1|
C |0 1 0|
- Pros: Simple, easy to implement.
- Cons: Uses O(V2) space, even for sparse graphs.
Adjacency List
- Each vertex has a list of its neighbors.
- Example (Undirected Graph):
Vertices: A, B, C
Edges: A-B, B-C
Adjacency List:
A: B
B: A, C
C: B
- Pros: Efficient for sparse graphs, uses less memory (O(V+E)).
- Cons: Slightly more complex to traverse.
2. Dijkstra’s Algorithm
A shortest-path algorithm for weighted graphs.
Steps:
- Initialize distances from the source to all vertices as infinity (∞) except the source, which is 0.
- Use a priority queue (or min-heap) to extract the vertex with the smallest distance.
- For the current vertex, update the distances to its neighbors if a shorter path is found.
- Repeat until all vertices are processed.
Time Complexity:
- Using a min-heap: O((V + E) log V).
Example:
Find the shortest path from A to D.
Vertices: A, B, C, D
Edges: A-B (1), A-C (4), B-C (2), B-D (6), C-D (3)
Shortest path: A → B → C → D (distance = 6).
3. BFS (Breadth-First Search)
- Explores all vertices at the current depth level before moving deeper.
- Uses a queue.
- Common applications:
- Finding the shortest path in an unweighted graph.
- Graph traversal.
Steps:
- Start at a vertex, mark it as visited, and enqueue it.
- Dequeue a vertex, process it, and enqueue its unvisited neighbors.
- Repeat until the queue is empty.
Time Complexity: O(V + E).
4. DFS (Depth-First Search)
- Explores as far as possible along a branch before backtracking.
- Uses a stack (or recursion).
- Common applications:
- Detecting cycles.
- Topological sorting.
Steps:
- Start at a vertex, mark it as visited.
- Move to an unvisited neighbor and repeat.
- Backtrack when no unvisited neighbors remain.
Time Complexity: O(V + E).
5. Minimum Spanning Tree (MST)
An MST connects all vertices in a graph with the minimum total edge weight, without forming a cycle.
Algorithms for MST:
-
Kruskal’s Algorithm:
- Sort edges by weight.
- Add edges to the MST, avoiding cycles, until all vertices are connected.
- Time Complexity: O(E log E).
-
Prim’s Algorithm:
- Start at an arbitrary vertex.
- Grow the MST by adding the smallest edge connecting a vertex in the MST to a vertex outside the MST.
- Time Complexity: O((V + E) log V).
Chapter 9: Balanced Trees
What are Balanced Trees?
A balanced tree ensures that the height of the tree is kept as small as possible to optimize operations like insertion, deletion, and search. Balanced trees avoid skewed structures (unbalanced trees), where performance can degrade to O(n).
Common types of balanced trees:
- AVL Trees
- Red-Black Trees
1. AVL Trees
Definition
An AVL tree is a self-balancing binary search tree (BST) where the difference in height (balance factor) between the left and right subtrees of any node is at most 1.
- Balance Factor:
Balance Factor = Height(left subtree) - Height(right subtree)
. Values: -1, 0, +1.
Rotations in AVL Trees
Rotations are used to rebalance the tree when the balance factor becomes -2 or +2. There are four types of rotations:
-
Right Rotation (Single Rotation)
Used when there’s a left-heavy imbalance (left-left case).- Rotate the unbalanced node’s left child upward.
-
Left Rotation (Single Rotation)
Used when there’s a right-heavy imbalance (right-right case).- Rotate the unbalanced node’s right child upward.
-
Left-Right Rotation (Double Rotation)
Used for left-right imbalance.- First, perform a left rotation on the left child.
- Then, perform a right rotation on the unbalanced node.
-
Right-Left Rotation (Double Rotation)
Used for right-left imbalance.- First, perform a right rotation on the right child.
- Then, perform a left rotation on the unbalanced node.
Operations in AVL Trees
-
Insertion
- Insert as in a BST.
- After insertion, update the balance factor and perform rotations if necessary.
- Time Complexity: O(log n).
-
Deletion
- Delete as in a BST.
- After deletion, update the balance factor and perform rotations if necessary.
- Time Complexity: O(log n).
2. Red-Black Trees
Definition
A red-black tree is a self-balancing BST with the following properties:
- Each node is either red or black.
- The root is always black.
- Red nodes cannot have red children (no two consecutive red nodes).
- Every path from a node to its leaf must have the same number of black nodes (black height).
Rotations and Recoloring
-
Insertion
- Insert as in a BST.
- Fix violations of the red-black properties by performing rotations and recoloring.
- Example: If two consecutive red nodes occur, a rotation is needed to restore balance.
-
Deletion
- Delete as in a BST.
- Fix violations by adjusting colors or performing rotations.
Advantages of Red-Black Trees
- Guarantees O(log n) time complexity for insertion, deletion, and search.
- Often used in system libraries (e.g.,
TreeMap
in Java,std::map
in C++).
Comparison: AVL vs. Red-Black Trees
Feature | AVL Tree | Red-Black Tree |
---|---|---|
Balancing | Strictly balanced | Less strict balancing |
Rotations | More rotations | Fewer rotations |
Search Speed | Faster | Slightly slower |
Insertion/Deletion | Slower (due to rotations) | Faster (fewer adjustments) |
Chapter 10: Heap
What is a Heap?
A heap is a specialized binary tree-based data structure that satisfies the heap property:
- In a Min-Heap, the value of each node is less than or equal to the values of its children. The root has the smallest value.
- In a Max-Heap, the value of each node is greater than or equal to the values of its children. The root has the largest value.
1. Key Properties of a Heap
-
Complete Binary Tree
- All levels, except possibly the last, are completely filled.
- The last level is filled from left to right.
-
Heap Property
- For Min-Heap: Parent ≤ Children.
- For Max-Heap: Parent ≥ Children.
-
Applications of Heaps
- Priority Queues.
- Heap Sort.
- Scheduling (CPU or memory).
2. Array Representation of a Heap
A heap is typically represented as an array to save space and avoid pointers.
- For a node at index i:
- Parent: ⌊(i – 1) / 2⌋.
- Left Child: 2i + 1.
- Right Child: 2i + 2.
Example (Min-Heap):
10
/ \
20 30
/ \
40 50
Array Representation: [10, 20, 30, 40, 50].
3. Operations on a Heap
a. Insertion
- Insert the new value at the end of the heap (last available position).
- “Heapify Up”: Compare the new value with its parent, and swap if the heap property is violated. Repeat until the heap property is restored.
- Time Complexity: O(log n).
b. Deletion (Removing the Root)
- Remove the root (smallest in Min-Heap, largest in Max-Heap).
- Replace the root with the last element of the heap.
- “Heapify Down”: Compare the new root with its children, and swap with the smallest (Min-Heap) or largest (Max-Heap) child if the heap property is violated. Repeat until the heap property is restored.
- Time Complexity: O(log n).
c. Build a Heap
- Convert an unordered array into a heap.
- Start heapifying from the last non-leaf node up to the root.
- Time Complexity: O(n).
4. Min/Max Heap Properties
Min-Heap
- Root contains the smallest value.
- Efficient for finding the minimum element in O(1) time.
Max-Heap
- Root contains the largest value.
- Efficient for finding the maximum element in O(1) time.
Heap Sort
- Build a Max-Heap from the array.
- Swap the root (largest value) with the last element.
- Reduce the heap size by 1 and “Heapify Down” the root.
- Repeat until the heap size is 1.
- Time Complexity: O(n log n).
- Space Complexity: O(1) (in-place sorting).
Chapter 11: Dynamic Programming Algorithms
What is Dynamic Programming (DP)?
Dynamic Programming (DP) is a method for solving complex problems by breaking them down into simpler subproblems and storing the results of already solved subproblems to avoid redundant computation.
Key principles:
- Optimal Substructure: A problem can be solved optimally by combining optimal solutions to its subproblems.
- Overlapping Subproblems: The problem can be broken down into subproblems that are reused multiple times.
1. Steps to Solve a DP Problem
-
Define the Problem
- Clearly state what the function or array will represent (e.g.,
dp[i]
represents the minimum cost to reach step i).
- Clearly state what the function or array will represent (e.g.,
-
Identify Recurrence Relation
- Define how the solution to a larger problem is derived from solutions to smaller subproblems.
-
Base Case(s)
- Specify the initial conditions or base values.
-
Iterative or Recursive Approach
- Choose between bottom-up (iterative) or top-down (recursive with memoization) implementation.
-
Solve for the Final Result
- Combine subproblem results to get the final answer.
2. Techniques in DP
a. Memoization (Top-Down)
- Solve the problem recursively and store results of subproblems in a table (cache) to avoid recomputation.
- Commonly implemented using a dictionary or array.
Example: Fibonacci Numbers
def fib(n, memo={}):
if n <= 1:
return n
if n not in memo:
memo[n] = fib(n - 1, memo) + fib(n - 2, memo)
return memo[n]
b. Tabulation (Bottom-Up)
- Solve the problem iteratively, building solutions from smaller subproblems and storing them in a table.
Example: Fibonacci Numbers
def fib(n):
dp = [0, 1]
for i in range(2, n + 1):
dp.append(dp[i - 1] + dp[i - 2])
return dp[n]
3. Common Dynamic Programming Problems
a. Longest Common Subsequence (LCS)
Given two strings, find the length of their longest subsequence that appears in both strings.
Recurrence Relation:
dp[i][j] =
dp[i-1][j-1] + 1, if str1[i] == str2[j]
max(dp[i-1][j], dp[i][j-1]), otherwise
Time Complexity: O(m⋅n), where m and n are the lengths of the two strings.
b. 0/1 Knapsack Problem
Given a set of items with weights and values, determine the maximum value that can be obtained by selecting items such that the total weight doesn’t exceed a given capacity.
Recurrence Relation:
dp[i][w] =
dp[i-1][w], if w_i > w
max(dp[i-1][w], dp[i-1][w-w_i] + v_i), otherwise
Time Complexity: O(n⋅W), where n is the number of items and W is the capacity of the knapsack.
c. Longest Increasing Subsequence (LIS)
Find the length of the longest subsequence where elements are in increasing order.
Recurrence Relation:
For each i, find the maximum length of subsequence ending at i:
dp[i] = max(dp[j] + 1) for j < i and arr[j] < arr[i]
Time Complexity: O(n2) or O(n log n) using binary search.
d. Matrix Chain Multiplication
Given matrices A1, A2, …, An, find the minimum cost of multiplying them.
Recurrence Relation:
dp[i][j] = min(dp[i][k] + dp[k+1][j] + cost(i, k, j))
Time Complexity: O(n3).
4. DP vs. Greedy Algorithms
- Dynamic Programming: Examines all possible solutions to ensure an optimal result.
- Greedy Algorithms: Makes a locally optimal choice at each step, which may not guarantee the global optimum.