Algorithms: Analysis, Design, and Optimization Techniques

Posted on Mar 21, 2025 in Computers

Reduction

When encountering a new problem A, instead of creating a new algorithm, you may transform it into another problem B that already has a known solution (useful when A resembles B, allowing you to leverage an existing solver). Here’s how it works:

Transform an instance of problem A into an instance of problem B.
Transform the solution for B back into a solution for A.
Prove correctness: If the solution for B is correct, the solution for A must also be correct.
Compute total running time.

Asymptotic Analysis

Big-O (Upper bound/worst-case growth): If f(n) ∈ O(g(n)), there exist constants c and n₀ such that: f(n) ≤ c*g(n) for all n ≥ n₀. Example: 2n ∈ O(n²) because for some c, 2n ≤ c⋅n² for large n.

Big-Omega (Ω(g)) (Lower bound/best-case growth): f(n) ∈ Ω(g(n)) means g(n) ∈ O(f(n)) (inverse of Big-O).

Theta (Θ(g)) (Tight bound/same growth rate): f(n) ∈ Θ(g(n)) when f(n) ∈ O(g(n)) and f(n) ∈ Ω(g(n)).

Best and worst cases are properties of an algorithm (based on inputs).

Little-o (o(g)): f(n) grows strictly slower than g(n). Example: 2n ∈ o(n²) since 2n grows slower than n².

Little-Omega (ω(g)): Opposite of Little-o.

Divide and Conquer

Three main steps:

Divide the input into smaller subproblems.
Solve each subproblem recursively.
Combine the subproblem solutions to solve the original problem.

Algorithm examples:

MergeSort: Divides the array, sorts subarrays, merges results.
QuickSort: Chooses a pivot, partitions, sorts subarrays recursively.
Binary Search: Only one subproblem is explored at each step.

The running time T(n) of a recursive function is often expressed using a recurrence relation: {T(n) = cost of recursive calls + work done outside recursion}. Example: T(n) = T(n/2) + 1 (splitting the array in half, constant work per step).

The Master Theorem

Used to solve recurrence relations of the form: T(n) = aT(n/b) + f(n), where a = # of recursive subproblems, b = factor by which the problem size is reduced, and f(n) = extra work done outside recursion.

Three cases of the Master Theorem:

If f(n) grows slower than recursive work: Work is dominated by recursion. Runtime: T(n) = Θ(n^log_b(a)).
If f(n) grows at the same rate as recursion: Runtime: T(n) = Θ(n^log_b(a)*log(n)^k+1).
If f(n) grows faster than recursion and meets the regularity condition: Work is dominated by f(n). Runtime: T(n) = Θ(f(n)).

Greedy Algorithms

An approach to solving optimization problems by making a sequence of locally optimal choices, hoping to achieve a globally optimal solution.

Optimization Problems

There are multiple valid solutions.
An objective function determines the quality of a solution.
The goal is to maximize or minimize the objective function.

Examples:

Interval Scheduling:
- Find the largest set of non-overlapping intervals.
- Greedy choice: Pick the interval with the earliest finish time.
Minimum Spanning Tree (MST):
- Find a spanning tree with the smallest total edge weight.
- Greedy choice: Always pick the smallest weight edge that doesn’t create a cycle (Kruskal’s Algorithm) or always expand the shortest path (Prim’s Algorithm).

How Greedy Algorithms Work

A greedy algorithm follows these steps:

Make a local choice based on a simple criterion.
Solve the resulting subproblem (recursively or iteratively).
Combine choices to form the final solution.

Greedy algorithms do not always find the correct solution!

Proving a Greedy Algorithm is Correct

Greedy Stays Ahead (Induction Proof):
- Compare the greedy solution step-by-step with an optimal solution.
- Show that at each step, the greedy choice is at least as good as the optimal choice.
- Example: Interval Scheduling
Exchange Argument:
- Suppose O is an optimal solution and G is the greedy solution.
- Show that O can be modified to be more like G, while still being optimal.
- Continue modifying O until it matches G, proving that G is also optimal.
- Example: Job Scheduling by Weight

Worked Examples of Greedy Proofs

Greedy Stays Ahead Example (TA Carrying Exam Booklets):
- Greedy Algorithm: Each TA carries as many booklets as possible before the next TA gets assigned.
- Proof Outline:
  - Lemma: The first i TAs in the greedy solution carry at least as many booklets as in any optimal solution.
  - Inductive proof to show greedy uses no more TAs than optimal.
  - Conclusion: Greedy is optimal.
Exchange Argument Example (Weighted Scheduling):
- Greedy Algorithm: Schedule jobs in decreasing order of weight.
- Proof Outline:
  - Define a difference (e.g., an inversion where two jobs appear in a different order in the greedy vs. optimal solution).
  - Show that swapping two jobs does not decrease weighted satisfaction.
  - Repeated swaps transform O into G without reducing quality → Greedy is optimal.

Stable Matching

Problem definition: We have n students and n employers, where:

Each employer ranks all students in preference order.
Each student ranks all employers in preference order.

Desired output: A one-to-one matching where no employer-student pair prefers each other over their assigned partners (Perfect matching).

An instability example is: Employer e_a prefers student s_d over their assigned student s_b, and Student s_d prefers employer e_a over their assigned employer e_c.

The Gale-Shapley Algorithm

Key idea: Employers propose to students in order of preference; students temporarily accept the best offer they’ve received and reject less-preferred employers; if a better offer arrives, they switch to the more preferred employer. This process guarantees a stable matching.
Steps:

Initialize all employers and students as free.
While some employer is free and has not yet proposed to all students:
- The employer proposes to their most preferred student who hasn’t rejected them.
- If the student is free, they accept the offer.
- If the student is already matched:
  - If the new employer is preferred, the student switches (the previous employer is now free).
  - Otherwise, the student rejects the new offer.
Repeat until every student has exactly one assigned employer.

Time Complexity: O(n²)

KT Chapter 2

2.1: Refined Definition

An algorithm is efficient if it avoids brute-force search and improves worst-case performance. Polynomial-time algorithms (O(n^k), where k is a constant) are considered tractable. Exponential-time algorithms (O(2ⁿ), O(n!)) are intractable for large n.

2.4: Growth Rate

O(1) < O(log(n)) < O(n) < O(n*log(n)) < O(n²) < O(n³) < O(2ⁿ) < O(n!)

2.5: Priority Queue Operations

Insert: Add an element with a priority.
Extract-min: Retrieve the smallest element.
Decrease-key: Update the priority of an element.

Heap Implementation

A heap supports priority queues efficiently:
- Insert: O(log(n))
- Extract-min: O(log(n))
Proof: Heap Sorting Algorithm:

Insert all elements into a heap (O(n*log(n))).
Extract elements one by one (O(n*log(n))).

Total Complexity: O(n*log(n)), which is optimal for comparison-based sorting.

KT Chapter 3 (3.1-3.6)

3.1: Graphs

A graph G = (V, E) consists of:

NODES (V): Also called vertices, representing entities.
EDGES (E): Connections between nodes, representing relationships.

Types of Graphs:

Undirected graphs: Edges have no direction.
Directed graphs (digraphs): Edges have direction.

Graph Representation

Adjacency List: Each node stores a list of its neighbors (Efficient for sparse graphs (O(|V| + |E|) storage)).
Adjacency Matrix: A |V|×|V| matrix where A[i][j] = 1 if there’s an edge from i to j.

Tree Property: A tree with n nodes has exactly n-1 edges.

Graph Connectivity: A graph is connected if a path exists between any two nodes.

3.2: Breadth-First Search (BFS) and Depth-First Search (DFS)

Breadth-First Search (BFS): Explores a graph layer by layer, uses a queue (FIFO structure), useful for finding the shortest path in an unweighted graph. Complexity: O(|V| + |E|).
Depth-First Search (DFS): Explores a graph as deeply as possible before backtracking, uses a stack (LIFO structure) (or recursion), useful for cycle detection and topological sorting. Complexity: O(|V| + |E|).

BFS correctly finds the shortest path from a source s to any node v. BFS expands in layers; all nodes in layer i are at distance i from s. Since BFS visits each node only once, the first time it reaches v, it must be via the shortest path. Thus, BFS guarantees an optimal solution.

3.3: Graph Storage Structures

Adjacency List (Recommended for sparse graphs).
Adjacency Matrix (Efficient for dense graphs but space-intensive).

3.4: Bipartite Graphs

A bipartite graph is one where nodes can be divided into two sets X and Y, such that every edge connects a node in X to a node in Y.

3.5: Strongly Connected Components

A strongly connected component (SCC) is a maximal subgraph where every node is reachable from every other node.

3.6: Topological Ordering

A topological order of a DAG is a linear ordering of its vertices such that for every directed edge (u, v), vertex u appears before v.

A Directed Acyclic Graph (DAG) is a directed graph with no cycles.

KT 5.1-5.4

5.1: MergeSort

MergeSort algorithm:

Divide: Split the array into two equal halves.
Conquer: Recursively sort each half.
Combine: Merge the two sorted halves in linear time.

The worst-case running time follows the recurrence: T(n) = 2T(n/2) + O(n), which accounts for sorting each half, O(n) accounts for merging the two halves.

Proof:

Unrolling the recurrence:
- Level 0: T(n) = O(n) + 2T(n/2)
- Level 1: 2(O(n/2) + 2T(n/4))
- Level 2: 4(O(n/4) + 2T(n/8))
- Generalizing: At level j, there are 2^j subproblems of size n/2^j, each taking O(n/2^j) time.
Summing over all levels:
- The recursion continues until the subproblem size reaches 1 (base case).
- There are log(n) levels, and the total work per level is O(n).

Total running time: O(n*log(n)).
Conclusion: MergeSort runs in O(n*log(n)) time.

5.2: Recursive Calls

For an algorithm that recursively calls q subproblems of size n/2 each and combines them in O(n) time, the recurrence is: T(n) = q*T(n/2) + O(n), where q can be greater than or less than 2.

Master Theorem: For recurrences of the form: T(n) = a*T(n/b) + f(n), we compare f(n) with n^log_b(a).

The three cases:

If f(n) = O(n^c) with c < log_b(a), then T(n) = O(n^log_b(a)).
If f(n) = O(n^log_b(a)), then T(n) = O(n^log_b(a)*log(n)).
If f(n) = O(n^c) with c > log_b(a), then T(n) = O(f(n)).

5.3: Counting Inversions

Given an array, count the number of inversions (pairs where elements are out of order). Examples: Input: [2, 4, 1, 3, 5], inversions: (2,1), (4,1), (4,3) → Total: 3 inversions.

5.4: Closest Pair of Points

Finding the Closest Pair of Points: Given n points in the 2D plane, find the pair of points that are closest together.

KT 4.1-4.2

4.1: Interval Scheduling

The Greedy Algorithm Stays Ahead, involves a set of requests, each corresponding to an interval of time [s(i), f(i)]. The goal is to select the maximum number of non-overlapping requests (a compatible subset). Greedy Algorithm Approach: A greedy algorithm selects requests using a simple rule:

Sort the intervals by their finishing time in ascending order.
Iterate through the sorted list, selecting the first available request and removing any conflicting requests.
Repeat until no more requests can be selected.

Proof of Optimality: Staying Ahead: To prove that this greedy algorithm produces an optimal solution, the proof shows that:

For every request in the greedy algorithm’s selection, there is a corresponding request in the optimal solution that finishes no earlier. This is proven using induction, ensuring that each greedy choice keeps the solution “ahead” of any potential optimal solution.
Contradiction Proof: Suppose the greedy algorithm selects fewer requests than the optimal solution; by construction, if a longer optimal solution existed, there must be another request that could be included, contradicting the stopping condition of the greedy algorithm.

Time complexity: Sorting the intervals takes O(n*log(n)), and selecting the intervals in one pass takes O(n), leading to an overall complexity of O(n*log(n)).

4.2: Scheduling to Minimize Lateness: An Exchange Argument

Has deadline d_i, A processing time t_i. The goal is to schedule all jobs with non-overlapping intervals such that the maximum lateness L = max_i(f(i) – d_i) is minimized.

Greedy Algorithm: Earliest Deadline First (EDF).

Proof of Optimality: Exchange Argument: Inversions: If a job with a later deadline is scheduled before one with an earlier deadline, we say there is an inversion. Key Idea: Any optimal schedule can be converted into one with no inversions without increasing lateness.

Time complexity: Sorting the intervals takes O(n*log(n)), and selecting the intervals in one pass takes O(n), leading to an overall complexity of O(n*log(n)).

4.4: Recursive Solution to Weighted Interval Scheduling

In the standard Interval Scheduling Problem (where all intervals have equal weight), it can be solved greedily. However, when intervals have different values, greedy methods fail because choosing the interval that ends the earliest is not necessarily optimal.

FLIP

This section discusses finding the shortest paths in a graph using a greedy algorithm, specifically Dijkstra’s Algorithm. Given a directed graph G = (V, E) with a source node s, the goal is to compute the shortest path from s to all other nodes. Each edge e has a non-negative weight w(e), representing cost, time, or distance. If the graph is undirected, each edge is represented as two directed edges with the same weight.

Dijkstra’s Algorithm

Maintains a set S of explored nodes for which the shortest distance from s is known. Initialization: S = {s}, with d(s) = 0.

Iteration: Repeatedly selects the next node v ∉ S that has the shortest known distance from s, then updates the distances of its neighbors.

Implementation using priority queues: Uses a min-priority queue where each node v is associated with its shortest known distance d(v). The algorithm extracts the minimum distance node and updates the queue with new shortest distances for adjacent nodes.

Time complexity: O(m*log(n)) if implemented with a binary heap. O(n²) with a simple array implementation. The algorithm guarantees correctness because it processes nodes in increasing order of distance from s. This ensures that when a node is added to S, its shortest path distance is finalized.

4.5: The Minimum Spanning Tree Problem

Given a connected, undirected graph G = (V, E) with edge weights w(e) > 0, the goal is to find a subset of edges T ⊆ E such that:

T connects all nodes (i.e., forms a tree).
The total edge weight ∑_e∈T w(e) is minimized.

A solution must be a tree, as cycles can be removed to reduce costs.

Greedy Algorithms for MST

Kruskal’s Algorithm:
- Sorts edges by increasing weight.
- Adds the smallest edge to the MST unless it forms a cycle.
- Uses Union-Find to efficiently detect cycles.
- Time Complexity: O(m*log(m)).
Prim’s Algorithm:
- Similar to Dijkstra’s Algorithm but grows an MST.
- Starts from an arbitrary node and repeatedly adds the smallest-weight edge connecting the tree to an unexplored node.
- Uses a priority queue for efficiency.
- Time Complexity: O(m*log(n)) using a binary heap.

Proof of Correctness:

Cut Property: A minimum spanning tree must include the smallest edge crossing any partition of the graph.
Exchange Argument: If any MST differs from the one found by Kruskal’s or Prim’s, edges can be swapped to create a better (or equally optimal) tree.