Data Structures, Algorithms, and Their Applications
Data Structure and Iteration
Explanation: We use [Data Structure] because it maintains [monotonicity/ordering/property] and allows [operations] (sort/pop/push/get/set/remove/slice, etc.), tracking invariant (max/min/last valid position, etc.).
- Process:
- Initialization: Set up Data Structure (size/pointer start index). Initialize trackers.
- As we traverse through elements, for each element, we perform operations:
- Case 1: Perform [operation], update [tracking info], maintain [invariant].
- Case 2: …
- Move to the next element.
- After processing all elements, the final answer is obtained from (Data Structure/variables).
Correctness Proof:
- [Data Structure] maintains and keeps track of…
- For each element being processed:
- When [case 1/2/3], we perform operations. At this point, [Data Structure] has [some property].
- The algorithm terminates when… This is correct as: Progress is made with each operation (index moved/stack updated). Each element is processed exactly once.
- Completeness: No elements are lost in the process. For any arbitrary element k, there are m cases:
- Case 1: If [some condition holds].
- Case 2…
We solve all of them in our algorithm, thus our algorithm is complete and correct. (Notice edge cases and invariant maintained if applicable)
Time Complexity Analysis
O(log N)
Consider only one branch + combine in O(1) (e.g., binary search). Note for case 2: If f(n) = Θ(n(logb a) * logk n), for k ≥ 0, then T(n) = Θ(n(logb a) * log(k+1) n). This is a formal way to express it. No need to find ε when using the master theorem, no need to consider the power/classification/addition/multiplication in the return result, we only focus on time!
O(N)
It is possible to traverse all branches, but it must be O(1). Otherwise, it can only traverse one branch, and it can be O(N). Note: If f(n) is given as big-O, you can only conclude T(n) as big-O (not Θ).
O(N log N)
It is possible to traverse all branches, and it must be O(N). Note the control of tree height is log N. Divide and then linearity combines. (e.g., Merge Sort). Note in division/subset, the special attention should be paid to the situation in the middle.
O(N log2 N)
One of the two is combined after the division. It is necessary to sort one path. Usually, it is associated with “the k-th smallest/largest”. Most sorting algorithms are O(N log N) (e.g., merge sort, heap sort, quick sort).
Example: Design
If an element is a subset to be merged, “take a mid-connected left part” + “take a mid+1 started right part”. The function finds the maximum sum of a subarray that crosses the midpoint. It adds elements it encountered from the midpoint leftward, keeping track of the maximum sum, thus finding the optimal left portion ending at the midpoint. Similarly, it accumulates elements rightward from the position after the midpoint, tracking the maximum sum, and so forth. The function then combines these maximum sums from both sides to determine the largest possible sum of any subarray crossing the midpoint.
Proof: We find the maximum sum ending at mid for the left side, and at mid+1 for the right side. Their sum must be the maximum crossing sum because omitting any elements would break contiguity.
Divide and Conquer
Design:
- Preprocessing: Set invariant/sort.
- Base case: n = … It is clear that…
- Divide: Divide the given Data Structure in 2/4 halves/parts.
- Conquer: Some operations needed (e.g., add helper elements (like L-tile), choose pivot (like quick sort), sort/organize first, pad/normalize data (like matrix)). Ensure each subproblem has the same properties as the original. We recursed on the halves separately until we reach the base case.
- Combine: Return the value/Data Structure from recursive steps, which is what we need (do operations (compare/add/sort) if needed).
Correctness:
- Base case: (n = …) The algorithm correctly returns…
- Induction case: (n > …)
- Suppose the algorithm is correct for all Data Structures with size = k, k ≤ n.
- Let T be a Data Structure of size n+1. Since if we divide it into 2 halves, the size of the halves would be…
Greedy Algorithm
Design: Consider the following greedy algorithm. Let the people/sample be numbered 1, …, n, and let si, bi, fi denote…, first sorting by… so that we have f1 ≤ … ≤ fn. We claim that this order optimizes …(what we want to return). If needed, we could have O = {o1, o2, …, ok} here, X is the same.
Proof: We prove this by an exchange argument. Let X be the solution produced by the greedy algorithm, and let O be an optimal solution. If X and O are identical, then we are done. Suppose O ≠ X, then the optimal solution must contain (something different from X). Thus, we assume that X and O agree on the first r time steps but run different jobs on the (r+1)th time step.
Do the swap: No more cost/remain valid. (e.g., We will call such a pair (i, j) an inversion. Consider the solution obtained by swapping the orders of i and j. In this swapped solution, compare the time and express remain valid).
Hence, our swapped Data Structure/result does not have a greater cost time, and so it too is optimal. [Alternative: (If results in a solution with fewer costs) But this contradicts the optimality of S*. Thus, we conclude that S = S* and that greedy is optimal.] Continuing in this way, we can eliminate all inversions without increasing the completion time. At the end of this process, we will have a (what we return) produced by our algorithm, whose cost time is no greater than that of the original optimal solution we considered. Thus, the …(result) produced by our algorithm must also be optimal.
Runtime: Sorting the jobs (using e.g., mergesort) takes Θ(n log n) time. At each iteration, doing a comparison and adding to a stack takes Θ(1) time, and we do a total of n iterations. Outputting S takes Θ(n) time. Hence, the total running time is dominated by the sorting step and is Θ(n log n).