## Arrays - Two pointers - Two Sum, Two Diff

### Question1 - Two Diff : unsorted input

Two difference, find all pairs(number of indexes pairs) values difference (abs(A[i] - A[j])) to a target, where target > 0, with/without duplicates? what if the array is unsorted?

The same as two sum, use a HashSet to store the visited value, converted to a problem of searching an exact value

1. Go through the input array
2. We are trying to find A[i] - A[j] == target || A[j] - A[i] == target, A[j] : visited element, A[i] : current element)
3. so A[j] = (A[i] - target || A[i] + target)

Time: O(n), Space: O(n)

### Follow-up: Sorted input

What if the array is sorted?

Also like two sum, using two pointers i and j, both start at the beginning of input, make sure j >= i, the difference can be found by advancing either A[i] or A[j], because it's sorted, we can use its property of monotonicity to guarantee that we can find all the answers.

1. A[j] - A[i] is the difference that we will search
2. If the diff is greater than target, then we decrease it by advancing i
3. Else the diff is smaller than target, then we increase it by advancing j
4. Go through the input list and stops when j reach the end.

Time: O(n), Space: O(1)

### Question2 - Find all A + B + C = D

Given an int array, find if(all) there are 4 elements in the array such that A + B + C = D. (all positive numbers) Sorted Array.

D is the largest one → three sum

We fix D, and we converted it to a 3-Sum problem (A + B + C = constant?). We try all the possible D from the 4th element to the end element.

Time: O(n^3), Space: O(1)

When we have multiple variables, we can try to fix some variables, so that we can convert it to a problem we know how to solve.

### Variant - Find all A + B = C + D

Given an int array, find if (all) there are 4 elements in the array such that A + B = C + D.

Sorted array

We fix A and B, so that it is converted to a two sum problem, C + D = target ?.

{1, 2, 3, 4}

A C D B

For a sorted array, A, B must be the largest and smallest, C, D are in the middle, so that we can find a two sum equals to A + B.

Time: O(n^3), Space: O(1)

Why it works? Proof : Let's say C < D and A < B, naively there are 6 cases: 1. A < C < D < B 2. C < A < B < D 3. A < B < C < D 4. C < D < A < B 5. A < C < B < D 6. C < A < D < B

Cases 3 ~ 6 can be easily proved that either A + B > C + D or A + B < C + D by the monotonicity of addition (E.g. A < C && B < D --> A + B < C + D).

Cases 1 ~ 2 are the same, we just logically split it to two cases, they are actually the same one relationship of four random numbers, we just choose this way (A < C < D < B) to traverse the array.

### Question3 - Two Diff from two arrays

Given two sorted array A and B, if there exist two numbers a from A and b from B, such that |a-b| = target.

Simplify the goal by case analysis. The key is to decompose the absolute operation

Two cases: Suppose A[i] > B[j] --> do a two diff on A[i] - B[j] Suppose A[i] < B[j] --> do a two diff on B[j] - A[i]

Two pointers walk through two array, two times, two cases.

E.g. A[]: 1 3 4 5 9 11, B[]: 2 3 4 6 7 8 10

1 2 3 3 4 4 5 6 7 8 9 10 11

A B A B A B A B B B A B A

i

j

Time: O(m + n), Space: O(1)

### Question4 - Find the minimum of |x-y| + |y-z| + |z-x|

3 sorted arrays A, B, C, from each of the arrays pick one element, x from A, y from B, z from C, what is the minimum |x-y| + |y-z| + |z-x|.

One way to think about |x-y| is the distance of x and y on an axis.

Find the minimum of |x-y| + |y-z| + |z -x| is to find the *closest three points on an axis* from A, B, C.

Basically the same as two diff. 1. Use three pointers start at the beginning of each Array. 2. Each time we advance the pointer that has the smallest value, update the global Minimum at the same time.

Time: O(n), Space: O(1)

### Question5 - Pick three elements in an array to form a triangle

Given an integer array with all positive integer values. Can we pick three elements in the array as the lengths of edges, to construct any triangle?

Triangle - three edges a, b, c, must apply:

a + b > c && b + c > a && c + a > b

What if a, b, c is sorted? → a <= b <= c, a + b > c

If we have a sorted array of positive integers, can we pick three consecutive elements in the array as a, b and c, such that a + b < c?

Time: O(nlogn), Space: O(1)

### Follow-up: How many triangles can we get from the array?

Convert it → How many pairs(a, b) of values in an integer array such that a + b > target?

Two sum problem

1. Sort the input array
2. For a + b > c, we first fix c, try different c form A to A[n - 1]
3. For each c = A[i], we convert it to the problem of finding how many pairs of (a, b) such that a + b > c
4. Using two pointers l and r run towards each other, l start from 0, r start from i - 1. (Two Sum)
5. Two cases while running,
1. A[l] + A[r] > A[i], in which case we know there are r - l pairs valid, advance r, decrease the sum
2. A[l] + A[r] <= A[i], invalid, advance l to increase the sum

Time: O(n * n), Space: O(1)

## Arrays - Precomputation sums

### Question1 - Subarray Sum, exists

Given integer array, find the subarray, the sum = target. Does it exist?

input: int[] A, int target.

Basically it's trading space for time: Use a HashSet to store precomputed sums[0...j], and any subarray sum can be computed by: sum[0...i] - sum[0...j] = sum[j...i] =? target

1. Calculate sum[0...j] first
2. Any subarray sum[j...i] can be calculated by sum[0...i] - sum[0...j]

Time:O(n), Space:O(n)

### Follow-up 1: How many subarrays

How many subarrays equals the target? 1. The question is converted to count all the pairs (i, j) in the precomputed sums[] that sums[i] - sums[j] = target 2. We still should use a HashTable/Set to boost look up time to O(1). 3. Compare to Q1, we just use a HashMap instead of HashSet, use its value to store the count of visited sums.

Time: O(n), Space: O(n)

### Follow-up 2: Closest to a 0

Find the subarray sum, that the sum is closest to a 0.

input: int[] A, int target

• This time Hash won't work cause it can only lookup exact (=) values, can't do > or < value search.
• subarray closest to 0 ---> A[i] - A[j] closest to 0 ---> A[i] and A[j] are closest.
• Whenever we see closest, <, >, distance etc., we should think about sorting or sorted data structure, because it will make things easier.
1. Use an array int sums[] to store all the precomputed sums[0...i].
2. Then it's converted to find the smallest difference of two consecutive values in an array:
1. Sort the sums array, keep a global min difference
2. Go through the sorted array sums[], compare adjacent value, calculate the difference, update global Diff.

Time: O(nlogn), Space: O(n)

### Question2 - Exist duplicate within k indices

Given an array，check if there are duplicates elements within k indices，return true，else return false.

Within k indices ---> maintain a K size window, use a HashSet to store the values in the window

Time: O(n), Space: O(k)

### Follow-up - Exist a | difference | of two numbers < T within K indices

Check if there are two elements that their |difference| is less than T within K indices.

#### Ideas

• less than T ---> HashSet/Table won't work
• within K indices ---> sliding window
• abs(difference of two numbers) ---> classic two pointers way to solve two-diff won't work because it's sliding window and not sorted.
• abs(difference of two numbers) ---> can be viewed as distance, which means the distance of two points within K indices which < T.

#### Method 1 --- BST Window

Because we are looking for abs(difference) < T, it's a range search for any difference < T basically.

We can do it much easier if it is sorted, and because it's a sliding window, it's hard to sort it in place. Therefore, we can maintain a sorted window of K elements using BST.

Every time we insert a new element, we use BST to search the closest number to the new element in the window.

BST has two related functions : - ceiling(K key), closest key equal or larger - floor(K key), closest key equal or smaller

Time: O(nlogk), Space: O(k)

#### Method 2 --- Hash Bucket Window

What is possible to optimize and what we want to optimize:

Time Complexity: O(n * logk) → optimize to O(n)

Space Complexity: O(k)

Optimize from O(n * logk) ---> O(n), in which case BST can't be used, and possible data structure should be Hash related.

We use T as the bucket size, and hash each value to different bucket.

E.g. T = 3:

Bucket: (Key) 0 1 2 ...
Value range: (Value) 0 ~ 2 3 ~ 5 6 ~ 8 ...

In this case, we only need to check the bucket contains the value and the two neighbor buckets.

1. Use a HashMap from Bucket Number to one of the values. There won't be two values at the same bucket, because if so it must have already satisfied this condition; abs(difference) < T, and returned)
2. Maintain such a window, whenever insert a new element, check its bucket and its neighbor buckets, they are the only possible numbers that could have the difference < T.

We can use this kind of method (Hash Value to Bucket, same range in the same bucket) when we see something like this ** abs(two number's difference) < T **

## Priority Queue

### Basics

#### When to use?

1. kth smallest/largest, smallest/largest k
2. min/max, median
3. Best first search

#### Properties

• O(1) get min/max
• O(logn) insert
• O(logn) remove min/max
• O(n) search for value maintain an auxiliary HashMap, can make search(value) to O(1) time Below are not in the default APIs of Java/C++:
• update(index)
• update(value)
• remove(index)
• remove(value)

#### Implementation (必考)

• complete binary tree
• array

Key functions

• percolateUp
• percolateDown
• heapify
• poll
• offer
• peek

#### Heap sort

• Time: O(n) best case, O(nlogn) worst case, Space: O(1)
• Not stable

#### Compare with Balanced Binary Search Tree

function Priority Queue BST
peekMax minHeap --> O(n)
maxHeap --> O(1)
O(logn)
peekMin minHeap --> O(1)
maxHeap --> O(n)
O(logn)
insert O(logn) O(logn)
search O(n) O(logn)
update by index --> O(logn)
by value --> O(search)
O(logn)
delete min minHeap --> O(logn)
maxHeap --> O(n)
O(logn)
delete max minHeap --> O(n)
maxHeap --> O(logn)
O(logn)
delete by index --> O(logn)
by value --> O(search)
O(logn)

There will be a lot of scenarios that using PriorityQueue and using BST is interchangeable.

##### The benefits of Priority Queue:
• get max/min is O(1), but get min/max is O(n)
##### The benefits of BST:
• update / delete arbitrary value is O(logn)
• floor() / ceiling is amortized O(1) (given the current node)
• lookup is O(logn)

### Top K problems

• k-th smallest/largest element in unsorted array.
• k smallest/largest elements in unsorted array.
• the k points in 3-d space closest to point of (0, 0, 0)
• the most k frequent visited urls

E.g. k-th smallest element in an unsorted array.

#### Method 1 - Heapify a minHeap + poll k times

• O(n + klogn), good for big n small k
• Need to read all the data in memory to do heapify, not suitable for large data

#### Method 2 - Build a maxHeap of size n + poll() max once

• O(nlogk), good for small n big k
• It's a good external algorithm, only need O(k) memory
• Good for streaming data, easy to be distributed

#### Method 3 - Quick select

• Average O(n), Worst case O(n^2)
• Not a good external algorithm as well, need to read all data in

### Q1.Find the k-th smallest sum from k arrays

K sorted array, each has size n >= 1, pick one element from each of them to build a sum, find the kth smallest sum.

### Q2.Median of stream data flow at any time

maintain two Heaps, one minHeap, one maxHeap.

x% percentile of stream data flow

It's a more general form of median of stream. (median is 50% percentile.)

### Q3.To implement some ID poll

implement a pool of integers (IDs) range from [0, 10000], provides APIs: - int checkOutMinID() : retrieve the smallest available ID, the ID will be removed from the pool. - bool returnID(int) : return the ID to the pool so that it can be available again. - bool checkOutID(int) : check out a specific ID, return false if it is not available.

Solution: minHeap + HashMap or TreeSet (Balanced BST)

Heap + HashMap:

maintain a minHeap, when initialization, put all available IDs into the minHeap.

• checkOutMinID() --> minHeap.poll(); Time: O(logn)
• returnID(int) --> minHeap.offer(int id); Time: O(logn)

and a int[] idMap (or HashMap), map from Integer as ID to Integer as Index of the heap element.

• checkOutID(int id) --> minHeap.deleteAtIndex(idMap[id]), delete both in Map and Heap . Time: O(logn)

E.g. Code example:

Details of the implementation by TreeSet is much easier. (ignored)

### Q4.To implement a data structure that can insert(), median, delete()

• void insert(int) --> insert a new element
• int median() --> get the median of all the inserted elements
• void delete(int) --> delete one element from it

Heap(Array) + HashMap --> O(logn). much complex when involving delete(), need to implement delete() yourself to achieve O(logn)

TreeMap --> delete() --> O(logn), median --> O(logn).

Solution: Sliding window of median numbers

Only need one TreeSet

1. initialize the TreeSet with the first k elements
2. Find the initial median
3. Traverse the array, each step, add right element then remove left element; The tricky part:
• If the element added and the element removed are on the same side of the median, median won't change, remain the same.
• If the element added < median and element removed > median --> median is the left neighbor node, TreeSet.prev(median).
• If the element added > median and element removed < median --> median is the right neighbor node, TreeSet.next(median).

#### Huffman Encoding Huffman_coding_example.svg.png

The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority:

1. Create a leaf node for each symbol and add it to the priority queue.
2. While there is more than one node in the queue:
1. Remove the two nodes of highest priority (lowest probability) from the queue
2. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
3. Add the new node to the queue.
3. The remaining node is the root node and the tree is complete.

Since efficient priority queue data structures require O(log n) time per insertion, and a tree with n leaves has 2n−1 nodes, this algorithm operates in O(n log n) time, where n is the number of symbols.

Symbol Code
a1 0
a2 10
a3 110
a4 111

## Single Number & Swap

### Q1.single number exists once, other numbers all exist twice?

#### Method 1 - HashSet, add & remove:

Using HashSet, add to the HashSet if the HashSet not contains the value, if the HashSet already contains the value, remove it from the HashSet. The remaining element will be the answer.

Time: O(n), Space: O(n)

#### Method 2 - bit manipulation, xor

XOR, since num1 ^ num1 = 0, traverse the array and do XOR.

Time: O(n), Space O(1)

#### Method 3 - count 1s of every bit

record number of 1s for each bit, for each bit count, count = count % 2: - if the bit count is odd, then --> 1 - if the bit count is even, then --> 0

Time: O(32n), Space: O(32)

### Follow-up 2, two numbers exist once, other numbers all exist twice

#### better solution: variant of Method 2

Suppose the missing two numbers are x and y, and x != y
1.xor all = x ^ y
2.find the first bit where they differ
3.use that bit to partition the list to two lists
partition1: { all the numbers whose kth bit is 0 } --> contains x
partition2: { all the numbers whose kth bit is 1 } --> contains y
4.x and y must lies in different list
5.xor each list, we will get the result.

### Follow-up 3, one number exists once, all other numbers exist three times

#### Best solution: variant of Method 2,

Like Method 2 ---> how to let every three "1"s to be 0 ?

Two bits to record how many "1"s have seen so far.

00 --> 01 --> 10 --> 11 <=> 0

Use two integer variables to record the first bit and the second bit.

0 0

two one

For Variable One:

one + next value

0 + 0 = 0

0 + 1 = 1

1 + 0 = 1

1 + 1 = 0

==> one ^ value = next one

For Variable Two, it depends on One: 1. one == 1 && val == 1 --> two = 1 2. two == 1 --> two = 1

### Q2.Given an array contains n - 1 integers, only one number is missing from [1, n].

Example: {3, 1, 5, 4}, n = 5, return 2

#### Method 2 - Swap - a[i] = i + 1

{3, 1, 5, 4} --> {1, 5, 3, 4}

### Q3.Given a value array with its corresponding position array, swap back the value array.

Example:

value = {1, 2, 3}, pos = {2, 1, 0} --> {3, 2, 1}

Solution: swap

Traverse the array:

if pos[i] != i, then swap value[] and pos[] at the same time and stay put

else continue

## Byte Array/Streams

### Q1.Is valid UTF-8

Determine if an byte array is valid UTF-8 string? utf8-encoding.png

Be careful corner cases:
1.the first byte is invalid
2.the rest of the bytes is short
3.one of the rest of the bytes is invalid

The difficult part is the last read

Solution --> Cases analysis:

1. readLen == 4096, k < 4096

### Follow-up 1 : Repeatedly called read()

Now the read() can be called repeatedly for the same file

Solution:

Case analysis: 1. k <= rest --> read k bytes, move buff4k to left k bytes, rest -= k 2. k > rest --> read all rest bytes in buff4k, at last read, there are several cases to handle: 1. when copy buff4k to buffer, the copy length = min(readSize, k - len) 2. when the bytes needed (k - len) is less than the number of bytes has read in buff4k[] (readSize), we also need to move the rest of bytes that are not read in buff4k to the beginning and update rest counter. 3. the readSize < 4096, has reach the end of the file

### Q3.Parsing message format

[ msg ]  [msg] ....  [msg]

byte array for a stream of messages, the format is every 4 bytes determine the length of the message in byte, followed by the number of length of messages.

## Dynamic Programming

### What kind of problems to solve?

• min/max,
• sum,
• yes/no,
• all results,
• ....

The problem can be easily separated into small steps with the same problem description.

A big hint: if the problem can be resolved by recursion, each step in recursion depends on one or several previous smaller steps’ result.

Another big hint: if you can not find a reasonably better solution than brute force/recursion.

本质

Dynamic Programming is nothing special than the brute force solution, essentially it just does the memorization of the partial solutions so that later we can reuse the partial solutions instead of computing them again. The key is trading time complexity with space complexity for the optimization.

Key Points:

State: the partial solution for the same problem with smaller size.

What does the state mean?

state[i] / state[i][j] / .....

How to get the state?

problem patterns, similar typical problems(已经总结好的常见类型)

try out each of the most frequently used ones.

try to find the brute force solution first, and find which part is duplicated and can be optimized.

Function (Induction rule):

how to use the solution of the problem with smaller size to solve the problem with larger size.

Hint: the function is usually how the recursion is done.

Function sometimes is not “identical”, and usually it is determined along with how to define the state, for the same problem we could have multiple possible states and functions definition, and there might be one among all of these which is the most optimal.

Base case: what are the initial states to start?

Hint: in recursion, there should be a termination state.

How the answer is related to one/all of the states.

Order of filling the solution:

how the Function is defined, what are the cells state[i][j] depends on.

Optimization:

Time: usually depends on how your state is chosen.

Space: identify the duplicated spaces needed.

### Q1.Steal money on street problem

There are several rooms and in each of the rooms there is some amount of money (non-negative).

There is an alert system such that if we steal the money from two adjacent rooms, the alert will fire. What is the max amount of money we can safely have? (You can only steal non-adjacent rooms)

Solution

State: M[i] : the max amount of money we have steal from A[0 ...i], no guarantee steal A[i].

Base case: M = A

Induction rule: M[i] = max(M[i - 1], M[i - 2] + A[i])

Optimal, Time: O(n), Space: O(1)

### Q2.Determine last bit of byte stream

Given a byte array, which is an encoding of characters. Here is the rule:

1. If the first bit of a byte is 0, that byte stands for a one-byte character
2. If the first bit of a byte is 1, that byte and its following byte together stand for a two-byte character

Now implement a function to decide if the last character is a one-byte character or a two-byte character

Solution: 1. if the last byte is 1 → two-byte character, if (A[n - 1] == 1) return false; 2. A[n - 1] == 0

State: m[i]: if the last byte can be one-byte character, for the subarray(i, A.length - 1)

Induction rule:

m[i] = A[i] == 1 → m[i + 2], A[i] == 0 → m[i +1]

Result: m

Base case: m[n - 1] = true, m[n - 2] = (A[n - 2] == 0)

Time : O(n) ,Space: O(n)

Optimization:

0 0 → result = m[i + 1] // break ealiear

1 0

0 1

1 1

### Q3.Longest Increasing Subsequence (Frequent)

Method - 1

Time: O(n^2), Space: O(n)

State: M[i] = max length of increasing subsequence of A[0...i], include i

Base case: M = 1

Induction Rule: M[i] = max(M[j]) + 1, for all j such that j < i && A[j] < A[i]

Method - 2

Optimal Time O(nlogn)

M[i]: {1, 2, 4} the smallest end element of all the increasing subsequence having length i.

### Follow-up1.Find the number of increasing subsequences

Time: O(n^2), Space: O(n)

State: M[i] = max number of increasing subsequence of A[0...i], include i

Induction Rule: M[i] = sum(M[j]), for j < i && A[j] < A[i]

### Follow-up2.Reconstruct the path

Time: O(n^2), Space: O(n)

Still the same M[i] as Q2, need one more pointer array:

backIndex[i]: the last index of the longest subsequence ended at index i.

### Variant1.2D Space, lines

Set of points in 2D space, how to find the largest subset of points such that, each of the lines conducted by any pair of points in the subset has positive slope?

Positive slope --> (y2 - y1) / (x2 - x1) > 0 --> (y2 > y1 && x2 > x1 ) || (y2 < y1 && x2 < x1)

Solution 1. sort the points by x coordinate 2. find the longest increasing subsequence of y coordinate

Variant1.1 - Squares: (the exact same solution)

Set of envelopes, each one has (length, width), we can put e1 into e2 iff e2.length > e1.length && e2.width > e1.width. What is the maximum number of envelopes we can put in one stack.

### Variant2.3D Boxes

Set of boxes, each one has (length, width, height), we want to put them as a stack, when we put none box b2 on another box b1, we need to guarantee that b2.length < b1.length && b2.width < b1.width.

What is the max height we can get?

Induction rule: M[i] = height[i] + max(M[j])

### Variant3.Max sum of non-overlap Intervals

We have a list of records, each record has (start, end, weight). find a subset of the records, such that there is no overlap, and the sum of weight is maximized.

Induction rule: M[i] = Max( M[k] ) + weight[i], k is any next non-overlap record

### Variant4.Minimum number of move operations needed to form a sorted array.

We have a permutation of number 1 - n, we can delete one number and insert it to any other position, if we want to do several such operations to make the permutation as sorted, what is the minimal number of operations we will need?

Max number of elements not needed to be moved --> Find the longest increasing subsequence

### Q4.Largest subarray sum

Optimal Time: O(n), Space: O(1)

State: M[i] = largest subarray sum of A[0, i], include A[i].

Induction rule: M[i] = max(A[i], M[i-1] + A[i]);

### Variant1.Largest submatrix sum

Flat the matrix, convert it to 1D array --> largest subarray sum

Time: O(m*m*n), Space: O(n)

### Variant3.Flip bit segment to get max 1s

Given an array has only 1 and 0, you can choose a segment[l, r], and flip the segment, how do you do it to get the max number of 1s?

Find the largest subarray that (## of 0) - (## of 1) is the largest View 0 as 1, 1 as -1 --> largest subarray sum

### Variant4.Diff of number of 0s and 1s

Given an array containing only 0 and 1s. Find the subarray, such that, the diff of number of 1, 0s are the largest.

Still → 0 =>1, 1 => -1 max(largest subarray sum, abs(smallest subarray sum))

### Variant5.Equals number of 0s and 1s

Given 1, 0 array, longest subarray contains equal number of 0 and 1.

Still → 0 =>1, 1 => -1 Longest subarray sum equals to 0 + pre-computation sum

### Q5 Longest Increasing Subarray

M[i] : longest increasing subarray end at i, include i

M = 1

M[i] = A[i] > A[i-1] ? M[i-1] + 1 : 1;

### Follow-up - Longest increasing subpath in binary tree

Path: form root to any leaf

3

1    6

2 5   4 1

3

Sub-Path output: 1 2 3

Time: O(n)

### Follow-up - Longest increasing path in 2d matrix

Input: 1 2 3 4 3 4 5 2 1 3 2 1

Output: 5 (1 2 3 4 5)

It's more general than above tree problem (tree is a special graph). DFS + Memorization

Time: O(mn)

### Q6 Shortest list of integers, sum of square to the target number

For example, target = 14

14 = 1^2 + 2^2 + 3^2, (1,2,3)

14 = 1^2 + 1^2 + 2^2 + 2^2 + 2^2 (1,1,2,2,2)

(1, 2, 3) is the shortest.

### Q7 Paint House

Minimize the cost of painting K houses, each house has different costs to paint in different colors, 2 houses (next to each other) cannot be painted in the same color.