This is a document for how I should approach any Competitive Programming Problem.
Read the problem carefully and draw out the sample case so that you fully understand it. Look at the constraints and note if I might encounter integer overflow and which angles I can attach the problem from.
WRITE DOWN ANYTHING YOU NOTICE, EVEN IF IT"S OBVIOUS.
Before going anywhere near a solution, first try to visualize or reword the problem:
- What time complexities will work here? Does a cubic work? Does a quadratic work? This will give you better ideas on what algorithms you should be evaluating to use.
- Whenever in despair, look for bounds. Some number that signifies something. What is the maximum number of moves. What is the maximum possible length that could appear for this string? This will give you more clues, to make it faster or maybe to use integer overflow to your advantage. Bounds are very important to starting a problem. Some bounds aren't shown as a number. For example, a problem might state that the only possible characters are A, B, or C. That hints that a cubic time might be possible (vague example).
- Can I solve the problem backwards? What does it mean to solve the problem backwards? Write down what that means.
- Can this be visualized as a graph? If so draw it out. If it looks like a DP problem, see if the DP transitions can be visualized as a graph. Is this problem about cycles? Can I reword this problem into a shortest path problem or a MST problem? What are the edge weights? What if I sort the edges and do Minimax or Maximin with a DSU data structure or an MST? Can I reduce the sorted edges in the MST (when there are too many edges, e.g. grid)? A cool trick is to run in square root of edges time because we know the indegree of each node is max the square root of the edges. Do I have to merge adjacency lists? Remember you can always customize your data structure, for example, DSU can be used to maintain properties on components that follow along, using path compression.
- Is the bounds for the coordinates too high? Try using coordinate compression so you can insert it into some range data structure or run a sweep line on it. Make sure before you run coordinate compression the affects it has on the algorithm.
- Can I sweep line? Try placing a pointer and shifting it from right to left, or left to right. Does this help me solve the problem? Can I find some relationship between each side or the dividing line and the left side? If it's a 2D problem, try sweeping from top to bottom, or left to right. Another way to solve Plane Sweep problems is keeping track of ys and xs coordinates in data structures and remembering only critical points to the problem. Identify each of these terms while solving. Can I reword this into something similar now? Can I make any inferences? Do I have to maintain two data structures, one for the left side and another for the right?
- Try dividing stuff into parts. Maybe I group the input into separate categories? When it's grouped I can search one and lookup something in the other.
- Sort the input, sort stuff. Whenever you are wondering, I need something bigger here, then sort your input. You can maintain your pointer on the sorted array on a separate variable and shift it along while you are doing the real stuff. This is a very common technique. Remember you can always write out complex comparators.
- Is this a dynamic programming problem? Try listing out states and transitions. Does it result in a MLE or TLE? If so, are there any states I can infer or drop? For example, I can infer that the current index is the number of set bits in bitmask DP. Bitmask DP also often requires other forms of precomputation to remove factors. Write down the type of DP I am dealing with. See if I have encountered this type of problem before. Is this bitmask DP, knapsack DP, grouping DP, or a DAG? Or something else? What solution space am I searching? Is it permutations, subsets, sequences, or something else? If it's a DAG problem, don't forget to use Topological Sort. Write down the recurrence and the states and their definition. Sometimes we might need a third array to store some cumulative data on the previous DP states. Don't be limited to one DP array. Don't cross out DP state ideas, just write down a new one and write why it's better. Would bottom-up be faster or should we do top-down? Remember the reverse of an array's LIS is its LDS. Whenever finding the longest x sequence, we can store the longest sequence with some last number of it. Remember that the minimum LIS's needed to cover an array is the LDS's size. Also, we can calculate LIS of ranges by calculating LIS of end points on sorted by start point. Can I add another factor to the time complexity by doing a sweep line on each state? When finding the number of ranges in a range, use the following formula:
$dp[i + 1][j] + dp[i][j - 1] - dp[i + 1][j - 1] + k$ where$k$ is the number of ranges starting at$i$ and ending at$j$ . You can also use Range DP techniques when there is a fixed starting point. Also, try for Range DP if you are selecting once on either side a DP run where$i$ and$j$ are instead the distance traveled on either side, if that doesn't work then fallback to Range DP. When searching for a DP state, think, position, weight, elements left to traverse, last element or row or column, etc. When trying to fill up a grid with cells also try using bitmasks if any of the dimensions are small enough. You can use the Open and Close Interval Trick whenever counting the number of ways to sum subsets up, instead of bitmasks on some problems. If the bounds are very large, you might be able to use Digit DP. When working with DP and expected value, if we an indegree larger than 1 for a DP state, then we have to multiply the EV added each time by the probability of reaching that state (see AtCoder DP J - Sushi). When you are having trouble with DP optimizations because of too many transitions, see if you can use prefix sums with a difference array or initialize a segment tree for the DP array. Usually, for these problems, implement a naive DP solution first then make inferences looking at the transitions. Another common technique for DP problems is isolating states by some parameter which has only one possibility for a state, however it is required for DP transitions for the dependency resolution. This can be handled by instead of creating an extra dimension on the array, we add that dimension and instead store some queue or stack for iterating over states. When evaluating the implementation, find out where you would like to draw a fine line between the number of transitions and the number of states. This will help you weight each of them out and find out the optimal time complexity. - Do I have to merge numbers? If so, Range DP might be reasonable because we can represent a merged subarray with some value in a range DP, while a non-merged subarray is -1. Remember to either do smaller subarrays first in Range DP. That's the easiest way to work around dependency conflicts. There are solutions that involve reverse loops, but they're not intuitive to understand. Just realized that reverse loop is several times faster than the size based loop. Instead of iterating from
$0$ to$n - 1$ for the size, iterate from$n - 1$ to$0$ for$i$ and$i + 1$ to$n - 1$ for$j$ and the dependency conflicts will be resolved for Range DP. Range DP techniques don't need to be used for an explicit range on an array. It can also be used in the context of a 2D array. Whenever checking for the still longing existence of something in an array, simply split the range into two with a gap of size 1 in between, this is a technique from Greedy Pie Eaters - USACO Platinum. - Can I use a greedy algorithm? Try to find a greedy algorithm. Can I solve this problem progressively, left to right? Are there some operations that are always the most optimal? Maybe Huffman Coding is related?
- Can I BSTA (Binary Search the Answer)? Look for some form of monotonic function. If the function is unimodal, then I might be able to do a ternary search. The function might not be apparent, try attacking the problem from different angles.
- Can I simply brute force this? I usually skip this step, but think up of a basic brute force attack to the problem. Even though it feels negligible it will help you understand the problem much better.
- Can I reword relationships into a mathematical formula? Some relationships in problems can be interpreted with a mathematical formula that can be rearranged. Is this a number theory problem? Can I find some relationship with the problem and LCM, GCD, or primes? Can I find some relationship with the problem and modular arithmetic, do I have to rearrange a modular congruence? Can I apply the Principle of Inclusion-Exclusion? Think about the Montmort Numbers problem. Although this is slightly rare, try searching OEIS (Online Encyclopedia of Integer Sequences) for a similar sequence. Also, try to rearrange equations into geometric sequences, arithmetic sequences, harmonic sequences, or some equation where static variables are separate so we can precompute them. Can we use combinatorics? If so what are the limitations? Write down an algorithm to validate an option in the search space. Define the search space. Can we make an observation related to divisibility? If so, we might be able to reduce the asymptotic complexity by a lot. Remember Expected Value rules from Errichto. Think something like:
$E(X) = P(\text{fail}) \cdot (E(x-1) + c_f) + P(\text{success}) \cdot c_s$ where$c_s$ and$c_f$ are the cost of success and failure respectively. Whenever solving number theory problems algebraically the key is to recognizing which parts of the equation MUST be an integer. When working with real numbers, you have to rearrange fractions to prevent loss of data when calculating, so use stuff like Pascal's Identity. Remember the contribution technique for combinatorics, consider edges, nodes, and elements and see how they contribute to the sum. If a combinatorics problem asks for counting the number of possible of something or the sum of all possibilities, first reduce the search space. Then, look for some way to construct a count in an optimized way. - Does this problem have a sliding window or two pointers? Can I reduce the amortized complexity given this? Maybe I can combine two pointers/sliding window with prefix/suffix sums or some other data structure? Could I incorporate Binary Search on top somewhere here?
- Is this a problem about offline queries or online queries? If this problem is offline, try to sort the queries by each of its parameters. Is there any way I can exploit each parameter? Maybe if one of the parameters is a node, then I can DFS and answer queries node by node. If it is some number with a reasonable size, I can answer queries for each number. If it is a online query problem, how can I reuse range query data structures? Can I use the range query data structure progressively, left to right, or up and down? Maybe I can answer half of the query at a time, go left to right first, then go right to left? Have multiple range query data structures? Maybe we can use range DP? For example, in Gold Farmer John Solves 3SUM, we have to use Range DP to expand triplets into ranges outside of it, this works like a 2D prefix sum so it is good to relate ranges to that. Remember Range DP problems sometimes require you to sweep line a range. Several range query data structures problems also involve inversions, so try to think of some usage of that where applicable. Sometimes Palindromic Range DP requires that you search through all k on the right such that
$A[i] = A[k]$ . Search for such conditions that you are looking for when you are forming a pull Range DP solution. - Is this a tree problem? In what way does finding the LCA help me? What about binary lifting? Is this some subtree queries problem? Then try the Euler's Tour array, maybe that can help? Try a combination of all of these techniques. Is this a DP on trees problem? Then try to find a way to progressively change the state for each subtree. Maybe we can run two DFS searches, one for handling subtrees and another for outside the subtree (DP On Trees w/ Several Roots). Another technique for counting the # of some category in a subtree is to timer search the tree, and increment the timer when it is entered and when it is exited and maintain an array for each category. Then, you can simply use
lower_bound
searches to find the # of each category between the in and out timestamps. Some DP on Trees problems require you to start from the top, while other require you to start from the bottom, it depends, try out both because one might have dependency conflicts or too slow. - Don't forget that Small-to-large mapping can make merging sets fast enough.
- Sometimes the problem needs some obvious clever rewording. If we are searching the permutations of a string, think of the frequency table of that string. If we are searching for the average, think the number that goes into the array and has the same sum.
- Use observations and come up with some constructive solution. Try writing a separate program or just on paper brute force inputs. It could give you some more observations to deal with. Also, sometimes the answer is simple, but you just have to use smart thinking based on your observations to speed up certain parts of the evaluations.
- Don't be afraid to write a separate program to evaluate the time complexity of an idea. It doesn't take that long usually.
- Is this a rectangle geometry problem? Draw it out. Recall the rectangle geometry formulas.
- Try visualizing the problem with geometry. Though this isn't common in USACO Gold, there are often problems that involve visualizing the problem with geometry. Take a look at CodeForces Diluc and Kaeya for an example.
- Try hashing the problem. If the problem asks for some form of repetition, we can use a polynomial hashing algorithm. Hashing algorithms don't need to be specifically for strings, we can also use them for other purposes such as paths. Another common usage of hashing is to hash a prime factorization in a number theory problem. Whenever you have some form of hashmap or inversion counting, think what if I just hashed this? Can I do that with low collision rates? What would the process be (speed/space)?
- Remember that pairs on maps is very slow. Instead try to hash the pair into a single number. Try something basic like
$a \cdot 10^9 + b$ . The chances of it colliding is very low. Remember that meet in the middle searching problems have a very narrow acceptance without TLE, so try using tricks like hashing pairs, usingLSOne
instead of iterating all bits, maintaining an array instead of hash table, backwards looping, etc. - Meet in the middle doesn't only apply to power of two time complexities. We can also apply meet in the middle as a methodology of thinking about a problem, for example, counting the # of increasing triplets, we can iterate each possible middle element of the triplet and get a
$O(n\log_2{n})$ solution using data structures. This can be generalized for any k-tuple by simply finding the middle and iterate over all possible options on the left and all possible options on right. This simplifies the time complexity from$O(n^k)$ to$O(n^{\lceil k/2 \rceil})$ . In general using meet in the middle as an algorithm is whenever we can solve the problem more easily with half N and we can join their results efficiently. - When storing some form of path, try to store it in a way that is easy to backtrack. For example, when solving LIS problems, don't store the indices, because otherwise the backtracking will intersect. Instead iterate over the array backwards given the max value to end at from the DP array. There is no problem in iterating all elements when retracing a path.
- Stack problems that require you to compute the something for each point in the array in
$O(n)$ time usually work by making observations about the relationship between elements in an array. - When trying to derive a value, search for what we should be looking for that will affect that value. Does the range change based on the extreme values? Does it change based on closest values? When we notice that there is a limited number of affecting values, then we can develop a solution that decreases the asymptotic complexity by dropping factors.
- When in despair, don't forget to look at the subtasks. In fact, this is one of the first places you should check out. It might involve splitting the problem into two separately solvable problems. You can also find some features of the problem that you weren't aware of, giving you better tactics to approach solving the problem.
- Write pattern scripts. Sometimes writing up a quick brute-force script for the first say 1000 N can show you some important pattern that will help you solve the problem.
- BFS -- Visit each node only ONCE
- DP -- Visit each state only ONCE
- Shortest path
- Dijkstra --
$\mathcal{O}(V+E\log{E})$ positive weights generally - Floyd-Warshall --
$\mathcal{O}(V^3)$ all pairs shortest path - Bellman-Ford --
$\mathcal{O}(VE)$ negative weights - 0/1 BFS --
$\mathcal{O}(V+E)$ shortest path with 0/1 weights
- Dijkstra --
- TopoSort -- Edges ordering or DP on DAG or cycle detection
- DSU -- Maintaining disjoint sets and their props
- MST -- Cost efficient way to connect all nodes
- Stacks -- Maintain only required info for comparison
- Sliding Window -- Maintain window of size
$K$ - PURS -- Dynamic data structure for range queries
- Euler Tour -- Subtree queries/updates
- DP on Trees -- Subproblem is a subtree
- String Hashing -- Hashing anything in general
- Hashmap -- Efficient hashmap
- Meet in the Middle -- method of solving problems by keeping middle or solve either side separately
Think of decomposition techniques:
- Sort - One property of a problem to look out for anywhere, from DP to Range Queries to Graph problems is whether the order of the input matters. If it doesn't, maybe there is something special you can do by sorting it. Sometimes you might even get the input sorted.
- Knapsack DP - If you don't find an optimal solution and have too many dimensions, you might be able to drop a dimension by using greedy techniques on a sorted array.
- DSU/Traversal Algorithm - Sometimes we need to sort the edges and use DSU on it.
- Range Queries - If we can sort the queries then we can do all sorts of things with the offline queries.
- Binary Search - Sometimes we are searching for the number of steps we have to take before x happens or we are simply searching for some occurrence of x. Always consider Binary Search when you run into such problems. Binary search is also useful for binary lifting and custom ordered statistics trees. Another example of using Binary Search is finding the right length for getting a string hash to match (several string hashing problems involve binary search).
- Math - Many problems require you to remanipulate equations or maybe even make use of binary search or some other algorithm while you are doing the math. Always consider other algorithms when you are in roadblocks in math problems. Another common usage of math in problems is modular mathematics.
- Shortest Path - Some problems require you to precompute the shortest path or APSP of the graph before moving onto the next portion of the problem.
- Range Queries
- Euler Tour - Several problems require you to do some form of range query on the Euler Tour in its normal form or the LCA form.
- Sweep Line - Sweep Line is a very common combination with problems. This can be combined with Dynamic Programming for 2D and 1D ranges, or with Coordinate compression, and many more problem paradigms.
- Prefix sums - Some problems require you to manipulate a prefix sum array or difference array 1D or 2D along with other algorithms. Some examples are range update point query Fenwick Trees. You have to calculate the difference array first before you can use the Fenwick Tree.
- Greedy - Several problems involve combining a greedy technique with other algorithms. Many a time, you have to make some greedy inferences to optimize a DP problem or get a better approach to the problem as a whole.
Given your above idea of the problem, and several observations that you made, come up with an algorithm that could work. Implement the algorithms. Maybe one won't work and you need to switch it to a different angle, but that is fine. Also, if you have the feeling that a program asymptotically could be slow but isn't so amortized, try writing the program. You will at least get some half credit.
Try using a faster hash table (gp_hash_table
has ~1.5x more memory and ~3x faster). Try dropping DP states. Try using a greedy solution instead. Try pruning the search space. Try using casting modular multiplication. Try changing the base or the modulus for string hashing. Try using a meet in the middle search. Or just completely change your approach. Use the multiplication trick for canceling modular inverses in geometric sequences. Sometimes try naive solutions even though they might not work. Many a time there is lots of complicated amortization involved that would be difficult to prove. Also, use bitsets when iterating over boolean arrays. Sometimes, it is best to just use a vector
as the key in the map
, it might be considered inefficient, but it might be enough to make the cut. Add pragmas if you really need to.
- Integer Overflow
- Check for uncommented debug/asserts
- Check for negative numbers
- Check for dirty data
- Walk through each line and check for above problems
- Remember that harmonic series makes the time complexity
$O(n\ln{n})$ - Remember to initialize answer variable with DP base cases
- Initialize all arrays with 0
- Check output format, don't add trailing spaces