The multiplication is possibly the most important of the 4 basic arithmetic operators in a big integer library. This is because it is the one most likely to work on and generate truly big numbers.
Development-wise, it is an interesting operation because of its time complexity.
The case to consider is the multiplication of 2 numbers
In practice, operands rarely are of the exact same size but algorithms can easily cope with any difference and the theoretical considerations can be solved by filling the smaller number with leading zeroes.
With
-
Long multiplication:
This is how we learn multiplications in school, by taking each pair of digits from both operands, and multiply them together, with a carry.
The computational complexity is$\text{Θ}(l^2)$ .
The other algorithms (addition, subtraction, ...) having a$\text{O}(l)$ complexity, they are practically free in comparison.The demonstration that better algorithms exist has triggered a decades-long effort by mathematicians to improve performance further and further. This resulted in several algorithms, each better than the previous and suitable for increasingly big numbers.
-
Karatsuba:
Each time$l$ doubles, the Karatsuba algorithm trades 1 multiplication for several additions/subtractions, thus making it 3 multiplications instead of 4. The idea is that when$l$ is big enough, the cost of the additional operations is less than what the time saved by skipping the fourth multiplication. As a result, it is not suitable for multiplying (relatively) small numbers.
The complexity is$\text{Θ}(l^{\log_2(3)}) = \text{O}(l^{1.585})$ . -
Toom-Cook algorithms:
The Toom-Cook algorithms form a family of divide-and-conquer algorithms similar to Karatsuba, except for the fact they are not limited to splitting operands in halves. The smaller the parts, the more multiplications it can trade against simpler operations. For this reason, each Toom-Cook algorithm needs longer and longer numbers to make the trade worth it.
The complexity for Toom-3 is$\text{Θ}(l^{\log_3(5)}) = \text{O}(l^{1.465})$ . -
Schönhage–Strassen:
This method is not implemented yet.
This is yet another algorithm more efficient that the ones above, but this time, applicable for large numbers only, with millions of digits.
The multiplication function compares the size of the operands passed to it with hardcoded threshold, which are the best guess as what values give the shortest calculation time:
- If
$\text{size}(a) > \text{threshold}_\text{Schönhage–Strassen}$
→ return the product calculated using the Schönhage–Strassen algorithm. - If
$\text{size}(a) > \text{threshold}_\text{Toom-3}$
→ return the product calculated using the Toom-3 algorithm. - If
$\text{size}(a) > \text{threshold}_\text{Karatsuba}$
→ return the product calculated using the Karatsuba algorithm. - Otherwise
→ return the product calculated using the long multiplication algorithm.
This is the same approach as every other library that implement multiplication through several algorithms, the only differences being what algorithms were implemented, small details inside them and the thresholds to decide which one to run.
The long multiplication algorithm works on the principle that:
The implementation is straight-forward. Indexing limbs of operands starting at index 0:
- Initialize the variable
$p$ to store the product, so that$2 l = \text{size}(p) = \text{size}(a) + \text{size}(b)$ , with all its limbs initialized as$0$ . - Start a for-loop:
$i$ from$0$ to$\text{size}(a) - 1$ - Initialize a variable to store the carry modulo
$2^{32}$ :$c \leftarrow 0$ . - Start a nested for-loop:
$j$ from$0$ to$\text{size}(a) - 1$ .
Do$p_{i+j} \leftarrow \big(p_{i+j} + a_i \times b_j + c\big) \bmod 2^{32}$ and$c \leftarrow \lfloor {\big(p_{i+j} + a_i \times b_j + c\big)} / {2^{32}} \rfloor$ .
In practice, these 2 operations are done by doing$\big(\text{result}_{i+j} + a_i \times b_j + c\big)$ into a 64-bit integer1, then splitting the integer into$\text{result}$ and$c$ . - At the end of the outer loop, a carry may remain. Do
$p_{2l-1} \leftarrow c$ . - Trim
$p$ if the carry was 0.
Karatsuba, from the name of its inventor, is a divide-and-conquer algorithm. Its principle run the multiplication as:
For convenience, the 4 sums, i.e. substrings of the integers' limbs, appearing in the above formula will be noted with Greek letters:
Although it looks like this just made the formula longer and more complicated, hence longer to calculate, the key to the algorithm is about being able to reuse
By doing so, we allow the Karatsuba algorithm to execute in 3 multiplications what the long multiplication would do in 4.
This 1 operation being optimized away at every step is a big deal: if we were to assume that at a given size
Size | Long multiplication | Cost of Karatsuba | Ratio |
---|---|---|---|
1s | 1 | 5 | 1:5 |
2s | 4 | 15 | 1:3.75 |
4s | 16 | 45 | 1:2.81 |
8s | 64 | 135 | 1:2.11 |
16s | 256 | 405 | 1:1.58 |
32s | 1,024 | 1,215 | 1:1.19 |
64s | 4,096 | 3,645 | 1.12:1 |
128s | 16,384 | 10,935 | 1.50:1 |
256s | 65,536 | 32,805 | 2.00:1 |
512s | 262,144 | 98,415 | 2.66:1 |
1024s | 1,048,576 | 295,245 | 3.55:1 |
The Karatsuba algorithm consists in:
- Initialize the variable
$p$ to store the product, so that$2l = \text{size}(p) = \text{size}(a) + \text{size}(b)$ , with all its limbs initialized as$0$ . - Split
$a$ a,d$b$ at the limb indexed$k = \lceil l /2 \rceil$ .
This allows to obtain the subparts of$a$ ,$b$ and$p$ as:-
$\alpha_0$ and$\alpha_1$ , -
$\beta_0$ and$\beta_1$ , -
$\psi_0$ ,$\psi_1$ and$\psi_2$ .
Note that$\psi_0$ and$\psi_2$ do not overlap in$p$ . However,$\psi_1$ does overlap with the other two in memory.
-
- Do
$\psi_0 \leftarrow \alpha_0 \times \beta_0$ , by a recursive call to the multiplication algorithm. - Do
$\psi_2 \leftarrow \alpha_1 \times \beta_1$ the same way. - Initialiée a variable
$p_\text{mid}$ , of size$l$ . - Do
$p_\text{mid} \leftarrow \big( \alpha_0 + \alpha_1 \big) \times \big(\beta_0 + \beta_1 \big) - \alpha_0 \times \beta_0 - \alpha_1 \times \beta_1$ . - Finally, do
$\psi_1 \leftarrow \psi_1 + p_\text{mid}$ .
Toom-Cook is a divide-and-conquer algorithm (rather, an infinite family of algorithms) named after its two discoverers: Andrei Toom who constructed a method for multiplication with and Stephen Cook who turned it in an algorithm. The method produces an infinite family of algorithms.
Just like Karatsuba, Toom-Cook swaps some of the required multiplications for a finite number of simpler operations, all of which are
The more the operands are broken down (Toom-2 is equivalent to Karatsuba, Toom-3 breaks them down into 3, Toom-4 breaks them down into 4, etc), the better the complexity. However, this also comes at the cost of having more and more of the simpler operations. Hence, doing so requires increasingly bigger numbers to make the trade worth it.
The complexity of Toom-Cook, when breaking down operands of length
Toom-k | Complexity | Approximate |
---|---|---|
2 | ||
4 | ||
5 | ||
6 | ||
... | ... | ... |
10 | ||
20 | ||
50 | ||
100 |
Because of the diminishing returns, Toom-Cook algorithms are not implemented for large values of
For the sake of comparison, when the size of the operands is multiplied by 4, it generates:
- for the long division, 16 times as many multiplications.
- for Karatsuba, 9 times as many multiplications.
- for Toom-4, 7 times as many multiplications (not counting the multiplications by small constants due to their
$\text{O}(l)$ complexity as discussed earlier).
To understand Toom-Cook algorithms, we have to express big integers as the result of polynomials. Since big integers are encoded as limbs, it looks natural to express
This is not a polynomial expression we can use here, though. Toom-k splits operands in
The product
Obviously, we are not doing it to just calculate the direct product of
Thankfully, polynomial interpolation allows to calculate a polynomial of degree
We will do exactly that with easy-to-calculate points.
The chosen points are 0 and numbers in the form of
For Toom-3, we need 5 points, that we actually calculate:
We have now replaced the product of 2 operands of length
Remembering that
We can solve it and deduce the values
Well... no and while it won't improve the complexity, we can slightly simplify the above equations.
The clue is to note that when
The asymptotic behavior of
This is much simpler than the above equations and gives
Now, we can replace one of the points chosen above by
Which we solve into the following system, written in a way that makes it as convenient as possible for a computer:
If it was
Note that if you were to finish solving the equation system (i.e. get rid of the
This matrix product is how the solution to Toom-3 is presented on Wikipedia but it is not completely optimal in terms of number of operations to execute.
As big integers are coded with limbs in base
For clarity's sake, the illustration is done with numbers under 109, in a base 10k (convenient for human beings), then in a base 2k (convenient for computers).
Since
We calculate the 5 points
Applying the formulas we obtained from the linear equations, we deduce that
Since
We calculate the 5 points
Applying the formulas we obtained from the linear equations, we deduce that
The sum of the coefficients, after shifting each by a multiple of 10 bits gives the program: