You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The only place where it is a power of 10 is used is that powers of 10 are stored in the precomputed cache. By allowing more general integers other than $10^{\kappa}$, we may further reduce the chance of having to investigate the fractional part, thus boosting performance. It may also opens up a possibility of eliminating the needs for treating the shorter interval case specially.
The text was updated successfully, but these errors were encountered:
An alternative would be to use $2^{q-p-2}$ as the multiplier, i.e., $128$ for binary32 and $1024$ for binary64. This has an advantage of simplifying the division and divisibility check by the multiplier, but it reduces the error margin for the approximate multiplication by $10^{k}$ into very small value, which has a serious impact on compressed cache handling.
Currently, clang translates this division + divisibility check into imul + movzx + shr + cmp. If we use $2^{q-p-2}$, then this would become test + sete + shr.
Another alternative would be to consider a mixed power of $2$ and $5$, but it doesn't sound that worth the trouble of rewriting a lot of the paper/implementation.
On the other hand possible elimination of the shorter interval branch is worth investigating, but it seems like a completely independent issue.
The only place where it is a power of 10 is used is that powers of 10 are stored in the precomputed cache. By allowing more general integers other than$10^{\kappa}$ , we may further reduce the chance of having to investigate the fractional part, thus boosting performance. It may also opens up a possibility of eliminating the needs for treating the shorter interval case specially.
The text was updated successfully, but these errors were encountered: