-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: math/big: base62 and/or arbitrary alphabet and base support #21558
Comments
While it would be relatively simple to change That said, arbitrary alphabet support for printing alone would be trivial (but for the API footprint increase). However, parsing (reading) would require at least some validation of the alphabet (to avoid parsing ambiguities). It gets complicated quickly. A relatively easy change would be to increase MaxBase and expand the digits string with additional characters (and perhaps export it). I see that's what you have effectively done with your customized version of math/big. The result may not be using the desired alphabet, but it would be a linear pass over the result string to substitute characters as desired. I think we might be able to do the latter. It's a trivial change, backward-compatible (ignoring the change of the MaxBase constant value for which we could allow an exception), and get you 90% there. |
That's a fair point, I didn't realize that would break the Go 1 API. Based on what you described I don't think the 2nd point would be a very desirable change unless others chime in with their needs and use cases where it would help. I wouldn't completely mind maintaining a separate customized version of math/big that does support my arbitrary alphabet and radix use case. I like the idea of the linear pass for character substitution. That said, I do still feel the 1st point regarding the base62 "bug"/collision should be addressed with a CL. As you mentioned, the diff from the linked issue is a trivial and non-breaking change assuming the change in MaxBase from 36 to 62 is deemed acceptable. I can provide the change for review if the above is approved. |
Just to be clear: I am suggesting to simply increase the value of MaxBase to some higher value and internally expand the digits string. That's all. Regarding the extension of the digit string: I don't know that simply adding the upper-case letters is the correct approach. In general, upper- and lower-case letters in numbers are considered to have the same value, at least for bases up to 16 (e.g., 0xef == 0xEF). Thoughts? |
There are two issues:
|
Circling back and re-reading this discussion, I feel it's still easiest to add 2 new APIs to
These could have been implemented separately but the current |
For a default, it sounds like we should do 0123456789abc...xyzABC...XYZ, because today we use 0123456789abc...xyz. We already change unicode.Version to reflect changes in reality, so I think we should change MaxBase too. Perhaps we should also update the compat doc to point out that we can change these kinds of "documentation" constants. @griesemer says he will implement this. -rsc after discussion with @golang/proposal-review |
I worry about setting global state with the original API suggested. If this is to go ahead, I agree the new methods should accept the alphabet to use. |
@robpike Setting global state as initially proposed is a no-go. Instead, the suggested alternative solution here is to simply increase the value of MaxBase from 36 (= 10 + 26) to 62 (= 10 + 26 + 26), and add the upper-case letters to the (constant) alphabet used. If a different alphabet is required, it's a simple linear pass over the output to substitute characters. |
@griesemer Understood. |
@anitgandhi, @rsc It turns out this is more complicated than I originally thought: Providing support to convert from a number to a 62-base string representation is one thing; but we also need to provide the opposite which is to convert a 62-base string back to a number. Extending the current alphabet with "A" to "Z" is problematic because "a" and "A" mean the same for all bases up to 36 at the moment. Changing the meaning of "a" and "A" for bases 37 and up seems at least odd. For instance, while the value 10 will be printed as "a" in base 36 and 37, reading "A" in base 36 currently means 10, while in base 37 it would have to have the value 36. This seems at least inconsistent. Waiting for more input before moving ahead with this. |
I think it's OK to say that the alphabet is 0123456789abc..xyzABC..XYZ and that, as a convenience, for base <= 36, ABC..XYZ maps down to abc..xyz during SetString. I think 36 being a dividing line is entirely defensible. I see your point that "A"(35) = "A"(36) != "A"(37), but any such cross-base equality is already limited to single-digit inputs (or inputs with leading zeros). That is, even though "A"(35) = "A"(36), it's already the case that "AA"(35) != "AA"(36). The discontinuity that causes "A"(36) != "A"(37) seems OK. |
Change https://golang.org/cl/65970 mentions this issue: |
Introduction
In the process of developing fpe, I used
math/big
for it's useful big number equivalents ofitoa
andatoi
with base/radix support. However, as of right now,math/big
only supports a maximum base/radix of 36: https://github.com/golang/go/blob/master/src/math/big/natconv.go#L18-L24I believe this can be expanded to allow at least base62 strings and possibly arbitrary alphabets and radices.
Proposal
I'm proposing two separate but related changes such that
math/big
can add support for:big.Int
as base62 and base36 strings usingText()
andSetString()
results in the same string. In the context of FPE, that's effectively a collision in my opinion. See Base62 support capitalone/fpe#1 for details including a small 4-line diff that introduces base62 supportI believe this may entail:
natconv.go
, changedigits
andMaxBase
const
s to vars.SetDigits
and/orSetMaxBase
function tomath/big
Admittedly I don't know how many use cases there are for the 2nd point other than what I'm doing on fpe, however the flexibility of having arbitrary alphabets and bases could be useful in other cases.
Notes
CC: @griesemer
The text was updated successfully, but these errors were encountered: