Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: database/sql: define a Decimal decompose interface for decimal packages #30870

Open
kardianos opened this issue Mar 15, 2019 · 87 comments

Comments

@kardianos
Copy link
Contributor

kardianos commented Mar 15, 2019

Background

Databases store many types of values. One type of very common value databases store, especially SQL databases, are decimal values. Some databases limit the size of the decimal, while others support arbitrary precision. But common to all is a base10 representation of the decimal number.

Historically handling decimals for database/sql and database drivers has been a pain. What types should a driver look for? If a driver looks for a given type and handles it, then it has to import it and depend on it. Or possibly go to the trouble of injecting the various types it handles with some type of type registry. The solution space at present for "library" packages that need to deal with decimals, but may be used by many other applications, is sub-optimal.

There is a history of proposals for including a decimal type into the standard library:
#19787 #12127 #12332 .

Lastly, there are a number of decimal packages. Each implementation has a similar type implementation:

Generally each decimal type has a big.Int coefficient and an int32 exponent. All of these types of decimals are arbitrary precision, not fixed size like decimal128.

Proposal

I propose that a common interface is defined for a decimal type that dumps the big.Int and int32 exponent. This interface could be defined in the standard library, or agreed upon by each package. This way a SQL driver would need to do a type assertion for the well known interface, dump the value and exponent, and marshal the value to the driver. When scanning a decimal value, it would try to load the value and exponent into the decimal type presented to it.

It may be necessary to also provide some test vectors along with the implementation so the value and exponent are interpreted the same.

The proposed interface:

// Decimal composes or decomposes a decimal value to and from individual parts.
// There are four separate parts: a boolean negative flag, a form byte with three possible states
// (finite=0, infinite=1, NaN=2),  a base-2 little-endian integer
// coefficient (also known as a significand) as a []byte, and an int32 exponent.
// These are composed into a final value as "decimal = (neg) (form=finite) coefficient * 10 ^ exponent".
// A zero length coefficient is a zero value.
// If the form is not finite the coefficient and scale should be ignored.
// The negative parameter may be set to true for any form, although implementations are not required
// to respect the negative parameter in the non-finite form.
//
// Implementations may choose to signal a negative zero or negative NaN, but implementations
// that do not support these may also ignore the negative zero or negative NaN without error.
// If an implementation does not support Infinity it may be converted into a NaN without error.
// If a value is set that is larger then what is supported by an implementation is attempted to
// be set, an error must be returned.
// Implementations must return an error if a NaN or Infinity is attempted to be set while neither
// are supported.
type Decimal interface {
    // Decompose returns the internal decimal state into parts.
    // If the provided buf has sufficient capacity, buf may be returned as the coefficient with
    // the value set and length set as appropriate.
    Decompose(buf []byte) (form byte, negative bool, coefficient []byte, exponent int32)

    // Compose sets the internal decimal value from parts. If the value cannot be
    // represented then an error should be returned.
    Compose(form byte, negative bool, coefficient []byte, exponent int32) error
}

This is the original proposed interface: out-dated, see this comment or see above.

// Form of the number. If this lived in "math/big" it could be typed as "type Form byte".
const (
	Finite byte = iota // Normal numbers
	NaN
	InfiniteNegative
        InfinitePositive
)

// DecimalValue is an interface decimal implementations may implement to
// allow easy conversion between various decimal packages. These interfaces
// may also used by packages that need to marshal or unmarshal a decimal value,
// but don't care about the rest of the decimal type.
//
// The decimal value in the finite form is defined as "decimal = coefficient * 10^exponent".
type DecimalValue interface {
    // ValueExponent returns the coefficient, exponent, and form of the decimal value.
    ValueExponent() (coefficient big.Int, exponent int32, form byte)

    // SetValueExponent sets the decimal number's value, exponent, and form.
    // If the decimal number does not support the value form (such as Infinite),
    // or if the value is larger then a fixed size representation can handle, an error should be returned.
    // However, if a decimal supports NaN but not Infinite forms, the Infinite forms
    // may be coerced into a NaN.
    SetValueExponent(coefficient big.Int, exponent int32, form byte) error
}

/cc @mjibson @victorquinn @ericlagergren

If this, or a similar, interface is acceptable to cockroachdb, shopspring, and Eric, I'd be willing to open up PRs to add the methods and some test vectors.


Edited proposed interface to include a "form".

@gopherbot gopherbot added this to the Proposal milestone Mar 15, 2019
@ericlagergren
Copy link
Contributor

ericlagergren commented Mar 15, 2019

I think decimal = mant * 10^exp should be a part of the API.

I really like this idea.

I’ll have more feedback later.

@maddyblue
Copy link
Contributor

The immediate problem I see with this is the non-numeric decimal representations. apd and ericlagergren's decimal type support positive and negative infinity, nan, and signaling nan. The shopspring decimal and many others do only mantissa + exponent. The special values are handled well with floats because they are defined as part of the bits. I think that if we want adoption of this interface it would need to support more than finite numbers.

@renthraysk
Copy link

Just to clarify...
On querying, driver.Next() has to pass back an implementation of DecimalValue, and then database/sql will attempt to convert/copy that into whatever type was passed into sql.Rows.Scan()?

@kardianos
Copy link
Contributor Author

@renthraysk Yes.

It would first attempt to match the type (like apd.Decimal -> apd.Decimal) but failing that, if both the src and dest satisfy the DecimalValue type, it could extract the bits from one and stuff it into the other.

@renthraysk
Copy link

Leaves NULL handing.
Currently drivers return NIL, and database/sql calls dest type's Scan() func for setting Valid field.
So something like

type NullDecimal struct {
Valid bool
Value DecimalValue
}
func (nd *NullDecimal) Scan(src []interface{}) error { ... }

?

@kardianos
Copy link
Contributor Author

@ericlagergren If you have time this week, additional feedback would be appreciated.

@mjibson What do you think of the edited interface? I modified the interface to take a "form", as well as the the set method returning a method. This way if a decimal implementation and the number was too large, or the implementation doesn't support NaN or infinity and the number was in that form, it could return an error.

@maddyblue
Copy link
Contributor

maddyblue commented Mar 19, 2019

I recommend removing pos and neg infinity and just having infinity. The sign of the infinity can be extracted from the sign of the big.Int value. In addition, I recommend adding sNaN, which is the signaling nan that is supported by most decimal implementations (see https://docs.python.org/2/library/decimal.html#decimal-objects for example).

Separating the sign from the form is important because in addition to +/- infinity, the spec also supports +/- NaN and +/- sNaN. See http://speleotrove.com/decimal/damodel.html, which discusses the requirements of finite and special values, and is linked to by the python decimal module.

Although there is a very good argument to be made that the forms you have as of now (pos inf, neg inf, nan, finite) are sufficient since the sign and type of nan almost never matters, and the java bigdecimal implementation only supports those forms.

apd supports 4 kinds of nan, but I suspect most users only use the normal nan. In our use of apd, we certainly only use the normal nan and coerce all other nans to it.

@kardianos
Copy link
Contributor Author

kardianos commented Mar 19, 2019

@mjibson Thanks for the feedback. I appreciate the links. I finished reading the speletrove link and skimmed the python doc link.

My point of reference is mostly line of business applications. The implementations I'm looking at are C#, Java, Oracle, PostgreSQL, and MS SQL Server.

  • C# uses fixed width decimal128, no NaN or Inf forms. ref
  • Java BigDecimal supports arbitrary precision, but no NaN or Inf forms from what I see. ref
  • PostgreSQL supports arbitrary precision and NaN form, but not Inf form. ref
  • Oracle (database) uses fixed width and supports NaN and Inf forms. ref
  • SQL Server uses fixed width and does not store NaN or Inf forms. ref

Is there a different set of use cases I'm not thinking of where transmitting the signed NaN, or various forms of NaN would be advantageous?

@maddyblue
Copy link
Contributor

Honestly no, I'm not aware of anyone caring about any kind of NaN other than just the normal NaN. I guess in that light I think your current proposal is fine. I almost think the previous one with posinf, neginf, nan, finite may be better just because it feels medium risky to tell people to inspect the sign of the big.Int. Unsure.

@ianlancetaylor
Copy link
Member

CC @griesemer

@kardianos
Copy link
Contributor Author

@mjibson When I wrote some pseudo code to unmarshal and marshal the infinity forms, it was clearer to keep Positive and Negative Infinity as separate forms. I reverted to having them separate.

@ericlagergren
Copy link
Contributor

ericlagergren commented Mar 20, 2019

Overall, I like the idea.

I'm kind of torn between including different types of NaN and infinity values, though.

I don't see much value in inserting qNaN into a database. So, I like the idea of having three simple form constants: Finite, Inf, and NaN.

That being said, if people do need those values it'll be difficult to add them on top of the aforementioned iota-based form constants.

So, a bitmask seems like a decent approach:

type Form byte
const (
    Finite Form = 0
    Inf    Form = 1 << 2 // ±Inf
    NaN    Form = 1 << 3 // qNaN or sNaN

    // sign can be used to create -infinity:
    //    NegInf = sign | Infinity
    // as well as signed NaN values
    //    NegNaN = sign | NaN
    // it can also be used to represent -0
    // if coefficient == 0 and Form == sign
    sign Form = 1 << 1
)

Something else not considered is -0. Though there are not a lot of practical uses, it is important. The current proposal doesn't allow for -0 because big.Int does not support it.

@ericlagergren
Copy link
Contributor

Additionally, I know decimals are well-defined, but I would take the extra step to add decimal = coefficient * 10^exponent to the API.

The reason for this is some libraries like my decimal.Big and like Java's BigDecimal expose the exponent as a "scale" value that, if positive, is equal to the number of digits to the right of the decimal point. In effect, scale = -exponent. (FWIW: internally, the libraries work on the exponent like normal.)

If the equation is defined in the DecimalValue API (much like it is in big.Float's API) then it's less likely to confuse folks.

@griesemer
Copy link
Contributor

math/big doesn't support NaN, so there's no point in having it in such an API.

More generally, conversion from/to decimals requires parsing/printing the decimal numbers; in other words these are string conversions. The conversions in each of these directions is a handful of lines of code. I am not convinced that these need to be added to the big.Float API. See here for an example implementation:

https://play.golang.org/p/iJpjp9J11yy

The code is easily adjusted to provide big.Ints if that is desired. The string ops may be not totally optimal but negligible (in terms of overhead) compared to the actual Float conversions (which are expensive, and there is no way around those).

If the decimals are in []byte rather than in string form, then one can use Un/MarshalText instead.

In short, it seems to me that it's better to write the few lines of code that are custom-fitted for the specific application in mind than adding more API that may not be quite right in the specific case.

@renthraysk
Copy link

This is about getting decimals (including NaNs) in and out of database/sql and drivers without having to depend on any specific implementation.

@griesemer
Copy link
Contributor

@renthraysk NaNs are not "decimals". There's no support for NaNs in math/big by design (and we're not going to add support for NaNs); so let's leave them out of the picture. If you need to represent NaNs, you can always have a special encoding outside of math/big.

I don't know what you mean by "without relying on a specific implementation". The title of this issue is pretty clear, the specific implementation is math/big. Please clarify.

@ericlagergren
Copy link
Contributor

ericlagergren commented Mar 25, 2019

NaNs are not "decimals"

I hope I'm not being too pedantic or misunderstanding your point, but NaN values are decimals in that they're special values explicitly covered by decimal floating point specs.

@kardianos
Copy link
Contributor Author

@griesemer Thank you for looking at this issue. I mostly focus on database/sql and database drivers. This issue pertains specifically to (base-10) decimal types. It is most important not to end applications, but intermediate libraries, notably database drivers.

I would be nice to have a base-10 decimal type in the standard library, but baring that, it would be nice to have an interface for marshaling and unmarshaling decimal types in and out of non-standard types. This interface could be defined in "math/big", or it could be implicitly defined in decimal packages as we mutually agree to it. So while this could involve "math/big", the interface could also be implicit.

Example 1:

  1. User wants to put a decimal value into a database. Say they are using cockroach/adb in the application.
  2. They pass this decimal value to db.QueryContext("insert into mymoney (?);", decimalValue).
  3. database/sql passes this into the database driver and it needs to handle that decimalValue. What should it do?
    a. Require all applications to pass in decimals as strings?
    b. Require a single decimal type to convert to before passing in?
    c. Require any decimal type to support a specific interface that can be efficiently read and handled?

Example 2:

  1. User wants to read a decimal value from a database. Say they are using shopspring/decimal in the application.
  2. They create a decimal type, then scan from row row.Scan(&decimal).
  3. database/sql looks at decimal and doesn't know what to do with it, so it asks the driver if it can handle it. The driver looks at the value and might do one of the following:
    a. Require all decimals to be pulled out of the database as strings?
    b. Require a single decimal type to convert from before scanning out?
    c. Require any decimal type to support a specific interface that can be efficiently set and handled.

Let's ignore NaN at the moment. Regardless, as a database driver, telling clients to always go through strings or a []byte interface isn't really an option I see as viable.

I hope this clears up what I'm trying to accomplish with this issue.

@griesemer
Copy link
Contributor

@ericlagergren Point taken, but again, they are not supported by math/big, by design (math/big is a software library, not a hardware standard, and as such has many other ways to express errors such as a division by zero w/o resorting to special "in-band" values such as NaNs).

@kardianos Thanks for the clarification. I suggest you change the title of this issue to something that better reflects the intent. From you response I gather that this issue is only marginally related to math/big.

@kardianos kardianos changed the title proposal: math/big: add Decimal decompose interface proposal: define a Decimal decompose interface for decimal packages Mar 25, 2019
@renthraysk
Copy link

renthraysk commented Mar 25, 2019

@griesemer
The implementations I was referring to was the decimal packages listed in the original post.

@griesemer
Copy link
Contributor

@kardianos Thanks for the title change, this makes is much clearer (to me, at least). I'm somewhat amazed that all the Decimal packages are using a *big.Int plus exponent representation. Wouldn't it be better to not be so dependent on that? For instance, borrowing freely from your suggestion:

type Decimal interface {
   // Value returns the sign, mantissa, and scale factor of the decimal value x, such that
   // x = sign * mant * 10^exp, where a sign < 0 indicates a negative value (otherwise it
   // is a positive value), mant contains the mantissa bits in little-endian binary format,
   // and scale is the scale factor by which the mantissa is adjusted. The value 0 is
   // represented by an empty mantissa (len(mant) == 0) and a scale of zero, while an
   // infinity uses an empty mantissa with any non-zero scale value. If a sufficiently
   // large buf is provided, its underlying array is used for mant.
   Value(buf []byte) (sign int, mant []byte, scale int)

   // ...
   SetValue(sign int, mant []byte, scale int) error
}

With an interface like this one wouldn't be so tied to math/big, yet it would still be reasonably efficient to convert to math/big.Int since big.Int's have methods to convert from/to bytes.

Some questions to answer:

  • should it be sign int or neg bool ?
  • is the mantissa normalized in any way? (This is also a question for your design.)

Also, how important is it to be able to represent NaNs?

@kardianos
Copy link
Contributor Author

@griesemer Thank you for the continued feedback. It would be nice to not be coupled to big.Int, though it does come at a cost of being a slightly more complicated definition. I would need to implement your proposed interface on an existing package or two before I have answers to your questions.

  • It is (unfortunately) somewhat important to support NaNs. Postgresql and Oracle both support storing a NaN decimal state.
  • One nit on the code above is I would explicitly state int32 for scale. Did you put in int scale on purpose?
  • In your suggestion above, you have []byte be a little-endian number. In big.Int API, Bytes returns a big-endian number. Any particular rational for that difference?
  • I don't think I understand what a normalized mantissa means in this context. But I don't think the mantissa and scale need to be defined so that for a given number they will always be represented the same way.
  • You used "mantissa", but other packages, people have used coefficient. Is there a technical difference I should be aware of between the two terms? Do you view one to be more appropriate?

@griesemer
Copy link
Contributor

griesemer commented Mar 25, 2019

@kardianos Please find my replies below, in the order of your questions:

  • If NaN is important, I'm sure it could be encoded somehow as well. I'd be somewhat hesitant to use an additional form argument as it looks like it is borrowed from math/big.Float's implementation, and it's really an implementation detail. My question here would be: How do DB's encode NaNs? What about Infs?

  • I used scale to differentiate from an exponent (which usually has a specific meaning with respect to a normalized mantissa). But I have no strong feelings here. exp is probably fine.
    [edited - I misread your question here] Yes, I chose int over int32 deliberately. Exposing int32 seems like an implementation detail that will make it harder for the API to interact with other code. We find that even the choice of uint vs int can make code more cumbersome to use (e.g., in retrospect, I'd not use uint in the math/big API, just int). Using int rather than the int32 may make a concrete implementation a bit harder but likely will simplify external code. I'd be curious as to your choice of int32.

  • big.Int.Bytes should have returned a little-endian representation; but that was a battle I lost when that API was designed. After all, the internal representation is little-endian. I don't think it matters much.

  • A mantissa is usually the fractional part of a floating-point number, and it's normalized, meaning that it is within a certain range. E.g., for IEEE floats, the mantissa is in the range [0.0, 1.0[ (with the 1.0 excluded), and the actual value scaled by the exponent is 1.mantissa (the leading 1 bit is implicit), unless the value is 0.0. I don't know that it matters that the mantissa is normalized here, but it seems a bit odd to me that one might get an arbitrary (mant, exp) pair for a given value. For instance, the value 123 might be represented as (mant, exp) pair (123, 0), or (1.23, 2), or (0.123, 3) (and others). In this case, the latter two might be considered normalized. I think one would want to have some guarantee that the mantissa is "minimal" (as in short). Perhaps not for the input (SetValue), but for the output. I suspect we don't want 123 to be represented as (1230000000000, -10)

  • Neither mantissa nor coefficient are good names here in my mind. But mantissa seems slightly better, especially if it is normalized. The word coefficient really doesn't imply anything (coefficient of what?). At least a mantissa has some meaning that's related.

@ericlagergren
Copy link
Contributor

@griesemer

I haven't fully digested your last two comments yet, but I'm curious what you meant by

I'm somewhat amazed that all the Decimal packages are using a *big.Int plus exponent representation.

@maddyblue
Copy link
Contributor

Decimals (or at least the decimal spec previously mentioned) have the ability to differentiate between representations like 123, 123.0, 123.00, so I think normalizing the mantissa is not desired.

@griesemer
Copy link
Contributor

@ericlagergren What I meant is literally that I am surprised that given six essentially independent packages (I assume that's the case) all chose a very similar implementation. I would have expected that some of them would use a custom, fixed-size representation.

@kardianos
Copy link
Contributor Author

kardianos commented Oct 27, 2020

When the decomposer interface was added (though not yet publicly declared), it was designed to gracefully grow into it. https://github.com/golang-sql/decomposer

I failed to take into account one scenario:

Should a decimal type directly implement a driver.Valuer interface (like shopspring) or should it be embedded into a decimal type that satisfies a Valuer interface (like sqlboiler does), the driver.DefaultParameterConverter will recognize the value as a valid value because IsValid was modified to recognize the decomposer value.

This combination breaks existing implementations when the sql driver doesn't recognize the decomposer interface.

I can either try to rollback the decomposer experiment, either partially or fully, or we can recognize the decomposer type as a fundamental type that may get passed to the driver. I would prefer to update database drivers to support the decomposer interface, because I see this as the best way forward to support decimals generally.


https://groups.google.com/g/golang-sql/c/sbu_GVIhwsY

@tgulacsi
Copy link
Contributor

I'd happily add support to the sql driver.
Maybe make it opt-in and more explicit, and only use the fast-path in IsValid when the driver implements a "Decimaler" interface?
Or has an "IsDecimaler() bool" method that returns true?

@ericlagergren
Copy link
Contributor

or we can recognize the decomposer type as a fundamental type that may get passed to the driver.

I vote for this option.

@dolmen
Copy link
Contributor

dolmen commented Jan 12, 2021

@kardianos I suspect there is a bug in database/sql/convert.go: 683ffe0#r45892984

@kardianos
Copy link
Contributor Author

@dolmen Can you look at this playground snip and tell me what I'm doing wrong?
https://play.golang.org/p/XrenPekngxd

@wpecker
Copy link

wpecker commented Jan 21, 2021

With this package, everything is fine
github.com/shopspring/decimal

tgulacsi added a commit to godror/godror that referenced this issue Jan 22, 2022
@Herokookkik55

This comment was marked as duplicate.

@Herokookkik55

This comment was marked as duplicate.

@dagss
Copy link

dagss commented Aug 12, 2022

It seems to me that this proposal has been implemented and support has been merged; yet the issue is Open and the proposal is put on hold given the issues discovered...

...does that mean that it can still be rolled back in a future version of go, and that code using the feature could suddenly break in a go upgrade? Or is it OK to build on this?

@ianlancetaylor
Copy link
Member

@dagss The proposed Decimal interface is not implemented in database/sql. I'm not sure which CL you are referring to.

@dagss
Copy link

dagss commented Aug 17, 2022

See databases/sql/convert.go in go1.18 standard library:

	case decimalDecompose:
		switch d := dest.(type) {
		case decimalCompose:
			return d.Compose(s.Decompose(nil))
		}

So it appears to be included, at least in my go1.18.1 install. It doesn't define a public interface, but it defines a a private interfaces decimalDecompose, decimalCompose and if you pass in a type that conforms to that private interface, it is passed through to the SQL driver.

The question is if it could be removed again...

@ianlancetaylor
Copy link
Member

We aren't going to roll back that change.

@kardianos Is there anything else to do here? Thanks.

@shueybubbles
Copy link

@ianlancetaylor @kardianos what approach should we take for go-mssqldb in our Microsoft fork to support decimals, based on everything discussed here? I want to make sure our sqlcmd implementation can handle decimals and money, so if it means taking an expedient path and just handling them as strings at the driver for now I'll probably do it.

@tgulacsi
Copy link
Contributor

Return strings for noe, but implement the decimal(De)Composer methods as a performance optimization route.

@shueybubbles
Copy link

Just to clarify the E2E desired behavior of a driver like go-mssqldb now:

Instead of reporting []byte] as the scan type of a SQL decimal or money it could instead report reflect.Typeof(Decimal) as long as we add the Compose and Decompose methods here: https://github.com/microsoft/go-mssqldb/blob/main/internal/decimal/decimal.go

For supporting input parameters of a decimal type, the driver will need to define its own local copies of decimalCompose and decimalDecompose using the same prototypes and require client apps to use those if they aren't passing []byte or string.

Does that sum it up?

@ncruces
Copy link
Contributor

ncruces commented Oct 19, 2023

I believe this comment seems erroneous:

// For now, continue to prefer the Valuer interface over the decimal decompose interface.
case decimalDecompose:
return vr, nil
}

A few lines before this function does:

if IsValue(v) {
return v, nil
}

And IsValue returns true for a decimalDecompose:

func IsValue(v any) bool {
if v == nil {
return true
}
switch v.(type) {
case []byte, bool, float64, int64, string, time.Time:
return true
case decimalDecompose:
return true
}
return false
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Hold
Development

No branches or pull requests