Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Long numbers during desiralization #1267

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

artem-krutov
Copy link

Problem: every number is being deserialized as Double, even if it doesn't have floating point: https://github.com/google/gson/blob/master/gson/src/main/java/com/google/gson/internal/bind/ObjectTypeAdapter.java#L79

Solution: added supporting for Long numbers:

  • 1.0 and 1.2 will be deserialized as Doubles
  • 1 will be deserialized as Long.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers
  • Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
  • The email used to register you as an authorized contributor must also be attached to your GitHub account.

@artem-krutov
Copy link
Author

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

@lyubomyr-shaydariv
Copy link
Contributor

Would this break backward compatibility?

@artem-krutov
Copy link
Author

artem-krutov commented Apr 12, 2018

@lyubomyr-shaydariv I don't think so. Currently every number in a json document is being deserialized as a Double. So, people have the code like this in their projects:

double price = ((MyClass) gson.fromJson(myData, myClass.class)).getPrice();

When we will start returning Longs, the same code will work. No problems in this code:

double a = 1L;

@lyubomyr-shaydariv
Copy link
Contributor

@artem-krutov Could you please give a more concrete example of the issue you're trying to fix?

@artem-krutov
Copy link
Author

@lyubomyr-shaydariv sure.

The issue reproduces when we have a POJO with a Map inside it. For example:

public class Person {
    public String name;
    public String lastName;
    public int age;
    public Map<String, Object> parameters; // The problem will be here

    // getters, setters
}

Imagine we have the following JSON object:

{
    "name": "Artem",
    "lastName": "Krutov",
    "age": 25,
    "parameters": {
        "childrenCount": 5,
        "numberOfFingersOnRightHand": 5
    }
}

What will we have once we deserialize?

Person person = new Gson().fromJson(jsonString, Person.class);

The age will be completely OK:

ing age = person.getAge(); // this is int

But what about parameters?

int childrenCount = person.getParameters().get("childrenCount"); // this line will fail

The line will fail, because Double will be returned. The same is for fingers. My PR fixes it – if we put double into a JSON, we will get double after deserializing. If we put Integer like in the example above, we will get integer after deserialization.

@lyubomyr-shaydariv
Copy link
Contributor

The issues I can see with the patch:

Issue: Introducing a new JSON token type: LONG_NUMBER

Gson follows the JSON standard that does not make a difference between integer and real numbers. That is why, I guess being not the inventor of Gson, JsonToken.NUMBER exists. Please note that the JsonToken enumeration covers all possible tokens that can occur in JSON documents. Adding a new specialized one would break a lot of code: any existing code that relies on JsonToken.NUMBER would just break in case of JsonToken.LONG_NUMBER; I usually use switches in the form of the following pattern:

switch ( jsonToken ) {
case NUMBER:
	// do something
case ...
	// do something
case ...
	// do something
default:
	throw new AssertionError(jsonToken); // This must never happen, an assertion, but all of my code will fail.
}

Issue: Possible misinterpreting of java.lang.Number and Gson LazilyParsedNumber

java.lang.Number allows mutual conversion between general numeric types. LazilyParsedNumber, that is a direct child of java.lang.Number, follows the contract of the latter as well. Holding a backing string that is the original value of a primitive numeric JSON value, it allows to convert the string into various numeric types depending on your particular needs. Please note that there is a low chance of losing precision since you are free to convert LazilyParsedNumber instances at call-site just by checking the value range.

For example, consider the following JSON to be deserialized:

{
	"foo": 1,
	"bar": 2.3
}

Now the consider the following code:

final Map<String, Number> fooBarMap = gson.fromJson(json, new TypeToken<Map<String, Number>>() {}.getType());
private static class FooBar {
	final Number foo = null;
	final Number bar = null;
}
...
final FooBar fooBar = gson.fromJson(json, FooBar.class);

All four following lines:

System.out.println(fooBarMap.get("foo").getClass());
System.out.println(fooBarMap.get("bar").getClass());
System.out.println(fooBar.foo.getClass());
System.out.println(fooBar.bar.getClass());

produces class com.google.gson.internal.LazilyParsedNumber that can be converted to any basic numeric value of your choice.

Issue: Deserializing a long-look-like value as java.lang.Long

This will also break backwards compatibility for cases like the below:

final Map<String, ?> index = gson.fromJson(json, new TypeToken<Map<String, ?>>() {}.getType());
...
final Double d = index.get("foo"); // May throw:
                                   // java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double

Also, double d = 1L; is not about a runtime-cast. In this case, javac converts a primitive long to a primitive double at the compile time. Check this out of javap output:

0: dconst_1
1: dstore_0

A similar output for long l = (long) 1D;

0: lconst_1
1: lstore_0

And this is what happens for final Double d = (Double) (Object) 1L; at runtime:

0: lconst_1
1: invokestatic  #2                  // Method java/lang/Long.valueOf:(J)Ljava/lang/Long;
4: checkcast     #3                  // class java/lang/Double
7: astore_0

Fix for your case

int childrenCount = person.getParameters().get("childrenCount");

This cannot compile, but you can get the childrenCount property at least in two ways:

  • ((Double) person.getParameters().get("childrenCount")).intValue()
  • (int) (double) person.getParameters().get("childrenCount") (this should work faster, as far as I understand)

And even making your mapping more convenient, you could just do (if possible for you case):

Map<String, Number> parameters;

And then just get a proper type you want: person.getParameters().get("childrenCount").intValue().


If I'd implemented such an implementation, I'd go with an option to disable the old functionality in GsonBuilder by implementing an internal type adapter factory that works for the java.lang.Number class hierarchy to work like you want it to be. But I wouldn't due to the issues I mentioned above, and I believe I missed some of them.

@artem-krutov
Copy link
Author

So, generally there are a lot of workarounds.

The following approach could work in some cases:

gson.fromJson(json, new TypeToken<Map<String, Number>>() {}.getType());

But it won't work in case when you don't know which class you need. For example:

String className = getClassName();
SomeBaseClass myObject = gson.fromJson(json, Class.forName(className));

I could agree it's not a super popular way to deserialize, and you usually know which exact class you need. But this flow is possible.

any existing code that relies on JsonToken.NUMBER would just break in case of JsonToken.LONG_NUMBER

How's that?

JsonToken token = JsonToken.NUMBER;
switch (token) {
    case JsonToken.NUMBER: {
        // this code will work;
    case JsonToken.LONG_NUMBER: {
        // we won't fall here
    }
}

...

JsonToken token = JsonToken.LONG_NUMBER;
switch (token) {
    case JsonToken.NUMBER: {
        // we won't fall here
    case JsonToken.LONG_NUMBER: {
        // this code will work
    }
}

We introduced LONG_NUMBER, and we process it in any place where we can see it.

Then, about Number and monster constructions like (int) (double) person.getParameters().get("childrenCount").

The first doesn't look like a good option. When you develop an application you want to be as straight as it possible. If you can have only integer amount of children, why you need Number? Hacking your code just to fit into the 3rd party library's limitations doesn't look good.

Converting doubles into integers looks better to me. We can live with it, but it would be better if we don't need to do any convertions.

I do understand your reasons not to merge this PR. You sound confident and you are indeed, your comments looks solid. Honestly, I didn't take into account that Gson follows the JSON specifications.
I'm basically OK with closing this PR, but just wanted to point out that JSON is pretty simple and stupid. It is very basic. On the other hand, our enterprise code much more complex :) So, if we can make difference between int and double – why not to?

@lyubomyr-shaydariv
Copy link
Contributor

How's that?

Ah, no-no, I meant not your code, but mine, for example: the code relies the finite and known-ahead set of JSON tokens, so introducing LONG_NUMBER would just tear it apart.

If you can have only integer amount of children, why you need Number?

Then there's probably something wrong with your mappings: you mentioned Map<String, Object> that is weaker than Map<String, Number> (consider it a virtually-all number), but you could probably need Map<String, Integer> or even class Parameters { int childrenCount; }?

Hacking your code just to fit into the 3rd party library's limitations doesn't look good.

Well, it depends on how your code is organized. It's nice to have your and for-a-library Data Transfer Object classes (say, Person value class and PersonGsonDto to work specially for Gson having these two mutually convertible). I typically use this approach as much as possible in order to have lose coupling between my code and the dependencies I use. Yeah, there is more code, but this is a price I pay in order to localize third-party library quirks as much as possible (by the way, this goes to Gson too). From time to time I must work with Jackson that I find much harder to work with, and I tend to create PersonJacksonDto just to make sure that hacking my code is concentrated in Jackson-related area like these ***JacksonDto.

, but just wanted to point out that JSON is pretty simple and stupid. It is very basic.

Yes, and this makes JSON be easier to implement. For example, if you'd have a delimitation between integers in reals in JSON, how would it affect JavaScript where JSON originates from? For example, not all long values can fit double, but JavaScript Number is just a double-precision float value. Let's go further: what if they'd add char as a single-character string specification? This would add another layer of complexity, and would make really not much sense for JavaScript bringing the complexity to JSON as well. And then we get byte and short. It's nice that JSON does not support unsigned values: just imagine how these would live in Java. And then we get BigDecimal with another layer of even bigger complexity... Well, you see, being stupid is sometimes a real virtue. :)

So, if we can make difference between int and double – why not to?

You can try to implement a TypeAdapterFactory that would work for java.lang.Number, and that could return a TypeAdapter<Number> instance that could try to detect numeric tokens and return the most appropriate java.lang.Number descendant you choose or just play with LazilyParsedNumber. But I tend to think that this will bring some complexity to your code at call-site.


By the way, a few words on my previous comment:

  • ((Double) person.getParameters().get("childrenCount")).intValue() -- this is better to be cast to a Number rather than Double -- all Numbers can be converted to int (by contract, of course), but not all Numbers are Doubles.
  • (int) (double) person.getParameters().get("childrenCount") -- despite it works faster, it requires the childrenCount object to be exactly java.lang.Double, and cannot work with other Numbers.

@artem-krutov
Copy link
Author

I meant not your code, but mine, for example: the code relies the finite and known-ahead set of JSON tokens, so introducing LONG_NUMBER would just tear it apart.

I'm afraid your code could start failing on each and every change of the underlying library if it's so tied with it. You are basically not supposed to rely on the library's innards, you are supposed to use it's API. Or, use adapters ;)

Yes, these barriers between 3rd party and your own code – they are really cool, this is true. It's a good idea to have them for some complex peaces of code. But the whole idea of any library is to fit into your needs. If you want to do simple things like getting data from JSON object, I believe it's good when you just do it. Or, otherwise, your business logic would easily transform into a bunch of adapters for each and every thing.

You can try to implement a TypeAdapterFactory

This is exactly how I workarond that double/int thing right now, thank you.

So, speaking generally, the biggest issue I can see with the current implementation – it is not consistent. You put integer and then you get double. We can debate for a long time and we both can propose reasonable solutions. Sometimes more reasonable and sometimes less :) But the fact is the fact – the behavior is not consistent.

Don't include JavaScript stuff here. Gson is a barrier (or, adapter) between JavaScript's JSON and Java code. And I really want to get object exactly of the same type as I put into the structure.

@lyubomyr-shaydariv
Copy link
Contributor

Well, JsonToken is a public API enum and it's widely used for parsing JSON streams. And it's well designed for public use covering all possible JSON tokens. Sorry, when I was referring JavaScript, I meant that JSON originates its idea from it and the compromise for numeric values, and nothing else.

You're missing the fact that Double is probably the best way to store numbers except the case when Long may exceed the limits of precision of Double and Double.longValue() may return incorrect results. Especially when JSON documents do not store numbers type info. All of its, Double, operations are superfast from the JVM perspective. As an alternate solution that can be 100% precise is LazilyParsedNumber that holds the original JSON string but parses it every time its requested for a number. That's an option you can have. I don't know the exact reason why Gson prefers Double over LazilyParsedNumber in certain cases and vice versa, but both of them are Numbers therefore giving you a chance to extract a value of the type you need. That must be either declared at declaration-site (field types, type tokens, etc) being as concrete as possible, or letting a user pick the best alternative at call-site. You're saying:

it is not consistent. You put integer and then you get double.

True, it is not 100% consistent, and it cannot be such: how can you distinguish between bytes, shorts, ints, longs then? If you put an Integer and instruct Gson it's put into an Object of any sort, it writes a number, but this does not let you read it back to Object having exact type unless you declare it somehow.

Take a look: if you read 1: is it a long, an int or a short? None! It was a byte while writing a single digit, but I didn't had any chance to specify that it is a byte and nothing else in JSON. That's it: this is the fundamentals of JSON where every number is a value of double precision, and there is no way to specify a real type of the value it originated from. The best I and you can do is declaring the type before reading: Byte or byte for this example, but not an Object. The best Gson can do is Double and LazilyParsedNumber, if more super types were used in declaration (Object or Number). Number allows to extract a byte in any case. The case you're addressing to is 100% supported with LazilyParsedNumber without breaking compatibility and being 100% precise.

Please don't get me wrong, but you should either have a proper mapping at declaration-site, or use Number public methods at call-site. The Gson internal parsers do the right job and don't intrude into heuristic of choosing for "what this number literal looks like the most" (1 can be a Float too! -- I haven't mentioned this type today), but they obey your mappings, and you totally control it. That's it.

And I really want to get object exactly of the same type as I put into the structure.

100% working method is storing type information along with numbers. Say [1, 1, 1, 1] might be represented in the following form:

[
{"type": "byte", "value": 1},
{"type": "short", "value": 1},
{"type": "long", "value": 1},
{"type": "double", "value": 1}
]

or purely hypothetical (and the highlighter detects it)

[1B, 1S, 1L, 1D]

Looks crazy? It does. But this resolves the issue of not having the original type information. And this is not a very non-trivial case: take a look at RuntimeTypeAdapterFactory designed for exactly the same problem when you can't instantiate an instance of an abstract class without knowing the type of the original object the JSON has been generated from (and even the worst case: structurally such objects may look the equivalent having the same properties, but having no type discriminator will never help you to get the original type).

@artem-krutov
Copy link
Author

@lyubomyr-shaydariv thank you for detailed answers.

I agree that structures like {"type": "byte", "value": 1} and [1B, 1S, 1L, 1D] looks really crazy :) We tried to make things simpler but ended up with much more complex code, right.

I see your position. Though, a couple of comments.

You're missing the fact that Double is probably the best way to store numbers except the case when Long may exceed the limits of precision of Double

I think this is handled by this code:

case PEEKED_LONG:
      return JsonToken.LONG_NUMBER;

(https://github.com/google/gson/pull/1267/files#diff-0776314ca8e083ca3943426740d1dd69R452)

GSON can understand when it peeks longs, there is fitsInLong flag: https://github.com/artem-krutov/gson/blob/3af5f69a97668ab6b72ed26c5e9c201a4eed0f45/gson/src/main/java/com/google/gson/stream/JsonReader.java#L732

So, there would not be (or should not be?) any mess with longs and doubles.

Take a look: if you read 1: is it a long, an int or a short? None! It was a byte while writing a single digit, but I didn't had any chance to specify that it is a byte and nothing else in JSON.

Yes, that's true. But all these data formats are stand for numbers without floating part. It's too narrow case because people do not usually use these types at all. It could be confusing, because we don't need even int for fields like fingersCount, but anyway. I think that the only difference is between floating numbers and numbers without floating point.

I also want you to get me right: I don't want to argue, all I want is work together to make the library a bit better. I agree there are a bunch of moments that are not obvious to me and could be really obvious to you. I just want your help in some points (and already got nice-detailed answers).

As far as I understood, you stand for 2 points:

  1. Leave the number-policy unified. We don't need unnecessary differences between number formats, because we can control everything on a caller side
  2. We must not break backward compatibility, which this PR might potentially do.

Am I right? Also a question about the point 1: do you think we don't need it at all, or this particular implementation needs to be revisited and improved?

@lyubomyr-shaydariv
Copy link
Contributor

lyubomyr-shaydariv commented Apr 15, 2018

Let me quote and comment myself:

That's it: this is the fundamentals of JSON where every number is a value of double precision, and there is no way to specify a real type of the value it originated from.

I was wrong. If we take a look at the grammar of JSON numbers specified at https://tools.ietf.org/html/rfc8259#section-6 , then it's clear that a number unlimited to its literal length. Even more, below the grammar, it's said:

   This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754 binary64 (double precision) numbers [IEEE754] is generally
   available and widely used, good interoperability can be achieved by
   implementations that expect no more precision or range than these
   provide, in the sense that implementations will approximate JSON
   numbers within the expected precision.  A JSON number such as 1E400
   or 3.141592653589793238462643383279 may indicate potential
   interoperability problems, since it suggests that the software that
   created it expects receiving software to have greater capabilities
   for numeric magnitude and precision than is widely available.

This is where Gson ObjectTypeAdapter works as it's warned in the RFC. TypeAdapters.NUMBER smoothes the things out:

final String json = "[3.141592653589793238462643383279, 1e400]";
System.out.println("As big decimals: " + arrayToString(gson.fromJson(json, BigDecimal[].class)));
System.out.println("     As doubles: " + arrayToString(gson.fromJson(json, Double[].class)));
System.out.println("     As numbers: " + arrayToString(gson.fromJson(json, Number[].class)));
System.out.println("     As objects: " + arrayToString(gson.fromJson(json, Object[].class)));
private static String arrayToString(final Object[] array) {
	final StringBuilder builder = new StringBuilder().append('[');
	for ( int i = 0; i < array.length; i++ ) {
		if ( i > 0 ) {
			builder.append(',');
		}
		final Object e = array[i];
		builder.append(String.valueOf(e));
		if ( e != null ) {
			builder.append('(').append(e.getClass().getSimpleName()).append(')');
		}
	}
	return builder.append(']').toString();
}

Output:

As big decimals: [3.141592653589793238462643383279(BigDecimal),1E+400(BigDecimal)]
     As doubles: [3.141592653589793(Double),Infinity(Double)]
     As numbers: [3.141592653589793238462643383279(LazilyParsedNumber),1e400(LazilyParsedNumber)]
     As objects: [3.141592653589793(Double),Infinity(Double)]

I think that the ObjectTypeAdapter has the only flaw here: it does not distinguish between very big 64-bit values that don't fit double-precision values losing such long values. I just can guess why it was designed like that: performance reasons simplify invoking jsonReader.nextDouble() rather than make some more expensive heuristics? Maybe yes, or maybe no. From other point of view, ObjectTypeAdapter is extremely simple and does not rely on any other kind of type adapters. Well, simplicity has the priority here and it makes sense, and it does not conflict with the RFC that says an implementation can set its own limits.

Leave the number-policy unified. We don't need unnecessary differences between number formats, because we can control everything on a caller side

In general, yes. The code above demonstrates this approach: you tell Gson what you (want to) expect there.

We must not break backward compatibility, which this PR might potentially do.

Yes. For example, you PR breaks the tests, for another example (as we have discussed on JsonToken already): 3af5f69#diff-e6741235a544250db9eddf9b0ccfe209R273 . Just consider hypothetical final Double number = (Double) gson.fromJson('1', Object.class) in someone else's code.

I think Gson can be improved without breaking backwards compatibility. I have an idea on that and I'll probably suggest a change tomorrow.

@Marcono1234
Copy link
Collaborator

Do you think this change is still needed or has this been resolved by #1290?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants