Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support JsonNumberHandling in JsonElement & JsonNode #29152

Open
a11e99z opened this issue Apr 3, 2019 · 27 comments
Open

Support JsonNumberHandling in JsonElement & JsonNode #29152

a11e99z opened this issue Apr 3, 2019 · 27 comments
Labels
area-System.Text.Json enhancement Product code improvement that does NOT require public API changes/additions json-functionality-doc Missing JSON specific functionality that needs documenting
Milestone

Comments

@a11e99z
Copy link

a11e99z commented Apr 3, 2019

many JSON looks like this:
{"num":"123","str":"hello"}
Number can be in quotes too

JsonElement.GetInt64() throws Exception that "we need Number but we have a String".
I am sure that in quotes we have a Number, so code (GetInt64 and others) should ignore quotes and try parse number. and only in case when cannot parse Number(invalid chars) throw some Exception

UPD
Oops! I used unclear title. better
..JsonElement should parse Numbers in qoutes

SUMMATION:

  1. GetInt64() and etc should parse Numbers(long,uint,decimal..) for String elements: ignore first and last quotes.
  2. add .AsSpan() method to JsonElement for cases when we want to do something with elements manually. we cannot access to element data and do something useful or work around with some problem (.GetString(), .ToString(), .GetRawString() - stress GC and they are not solution)
@a11e99z
Copy link
Author

a11e99z commented Apr 3, 2019

in case that u suppose solution long.Parse( elem.GetString() ) - well, we lose all the advantage of not allocating memory.
JsonDocument loose any advantages in case when JSON has many numbers(long,int,decimal) in quotes and better to use Json.NET.

@a11e99z
Copy link
Author

a11e99z commented Apr 3, 2019

  1. I can parse Number in quotes by myself but I cannot to get span of element or JSON-position (like in Utf8JsonReader.Position)
    so, add method to JsonElement.AsSpan() or something.
    It simple and let us to get around a lot of problems cause we can do something with the elements manually.

@danmoseley danmoseley changed the title System.Text.Json.JsonElement JsonElement should parse Numbers in quotes Apr 4, 2019
@ahsonkhan
Copy link
Member

cc @bartonjs

@bartonjs
Copy link
Member

bartonjs commented Apr 4, 2019

add method to JsonElement.AsSpan() or something.

That's something we're explicitly keeping out of the API. JsonDocument and JsonElement can apply over UTF-8 or UTF-16 data, exposing the span removes that abstraction.

Number can be in quotes too

The easy question is "why is the number in a string instead of just being a number?". Once strings start being parsed there become questions of acceptable formats, and culture sensitivity, et cetera. e.g. the number "123,456" is either bigger than 1000 (en-US) or between 100 and 1000 (en-UK).

If I understand the flow correctly, JValue's int-conversion only uses the "N" format, removing the culture and format problems.

I don't suppose you have any sort of data suggesting how popular it is to transmit numbers as JSON strings instead of JSON numbers?

in case that u suppose solution long.Parse( elem.GetString() ) - well, we lose all the advantage of not allocating memory.

It's still significantly lower allocation than the equivalent in JValue, since this would allocate the number-string into gen0, you'd parse it, you'd lose the reference, and it'll avoid getting promoted to gen1. In JsonTextReader to JValue the (sub)string got allocated during document traversal (probably getting promoted to gen1) along with lots of other short strings, and then much later is parsed (while still staying as a string held alive by the JValue).

@a11e99z
Copy link
Author

a11e99z commented Apr 4, 2019

sometimes you've got data from server where u cannot change anything (Bitcoin exchange Binance for example).
// timestamp: no quotes. price change: have quotes
[{"e":"24hrTicker","E":1554206291885,"s":"ETHBTC","p":"-0.00230300"... and many more numbers with quotes.
API description https://github.com/binance-exchange/binance-official-api-docs/blob/master/web-socket-streams.md

well, we have data and we should to do something with this.
when objects dont allow do something useful - such objects will not be used.
we live in the world where have many colors not just black and white. so, my request is not to do worse, is to do better with things that we have.
all job that needed is couple lines of code. (I am not sure for that coz many method invocations inside)

write now my codegenerator looks like (used Roslyn Scripting):
cod.AppendLine( $"if (nod.TryGetProperty( "{fld.Name}", out prop )) " );
if (ftyp == typeof( sbyte ) || ftyp == typeof( short ) || ftyp == typeof( int ) || ftyp == typeof( long ))
cod.AppendLine( $" res.{fld.Name} = ( System.{ftyp.Name} )(prop.Type == System.Text.Json.JsonValueType.String ? long.Parse( prop.GetString()) : prop.GetInt64());" );
// long.Parse( prop.GetString()) - "failed" use object.. better to use Json.NET and loose couple of milliseconds.
last line instead that I expected to work well (line where I dont expect Exception)
cod.AppendLine( $" res.{fld.Name} = ( System.{ftyp.Name} )prop.GetInt64());" );

with prop.AsSpan() I can (try to) parse span manually without any allocation.
with prop.GetInt64() that parses Numbers in quotes I have simple working code without any allocation.

well, GetXXX() that parses numbers in quotes solve my problem and dont need add AsSpan().
but AsSpan() will allow to use some hacks/optimization/access to JSON-element data in scenarios that we cant see right now.

my point is that we have numbers inside quotes in JSON and we can't change this.
so we need some methods that allow us to parse those numbers without any string allocations.
I dont know (dont care for sure) how to do this, maybe change current methods maybe add new ones
GetStringAsInt64(), xxxUInt64, xxxDouble, xxxDecimal will be enough, no need for float, byte, int, ushort..

@a11e99z
Copy link
Author

a11e99z commented Apr 4, 2019

OFFTOPIC:

well, I can try to use Utf8JsonReader for my task and (I didnt try yet but IMO) it will increase my code twice and harder code-logic maybe 5 times.

maybe later I will want my code even faster and rewrite it (or will hire u to rewrite it :) ) for Utf8JsonReader but now I want simple MVP. next level will be native codegeneration (Roslyn, LLVM..), next - FPGA & HFT..

in trading you dont need any allocations even in gen0 or Edens. u just cant stop (all) thread(s) for 10-50ms.

we can save 5ms in one type message from exchange vs Json.NET.. we have 10 message types, 100 instruments/securities, 10 another exchanges with 5 of them with numbers in quotes.. so u can save for 1second of real time >1second of CPU (many threads) of parse/GCcollect work that we can to spend to more useful work for real problem solution but not to fight with developer tasks..
cent saves the dollar.. millisecond saves the second

@Gnbrkm41
Copy link
Contributor

Gnbrkm41 commented Apr 5, 2019

I don't suppose you have any sort of data suggesting how popular it is to transmit numbers as JSON strings instead of JSON numbers?

+1, I'm no JSON expert, but from what I can see I'm not sure if it is a common scenario to serialise numbers as string type. https://stackoverflow.com/a/15368259/10484567 A little bit of search seem to suggest that it's not a number once it's been quoted.

I'm against adding supports for non-standard usages of common protocols. I think the benefits of implementing this in BCL are not worth the amount of efforts needed to do so.

@Gnbrkm41
Copy link
Contributor

Gnbrkm41 commented Apr 5, 2019

@bartonjs by "between 100 and 1000 (en-UK)", I believe you've meant 'en-GB' 😀. The UK also uses full stop as decimal points, so '123,456' would be "one hundred and twenty three thousand, four hundred and fifty six". The point still holds in majority of European countries though. (e.g. 'fr-FR', 'de-DE'...)

@Gnbrkm41
Copy link
Contributor

Gnbrkm41 commented Apr 5, 2019

dotnet/corefx#36639 Could this solve your problem (if implemented)?

@bartonjs
Copy link
Member

bartonjs commented Apr 5, 2019

by "between 100 and 1000 (en-UK)", I believe you've meant 'en-GB' grinning

Yep, oops 🤭. Though, looking at an old culture document I built it looks like en-UK doesn't (didn't?) exist, and en-GB used comma-and-period the same as en-US. de-DE, OTOH, reverses them (like I thought "en-UK" did). -- But the write time on that doc is 2009, so maybe things have changed in the last 10 years 😄.

@Wildenhaus
Copy link
Contributor

Wildenhaus commented May 16, 2019

... from what I can see I'm not sure if it is a common scenario to serialise numbers as string type ...

When working with financial data, it is an industry standard to stringify numeric types in order to avoid rounding errors. In fact, this is part of the reason that the Decimal type has been introduced.

It would make perfect sense for Utf8JsonReader to at the very least allow the parsing of strings when calling GetDecimal, if not for other floating point types as well.

@bartonjs
Copy link
Member

When working with financial data, it is an industry standard to stringify numeric types in order to avoid rounding errors

But JSON numbers are already stringified. The question is, does the financial industry send JSON like

{ "amount": "18.13" } or { "amount": 18.13 }. The former is a JsonValueType.String, the latter is a JsonValueType.Number, which is a still a string on the wire.

@Wildenhaus
Copy link
Contributor

Wildenhaus commented May 16, 2019

But JSON numbers are already stringified.

True, but that doesn't consider platform-specific type handling. For instance, the JavaScript spec defines floating point numbers as 64-bit IEEE 754. Developers will use libraries like decimal.js or BigInteger in order to avoid the precision limitations set by the spec.

Consider what would happen if a JavaScript client tried to parse the number 0.987765432109876543210 from JSON. If it were in a number format, it would be rounded to 0.98776543210987655, whereas if it were in string format, it could be handled differently and sent to a library supporting a larger precision. In the grand scheme of things, the way floating point numbers are handled is largely dependent on the platform consuming the serialized data.

Coinbase, which is the leading cryptocurrency exchange based in the United States, states that they have adopted the practice of stringifying floating point numbers for this exact reason:

Decimal numbers are returned as strings to preserve full precision across platforms. When making a request, it is recommended that you also convert your numbers to strings to avoid truncation and precision errors. Integer numbers (like trade id and sequence) are unquoted.

The question is, does the financial industry send JSON like { "amount": "18.13" } or { "amount": 18.13 }

Actually, I'm almost certain that the most prevalent technique is to just ditch floating point numbers altogether in favor of 64-bit integers with separate decimal place tracking, but that can cause issues if the number happens to exceed a 64-bit integer's range.

It looks like JSON Schema has decided on this feature, but hasn't implemented it yet. Might be worth keeping an eye on.

@Gnbrkm41
Copy link
Contributor

That looks fair to me.

@manigandham
Copy link

manigandham commented Jun 24, 2019

Javascript is limited to 53 bits of precision so all 64-bit integers will be truncated. We have to use strings in JSON so that these values are transmitted correctly. This is standard industry practice because there's no other option.

Having the ability to both parse and write 64-bit integers as strings is a must, especially for JSON-based APIs.

@NickCraver
Copy link
Member

I have have hit this as well, but didn't find this issue (I'm using the .Parse<T>() API). Raised here: https://github.com/dotnet/corefx/issues/39473

@manigandham
Copy link

Now possible using the new Json converters on 3.0 preview-7

    public class LongToStringSupport : JsonConverter<long>
    {
        public override long Read(ref Utf8JsonReader reader, Type type, JsonSerializerOptions options)
        {
            if (reader.TokenType == JsonTokenType.String && Int64.TryParse(reader.GetString(), out var number))
                return number;

            return reader.GetInt64();
        }

        public override void Write(Utf8JsonWriter writer, long value, JsonSerializerOptions options)
        {
            writer.WriteStringValue(value.ToString());
        }
    }
services.AddControllersWithViews()
   .AddJsonOptions(options =>
   {
        options.JsonSerializerOptions.Converters.Add(new LongToStringSupport());
   });

@zelyony
Copy link

zelyony commented Aug 3, 2019

if (reader.TokenType == JsonTokenType.String && Int64.TryParse(reader.GetString(), out var number))
                return number;

Point of this issue is NOT to use GetString() for JsonTokenType.String that is really decimal or long.

@manigandham
Copy link

manigandham commented Aug 4, 2019

@zelyony

Yea, it can be better. Here's an updated version to use the same Utf8Parser code:

public class LongToStringConverter : JsonConverter<long>
{
    public override long Read(ref Utf8JsonReader reader, Type type, JsonSerializerOptions options)
    {
        if (reader.TokenType == JsonTokenType.String)
        {
            ReadOnlySpan<byte> span = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
            if (Utf8Parser.TryParse(span, out long number, out int bytesConsumed) && span.Length == bytesConsumed)
                return number;
        }

        return reader.GetInt64();
    }

    public override void Write(Utf8JsonWriter writer, long value, JsonSerializerOptions options)
    {
        writer.WriteStringValue(value.ToString());
    }
}

@ahsonkhan
Copy link
Member

@manigandham, @zelyony - note that the latest converter sample that was shared won't work for numbers that are escaped since you are using the ValueSpan property to get the raw bytes (the previous sample using GetString() would work though). In case that scenario matters for your use case.

Say, the digit 2 was instead escaped as \\u0032.

const string json = "\"123456789010\\u0032\"";

var options = new JsonSerializerOptions();
options.Converters.Add(new LongToStringConverter());

// throws System.InvalidOperationException : Cannot get the value of a token type 'String' as a number
// since Utf8Parser.TryParse returned false
long val = JsonSerializer.Deserialize<long>(json, options);
Assert.Equal(1234567890102, val);

string jsonSerialized = JsonSerializer.Serialize(val, options);
Assert.Equal("\"1234567890102\"", jsonSerialized);

@manigandham
Copy link

@ahsonkhan Got it, thanks for the note. I guess both statements can be combined to handle that situation as well:

public class LongToStringConverter : JsonConverter<long>
{
    public override long Read(ref Utf8JsonReader reader, Type type, JsonSerializerOptions options)
    {
        if (reader.TokenType == JsonTokenType.String)
        {
            ReadOnlySpan<byte> span = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
            if (Utf8Parser.TryParse(span, out long number, out int bytesConsumed) && span.Length == bytesConsumed)
                return number;

            if (Int64.TryParse(reader.GetString(), out number))
                return number;
        }

        return reader.GetInt64();
    }

    public override void Write(Utf8JsonWriter writer, long value, JsonSerializerOptions options)
    {
        writer.WriteStringValue(value.ToString());
    }
}

@shaig4
Copy link

shaig4 commented Sep 29, 2019

I Have same issue with booleans, decimals and dates,
should be added a flag to enable quoted values.

@manigandham
Copy link

@shaig4 - There's a comment about that option in the other issue: https://github.com/dotnet/corefx/issues/39473#issuecomment-526245607

Maybe these issues should be merged to avoid duplicate discussions...

@steveharter
Copy link
Member

Linking https://github.com/dotnet/corefx/issues/40120 as supporting quoted numbers in the the deserializer for dictionaries is a feature ask.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@huoyaoyuan
Copy link
Member

Please do this at low level if possible - definitely we don't want to allocate temporary strings like naïve converters will do.

@layomia layomia modified the milestones: 5.0.0, Future Jul 7, 2020
@eiriktsarpalis eiriktsarpalis changed the title JsonElement should parse Numbers in quotes Support JsonNumberHandling in JsonElement & JsonNode Oct 20, 2021
@eiriktsarpalis
Copy link
Member

A few updates on this issue:

var options = new JsonSerializerOptions { NumberHandling = JsonNumberHandling.AllowReadingFromString };
var node = JsonSerializer.Deserialize<JsonNode>(@"{ ""num"" : ""1""}", options);
node["num"].GetValue<int>(); // System.InvalidOperationException: An element of type 'String' cannot be converted to a 'System.Int32'.

@steveharter is this something we should consider addressing in a future release?

@steveharter
Copy link
Member

steveharter commented Oct 21, 2021

I do think having pushing these simpler serializer-only features like quoted numbers and JsonIgnoreCondition down to node and element is goodness. Some may even make sense at the reader\writer level.

Since JsonNode is layered on JsonElement during deserialization, to support the various serializer-related features for node we would need to:

  • Add the appropriate knobs to the various option types:
    • JsonDocumentOptions.
    • JsonNodeOptions.
    • Optionally to JsonReaderOptions and JsonWriterOptions if we want to expose that there; document\element layers on the reader\writer. We have to make sure performance doesn't degrade.
  • Implement the handling of the various options in JsonElement, JsonNode and optionally in Utf8JsonReader\Utf8JsonWriter including adding new APIs to expose the functionality.
  • Address interop between the various options:
    • Ensure that JsonNodeOptions properly transfers settings to JsonDocumentOptions and if appropriate, JsonDocumentOptions to JsonReaderOptions and JsonWriterOptions.
    • Add helpers to obtain the type-specific options from an instance of JsonSerializerOptions. Currently we have internal methods already like ToNodeOptions().
    • Optionally, add additional overloads to document\node\reader\writer to take JsonSerializerOptions instead of the type-specific options. Internally, the jsonSerializerOptions.ToNodeOptions() etc would be called to get the appropriate options.
  • Perhaps remove appropriate serializer logic since reader\writer will be able to handle the functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Text.Json enhancement Product code improvement that does NOT require public API changes/additions json-functionality-doc Missing JSON specific functionality that needs documenting
Projects
None yet
Development

No branches or pull requests