-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support JsonNumberHandling in JsonElement & JsonNode #29152
Comments
in case that u suppose solution long.Parse( elem.GetString() ) - well, we lose all the advantage of not allocating memory. |
|
cc @bartonjs |
That's something we're explicitly keeping out of the API. JsonDocument and JsonElement can apply over UTF-8 or UTF-16 data, exposing the span removes that abstraction.
The easy question is "why is the number in a string instead of just being a number?". Once strings start being parsed there become questions of acceptable formats, and culture sensitivity, et cetera. e.g. the number "123,456" is either bigger than 1000 (en-US) or between 100 and 1000 (en-UK). If I understand the flow correctly, JValue's int-conversion only uses the "N" format, removing the culture and format problems. I don't suppose you have any sort of data suggesting how popular it is to transmit numbers as JSON strings instead of JSON numbers?
It's still significantly lower allocation than the equivalent in JValue, since this would allocate the number-string into gen0, you'd parse it, you'd lose the reference, and it'll avoid getting promoted to gen1. In JsonTextReader to JValue the (sub)string got allocated during document traversal (probably getting promoted to gen1) along with lots of other short strings, and then much later is parsed (while still staying as a string held alive by the JValue). |
sometimes you've got data from server where u cannot change anything (Bitcoin exchange Binance for example). well, we have data and we should to do something with this. write now my codegenerator looks like (used Roslyn Scripting): with prop.AsSpan() I can (try to) parse span manually without any allocation. well, GetXXX() that parses numbers in quotes solve my problem and dont need add AsSpan(). my point is that we have numbers inside quotes in JSON and we can't change this. |
OFFTOPIC: well, I can try to use Utf8JsonReader for my task and (I didnt try yet but IMO) it will increase my code twice and harder code-logic maybe 5 times. maybe later I will want my code even faster and rewrite it (or will hire u to rewrite it :) ) for Utf8JsonReader but now I want simple MVP. next level will be native codegeneration (Roslyn, LLVM..), next - FPGA & HFT.. in trading you dont need any allocations even in gen0 or Edens. u just cant stop (all) thread(s) for 10-50ms. we can save 5ms in one type message from exchange vs Json.NET.. we have 10 message types, 100 instruments/securities, 10 another exchanges with 5 of them with numbers in quotes.. so u can save for 1second of real time >1second of CPU (many threads) of parse/GCcollect work that we can to spend to more useful work for real problem solution but not to fight with developer tasks.. |
+1, I'm no JSON expert, but from what I can see I'm not sure if it is a common scenario to serialise numbers as string type. https://stackoverflow.com/a/15368259/10484567 A little bit of search seem to suggest that it's not a number once it's been quoted. I'm against adding supports for non-standard usages of common protocols. I think the benefits of implementing this in BCL are not worth the amount of efforts needed to do so. |
@bartonjs by "between 100 and 1000 (en-UK)", I believe you've meant 'en-GB' 😀. The UK also uses full stop as decimal points, so '123,456' would be "one hundred and twenty three thousand, four hundred and fifty six". The point still holds in majority of European countries though. (e.g. 'fr-FR', 'de-DE'...) |
dotnet/corefx#36639 Could this solve your problem (if implemented)? |
Yep, oops 🤭. Though, looking at an old culture document I built it looks like en-UK doesn't (didn't?) exist, and en-GB used comma-and-period the same as en-US. de-DE, OTOH, reverses them (like I thought "en-UK" did). -- But the write time on that doc is 2009, so maybe things have changed in the last 10 years 😄. |
When working with financial data, it is an industry standard to stringify numeric types in order to avoid rounding errors. In fact, this is part of the reason that the It would make perfect sense for |
But JSON numbers are already stringified. The question is, does the financial industry send JSON like { "amount": "18.13" } or { "amount": 18.13 }. The former is a JsonValueType.String, the latter is a JsonValueType.Number, which is a still a string on the wire. |
True, but that doesn't consider platform-specific type handling. For instance, the JavaScript spec defines floating point numbers as 64-bit IEEE 754. Developers will use libraries like decimal.js or BigInteger in order to avoid the precision limitations set by the spec. Consider what would happen if a JavaScript client tried to parse the number Coinbase, which is the leading cryptocurrency exchange based in the United States, states that they have adopted the practice of stringifying floating point numbers for this exact reason:
Actually, I'm almost certain that the most prevalent technique is to just ditch floating point numbers altogether in favor of 64-bit integers with separate decimal place tracking, but that can cause issues if the number happens to exceed a 64-bit integer's range. It looks like JSON Schema has decided on this feature, but hasn't implemented it yet. Might be worth keeping an eye on. |
That looks fair to me. |
Javascript is limited to 53 bits of precision so all 64-bit integers will be truncated. We have to use strings in JSON so that these values are transmitted correctly. This is standard industry practice because there's no other option. Having the ability to both parse and write 64-bit integers as strings is a must, especially for JSON-based APIs. |
I have have hit this as well, but didn't find this issue (I'm using the |
Now possible using the new Json converters on 3.0 preview-7 public class LongToStringSupport : JsonConverter<long>
{
public override long Read(ref Utf8JsonReader reader, Type type, JsonSerializerOptions options)
{
if (reader.TokenType == JsonTokenType.String && Int64.TryParse(reader.GetString(), out var number))
return number;
return reader.GetInt64();
}
public override void Write(Utf8JsonWriter writer, long value, JsonSerializerOptions options)
{
writer.WriteStringValue(value.ToString());
}
} services.AddControllersWithViews()
.AddJsonOptions(options =>
{
options.JsonSerializerOptions.Converters.Add(new LongToStringSupport());
}); |
Point of this issue is NOT to use GetString() for JsonTokenType.String that is really decimal or long. |
Yea, it can be better. Here's an updated version to use the same Utf8Parser code: public class LongToStringConverter : JsonConverter<long>
{
public override long Read(ref Utf8JsonReader reader, Type type, JsonSerializerOptions options)
{
if (reader.TokenType == JsonTokenType.String)
{
ReadOnlySpan<byte> span = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
if (Utf8Parser.TryParse(span, out long number, out int bytesConsumed) && span.Length == bytesConsumed)
return number;
}
return reader.GetInt64();
}
public override void Write(Utf8JsonWriter writer, long value, JsonSerializerOptions options)
{
writer.WriteStringValue(value.ToString());
}
} |
@manigandham, @zelyony - note that the latest converter sample that was shared won't work for numbers that are escaped since you are using the Say, the digit 2 was instead escaped as const string json = "\"123456789010\\u0032\"";
var options = new JsonSerializerOptions();
options.Converters.Add(new LongToStringConverter());
// throws System.InvalidOperationException : Cannot get the value of a token type 'String' as a number
// since Utf8Parser.TryParse returned false
long val = JsonSerializer.Deserialize<long>(json, options);
Assert.Equal(1234567890102, val);
string jsonSerialized = JsonSerializer.Serialize(val, options);
Assert.Equal("\"1234567890102\"", jsonSerialized); |
@ahsonkhan Got it, thanks for the note. I guess both statements can be combined to handle that situation as well: public class LongToStringConverter : JsonConverter<long>
{
public override long Read(ref Utf8JsonReader reader, Type type, JsonSerializerOptions options)
{
if (reader.TokenType == JsonTokenType.String)
{
ReadOnlySpan<byte> span = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
if (Utf8Parser.TryParse(span, out long number, out int bytesConsumed) && span.Length == bytesConsumed)
return number;
if (Int64.TryParse(reader.GetString(), out number))
return number;
}
return reader.GetInt64();
}
public override void Write(Utf8JsonWriter writer, long value, JsonSerializerOptions options)
{
writer.WriteStringValue(value.ToString());
}
} |
I Have same issue with booleans, decimals and dates, |
@shaig4 - There's a comment about that option in the other issue: https://github.com/dotnet/corefx/issues/39473#issuecomment-526245607 Maybe these issues should be merged to avoid duplicate discussions... |
Linking https://github.com/dotnet/corefx/issues/40120 as supporting quoted numbers in the the deserializer for dictionaries is a feature ask. |
Please do this at low level if possible - definitely we don't want to allocate temporary strings like naïve converters will do. |
A few updates on this issue:
var options = new JsonSerializerOptions { NumberHandling = JsonNumberHandling.AllowReadingFromString };
var node = JsonSerializer.Deserialize<JsonNode>(@"{ ""num"" : ""1""}", options);
node["num"].GetValue<int>(); // System.InvalidOperationException: An element of type 'String' cannot be converted to a 'System.Int32'. @steveharter is this something we should consider addressing in a future release? |
I do think having pushing these simpler serializer-only features like quoted numbers and JsonIgnoreCondition down to node and element is goodness. Some may even make sense at the reader\writer level. Since
|
many JSON looks like this:
{"num":"123","str":"hello"}
Number can be in quotes too
JsonElement.GetInt64() throws Exception that "we need Number but we have a String".
I am sure that in quotes we have a Number, so code (GetInt64 and others) should ignore quotes and try parse number. and only in case when cannot parse Number(invalid chars) throw some Exception
UPD
Oops! I used unclear title. better
..JsonElement should parse Numbers in qoutes
SUMMATION:
The text was updated successfully, but these errors were encountered: