Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Dictionary sorting for System.Text.Json #76008

Open
krwq opened this issue Sep 22, 2022 · 16 comments
Open

Stable Dictionary sorting for System.Text.Json #76008

krwq opened this issue Sep 22, 2022 · 16 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Milestone

Comments

@krwq
Copy link
Member

krwq commented Sep 22, 2022

Serializing the same dictionary but with entries added in different order in STJ can result in different results every time you run your app. We should offer converter/option which can make this behavior stable.

Repro:

using System.Text.Json;

Test repro = new Test();

if (Environment.TickCount % 2 == 0)
{
    repro.Dict.Add("left", "foo");
    repro.Dict.Add("right", "bar");
}
else
{
    repro.Dict.Add("right", "bar");
    repro.Dict.Add("left", "foo");
}

Console.WriteLine(JsonSerializer.Serialize(repro));

public class Test
{
    public Dictionary<string, string> Dict { get; } = new();
}

Output (randomly one of the following two):

{"Dict":{"left":"foo","right":"bar"}}

or

{"Dict":{"right":"bar","left":"foo"}}

Expected output:

{"Dict":{"left":"foo","right":"bar"}}
@krwq krwq added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json labels Sep 22, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 22, 2022
@ghost
Copy link

ghost commented Sep 22, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Serializing the same dictionary but with entries added in different order in STJ can result in different results every time you run your app. We should offer converter/option which can make this behavior stable.

Repro:

using System.Text.Json;

Test repro = new Test();

if (Environment.TickCount % 2 == 0)
{
    repro.Dict.Add("left", "foo");
    repro.Dict.Add("right", "bar");
}
else
{
    repro.Dict.Add("right", "bar");
    repro.Dict.Add("left", "foo");
}

Console.WriteLine(JsonSerializer.Serialize(repro));

public class Test
{
    public Dictionary<string, string> Dict { get; } = new();
}

Output (randomly one of the following two):

{"Dict":{"left":"foo","right":"bar"}}

or

{"Dict":{"right":"bar","left":"foo"}}

Expected output:

{"Dict":{"left":"foo","right":"bar"}}
Author: krwq
Assignees: -
Labels:

api-suggestion, area-System.Text.Json

Milestone: -

@eiriktsarpalis
Copy link
Member

Enumeration order of dictionary entries is generally speaking nondeterministic, and that is reflected by how entries get serialized in JSON. I can see how certain users might expect deterministic ordering, but there needs to be a commonly agreed upon order: should we sort by string keys? What if the key type is not a string?

Is there prior art in this space? Does Json.NET offer similar functionality?

@Clockwork-Muse
Copy link
Contributor

Probably the biggest use of this isn't for reader-side enumeration.
It's for things like request caching, where you need stable results or you'd invalidate the cache.
Or if the results would be signed for some reason, you want a stable order so that the hash doesn't change.

@eiriktsarpalis
Copy link
Member

Should we be sorting by key?

What about supported dictionary keys that are not strings? NB the default order of a key type might not be monotonic w.r.t. its JSON serialization, for example 2 < 10 holds however at the same time "10" > "2".

@krwq
Copy link
Member Author

krwq commented Sep 22, 2022

yes, it's mostly about the serialized JSON being same for same entries, not about particular order. IMO unless there isn't anything builtin to give some defaults we should support at least string keys and maybe couple of numeric types. Possibly make sort func customizable through resolver but not necessarily

@eiriktsarpalis
Copy link
Member

FWIW the CBOR specification defines a conformance mode mandating that map/object encodings are sorted lowest-to-highest by their key encodings. It might be worth investigating whether such a mode exists in the JSON space and whether we might consider supporting that out of the box.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Sep 22, 2022
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Sep 22, 2022
@GSPP
Copy link

GSPP commented Sep 26, 2022

Maybe users should just use a dictionary type that provides stable order?

@krwq
Copy link
Member Author

krwq commented Sep 27, 2022

@GSPP it is a good workaround but in general consider following scenario:

  • you get telemetry from random devices but always the same set of labels (keys in the dictionary)
  • you start your app and wait until all devices have sent you at least one measurement
  • you want to periodically send current state over the wire using JSON
  1. I wouldn't expect to have to use SortedDictionary in such scenario, in my app I don't need it sorted
  2. Receiver of telemetry even though they get identical measurements they may get different JSON every time
  3. Receiver don't have to parse data if it's same
  4. Sender if they had stable JSON allows you to do various optimizations - i.e. not sending entire thing over the wire and just send heartbeat saying equivalent to "same as last time"

Another scenario:
There are some libraries like i.e. Assent which allow you to easily baseline payloads like JSON. The way they work is when they detect difference in output they will fail test and show up diff window and ask you if this new baseline is ok, if you approve it will become new baseline. With unstable output of JSON those libraries can't effectively be used.

@steveharter
Copy link
Member

I can see that having deterministic ordering including when round-tripping the JSON would sometimes be useful. However, if the code for the object being serialized can be changed, the corresponding dictionary type could be changed to SortedDictionary<,> easily enough.

Dictionary<,> is non-deterministic but does unofficially maintain insertion order at least until an item is removed.

FWIW JsonObject is deterministic even with removes and does this by maintaining a list internally. It also has a dictionary that helps with perf once there are a few items. New items are added to the end. Users expect this DOM class to be deterministic.

@GSPP
Copy link

GSPP commented Sep 29, 2022

Dictionary<,> is non-deterministic but does unofficially maintain insertion order at least until an item is removed.

I have worked for a company where this was being relied on in about 1000 places. Unfortunately, it doesn't fully work even when just adding. There are some conditions that I forgot. I believe starting at about 1000 items it breaks down. I think there also was a problem with rehashing in case of too many collisions (a security feature).

In any case, I do not believe that relying on this should be encouraged as an unofficial workaround (not that you would have suggested otherwise). There should be a "correct" official guidance on how to achieve this scenario.

I also believe that sorting keys is often inadequate. Yes, it does ensure determinism. But it also alters the natural order of items which could be meaningful. It can be semantically meaningful, but it also might have debugging value.

@Clockwork-Muse
Copy link
Contributor

I also believe that sorting keys is often inadequate. Yes, it does ensure determinism. But it also alters the natural order of items which could be meaningful. It can be semantically meaningful, but it also might have debugging value.

The problem is that usually you want a stable key-based sort specifically for those cases where items were inserted in different orders for whatever reason. Insertion order being irrelevant to the serialized document.

@gregsdennis
Copy link
Contributor

This has come up for me as well while implementing the $json operator for JSON-e (issue/discussion).

While the SortedDictionary approach is a good workaround, I think having a serializer option would be the best bet.

public class JsonSerializerOptions
{
    // ...

    public IComparer<string> KeyOrdering { get; set; }
}

Just as as idea. This way I can specify how I want to sort the keys, be it ordinal, or ignore case, or whatever.

@eiriktsarpalis
Copy link
Member

It sounds like a setting you'd want to specify on individual properties, not as global configuration.

@gregsdennis
Copy link
Contributor

Not sure what you mean, @eiriktsarpalis. I'd definitely want this on an entire single serialization.

@eiriktsarpalis
Copy link
Member

I'm not saying a global flag wouldn't be useful, but I can also see cases where such a blanket policy might be undesirable.

FWIW we support key serialization beyond just strings (admittedly that's not as common though). Shouldn't the setting account for those as well?

@gregsdennis
Copy link
Contributor

Sure. I'm just saying that I think an open-ended option would be a good solution for this. I'm not set on it being IComparer<string>. It could be something like the Encoder option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

6 participants