Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate keys are lost #34

Open
Lx opened this issue Apr 11, 2020 · 10 comments
Open

duplicate keys are lost #34

Lx opened this issue Apr 11, 2020 · 10 comments

Comments

@Lx
Copy link

Lx commented Apr 11, 2020

The following JSON has two "1" keys, which is not great but is technically allowed:

{
    "1": 3,
    "2": 2,
    "1": 1,
    "4": 4
}

Sorting this JSON causes the "1": 1 key/value to be lost:

{
    "1": 1,
    "2": 2,
    "4": 4
}

My preference would be for the duplicates to be retained in their original order:

{
    "1": 3,
    "1": 1,
    "2": 2,
    "4": 4
}
@richie5um
Copy link
Owner

richie5um commented Apr 11, 2020 via email

@Lx
Copy link
Author

Lx commented Apr 12, 2020

Tough break! It makes sense that a JS stringify discards prior key/value pairs with duplicate keys though.

I realise that not having duplicate keys in the input JSON is the easiest solution, but I don't have control over the generation of the input JSON—in fact I installed this extension because I knew I had duplicates, and I figured that at least sorting them so that they were together would better enable me to deal with them before doing any further processing on that JSON.

I suppose the ideal solution would be not to parse as a JavaScript object (which destroys duplicates without warning), but instead as an abstract syntax tree (which retains everything). The Babel parser might be a good choice.

With an AST you could easily get all of the keys and values (including duplicates) as an ordered list, sort the keys (retaining value order for duplicate keys), and then output the AST to code again via something like the Babel generator. You could probably totally ignore the subtrees generated for the values, merely passing them to the generator along with the sorted keys.

At minimum, it would be great if the extension could at least detect duplicates and warn through some pop-up that they have been removed after the sort, which currently doesn't happen for understandable reasons. I suspect that ASTs would still be the only way around this, too.

@Shoelace
Copy link

if you go ahead with this.. have an option to specify a secondary sort would be nice..
ie
key
or
key then value

@guyisra
Copy link

guyisra commented Oct 5, 2020

just found out about this issue.. now I realize that data could have been lost due to this.. I'm deleting this extension. It should just sort, not remove data!

@vanowm
Copy link

vanowm commented Aug 26, 2023

Technically no data is lost, since only last duplicate item will be available to that application that uses the json.
ECMA-262 "ECMAScript® Language Specification":

In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten.

Aka, last value wins.

So, when the extension sorts the json, it does so without the values that wouldn't be accessible in the first place, hence no data is lost...

@Lx
Copy link
Author

Lx commented Aug 26, 2023

Technically no data is lost…

If I feed 10 key/value pairs into a system and only 8 pairs come out, then I'm sure we would both agree that data has been lost. I therefore believe your intended point might actually be: "technically some data deserves to be lost".

I agree with you that the original JSON data shouldn't be using duplicate keys. However, in the real world, we don't always get the luxury of perfect input data. My example data was contrived for the purpose of this bug report. My actual, real-world data consisted of thousands of lines, and I really needed to identify and specially handle the data with duplicate keys. Plus, I was working in a text editor—as such, I would expect a general-purpose JSON sorter inside a text editor not to reduce the size of any data set it processes.

Also, it's technically permissible to use duplicate keys per RFC 8259 (The JavaScript Object Notation (JSON) Data Interchange Format), which says:

The names within an object SHOULD be unique.

…and "SHOULD" is not "MUST" per RFC 2119 (Key words for use in RFCs to Indicate Requirement Levels). In other words, while not ideal, JSON with duplicate keys is still valid JSON.

May I ask what the intention of your observation is? Are you advocating for this matter to be closed without action?

@vanowm
Copy link

vanowm commented Aug 27, 2023

Well, I do believe this is an invalid issue, because even though duplicate keys are technically allowed, unless there is a json parser that allows retrieving data from duplicate keys, I don't see any practical reason to support such edge case.

Just try vscode built-in JSON: Sort Document command, it doesn't remove duplicate keys, however parsed JSON will have different result than the original because it might change order of duplicate keys (I'd rather consider that as corruption):

{
    "1": 1,
    "1": 3,
    "2": 2,
    "4": 4
}

@Lx
Copy link
Author

Lx commented Aug 27, 2023

The practical reasons in my view are:

  • This is a tool primarily intended for use by a programmer (or someone with skills beyond a general end user).
  • Programmers (and similar) are probably using this tool as part of problem diagnosis/correction. (I doubt people are often sorting JSON just for fun, or without the intention of further analysis.)
  • In other words, this tool is getting used in cases where data can be expected to be non-conformant.
  • In my case (and seemingly @guyisra's), this is exactly why we were trying to use this tool.

The built-in JSON: Sort Document command wasn't available when this issue was opened. Now that it exists though, perhaps this entire extension can be considered obsolete.

@richie5um
Copy link
Owner

richie5um commented Aug 27, 2023

Hey all. Author here…

The original intention of this extension was for checked in JSON files - we were using them for localisation. In their original form they had ‘random’ order, and it meant doing PRs was very time consuming. Sorting the output meant PRs became super simple.

I now mostly use it for checked-in JSON (now increasingly containing JSON comments) and quickly making JSON output from APIs more readable for diagnostics.

In both scenarios, having unreachable/unpredictable JSON keys (due to duplicates and order) is not helpful, IMO.

I haven’t yet used the built in approach due to muscle memory.

The source for this extension is available if you want to fork and create one that handles duplicates - and I’ll happily answer questions if the code isn’t clear. That isn’t a passive aggressive reply :), more just to be clear that I don’t (currently) have the need/time to look into this issue further.

HTH.

:-)

@vanowm
Copy link

vanowm commented Aug 27, 2023

Now I'm just curious, how do you guys create a json file with duplicate keys in the first place? Is it simple appending-to-string method and then save the string as a file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants