Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid license yaml files by resolving duplicated keys #2776

Merged
merged 4 commits into from
Dec 3, 2021

Conversation

fangxlmr
Copy link
Contributor

@fangxlmr fangxlmr commented Dec 2, 2021

Fixes #2655.

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁

@fangxlmr
Copy link
Contributor Author

fangxlmr commented Dec 2, 2021

CC @pombredanne

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this... but I do not understand what is the problem you are trying to fix. Furthermore, why are you removing notes that are likely valuable?

@KaitoHH
Copy link

KaitoHH commented Dec 2, 2021

Thank you for this... but I do not understand what is the problem you are trying to fix. Furthermore, why are you removing notes that are likely valuable?

Duplicate key is illegal according to the latest YAML spec. Using other libraries to parse these YAML files, for example, go-yaml/yaml in golang, will cause illegal YAML errors, which causes lots of trouble.

@fangxlmr
Copy link
Contributor Author

fangxlmr commented Dec 2, 2021

FYI. saneyaml allows duplicated keys. From experiment, the rule is "Last one wins".

@fangxlmr
Copy link
Contributor Author

fangxlmr commented Dec 2, 2021

Furthermore, why are you removing notes that are likely valuable?

Most notes I have removed is EXACTLY duplicted if you take a glance on the context around them, except for ecos.yml and ekioh.yml.

And as notes in ecos.yml, it's replaced by ecos-exception-2.0. Do you want me to merge those two notes together? @pombredanne

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fangxlmr @KaitoHH ah... I got it! great catch!
I have a few nit pickings for your consideration.

src/licensedcode/data/licenses/ekioh.yml Show resolved Hide resolved
src/licensedcode/data/licenses/ecos.yml Show resolved Hide resolved
@pombredanne
Copy link
Member

@fangxlmr re:

FYI. saneyaml allows duplicated keys. From experiment, the rule is "Last one wins".

Yes, and it can also raise errors when there are duplicates... See
https://github.com/nexB/saneyaml/blob/f35baebf2da0fc161f397ff5a9d69d09fd3f4078/src/saneyaml.py#L50

IMHO we should also enforce this by using saneyaml.load(allow_duplicate_keys=False) here https://github.com/nexB/scancode-toolkit/blob/eee864090717611f79e178f3b4f642d108951e68/src/licensedcode/models.py#L261 and there https://github.com/nexB/scancode-toolkit/blob/eee864090717611f79e178f3b4f642d108951e68/src/licensedcode/models.py#L1231 otherwise there surely will be regressions

Just curious: how did you find out about these?

Fixes aboutcode-org#2655.

Signed-off-by: Fang Xiaoliang <fangxlmr@foxmail.com>
Signed-off-by: Fang Xiaoliang <fangxlmr@foxmail.com>
Signed-off-by: Fang Xiaoliang <fangxlmr@foxmail.com>
Signed-off-by: Fang Xiaoliang <fangxlmr@foxmail.com>
@fangxlmr
Copy link
Contributor Author

fangxlmr commented Dec 3, 2021

@fangxlmr re:

FYI. saneyaml allows duplicated keys. From experiment, the rule is "Last one wins".

Yes, and it can also raise errors when there are duplicates... See https://github.com/nexB/saneyaml/blob/f35baebf2da0fc161f397ff5a9d69d09fd3f4078/src/saneyaml.py#L50

IMHO we should also enforce this by using saneyaml.load(allow_duplicate_keys=False) here

https://github.com/nexB/scancode-toolkit/blob/eee864090717611f79e178f3b4f642d108951e68/src/licensedcode/models.py#L261

and there
https://github.com/nexB/scancode-toolkit/blob/eee864090717611f79e178f3b4f642d108951e68/src/licensedcode/models.py#L1231

otherwise there surely will be regressions

Sure. During resolving dup fields in license yamls, I also found out this mistake occurs in one rule yaml as well. Not suprised though.

Just curious: how did you find out about these?

Sharp eyes :P

@fangxlmr
Copy link
Contributor Author

fangxlmr commented Dec 3, 2021

@pombredanne Done.

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clean and clear. Looking great. Thank you ++ 🙇
Merging

@pombredanne pombredanne merged commit 1f7fafd into aboutcode-org:develop Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some license files are not valid yaml due to duplicate fields
3 participants