Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance documentation of checksums easyconfig parameter #104

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Jan 20, 2023

Add more examples of which formats are possible.
Also contains not-yet implemented or broken features!

This is related to easybuilders/easybuild-framework#4142, easybuilders/easybuild-framework#4177, easybuilders/easybuild-framework#4150, easybuilders/easybuild-framework#4159, easybuilders/easybuild-framework#4164

We have https://github.com/easybuilders/easybuild-framework/blob/e3681ae53628400096f3e29d58e787bf1173f27b/test/framework/type_checking.py#L184-L225

and https://github.com/easybuilders/easybuild-framework/blob/e3681ae53628400096f3e29d58e787bf1173f27b/test/framework/type_checking.py#L709-L719

which tests various formats for checksums.

We have code in get_checksum_for which supports checksums being a dict instead of a list: https://github.com/easybuilders/easybuild-framework/blob/e3681ae53628400096f3e29d58e787bf1173f27b/easybuild/framework/easyblock.py#L369

However this is not supported by the type-checking code and hence to_checksums will be called which would iterate over the dicts keys mistaking them for checksums!!! https://github.com/easybuilders/easybuild-framework/blob/e3681ae53628400096f3e29d58e787bf1173f27b/easybuild/framework/easyconfig/types.py#L475

Furthermore this is made more difficult by the YEB files which use a YAML format where tuples are not supported. E.g. a test file has this:

checksums: [[
    'be662daa971a640e40be5c804d9d7d10',  # default [MD5]
    '44332000aa33b99ad1e00cbd1a7da769220d74647060a10e807b916d73ea27bc',  # default (SHA256)
    ['adler32', '0x998410035'],
    ['crc32', '0x1553842328'],
    ['md5', 'be662daa971a640e40be5c804d9d7d10'],
    ['sha1', 'f618096c52244539d0e89867405f573fdb0b55b0'],
    ['size', 273],
]]

This is a checksum entry for a single file. Obviously the inner-most 2-element lists should be converted to tuples and the outer-most list should be a list. But it isn't clear what to do with the 2nd list: Are those alternative checksums where only one needs to match or additional checksums that all need to match? Both cases should be supported.

I would argue that an additional level can be used: checksums: [[['mainchecksum', 'altchecksum']]] The type conversion code can deduce that after the 2nd level of lists only tuples may be specified as a list (i.e. an AND) inside a list (already an AND) doesn't make sense, so the 2nd level list is an AND consisting only of a single element, which is redundant but ok.

And finally specifying None in a dict currently yields the same error as not specifying it, see easybuilders/easybuild-framework#4142

So things to decide:

  • Do we allow checksums to be a dict? This would make the base check (len(checksums) = len(srcs+patches)) impossible. So I'd keep it a list.
  • What do we allow as values of dicts? Should we support the logical AND/OR semantic possible or only strings/type-value-tuples?
  • I'd always disallow putting a dict anywhere inside a dict, that's what I did in Fix to_checksums with None values in dicts and recursion easybuild-framework#4159 as I see no reason why you'd want that. Or is there any?
  • How do we handle None? In a tuple it doesn't make sense, so I'd disallow it there. For a dict should we differ between having a key-None entry or not having any entry especially related to --enforce-checksums?

I see 2 ways here:

  1. dicts can only contain checksums/type-checksum-tuples or None
  2. allow full power, so that the value of a dict entry is handled the same as a "top-level entry" except it disallows more dicts.

As for the missing-key case: I'd handle it the same as not specifying it: Error when enforce_checksums is active else treat as matched. I'd treat it as an error such that typos can be detected as otherwise e.g. 'src_X86_64.tgz' might be used instead of 'src_x86_64.tgz' looking valid

Port of easybuilders/easybuild#853 to the new markdown format

docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
@Flamefire
Copy link
Contributor Author

@smoors Thanks for the review. Besides the stylistic issues I wanted to put the general handling up for discussion, see the 2nd part of the description (starting at "So things to decide:")

So those questions should be answered first, maybe in the next confcall (which I can't attend unfortunately)

docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
docs/writing-easyconfig-files.md Outdated Show resolved Hide resolved
@Flamefire
Copy link
Contributor Author

@boegel Thanks for the review, however I'd like a decision first on "things to decide:" in the OP.

Also are you sure about e.g. #104 (comment) ? 0123456789...abcdef isn't a valid checksum either and using 'oldchecksum', 'newchecksum' is IMO a good example on the reason/situation where you may want to use that format.

@boegel boegel changed the title Enhance documentation of checksums EC parameter enhance documentation of checksums easyconfig parameter Sep 13, 2023
@Flamefire Flamefire marked this pull request as ready for review November 16, 2023 08:27
@Flamefire
Copy link
Contributor Author

I updated this with the review comments applied especially as currently not being able to use alternative checksums in dict entries came up, so this should be resolved timely. Decisions I basically assumed and documented which I'd like to outline:

  • The checksums parameter cannot be a dict, but must be a list such that the number of checksums can be easily checked against the number of files
  • Dict entries are basically the same as specifying the value directly except of course they are selected based on the filename
  • So list, tuples and lists of tuples are allowed as entries in a dict, as is "None"
  • dicts in dicts are not allowed

Missing from documentation and possibly implementation:
Having None in a tuple or list should be an error as it is either superflous or makes the alternatives always pass

Flamefire and others added 4 commits July 12, 2024 14:13
Add more examples of which formats are possible.
Also contains not-yet implemented or broken features.
Co-authored-by: Sam Moors <smoors@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants