Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize NSVCA fields on YAML parser #3346

Merged
merged 3 commits into from
Dec 12, 2023

Conversation

pedro-psb
Copy link
Member

This PR adds support for reading the literal value (non-casted value) of an unquoted number from a YAML file in the parse_modular function.

This is a customization of the YAML spec, but is the expected behavior for this context (see #3285)

  • Change yaml.safe_load to yaml.load(..., Loader=yaml.BaseLoader)
  • Add unit test to parse_modular for the specific issue
  • Small refactors

@github-actions github-actions bot added the multi-commit Added when a PR consists of more than one commit label Dec 1, 2023
@pedro-psb pedro-psb changed the title Fix: Sanitize NSVCA fields on YAML parser Sanitize NSVCA fields on YAML parser Dec 1, 2023
@pedro-psb
Copy link
Member Author

@dralley The test is failing with the same permission issue I was having locally, so I guess I won't open an issue about that on oci-env.

I'm not sure how to approach this exactly, but I can do some exploration work next week.

usr/local/lib/python3.8/site-packages/pulp_rpm/app/modulemd.py:192: in parse_modular
    for module in split_modulemd_file(file):
usr/local/lib/python3.8/site-packages/pulp_rpm/app/modulemd.py:71: in split_modulemd_file
    with tempfile.TemporaryDirectory(dir=".") as tf:
usr/lib64/python3.8/tempfile.py:780: in __init__
    self.name = mkdtemp(suffix, prefix, dir)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

suffix = '', prefix = 'tmp', dir = '.'

    def mkdtemp(suffix=None, prefix=None, dir=None):
        """User-callable function to create and return a unique temporary
        directory.  The return value is the pathname of the directory.
    
        Arguments are as for mkstemp, except that the 'text' argument is
        not accepted.
    
        The directory is readable, writable, and searchable only by the
        creating user.
    
        Caller is responsible for deleting the directory when done with it.
        """
    
        prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)
    
        names = _get_candidate_names()
        if output_type is bytes:
            names = map(_os.fsencode, names)
    
        for seq in range(TMP_MAX):
            name = next(names)
            file = _os.path.join(dir, prefix + name + suffix)
            _sys.audit("tempfile.mkdtemp", file)
            try:
>               _os.mkdir(file, 0o700)
E               PermissionError: [Errno 13] Permission denied: './tmpu6rk65ns'

@dralley
Copy link
Contributor

dralley commented Dec 4, 2023

Have you verified that the tests fail without this change?

@pedro-psb
Copy link
Member Author

Have you verified that the tests fail without this change?

Yes. The failure was the one we expected

@pedro-psb pedro-psb force-pushed the fix/sanitize-nsvca-fields branch 3 times, most recently from 5f0c091 to d34ad7e Compare December 4, 2023 16:12
@dralley
Copy link
Contributor

dralley commented Dec 4, 2023

Ok, so there is an exception:

Almost everything is text fields, but we do have a boolean or two, and some JSON that need to be stored as such. The JSON might be OK currently, but it looks like the booleans are not.

The CI errors are referring to this field: https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/models/modulemd.py#L30

So we need to find a way to make sure floats are parsed as strings, but not booleans and such.

@pedro-psb
Copy link
Member Author

So we need to find a way to make sure floats are parsed as strings, but not booleans and such.

I was thinking about doing a simple bool casting, but still, other fields inside those Json could end up messed up.

I'll go back to the more conservative approach of the custom Loader, which provides per-field control over not casting it. Then I'll add only the NSVCA fields as fields which should "bypass YAML" (thus, always be str), as we know there are no type problems with that. From our previous discussion on matrix, I believe this is safe (I can further look into that).

Wdyt?

"""Custom Loader that preserve unquoted float in specific fields"""

# Field to preserve (will bypass yaml casting)
PRESERVED_FIELDS = ("name", "stream", "version", "context", "arch")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this ought to handle the stream field on ModulemdDefaults and ModulemdObsoletes as well, but we ought to make sure it's tested

https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/models/modulemd.py#L101

We'll talk about this a bit more on a call

@pedro-psb pedro-psb force-pushed the fix/sanitize-nsvca-fields branch 3 times, most recently from 69cfd7b to d1f7fe6 Compare December 7, 2023 17:27
@@ -90,8 +89,7 @@ def test_parse_modular_preserves_literal_unquoted_values_3285(tmp_path):
with open(file_name, "w") as file:
file.write(sample_file_data)

mock_obj = SimpleNamespace(path=str(file_name))
all, defaults, obsoletes = parse_modular(mock_obj)
all, defaults, obsoletes = parse_modular(str(file_name))
Copy link
Contributor

@dralley dralley Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to get rid of the str() around file_name, I'm not sure why that was here before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The str is there because I was using a Path object before. Will remove it



class ModularYamlLoader(yaml.SafeLoader):
"""Custom Loader that preserve unquoted float in specific fields"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave a bit more detail, including the issue number where this was originally encountered, and perhaps point to the libmodulemd equivalent code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, where this code in particular came from

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

# parse_modular changes the structure and key names for obsoletes
assert modulemd_obsoletes["module_name"] == "perl"
assert modulemd_obsoletes["module_stream"] == "5.30"
assert modulemd_obsoletes["obsoleted_by_module_name"] == "5.40"
Copy link
Contributor

@dralley dralley Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be obsoleted_by_module_stream?

edit: oh look, a bug

the actual metadata is unaffected since we use raw yaml snippets. Our other tests are primarily interested in this, and so it wasn't caught. Only the metadata Pulp itself keeps for reference is wrong

This is a pretty minor bug, all things considered, because module obsoletes are basically unused and even less often would this metadata in question be referenced. However we should probably fix it anyways.

Copy link
Contributor

@dralley dralley Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be

assert modulemd_obsoletes["obsoleted_by_module_name"] == "perl"
assert modulemd_obsoletes["obsoleted_by_module_stream"] == "5.40"

Let's also fill these tests out with a few more fields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found "obsoleted_by_module_name" odd, but just thought, "it is what it is".
Will fix the constant name and test more fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Nice job. You can trust your instincts, though! Just because it works in a certain way currently, doesn't mean it's right :)

@pedro-psb pedro-psb force-pushed the fix/sanitize-nsvca-fields branch 2 times, most recently from 3caeb02 to a9d5d95 Compare December 11, 2023 13:40
@pedro-psb
Copy link
Member Author

When adding more asserts, I've found that modulemd_defaults does not use modified. I've left it as is because it does not seem useful to add new things if that's not obviously wrong (as the module_name thing).

@pedro-psb
Copy link
Member Author

pedro-psb commented Dec 11, 2023

The failure is due to the fix in obsoletes constant name, I'm taking a look into the test (will add those changes to another commit)

@dralley
Copy link
Contributor

dralley commented Dec 11, 2023

Most likely, the constant in the test is wrong. What I did was take the output from the initial implementation, check it by eye to ensure it was correct, and then commit that. But it's possible I missed something, in which case the constant would be incorrect.

@pedro-psb pedro-psb force-pushed the fix/sanitize-nsvca-fields branch 2 times, most recently from 85b925b to 3b40687 Compare December 11, 2023 18:16
- Replace yaml.safe_load by YAML loader that prevents unquoted numbers
  in NSVCA fields to loose trailing/heading zeros.
- Add unit test to `parse_modular` for the specific issue

closes pulp#3285
- Pass only necesary data to parse_modular (the path string)
- Improve some docstrings and simple types

[noissue]
modulemd_obsoletes["message"]
== "Module stream perl:5.30 is no longer supported. Please switch to perl:5.32"
)
assert modulemd_obsoletes["obsoleted_by_module_stream"] == "5.40"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add obsoleted_by_module_name also

Copy link
Contributor

@dralley dralley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! This is good to go with one last small addition to the test

OBSOLETE_BY_STREAM attr was named as "obsoleted_by_module_name" instead
of "obsoleted_by_name_stream". Fix tests accordingly.

[noissue]
@dralley dralley merged commit 37b11cb into pulp:main Dec 12, 2023
16 checks passed
@pedro-psb pedro-psb deleted the fix/sanitize-nsvca-fields branch December 12, 2023 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-commit Added when a PR consists of more than one commit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants