Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make allowed_deserialization_classes more intuitive #28829

Merged
merged 2 commits into from
Jan 10, 2023

Conversation

bolkedebruin
Copy link
Contributor

Regexps can be tough to get right. Typically someone would like to allow any classes below 'mymodule' to match. For example, 'mymodule.dataclasses' by setting allowed_deserialization_classes to 'mymodule.*'. However this matches everything starting with mymodule, so also mymodulemalicious. This change replaces bare '.' with '..' so it matches the literal '.' as well.

@kaxil


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Regexps can be tough to get right. Typically someone would like
to allow any classes below 'mymodule' to match. For example,
'mymodule.dataclasses' by setting allowed_deserialization_classes
to 'mymodule.*'. However this matches everything starting with
mymodule, so also mymodulemalicious. This change replaces
bare '.' with '\..' so it matches the literal '.' as well.
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@bolkedebruin bolkedebruin merged commit 12adb5e into apache:main Jan 10, 2023
@bolkedebruin bolkedebruin deleted the intuitive branch January 10, 2023 20:17
@kaxil kaxil added this to the Airflow 2.5.2 milestone Jan 15, 2023
@pierrejeambrun pierrejeambrun added the type:misc/internal Changelog: Misc changes that should appear in change log label Feb 27, 2023
@pierrejeambrun
Copy link
Member

Depends on #28829

@barrysteyn
Copy link

barrysteyn commented Sep 4, 2023

@kaxil @bolkedebruin

This change has totally broken things... for example, I have a class that needs to be on the allowed_deserialization_classes list... the error message I get is libs.dynamodb.models.ideas.IdeasModel is not on the allowed list.. I do understand that this is regex that is being matched against, however altering the regex is not (in my opinion) good. Firstly, people who are doing this are technical and should know that a . matches anything.

But things seem broken... For example, I have tested things by putting the following in theallowed_deserialization_classes (remember I am testing libs.dynamodb.models.ideas.IdeasModel for a match):

  1. libs.dynamodb.models.ideas.IdeasModel --> False (not expected)
  2. libs[.]dynamodb[.]models[.]ideas[.]IdeasModel --> True (weird)
  3. libs.dynamodb.* --> False
  4. libs[.]dynamodb.* --> True
  5. libs.* --> True (this mimic'd the unit test of this change).

It is obvious that the replace . with .. is the culprit. Instead, I suggest we undo the changes, and rather document in detail on how to use the regex.

@barrysteyn
Copy link

So, this is the regex replace for this PR:
p = re.sub(r"(\w)\.", r"\1\..", p)

I am no expert, but it seems to me that it replaces word. with word\.. - I think the idea is to match ., however all that does is ensure that there must be a . followed by any character... That is why things are failing...

I don't advocate for changing anything, but if you really want to match just ., then the regex should be p = re.sub(r"(\w)\.", r"\1[.]", p)

@barrysteyn
Copy link

At the very least, if I am using things incorrectly, then add more tests to explain how it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:serialization type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants