Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising behaviour with compileall -s STRIPDIR parameter #105931

Closed
hetmankp opened this issue Jun 20, 2023 · 8 comments
Closed

Surprising behaviour with compileall -s STRIPDIR parameter #105931

hetmankp opened this issue Jun 20, 2023 · 8 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@hetmankp
Copy link
Contributor

hetmankp commented Jun 20, 2023

The '-s' stripdir parameter to compileall has very unusual semantics that could lead to unexpected and potentially confusing outcomes.

For example, when run like this: python -m compileall -s/path/to/build/dst -p /lib /path/to/build/src/file.py

The source file path written to the compiled file will be: /lib/src/file.py. In other words the source file path is partially stripped even though the strip prefix doesn't fully match the source file path. Since this was introduced in #16012 I'd like to get @frenzymadness to comment if this was intentional, because it seems that it could be a source of really confusing bugs for end users who accidentally misspell something in their invocation and get a partially stripped result.

It gets weirder than that however, when run like this: python -m compileall -s/path/to/another/src -p /lib /path/to/build/src/file.py

The source file path written to the compiled file will be: /lib/build/file.py. The matching directories are removed around the non-matching ones. Even if there was some reason for the first behaviour it seems this second one is absolutely not a good idea.

I'm happy to fix this once it is confirmed this behaviour is unintentional.

Linked PRs

@hetmankp hetmankp added the type-bug An unexpected behavior, bug, or error label Jun 20, 2023
@frenzymadness
Copy link
Contributor

Hello @hetmankp. Thanks for the report and sorry that it took me so long to get to it. You are right that the behavior is surprising. We never thought about it this way because we use compileall in the build system where all the building happens in the buildroot which is an absolute directory the same for all files and we just strip it and prepend / to make the path absolute after that.

I'm looking at the code right now and it seems that we can implement two possible fixes:

  1. Strip all matching parts left to right and stop when the first difference is found.
  2. Refuse to strip anything if the stripdir doesn't fully match the path.

The first one should be simple to implement. I think that an else branch and break in the for loop is all we need in this part of compileall:

    if stripdir is not None:
        fullname_parts = fullname.split(os.path.sep)
        stripdir_parts = stripdir.split(os.path.sep)
        ddir_parts = list(fullname_parts)

        for spart, opart in zip(stripdir_parts, fullname_parts):
            if spart == opart:
                ddir_parts.remove(spart)

        dfile = os.path.join(*ddir_parts)

Feel free to open a PR.

@frenzymadness
Copy link
Contributor

One more note, the behavior starts to be surprising when both paths have the same folders at the same positions.

This is fine:
fullname: /tmp/a/b/c/d/foo.py
stripdir: /tmp/a/d/
result: b/c/d/foo.py

But this is not:
fullname: /tmp/a/b/c/d/foo.py
stripdir: /tmp/a/z/x/d/
result: b/c/foo.py

Option 2 mentioned above have two sub-options:
a. Ignore stripdir arg if it doesn't match with path to the file.
b. Raise an error or a warning.

@hetmankp
Copy link
Contributor Author

hetmankp commented Aug 9, 2023

Hi @frenzymadness , I think I would personally lean towards solution #2. I still can't think of any situations where solution #1 would be useful (unless there's something in your build system that requires it) since any build system should take care of providing the correct paths anyway, but what it can do is introduce subtle hard to detect bugs. On the other hand, solution #2 will create a glaringly obviously wrong outcome which would be noticed much more quickly and so likely fixed.

Now as for 2a vs 2b. I've never worked with a build system that would exhibit behaviour 2b, it does feel more idiomatically pythonic but I wonder if it could cause breakages in existing build systems which could still limp along producing something useful even if it had weird paths embedded in it. I think 2a might be the safer option no to break existing systems too much and not cause end user complaints. Maybe 2b could be passable if we also added a command line option to suppress such errors so people who need a quick hack around it can have one, though I wonder if that's just adding too much cruft to the system.

I'm happy to do a PR any way we decide to go, let me know if you have any further thoughts once you've had a chance to sleep on it.

@frenzymadness
Copy link
Contributor

I've discussed this with my colleague and we think that the best course of action is to implement 2a together with 2b. That means that if fullname doesn't start with stripdir (full match from the beginning of the path), ignore stripdir for that file entirely and raise a warning about it.
compileall already has the command line option for suppressing output so this warning can be muted by -qq.

@hetmankp
Copy link
Contributor Author

hetmankp commented Aug 9, 2023

A warning sounds like a great middle ground, I like it. I'll try to put something together in the next few days time permitting.

@frenzymadness
Copy link
Contributor

@hetmankp will you have some time to tackle this issue? I can do it, if you want to. It'd be better sooner than later for me until I lose the context.

hetmankp added a commit to hetmankp/cpython that referenced this issue Aug 30, 2023
This resolves issue python#105931. Currently the '-s STRIPDIR' option on
compileall can lead to some surprising results as it only strips away
path components that match, but leaves alone the non-matching ones
interspersed in between.

For example, with: python -m compileall -s/path/to/another/src
/path/to/build/src/file.py

The resulting written path will be: build/file.py

This fix only strips directories that are a fully matching prefix of the
source path. If a stripdir is provided that is not a valid prefix, a
warning will be displayed (which can be silenced with '-qq').
@hetmankp
Copy link
Contributor Author

Hi @frenzymadness , sorry for the rather long delay (it was just bad timing for me).

I've submitted a pull request, please take a look and let me know if everything looks correct from what you can see.

Note that the new implementation also disallows mixing of absolute with non-absolute paths when it comes to the stripdir and the full path, which I believe is the correct behaviour.

Also let me know if the formatting for the warning message is what you had in mind. When compiling a whole directory, the warning will be issued for every single file so I figured including the actual paths in question in the warning message made sense, not sure if we want to do anything more fancy than that (it would require duplication of some of the stripdir logic for example).

encukou pushed a commit that referenced this issue Oct 23, 2023
Before, the '-s STRIPDIR' option on
compileall lead to some surprising results as it only strips away
path components that match, but leaves alone the non-matching ones
interspersed in between.

For example, with: python -m compileall -s/path/to/another/src
/path/to/build/src/file.py

The resulting written path will be: build/file.py

This fix only strips directories that are a fully matching prefix of the
source path. If a stripdir is provided that is not a valid prefix, a
warning will be displayed (which can be silenced with '-qq').
@encukou
Copy link
Member

encukou commented Oct 23, 2023

Thank you @hetmankp for the report & fix, and @frenzymadness for the review!

@encukou encukou closed this as completed Oct 23, 2023
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…GH-108671)

Before, the '-s STRIPDIR' option on
compileall lead to some surprising results as it only strips away
path components that match, but leaves alone the non-matching ones
interspersed in between.

For example, with: python -m compileall -s/path/to/another/src
/path/to/build/src/file.py

The resulting written path will be: build/file.py

This fix only strips directories that are a fully matching prefix of the
source path. If a stripdir is provided that is not a valid prefix, a
warning will be displayed (which can be silenced with '-qq').
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…GH-108671)

Before, the '-s STRIPDIR' option on
compileall lead to some surprising results as it only strips away
path components that match, but leaves alone the non-matching ones
interspersed in between.

For example, with: python -m compileall -s/path/to/another/src
/path/to/build/src/file.py

The resulting written path will be: build/file.py

This fix only strips directories that are a fully matching prefix of the
source path. If a stripdir is provided that is not a valid prefix, a
warning will be displayed (which can be silenced with '-qq').
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants