Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent matching bogus parent gitignores #1643

Merged
merged 5 commits into from
Nov 10, 2024

Conversation

jameshilliard
Copy link
Contributor

@jameshilliard jameshilliard commented Jul 26, 2024

If our project root directory is matched by an exclude pattern we should assume the pattern is invalid as our project root is likely in an ignored build directory of another project.

Should fix: #1641

This is essentially a port of the same technique I used to fix a similar poetry core bug.

@jameshilliard jameshilliard force-pushed the validate-exclude-patterns branch 2 times, most recently from 3e12180 to 891d20a Compare July 26, 2024 09:24
@jameshilliard
Copy link
Contributor Author

@ofek Think you could take a look at this?

@jameshilliard
Copy link
Contributor Author

FYI tests on master are currently broken same as here so I don't think this is introducing any regressions.

@mv1005
Copy link

mv1005 commented Sep 27, 2024

Hi there,

I just spent some time on trying to fix my project not to produce empty wheels until I stumbled over this issue.

Why I am writing here is because it was surprising me that such simple condition (at least as appearing to me) like "do not cross project root when scanning for certain files" really is a concern here.

My first thought was: Wouldn't it be enough to simply stop scanning upwards the file tree when we encounter a pyproject.toml or a hatch.toml? As I understand, those files mark the project root. My suggestion obviously is too strict if there is any valid use case for having (relevant) .gitignore-files outside the project root. Are there any?

Markus

@jameshilliard
Copy link
Contributor Author

Wouldn't it be enough to simply stop scanning upwards the file tree when we encounter a pyproject.toml or a hatch.toml?

Technically that would also prevent the issue I was seeing...however that would differ from git's normal gitignore behavior so it may not be desirable.

My suggestion obviously is too strict if there is any valid use case for having (relevant) .gitignore-files outside the project root. Are there any?

If the python project root is in a subdirectory of a git project root directory it's possible that someone may want to use the gitignore above the python project root subdirectory but within the git project root directory to ignore specific files, so there may be a use case here for wanting to try and replicate git's behavior more closely.

The logic of this check as I implemented it should only filter out the use of gitignore in cases where the python project root directory is itself entirely ignored(a case where it is effectively guaranteed that the python project is not tracked by git at all), while still allowing for files within the project root directory to be ignored if needed using gitignores above the project root.

@mv1005
Copy link

mv1005 commented Sep 27, 2024

I see that there are valid use cases for having a relevant .gitignore outside the project root.

In my case, the situation was a bit special, since the python project I was working on was completely unrelated to any VCS. It was just a random project sitting in my user's home.

Ages ago, I decided (and almost forgot about) to put some of the config files in my home under version control with git (literally making my whole home a git repo). Now, to not have random stuff under version control, I used kind of inverse git-ignore logic with a .gitignore like this:

*
!.gitignore
!.bashrc
!.profile
!.bash_aliases
!.bash_profile
!.gitconfig
!.gitpwd.sh
!.Xmodmap
!.vimrc
!.pinerc
!.Xdefaults
[...]

I absolutely admit that this is a very rare case, and I also know that there are better/other ways today to manage user configuration. I just wanted to point out that there might be use cases that involve .gitignore files that are completely unrelated to the python project in question. It is not obvious and at least to me really unexpected that hatch even looks at files outside the project root. That was the reason for me having hard times to figure out what's going on here, and still it feels somewhat strange that "some entity" like hatch crawls up the whole file system and considers any .gitignore it finds.

What if in some completely unrelated .gitignore is a pattern like this?

*.py

I assume this still would break things.

What would have helped me a lot to spot the issue in a straighter fashion:

  • Having a kind of output from hatch that shows me what files are being considered
  • Having an option to disable (maybe even by default) the consideration of files above the project root (depends on how regular this use case is)

@jameshilliard
Copy link
Contributor Author

In my case, the situation was a bit special, since the python project I was working on was completely unrelated to any VCS. It was just a random project sitting in my user's home.

This is designed to handle cases like that, it's designed for really any case where the project root is itself ignored. Does it not work for you?

I just wanted to point out that there might be use cases that involve .gitignore files that are completely unrelated to the python project in question.

I mean, that's effectively what this was designed for, in my case I have a fully ignored python project root build directory inside a meta build system.

It is not obvious and at least to me really unexpected that hatch even looks at files outside the project root.

Yeah, it's not very obvious, so much so that this sort of behavior of matching bogus parent gitignore files was even independently re-implemented in virtually every newer pep517 build backend in some form or another. I managed to fix most of them already but hatch and flit(but not flit-core so there's much less real world issues for downstream integrators) AFAIU are the only ones remaining with fixes pending for this type of bug.

What if in some completely unrelated .gitignore is a pattern like this?

*.py

I assume this still would break things.

If the project root is ignored then this change should disable the use of .gitignore files entirely so having additional more specific ignores like *.py shouldn't be an issue, if you have a parent gitignore which does not ignore the project root then it's assumed that the project is actually tracked by git(which should replicate git's behavior) and thus the ignores will be used. I think this results in the least surprising and expected behavior.

@mv1005
Copy link

mv1005 commented Sep 28, 2024

This is designed to handle cases like that, it's designed for really any case where the project root is itself ignored. Does it not work for you?

Up to know, I just assumed that it would work 🙂. But to confirm that, I ran a quick test by installing your version directly from git inside a virtualenv.

$ pip freeze | grep hatch
hatch @ git+https://github.com/jameshilliard/hatch.git@10d23087d03c7ba51a5564402701d01de0f6f6f6
hatchling==1.25.0
$ hatch --version
Hatch, version 1.12.1.dev18
$ hatch build -t wheel
dist/photo_importer-0.0.1-py3-none-any.whl
$ 7za l dist/photo_importer-0.0.1-py3-none-any.whl 
Listing archive: dist/photo_importer-0.0.1-py3-none-any.whl
[...]

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2020-02-02 00:00:00 .....         1559          593  photo_importer-0.0.1.dist-info/METADATA
2020-02-02 00:00:00 .....           87           86  photo_importer-0.0.1.dist-info/WHEEL
2020-02-02 00:00:00 .....           48           49  photo_importer-0.0.1.dist-info/entry_points.txt
2020-02-02 00:00:00 .....         1097          642  photo_importer-0.0.1.dist-info/licenses/LICENSE.txt
2020-02-02 00:00:00 .....          437          267  photo_importer-0.0.1.dist-info/RECORD
------------------- ----- ------------ ------------  ------------------------
2020-02-02 00:00:00               3228         1637  5 files

As you can see, there are no package files contained in the wheel (there should be a package named photo_importer).

If I either remove my .gitignore file from my home or set the option

[tool.hatch.build]
ignore-vcs = true

in pyproject.toml I get a wheel with the expected contents:

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2020-02-02 00:00:00 .....          127          121  photo_importer/__about__.py
2020-02-02 00:00:00 .....          105           99  photo_importer/__init__.py
2020-02-02 00:00:00 .....         8090         2592  photo_importer/phi.py
2020-02-02 00:00:00 .....         1559          593  photo_importer-0.0.1.dist-info/METADATA
2020-02-02 00:00:00 .....           87           86  photo_importer-0.0.1.dist-info/WHEEL
2020-02-02 00:00:00 .....           48           49  photo_importer-0.0.1.dist-info/entry_points.txt
2020-02-02 00:00:00 .....         1097          642  photo_importer-0.0.1.dist-info/licenses/LICENSE.txt
2020-02-02 00:00:00 .....          680          405  photo_importer-0.0.1.dist-info/RECORD
------------------- ----- ------------ ------------  ------------------------
2020-02-02 00:00:00              11793         4587  8 files

It looks like your fix indeed does not work for my case. Did I miss something?

@jameshilliard
Copy link
Contributor Author

It looks like your fix indeed does not work for my case. Did I miss something?

Hmm, try with print(patterns) above this line and see what it's showing.

@jameshilliard
Copy link
Contributor Author

$ pip freeze | grep hatch
hatch @ git+https://github.com/jameshilliard/hatch.git@10d23087d03c7ba51a5564402701d01de0f6f6f6
hatchling==1.25.0

I think this is your issue, you forgot to actually install my change from this PR, the change is part of the hatchling package not the hatch package, this git repo contains both projects but I only made changes to hatchling.

I tried replicating your environment and it seems that was the issue, also I'm not sure which hatchling package the hatch build -t wheel command uses for the build backend but it seems to not be picking up the one installed in the active virtualenv for some reason, I testing wheel builds with the standard build frontend which seems to work as expected. Presumably if the pypi hatchling version contains this change it would also work.

Currently doesn't work:

hatch build -t wheel

Works:

python -m build --no-isolation .

@mv1005
Copy link

mv1005 commented Sep 29, 2024

Thanks for testing with respect to my setup.

I fixed my test environment and now have:

$ pip freeze | grep hatch
-e git+https://github.com/jameshilliard/hatch.git@10d23087d03c7ba51a5564402701d01de0f6f6f6#egg=hatch
-e git+https://github.com/jameshilliard/hatch.git@10d23087d03c7ba51a5564402701d01de0f6f6f6#egg=hatchling&subdirectory=backend

I installed in editable mode from a local working copy in order to be able to quickly switch branches.

Hereby I can confirm:

  • Running python -m build --no-isolation works
  • Running hatch build does not

I verified that none of the both calls above works when running from master, so your fix is effective.

I investigated a bit more what code actually runs when using hatch build.

  • Uninstalling hatchling leads to import errors, so python seems to pick up the package from inside the virtualenv after all.
    $ hatch build
    ╭───────────────────────── Traceback (most recent call last) ─────────────────────────╮
    │ [...]/jameshilliard/hatch/src/hatch/cli/__init__.py:229                             │
    │ in main                                                                             │
    │                                                                                     │
    
    [...]
    
    │                                                                                     │
    │ [...]/jameshilliard/hatch/src/hatch/project/core.py:334                             │
    │ in metadata                                                                         │
    │                                                                                     │
    │   331 │   @property                                                                 │
    │   332 │   def metadata(self):                                                       │
    │   333 │   │   if self._metadata is None:                                            │
    │ ❱ 334 │   │   │   from hatchling.metadata.core import ProjectMetadata               │
    │   335 │   │   │                                                                     │
    │   336 │   │   │   self._metadata = ProjectMetadata(self.location, self.plugin_manag │
    │   337                                                                               │
    ╰─────────────────────────────────────────────────────────────────────────────────────╯
    ModuleNotFoundError: No module named 'hatchling'
    
  • However, the code path that is covered by your fix does not seem to be executed (no effect of print(patterns), in contrast to when running python -m build)

In my estimation, using hatch build is a very common use case, so we should make it work somehow too. I probably could dig deeper into the code next week to find out the differences between the actually executed code paths of the two build command flavors.

@mv1005
Copy link

mv1005 commented Oct 8, 2024

Ok, it took me a while, but here are my findings.

The reason why hatch build does not pick up local changes you eventually applied to the hatchling backend simply is because it installs an isolated fresh version of hatchling from upstream when setting up it's internal build environment.

I did not dig deeper into the code of hatch or the build config options if there is any way to override this behavior.

As a test, I forcibly installed your patched version of hatchling into the isolated build environment that hatch has created for my project. After doing so, hatch build generates a wheel containing all files as expected 🎉.

As a conclusion:

  • Testing changes in hatchling locally using hatch build is a bit tricky and not straight forward. The best way is to manually invoke python -m hatchling build ..., since this is what hatch build does internally in case the build backend is set to hatchling.build.
  • As soon as your fix regarding the git ignore patterns is released and upstream, it should be picked up by hatch build as well.

Update:

Just after writing this, I found the following way to make hatch automatically use your local version of the build backend.

Set the following in your pyproject.toml:

[build-system]                                                                                                                                                            
requires = ["hatchling @ file:///path/to/your/local/hatch/backend"]                                                                             
build-backend = "hatchling.build" 

@jameshilliard
Copy link
Contributor Author

rebased

@ofek
Copy link
Collaborator

ofek commented Nov 9, 2024

Do you have an example so that I could reproduce?

@jameshilliard
Copy link
Contributor Author

jameshilliard commented Nov 9, 2024

Do you have an example so that I could reproduce?

Just build any package using hatchling from an extracted sdist inside a gitignored directory, for example:

If our top level working git directory is:

/home/buildroot/buildroot/.git

With * in this gitignore file:

/home/buildroot/buildroot/output/.gitignore

Then when building a sdist python package from the following build directory which is extracted from the package sdist:

/home/buildroot/buildroot/output/build/packagename

We will end up with an empty wheel file as all paths underneath this are ignored and incorrectly excluded by hatchling:

/home/buildroot/buildroot/output

If our project root directory is matched by an exclude pattern we
should assume the pattern is invalid as our project root is likely
in an ignored build directory of another project.
@jameshilliard
Copy link
Contributor Author

rebased again

Copy link
Collaborator

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an edge case lurking here regarding matching. I feel like rather than checking if the absolute path matches the logic should check the relative path.

In any case I'm going to merge this, thanks!

@ofek ofek merged commit 27cf52a into pypa:master Nov 10, 2024
50 checks passed
@jameshilliard
Copy link
Contributor Author

jameshilliard commented Nov 10, 2024

I think there is an edge case lurking here regarding matching. I feel like rather than checking if the absolute path matches the logic should check the relative path.

Hmm, so this may be a pre-existing bug, in buildroot we are in fact currently relying on hatchling not parsing a parent .gitignore with an absolute path correctly in order to avoid triggering the bug this PR was intended to fix. Weirdly maturin appears to have completely independently implemented this exact same bug.

For example hatchling doesn't pick up a /output exclude path in:

/home/buildroot/buildroot/.gitignore

But does pick up a * exclude path in:

/home/buildroot/buildroot/output/.gitignore

This is despite to my understanding these having identical meaning by git itself.

@ofek
Copy link
Collaborator

ofek commented Nov 10, 2024

What do you think I should do?

@jameshilliard
Copy link
Contributor Author

What do you think I should do?

Probably you would initially want to go through and root cause the reason why hatchling's gitignore logic is not following normal git rules(i.e. not picking up paths like /output in the above example).

Maybe also try and make sure gitignore path handling is consistent in regards to relative/absolute paths everywhere in the codebase. The relative/absolute path handling in hatchling is at least for me somewhat tricky to reason about due to the complexity of the config.py code. Possibly this complexity is largely unavoidable but I'd assume there would be at least some opportunities there for refactoring in order to simplify things somewhat.

I think there is an edge case lurking here regarding matching.

BTW I'm quite confident the general approach I'm using to fixing the bogus parent matching bug in this PR is overall fairly reliable(since it's worked reliably for a while in other projects like poetry), although I'm likewise not entirely sure about all the edge case in regards to absolute/relative paths are being handled correctly here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Building sdist uses wrong gitignore
3 participants