Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not exclude package data file only for the wheel #3380

Open
sinoroc opened this issue Nov 19, 2020 · 15 comments
Open

Can not exclude package data file only for the wheel #3380

sinoroc opened this issue Nov 19, 2020 · 15 comments
Labels
area/build-system Related to PEP 517 packaging (see poetry-core) area/ux Features and improvements related to the user experience kind/feature Feature requests/implementations

Comments

@sinoroc
Copy link

sinoroc commented Nov 19, 2020

There seems to be an issue when trying to exclude package data files of the wheel only.

Example project here: https://github.com/sinoroc/poetry-gh-2015

Project file structure:

$ tree Thing
Thing
├── CHANGELOG.rst
├── LICENSE.txt
├── pyproject.toml
├── README.rst
├── test
│   └── test_unit.py
└── thing
    ├── data
    │   ├── file.all
    │   ├── file.bin
    │   ├── file.not
    │   └── file.src
    └── __init__.py

All files are in the git repository except *.bin data files (supposed to be a build artifact).

We want the *.src data files and CHANGELOG.rst in the sdist only. We want *.bin data files in the wheel only. We want *.all data files in both the sdist and the wheel. We do not want any of the .not data files in the distributions.

We also want the test package, but only in the sdist.

.gitignore

/thing/data/*.bin

pyproject.toml:

packages = [
    { include = 'thing' },
    { include = 'test', format = 'sdist' }
]
include = [
    { path = 'CHANGELOG.rst', format = 'sdist' },
    { path = 'thing/data/*.bin', format = 'wheel' },
    { path = 'thing/data/*.src', format = 'sdist' },
]
exclude = [
    { path = 'CHANGELOG.rst', format = 'wheel' },
    { path = 'thing/data/*.src', format = 'wheel' },
    'thing/data/*.not',
]

Content of the sdist:

$ python3 -m tarfile -l dist/Thing-0.1.0.tar.gz 
Thing-0.1.0/CHANGELOG.rst 
Thing-0.1.0/LICENSE.txt 
Thing-0.1.0/README.rst 
Thing-0.1.0/pyproject.toml 
Thing-0.1.0/test/test_unit.py 
Thing-0.1.0/thing/__init__.py 
Thing-0.1.0/thing/data/file.all 
Thing-0.1.0/thing/data/file.src 
Thing-0.1.0/setup.py 
Thing-0.1.0/PKG-INFO 

Content of the wheel:

$ python3 -m zipfile -l dist/Thing-0.1.0-py3-none-any.whl 
File Name                                             Modified             Size
thing/__init__.py                              1980-01-01 00:00:00            0
thing/data/file.all                            1980-01-01 00:00:00            0
thing/data/file.bin                            1980-01-01 00:00:00            0
thing/data/file.src                            1980-01-01 00:00:00            0
thing-0.1.0.dist-info/LICENSE.txt              1980-01-01 00:00:00            8
thing-0.1.0.dist-info/WHEEL                    2016-01-01 00:00:00           83
thing-0.1.0.dist-info/METADATA                 2016-01-01 00:00:00         1515
thing-0.1.0.dist-info/RECORD                   2016-01-01 00:00:00          577

Somehow the thing/data/file.src appears in the wheel, which is unexpected and not what we want.

Example project: https://github.com/sinoroc/poetry-gh-2015

@sinoroc sinoroc added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Nov 19, 2020
@finswimmer
Copy link
Member

exclude = [
    { path = 'CHANGELOG.rst', format = 'wheel' },
    { path = 'thing/data/*.src', format = 'wheel' },
]

At the moment this syntax isn't support. exclude must be a list of path(globs). We should add the ability to work just like include.

fin swimmer

@finswimmer finswimmer added kind/feature Feature requests/implementations area/ux Features and improvements related to the user experience and removed kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Nov 20, 2020
@zoj613
Copy link

zoj613 commented Nov 27, 2020

@sinoroc Have you found a workaround for this yet? I am need of a similar strategy but it seems like no files are excluded conditional on the type of packaging.

@zoj613
Copy link

zoj613 commented Nov 27, 2020

A workaround I came up with is to just only include the necessary files only in package, then use include to add files based on if they should be in sdist or wheel distributions.

@sinoroc
Copy link
Author

sinoroc commented Nov 27, 2020

@zoj613 What would you change in my example to make it work with your workaround?

@zoj613
Copy link

zoj613 commented Nov 27, 2020

@sinoroc Here is how I would write the pyproject.toml

name = 'thing'
...

packages = [
	{ include = 'thing/*.py' },
	{ include = 'thing/**/*.all' }
]
include = [
    { path = 'CHANGELOG.rst', format = 'sdist' },
    { path = 'thing/data/*.src', format = 'sdist' },
	{ path = 'test', format = 'sdist' },

    { path = 'thing/data/*.bin', format = 'wheel' },
]
exclude = ['thing/**/*.not']

This approached worked for me. Basically, include everything thats required by both distributions in packages, then use include to selectively add files based on format. The use exclude to ignore files in both distributions.

@sinoroc
Copy link
Author

sinoroc commented Nov 28, 2020

At this point... What is the difference between packages and include?

@zoj613
Copy link

zoj613 commented Nov 28, 2020

At this point... What is the difference between packages and include?

my guess is that packages determines the base files that get included regardless of your distribution format. Think of it as the opposite of exclude. include is then used to selectively include files that are not python modules and also allows you to select them based on distribution format. I think if you just include non-python files in packages list then you will get a "not a python module error" or something like that.

@sinoroc
Copy link
Author

sinoroc commented Nov 28, 2020

packages determines the base files that get included regardless of your distribution format

But, packages also allows to select target distribution formats. It's also OK if there are 2 ways to achieve the same thing. It's a bit confusing though... Because as far as I understood, packages and include achieve the same things, except that packages also accepts from.

@zoj613
Copy link

zoj613 commented Nov 28, 2020

packages determines the base files that get included regardless of your distribution format

But, packages also allows to select target distribution formats. It's also OK if there are 2 ways to achieve the same thing. It's a bit confusing though... Because as far as I understood, packages and include achieve the same things, except that packages also accepts from.

Yeah I guess the maintainers will be better at explaining this than me. I only just worked from my own understanding based on trial and error. Maybe the functionality of these needs better documentation.

@mmerickel
Copy link

The wheel should really only include python package files, and anything outside of there should be excluded by default. This is how setuptools works with MANIFEST.in and include_package_data=True. I shouldn't need to explicitly exclude files from the wheel that are not in the package.

I ran into this while trying to convert Pyramid's scaffolds into poetry and there is not a way right now to get the sdist/wheel to match with how they are built with setuptools because the wheel includes lots of erroneous files like .coveragerc, development.ini, pytest.ini, etc at the root of the repo (stuff that we do want in the sdist).

@sinoroc
Copy link
Author

sinoroc commented Jan 8, 2021

Related: #2809

@zoj613
Copy link

zoj613 commented Feb 28, 2021

Is there any way to make the exclude command work without any hacks? doing exclude=["dir/*.pyx"] does nothing. It includes everything into the wheel. The previous workaround I suggested causes any shared libraries to copied outside of the package's folder after install.

@MarcSeebold
Copy link

MarcSeebold commented Aug 3, 2021

Is there any way to make the exclude command work without any hacks?

No. ./core/masonry/builders/builder.py calls get_vcs in ./core/vcs/__init__.py which runs git rev-parse--show-toplevel.

There's a try around the git cmd though. Possible hacks:

  • Specifically include the file in pyproject.toml (overrides vcs_ignored_files)
  • Make the git cmd fail. E.g., pushd $(mktemp -d) && echo "#!/bin/bash" > git && export PATH=$(pwd):$PATH && popd
  • Modify poetry

@zoj613
Copy link

zoj613 commented Aug 3, 2021

@MarcoSeebold, the previous method I suggested works and having the shared library install outside the project root isnt a big deal since it loads just when importing the package.

@bbatliner
Copy link

Running @sinoroc's example on poetry 1.1.11, I find that the *.bin files get included in the sdist despite having format = "wheel" in the pyproject.toml. It would seem that a file's presence in any include entry marks the file for inclusion in the package.

$ python3 -m tarfile -l dist/Thing-0.1.0.tar.gz 
Thing-0.1.0/CHANGELOG.rst 
Thing-0.1.0/LICENSE.txt 
Thing-0.1.0/README.rst 
Thing-0.1.0/pyproject.toml 
Thing-0.1.0/test/test_unit.py 
Thing-0.1.0/thing/__init__.py 
Thing-0.1.0/thing/data/file.all 
Thing-0.1.0/thing/data/file.bin *****
Thing-0.1.0/thing/data/file.src 
Thing-0.1.0/setup.py 
Thing-0.1.0/PKG-INFO

This is a different output than posted at the top of this issue, so I'm not sure what changed to produce this behavior.

This appears tied to some VCS/.gitignore functionality. Removing the *.bin from the include list:

include = [
    { path = 'CHANGELOG.rst', format = 'sdist' },
    #{ path = 'thing/data/*.bin', format = 'wheel' },
    { path = 'thing/data/*.src', format = 'sdist' },
]

appears to correctly use the .gitignore to exclude the *.bin file from the sdist:

$ python3 -m tarfile -l dist/Thing-0.1.0.tar.gz 
Thing-0.1.0/CHANGELOG.rst 
Thing-0.1.0/LICENSE.txt 
Thing-0.1.0/README.rst 
Thing-0.1.0/pyproject.toml 
Thing-0.1.0/test/test_unit.py 
Thing-0.1.0/thing/__init__.py 
Thing-0.1.0/thing/data/file.all 
Thing-0.1.0/thing/data/file.src 
Thing-0.1.0/setup.py 
Thing-0.1.0/PKG-INFO

But including *.bin in include, regardless of its format, seems to override the .gitignore, which seems like a bug.

The only workaround is to explicitly name the files in packages that you want included, then use include to selectively add what you want to each distribution, like @zoj613 suggests. However this prevents poetry from embedding "package" data to the sdist and wheels, meaning pip naively extracts your package to site-packages (in my case, I have a src directory in version control, so my package ends up at site-packages/src/<package>, whereas if I use { include = "<package">, from = "src" }, then pip is able to install my module at site-packages/<package> correctly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build-system Related to PEP 517 packaging (see poetry-core) area/ux Features and improvements related to the user experience kind/feature Feature requests/implementations
Projects
None yet
Development

No branches or pull requests

6 participants