-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to the User Guide's Data Files page #3335
Conversation
- All code snippets were given for `setup.py`. Have added corresponding snippets for `setup.cfg` and `pyproject.toml`. - To avoid incentivizing multiple top-level packages, have modified all the package trees and code snippets to include only a single package `mypkg`. Have added a separate example to illustrate the functionality of using the empty string `""` / the asterisk `*` for capturing data files from multiple packages. Have also modified the `setup.py` code snippets and removed the `find_packages("src")` since there is only a single package in each case (except one); have opted to explicitly name the package instead. - Have added a package tree example for the first `package_data` snippet. Have also added a package tree / code snippet example to show how `package_data` patterns should include subdirectories, separating it from the example showing the empty string `""` / asterisk `*` functionality. - Tried to have consistent naming for all directories and data files used in the package trees and code snippets. All directories have been named `mypkg` and data files have been named `data1.txt`, `data2.rst` etc. - Have reformatted package tree examples. Reformatting has been done by replacing the only-indentation based directory structure diagram with a line-based tree layout; I think this looks neater. - Have added `.. note::` blocks for paragraphs that would be more appropriately phased as a Note. Other minor changes to text content have been made.
Thank you very much @codeandfire, the improvements are very appreciated 👏 🎉 I would like to suggest some changes in the light of the recent discussions #3323: I think we should encourage people to consider |
@abravalheri Sure, sounds interesting, will look forward to this PR since I'm considering implementing a similar layout in one of my own projects! May I ask, though, what ramifications this will have for the advice given here? Especially since the linked EDIT: I think this has been tried in the discussion #3323 that you have linked. In one of the comments in the thread, one of the users has tried using |
I did a quick test: rm -rf /tmp/myproj
mkdir /tmp/myproj
cd /tmp/myproj
cat <<EOS > pyproject.toml
[build-system]
build-backend = "setuptools.build_meta"
requires = ["setuptools", "setuptools_scm[toml]"]
[project]
name = "myproj"
dynamic = ["version"]
[tool.setuptools_scm]
EOS
mkdir -p src/myproj/data
touch src/myproj/__init__.py
echo "hello world" > src/myproj/data/data1.txt
echo 42 > src/myproj/data/data2.json
tree
# .
# ├── pyproject.toml
# └── src
# └── myproj
# ├── __init__.py
# └── data
# ├── data1.txt
# └── data2.json
git init .
git add .
git commit -m "Initial commit"
virtualenv .venv
.venv/bin/python -m pip install -U pip importlib-resources
.venv/bin/python -m pip install .
ls .venv/lib/python*/site-packages/myproj/data # making sure data files are installed .venv/bin/python
>>> import importlib_resources, myproj.data
>>> importlib_resources.files(myproj.data).joinpath("data1.txt").read_text()
'hello world\n' Here I used
I also tried: >>> import importlib_resources, myproj
>>> importlib_resources.files(myproj).joinpath("data/data1.txt").read_text()
'hello world\n' Maybe |
I also did a second example: rm -rf /tmp/myproj
mkdir /tmp/myproj
cd /tmp/myproj
cat <<EOS > pyproject.toml
[build-system]
build-backend = "setuptools.build_meta"
requires = ["setuptools", "setuptools_scm[toml]"]
[project]
name = "myproj"
dynamic = ["version"]
[tool.setuptools_scm]
EOS
mkdir -p src/myproj/data
echo "hello world" > src/myproj/data/data1.txt
echo 42 > src/myproj/data/data2.json
tree
# .
# ├── pyproject.toml
# └── src
# └── myproj
# └── data
# ├── data1.txt
# └── data2.json
git init .
git add .
git commit -m "Initial commit"
virtualenv .venv
.venv/bin/python -m pip install -U pip importlib-resources
.venv/bin/python -m pip install .
>>> import importlib_resources, myproj.data
>>> importlib_resources.files(myproj).joinpath("data/data1.txt").read_text()
'hello world\n' Everything still seems to work fine even with a namespace only package. |
The support for namespace packages seems to have been added to recent versions of |
You're right, it's the docs that are outdated. I have raised a PR with them python/importlib_resources#251. |
Have tried to make the working of the `include_package_data` option as clear as possible. - Added a package tree - Tried to clearly state that the data files must be either included in `MANIFEST.in`, or tracked by a VCS, in order for them to be included in the installation of the package, when `include_package_data=True`. - Added a `MANIFEST.in` snippet to make things more clear.
@abravalheri from this, should I take it to mean that we are encouraging a scheme in which we set |
My original idea is not exactly about using It is more about not writing: package_data={"mypkg": ["*.txt", "data/*.rst"]} but instead: package_data={
"mypkg": ["*.txt"],
"mypkg.data": ["*.rst"],
} |
I see. I'll go ahead and amend my PR then, to include these changes, and maybe then you can review them and let me know of any further changes. |
Removed the statement within the parentheses, since the example which follows does not illustrate this specific example (of having documentation files that you may not want to include in the installation). Besides the `exclude_package_data` option covers this exact use case in a later example.
Modified code snippets for `package_data` example with `data` subdirectory to treat the `data` subdirectory as a namespace package. Also modified a paragraph below these snippets.
Made them consistent with the snippets given on the Package Discovery page. - Instead of enumerating a list of all the packages in `packages`, using `find_packages` or `find:` instead. The `find_packages` call in `setup.py` contains a `where` argument. In `setup.cfg`, included the section `options.packages.find` with a `where` option. - Instead of supplying the same `package_dir` for each package, using an empty string to indicate a `package_dir` for all packages. - In `pyproject.toml`, using the `where` option instead of `package-dir`. - Textual changes.
Tried to make why this option is useful more clear.
Made them consistent with the snippets given on the Package Discovery page. The changes made here are similar to the changes made to the previous example.
In the end of the document, in the summary section, there is a line stating that the files matched by `package_data` do not require a corresponding `MANIFEST.in` or a revision control system plugin. Have included this note higher up in the document because I felt it may be of interest to users and they might miss this line so far down the document.
- Added example package tree - Added snippet on how typically the `__file__` attribute would be used - Added snippet showing usage of `importlib.resources` with the `files()` API - Added notes on compatibility of this code with different Python versions along with references - Added snippet to show usage of `importlib_resources` backport
I believe this footnote is outdated and not required in lieu of the added notes describing compatibility with different Python versions
For consistency.
This footnote describes what Setuptools considers as a data file. This note is important and may be missed by the reader if it is kept as a footnote, hence I have copied its contents up ahead in the document, just after the `include_package_data` example.
- Added `include_package_data`, `package_data` and `exclude_package_data` sections to make clear the three options provided by Setuptools to manage data files. - Added a separate section illustrating the use of a `data` subdirectory, after these three sections. - Placed the summary of the three options under a Summary section. - Changed the levels of the last two sections to match the level of the five sections added. - Small changes. Changed the wording where appropriate to suit the new flow. Changed a paragraph on path separators in glob patterns to a Note.
Just to make it clear that we can use either one of `package_data` or `include_package_data` and not just the former.
@abravalheri sorry for the delay, but have amended my PR to incorporate the changes suggested by you along with a bunch of other changes. Let me know your feedback. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @codeandfire, those are really very good improvements.
(Sorry for the delay in providing a review/suggestions)
I added some small comments, but most of them refer mainly to existing content that was already present in the previous version. I am very happy to accept this PR as it is and do small adjustments later if necessary.
Thank you very much @codeandfire, I think I will go ahead and merge the PR, unless you would like to work in other changes. |
@abravalheri no other changes from my side, you can go ahead and merge. Thanks again for your review! |
Summary of Changes
setup.py
. Have added corresponding snippets forsetup.cfg
andpyproject.toml
.mypkg
. Have added a separate example to illustrate the functionality of using the empty string""
/ the asterisk*
for capturing data files from multiple packages. Have also modified thesetup.py
code snippets and removed thefind_packages("src")
call since there is only a single package in each case (except one); have opted to explicitly name the package instead.package_data
snippet. Have also added a package tree / code snippet example to show howpackage_data
patterns should include subdirectories, separating it from the example showing the empty string""
/ asterisk*
functionality.mypkg
and data files have been nameddata1.txt
,data2.rst
etc... note::
blocks for paragraphs that would be more appropriately phrased as a Note. Other minor changes to text content have been made.Pull Request Checklist
changelog.d/
.(See documentation for details)