Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Review 2 - complete) Packaging section - part 2 #55

Merged
merged 32 commits into from
Mar 22, 2023

Conversation

lwasser
Copy link
Member

@lwasser lwasser commented Feb 22, 2023

This PR builds on top of ALL of the feedback here:
#23

Im hoping that we have made some good progress.
a few. highlights:

✅ we are suggesting (but NOT requiring) src/ layout for beginners as it will remove some of the pain points associated with testing but we also review flat layouts as the most common layout in the ecosystem. i hope this is a win win.
✅ i've added poetry and updated the table of front end tools!
✅ i've tried to avoid being bogged down in details of each tool - users just want to know what to use and why. we can teach them more in tutorials as they are created.

@astrojuanlu
Copy link
Contributor

Thanks a lot @lwasser! Is there a way to see a rendered version of the site?

@lwasser
Copy link
Member Author

lwasser commented Feb 27, 2023

@astrojuanlu i know that would be great - right? we have a CI build in our other guide that i think i can try to port over here today or tomorrow.

@lwasser
Copy link
Member Author

lwasser commented Feb 27, 2023

@astrojuanlu can you see if this works for you - circle ci was setup it just wasnt running on branches / prs! https://app.circleci.com/pipelines/github/pyOpenSci/python-package-guide/131/workflows/1d0c1ca3-f7ff-4b61-a171-54d8409954b0/jobs/107/artifacts you should now be able to click on the circle CI build and go to artifacts to see the book as rendered.

@astrojuanlu
Copy link
Contributor

@lwasser
Copy link
Member Author

lwasser commented Feb 27, 2023

perfect! thanks for providing the specific link to the packaging section here and for the nudge to fix CI (i forgot i had added this and never got it working! )!

@lwasser
Copy link
Member Author

lwasser commented Feb 27, 2023

latest version of our packaging diagram - for reference (not in this PR as i'm still editing it)
Feedback is welcome on this too as you read through the content!

Screen Shot 2023-02-27 at 4 01 38 PM

@rgommers
Copy link

rgommers commented Mar 2, 2023

Nice diagram @lwasser. I like both the structure and the styling. There are a few issues with choices that I see:

  • In the "many C/C++ extensions" case, front-end choices:
    • build shouldn't be there. It's a universal front-end for building sdists and wheels only, not for development. So it doesn't really fit in with these development workflow frontends. If you want to mention it at all, say it's the tool to use to do releases (sdist, wheels) - and nothing else.
    • DevPy isn't quite ready yet, it has a number of rough edges and isn't even on PyPI yet. So I'd leave that out and only revisit if it's adopted by some projects and ready for general consumption.
    • Poetry should be there as the one alternative to PDM, since it can use arbitrary backends. I think it always can be put side-by-side with PDM.
  • For both cases with C/C++ extensions, back-end choices: hatchling should not be there, it is limited to pure Python projects only. It's already the default there for pure Python (Hatch includes hatchling as the build backend). So hatchling should not appear separately in the diagram.

@lwasser
Copy link
Member Author

lwasser commented Mar 3, 2023

Thank you @rgommers !! Ok if you don't mind i have a few points of clarification! Please know that i'm not pushing back rather still fighting a bit with conflicting information in my brain! I just want to get this right!

  • In the "many C/C++ extensions" case, front-end choices:

    • build shouldn't be there. It's a universal front-end for building sdists and wheels only, not for development. So it doesn't really fit in with these development workflow frontends. If you want to mention it at all, say it's the tool to use to do releases (sdist, wheels) - and nothing else.

OK got it. I played with Meson-Python and it seemed like I needed to use Build to build the SDIST and WHEELS. Example repo is here. which i think is what you're telling me.

i'm trying to figure out how to talk about Build. it's not a development workflow front end. But you need something when building a package's SDist and Wheel with a tool like meson-python. Would pulling Build out of the bubble and adding a text note about using it with tools like meson-python and setuptools perhaps work?

Id love any insight into how help people understand how build fits into this ecosystem. For me i've been confused about build for a while (until i started working on this guide actually) but i've used it for a while too! 🙃 so i want to somehow include it in a way that makes sense. thoughts?

  • DevPy isn't quite ready yet, it has a number of rough edges and isn't even on PyPI yet. So I'd leave that out and only revisit if it's adopted by some projects and ready for general consumption.

NO PROBLEM. I'll remove it!

  • Poetry should be there as the one alternative to PDM, since it can use arbitrary backends. I think it always can be put side-by-side with PDM.

OK question about Poetry. i actually did a bit of research into it by asking the devs. . It seems like Poetry DOES support non pure python builds but it's undocumented / unsupported and they are hesitant to document it at all. I also asked about supporting other build backends and it seems as if Poetry is not able to do that either.

I'm worried about suggesting a tool for non pure python builds that doesn't document its support of said builds.

  • For both cases with C/C++ extensions, back-end choices: hatchling should not be there, it is limited to pure Python projects only. It's already the default there for pure Python (Hatch includes hatchling as the build backend). So hatchling should not appear separately in the diagram.

i'm happy to remove hatchling. i'm still a bit confused by hatchling as it is my understanding that a dev can write plugins for it that would support more complex builds. But many (if not most?) don't want to do that. maybe my understanding is wrong. And as such, until hatch supports meson-python there is no real support for complex builds. Hatch does plan to support other back ends in the future but certainly doesn't now.

Many thanks, Ralf. i keep getting this wrong but hopefully i'm atleast moving in the right direction!!

@rgommers
Copy link

rgommers commented Mar 3, 2023

i'm trying to figure out how to talk about Build. it's not a development workflow front end. But you need something when building a package's SDist and Wheel with a tool like meson-python. Would pulling Build out of the bubble and adding a text note about using it with tools like meson-python and setuptools perhaps work?

Yes, that sounds like a good idea.

I'd love any insight into how help people understand how build fits into this ecosystem.

My mental model is that I clearly separate between "working on a package" and "releasing a package". Your whole diagram is about the former. build is about the latter (side note: it has about the least distinctive name possible, so I usually write pypa/build to refer to it).

It'd be useful to have a short separate section about releasing, which is where it should show up. I'd say something about best practice being to create all your release artifacts in a CI job. There, cibuildwheel will be using pypa/build under the hood to produce wheels, and you also need to produce an sdist with python -m build --sdist.

I'm worried about suggesting a tool for non pure python builds that doesn't document its support of said builds.

This makes sense. Seems like you did your homework about Poetry, so please ignore my comment. I remembered it came up elsewhere and the Poetry devs said "but we can do it, this is how ...". But if it's not documented, then +1 for treating it like it isn't supported. That seems like a good criterion for everything in the diagram, for a tool/feature to be included: is released on PyPI, has decent documentation, and is adopted successfully by at least a few projects.

i'm still a bit confused by hatchling as it is my understanding that a dev can write plugins for it that would support more complex builds.

This is a plan, it does not exist aside from a design sketch (see extensionlib docs). And imho it's not a very good plan at that, but that's besides the point and I could be wrong. Despite the large amount of talking about it, it's only an idea at this point which needs to be implemented and then proven to be useful in practice.

:widths: 20,5,50

Use Other Build Backends, ✅, When you setup PDM it allows you to select from Hatch; PDM-517 and PDM-core build tools. PDM also can work with Meson-Python which supports move complex python builds.
Dependency management & lock files ,✅,PDM and Poetry are currently the only tools that support creating dependency lock files. However their approach to locking files is different: Poetry uses an upper bound lock approach `^`. <!--Most users won't know what upper bound means--> PDM uses an open lock `>=` Lock files might be most useful to developers creating web apps where locking the environment is critical for consistent user experience.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reads as if poetry would only support dependencies with upper bounds and PDM would only support dependencies without upper bounds. But that's only the default when using the cli. You can specify upper bounds and open bounds in both tools. Further, the difference is not the "approach to locking files" (in the lock file there is only one specific version, at least per platform, python version, ...), but the default dependency constraint when using the cli to add a dependency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @radoering can you please help me clarify this so i get the text right? I haven't tried to modify the locking behavior in either tool. but i DID notice that each tool defaulted to poetry:upper bounder; PDM : open bounds

it was my understanding that this open bounds might be a future option in poetry but not currently? But the comment says that poetry "supports" other modifiers. but i didn't understand how. Does that mean a user needs to remove the upper bound specifier manually in the lock file or is there indeed a setting that will just turn it off?

Similarly i didn't see a way for PDM to support upper bound only open bounds. can you help me understand how it supports that?

Also are ou able to help me understand the different in the default dependency constraint? do you mean that one defaults to upper bound locking and the other to open bound locking?

Many thanks!! i think i can rewrite if i understand

  1. how to use poetry and avoid upper bound locking
  2. if the real only different is the default way of locking (upper bound vs not upper bound)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i understand now. i played with PDM. it has many options for constraint locking approaches which is great! i'm still searching for how to implement another locking approach with poetry!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may confuse defining dependency constraints (in the pyproject.toml file) and dependency locking (in a lock file, for poetry: poetry.lock). (Or I'm just confused by your usage of the word "locking". 😉)

Upper bounds and open bounds are only relevant for dependency constraints in the pyproject.toml file, but not for the lock file because there is only one locked version. (If I'm not wrong that should apply to both: poetry and PDM.)

Default: poetry add tomli will result in an upper bound. Explicit: poetry add "tomli>=2.0" will result in unbounded constraint. However, if there are no conflicts, both approaches (upper bound and unbounded) will result in the same locked version (2.0.1 as of today). And that's only the command line interface. You can edit (or add) constraints in the pyproject.toml file directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may confuse defining dependency constraints (in the pyproject.toml file) and dependency locking (in a lock file, for poetry: poetry.lock). (Or I'm just confused by your usage of the word "locking". 😉)

I think you are giving me too much credit here. 😆 i'm absolutely conflating the two things here @radoering . i fully see your point and will update text accordingly . I'm going to separate these out into two things. my conflating them means others will too. thank you for calling me out on this!! AND for clarifying how to add Dependencies with out the ^ in Poetry. that is huge to know about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok what i've done to fix this is made two lines in the chart - one for lock files, the other for adding deps


Dependency specifications |✅|PDM and Poetry but support managing dependencies. However, their default approach to adding dependencies is different. Poetry uses a default upper bound lock ^. Upper bounds lock means that Poetry will never bump a dependency to the next major version (ie from 1.2 to 2.0). However you can tell Poetry to use an open bounder approach (see below for more and we suggest that you do so). PDM uses an open lock >= approach by default but also allows you to customize how you want to add dependencies. Thus with PDM you can explicitly tell it to lock using upper bounds like Poetry. Or you can tell it to use other strategies. This makes PDM extremely flexible tools when managing dependencies.
Dependency environment lock files |✅|PDM and Poetry are currently the only tools that create dependency lock files. Lock files are often most useful to developers creating web apps where locking the environment is critical for consistent user experience. For community-used packages, you will likely never want to use a lock file.


i will now read through the entire page to ensure my notes on locking envt vs managing dependency constraints is clear and consistent!

Copy link
Member Author

@lwasser lwasser Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2023-03-20 at 12 38 55 PM

```{admonition} PDM vs. Poetry
The functionality of PDM is similar to Poetry. However, PDM also offers
additional, documented support for C extensions and version control based
versioning. If you are deciding between the two tools, the main difference between these two tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory that might be true. In practice, there are things that are difficult to mention in a guide (and may change over time) but are often more relevant. E.g. regarding dependency locking there will be special and edge cases that are only supported by one or the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. are you able to help me with something that is concrete that i could write with confidence? i think you are saying there ARE some potential nuances in how the dependency locking works between the two tools. Do you think someone who is newer to python packaging might need that type of support early on? or is this a more advanced feature to consider? this guide is for beginners. but again i do want to get this right if it's not too edge case focused! i'm happy to add a note that in some cases the way dependency locking happens differs between the two tools - but i think that could be confusing to users if they don't know what to do with that information. (if that makes sense?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, one example for dependency locking where poetry does better than pdm is multiple-constraints dependencies, in other words if you need a different version of a dependency based on environment markers.

Example:

Dependency definition in pyproject.toml for poetry:

[tool.poetry.dependencies]
python = ">=3.7"
flake8 = [
  {version = ">=6.0.0", markers = "python_full_version>='3.8.1'"},
  {version = "<6.0.0", markers = "python_full_version<'3.8.1'"},
]

and equivalent definition for pdm:

[project]
requires-python = ">=3.7"
dependencies = [
    "flake8>=6.0.0; python_full_version >= \"3.8.1\"",
    "flake8<6.0.0; python_full_version < \"3.8.1\"",
]

If you run poetry lock and pdm lock, you will notice that there are two flake8 versions (5.0.4 and 6.0.0) in the poetry.lock file, but only one version (the one satisfying the last constraint as it seems) in the pdm.lock file.

As already mentioned this is only an example. I do not have much experience with pdm but I'm sure you will find examples where pdm does better.

I'll leave it up to you to decide whether something like this is relevant for beginners or only for advanced users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radoering Looks like a bug in PDM that could be reported upstream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this explanation, @radoering . i think i understand. AND i'm totally conflating lock files with dependency additions in the text so i'm working on clarifying that as well. i didn't realize i was doing it until I saw your comment!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breakout now reads:


The functionality of PDM is similar to Poetry. However, PDM also offers
additional, documented support for C extensions and version control based
versioning. As such PDM is preferred for those working on non pure-Python packages.

If you are deciding between the two tools, a smaller difference is the default way that dependencies are added to your pyproject.toml file.

  • Poetry by default follows strict semantic versioning adding dependencies to your pyproject.toml file using an upper bounds constraint (^). However, you can tell Poetry to not use upper bounds if you wish.
  • PDM defaults to open-bounds (>=) dependency additions. However, PDM also allows you to specify the way dependencies are added by default. As such, you can also specify upper-bounds (^) using PDM.

In either case, beware that strict adherence to semantic
versioning can be problematic in some cases (more below).

Finally there are some nuanced differences in how both tools create lock files which we will not go into detail about here.

package-structure-code/python-package-build-tools.md Outdated Show resolved Hide resolved
package-structure-code/python-package-build-tools.md Outdated Show resolved Hide resolved
package-structure-code/python-package-build-tools.md Outdated Show resolved Hide resolved
Copy link
Contributor

@NickleDave NickleDave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has come a long way, you're almost there! I mostly pointed out some minor nits and made some rephrasing suggestions, hope it helps

index.md Outdated Show resolved Hide resolved
package-structure-code/complex-python-package-builds.md Outdated Show resolved Hide resolved
package-structure-code/complex-python-package-builds.md Outdated Show resolved Hide resolved
package-structure-code/intro.md Outdated Show resolved Hide resolved
package-structure-code/intro.md Outdated Show resolved Hide resolved
package-structure-code/intro.md Show resolved Hide resolved
Comment on lines 69 to 71
* If `tests/` are outside of the **src/package-name** directory, they aren't by default
delivered to a user
installing your package. However, you can chose to add them to the package archive. When tests are not included in the package archive your package size will be slightly smaller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If `tests/` are outside of the **src/package-name** directory, they aren't by default
delivered to a user
installing your package. However, you can chose to add them to the package archive. When tests are not included in the package archive your package size will be slightly smaller.
* Using a `src/package-name` layout in combination with a separate `tests/` directory in the root of the project will usually result in a smaller source distribution or wheel, since most packaging tools won't include files outside of `src/` (with the exception of flit). This can make install times faster (especially if you have large amounts of data for tests in your tests folder) and minimize the costs you impose on the Python Package Index (PyPI). Most packaging tools have setting that allow you to explicitly include tests even if you have them in a separate directory in the root of your project. Some Python libraries choose to do this, so that downstream package managers can test whether their own builds from the source distribution are working. This situation arises because [one function of PyPI is to serve as a single source of truth for many downstream package managers, like those on operating systems](https://pypackaging-native.github.io/meta-topics/purposes_of_pypi/). If you are not developing a library that many different package managers build from your source distribution, then you may want to exclude your test suite to minimize install times and PyPI costs. If you want to provide users a way to test that installation worked, consider developing a lightweight alternative, e.g., calling the help or version commands of a command line interface.

Trying to rephrase this to emphasize the good part, it's a bit tricky cuz there's multiple interacting things here (src/ + tests/ + which packaging tool you use)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this but have a question and then a concern.

  1. is it really true that install times will be faster? i don't understand how having a few .py test files would actually make things faster. i am asking not because i think the addition is a bad one but because i want to really stick to facts. If the real reason is the data can be large (btw i've always included data in a separate folder outside of tests as i can use it in tutorials too for docs) maybe we have a section on that explicitly as an issue. i'm just not full sure about the install time being a significant reason to put tests outside of src. we need to stick to facts and ideally things that are less opinionated wherever we can!
  2. this is a lot of text so i'm worried about how much is there about this one topic!! we will also have a section devoted to tests and best practices there. so does all of this belong here now or can we consolidate the message to. be a bit more specific and elaborate more in the tests section?
    thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this -

If you are not developing a library that many different package managers build from your source distribution,

i'm not sure what this means- do you mean if someone publishes on pypi and then also cross publishes on conda-forge? i'm just curious what many different package managers refers to here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The many package managers would be the various linux distros, things like homebrew etc.

I'd suggest that tests should always be included in the sdist, unless the size of the test data is significant, and then I'd suggest storing such test data outside the package, and allow it to be downloaded (astropy does this, and has pytest tools which make this easy).

Copy link
Contributor

@NickleDave NickleDave Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really true that install times will be faster?

I just mean "if you have a 1 GB source distrib it will take longer to download" (try installing torch or torchvision vs. requests)

this is a lot of text so i'm worried about how much is there about this one topic!! ... so does all of this belong here now or can we consolidate the message to. be a bit more specific and elaborate more in the tests section?
thoughts?

Yes I agree. I'm not sure if it makes sense to include this in a "pro of the src layout" because it's more like a side effect. I think we need a detailed explanation like this but it probably belongs in the tests section.

i'm just curious what many different package managers refers to here

what @aragilar said -- linux distros, homebrew, et al. As discussed in the link I added.

I'd suggest that tests should always be included in the sdist, unless the size of the test data is significant, and then I'd suggest storing such test data outside the package,

This does seem like a good blanket suggestion but it probably is better in the tests section not here in "pros of src/ layout".

Maybe the best thing to do is say something at the end of the src section, like "Note: People often use a src/ layout in combination with tests in a separate directory in the project root, for more detail see the tests section" and then say this stuff there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tests/ are outside of the src/package-name directory, then most high-level packaging tools (e.g., poetry, PDM, hatch) will not by default include them in the distribution package delivered to a
user.

David i think your language is important. as i've already been asked this.

ok i have another really important question related to this thread. important to the scientific ecosystem. when you specify tests including in the distro using hatch, PDM or setuptools / flit is that for both the built wheel AND the sdist ? IE can these tools currently be setup to make tests available to run for the end user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok y'all. i just want to confirm that for scientific packages, it is very important to highlight HOW to ship tests with your package. as such, it seems like i will need to add back the option of tests being here:

src/
package-name/
tests/

because it sounds like that is the ONLY way a maintainer can access the tests on other machines. (if there is another way to ship tests, please tell me). but i'm seeing that we may need to add a note about back about tests living in that package-name dir.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so essentially i think this language might be WRONG:


If tests/ are outside of the src/package-name directory, they aren't by default
delivered to a user
installing your package. However, you can chose to add them to the package archive (which is a good idea). When test files (.py files only) are not included in the package archive your package size will be slightly smaller.


is it indeed possible to ship tests to a user if the tests dir lives outside of src? it seems like it may not be but i welcome input here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it indeed possible to ship tests to a user if the tests dir lives outside of src? it seems like it may not be but i welcome input here.

Yes, with Hatchling either of:

Copy link
Contributor

@NickleDave NickleDave Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking strictly about including tests in a source distribution so that someone like a distro package manager could set up the tests themselves, to run them on a built package. In that case I think the language above is still correct.

Wnen you wrote this above you meant to indicate that package-name/ is within src/, right?
#55 (comment)
Like so

src/
    package-name/
tests/

I was unaware of the nuance of giving scientific users on a wide array of infra the ability to run tests themselves from a whelel, to make sure the package works on their systems. I get why our community in particular might need that, but I feel like it's maybe less of an issue for someone starting out just now writing a pure Python package? I.e, if this page in particular is aimed at beginners.

This starts to feel like it's very nuanced to be on this page and there should definitely be a link to "see detailed discussion on tests page".

index.md Outdated Show resolved Hide resolved
lwasser and others added 6 commits March 13, 2023 15:48
Fix: merge conflict fix

Fix: edits to package guide from review

rebase 7

Fix: remove chunk of commented text

Fix: edits to package guide

Fix: edits to package guide

Continued edits to pages

Update package-structure-code/python-package-build-tools.md

Co-authored-by: Randy Döring <30527984+radoering@users.noreply.github.com>
Fix: feedback on packaging pr

Fix: cleanup packaging page and add decision tree diagram

Update package-structure-code/python-package-distribution-files-sdist-wheel.md

Co-authored-by: David Nicholson <NickleDave@users.noreply.github.com>

Update package-structure-code/python-package-structure.md

Co-authored-by: Henry Schreiner <HenrySchreinerIII@gmail.com>
Update package-structure-code/python-package-structure.md

Co-authored-by: David Nicholson <NickleDave@users.noreply.github.com>

Phew! so many comments - more edits and cleanup pyproj page
Fix: a few minor edits

Add: CI to direct to circleci build

Fix: broken link

Update package-structure-code/python-package-build-tools.md
Co-authored-by: David Nicholson <NickleDave@users.noreply.github.com>
Co-authored-by: David Nicholson <NickleDave@users.noreply.github.com>
@henryiii
Copy link
Contributor

As far as I know, PDM-backend, Hatchling, and Poetry-core are all identical right now in their support for compiled extensions - basically you can sort of make it work in some common situations - but things like the wheel tags (when cross-compiling, targeting stable ABI, etc) are probably going to be wrong for all of them. PDM currently has a benefit over Hatch & Poetry in that it can use any backend - Hatch should pick this up too in the near future.

@ofek
Copy link
Contributor

ofek commented Mar 16, 2023

To elaborate for Hatchling, the usual mechanism for build hooks to define the wheel tag (other than setting the option explicitly) is to enable the infer_tag build time option which uses the tag that is most specific to the platform and architecture. As Henry mentioned this doesn't work for cross-compiling so for example I've had to add detection for certain environment variables and options to handle nuances of certain platforms and versions.

lwasser and others added 2 commits March 16, 2023 11:43
Co-authored-by: Ofek Lev <ofekmeister@gmail.com>
…-wheel.md

Co-authored-by: Ofek Lev <ofekmeister@gmail.com>
@lwasser
Copy link
Member Author

lwasser commented Mar 16, 2023

ok colleagues - i'm taking a needed break from working on this now but appreciate that comments ARE still trickling in! and feel free to continue to add them / respond to what i've changed, etc.

  1. apologies again - all of the comments that were in the text body are NOW removed from the content. MY MISTAKE. please ignore any comments that you did previously read as all are dated. what is important to focus on in this PR is the final text of our guide. i am confident we can merge this SOON.
  2. i just pushed a fresh batch of updates addressing many of the comments here. Knowing that other comments are still coming in. (many thanks for this continued feedback). We are really addressing small bugs and items that i conflated in the text that would be confusing to users - so i appreciate this level of feedback!

More to come soon either later today or in a day or two. i'm now shooting to merge this on Monday next week just to allow these last reviews to come in and be addressed :) thank you all as we are really working well together to get to a guide that i think the community will greatly appreciate. And this is really because of all of you here being willing to spend time working with us on this effort with the shared vision of making python packaging easier to navigate for beginners!!

@rgommers
Copy link

I try my absolute best to answer questions from users and document every feature, all in my free time. If you have concrete ideas on how I can be more "helpful" in my responses or in the documentation please let me know.

I'm sorry to be brief here, it's a busy week (this is my free time as well, as I'm sure is the case for most folks here). One suggestion that was already made by someone else on Discourse is to clearly state, up-front in the docs, that there is no built-in support for compiled code. Something like:

.. note::

    At the moment, Hatch only works with Hatchling, its own backend for pure Python code. 
    It does not currently work with other build backends like setuptools, meson-python or scikit-build-core.
    Expect support for this soon.

Another one I just made: if you really want folks to use your build hooks to do their own thing, a concrete example. There are lots of simple standalone packages in the meson-python test suite here, you maybe could reuse one so you already have the relevant .py/.c/.pyx source files.

because my assumption was that if one is knowledgeable enough to be building extensions with it then one necessarily would know how to execute a build of the extension modules in one's custom script.

That seems to exclude pretty much every user. In my experience people cargo-cult setup.py content (and same for other build system files), and it's not clear how to hook that up here - certainly to me it isn't.

These classifiers are used at build time to ensure one does not define an unknown classifier. I'm not sure what you find surprising about this, can you please explain?

I misread what that code was doing, apologies.

As far as I know, PDM-backend, Hatchling, and Poetry-core are all identical right now in their support for compiled extensions - basically you can sort of make it work in some common situations - but things like the wheel tags (when cross-compiling, targeting stable ABI, etc) are probably going to be wrong for all of them.

Exactly, this is going to be wrong in a number of circumstances (the ones you mentioned, plus lots more - not hard to see that by browsing issue trackers of build systems that folks actually use to write nontrivial packages with compiled code). I don't think any of that should be recommended to users at all, except in a last resort like "already using Hatchling in nontrivial ways, and maybe adding this one C file will work". Building packages with compiled code is hard, a "here's a generic hook, good luck" is not helpful. Best to use the right tool for the job.

PDM currently has a benefit over Hatch & Poetry in that it can use any backend - Hatch should pick this up too in the near future.
...
Right, in principle I totally agree with that but PDM has the same level of support (you write a custom script) and that is listed here as explicit support

The "can use any PEP 517 build backend" is why PDM is where it is in the diagram in this PR. Which is why I suggesting that Hatch can be added right next to PDM once it gains that capability. Your build hook really isn't okay to recommend to scientific users that know upfront that they're dealing with compiled code imho.

@lwasser
Copy link
Member Author

lwasser commented Mar 20, 2023

@rgommers question for you: should we suggest that EVERYONE creating scientific software ship tests with a package distribution so that it can be run locally on user's machines? I know from @stefanv that many scientific packages find it crucial to have tests shipped with a package (in the wheels) so they can be run on various installations.

However shipping tests even without data places a small but increasing over time package size burden on PyPI / warehouse's storage resources. so i'm trying to come up with a recommendation that doesn't tax PyPI but also satisfies the scientific ecosystems needs. any thoughts on this?

@rgommers
Copy link

should we suggest that EVERYONE creating scientific software ship tests with a package distribution so that it can be run locally on user's machines?

Maybe not necessary for the audience of this guide. For large projects it's important because we get a ton of bugs related to differences in CPU features, compilers and numerical libraries like BLAS, etc. (this is indeed different for numerical libraries vs. something like a web dev package). In practice those are very tricky to debug, and one of the best things to make a dent in that is ask users to run something like python -c "import numpy; numpy.test()". If the instructions are multi-step, downloading tests from a Git repo, finding the right branch etc., that won't work.

For small packages with a limited amount of compiled code this is all less relevant though. For such packages I'd say they should include tests in their sdists, but not necessarily install those tests.

@lwasser
Copy link
Member Author

lwasser commented Mar 20, 2023

For small packages with a limited amount of compiled code this is all less relevant though. For such packages I'd say they should include tests in their sdists, but not necessarily install those tests.

oh many thanks for this, Ralf!! very helpful. we've been having debates around this topic.


everyone - i've just pushed up a new batch of edits. i think i'm getting closer to getting the loose ends nailed down.
i've switched language about environment mgt focusing on each tool managing the environment for users but allowing users to select the environment type of their choice - and how that selection happens is different across tools. PDM finds conda envs right away, hatch needs a plugin, etc. !

i've also clarified some of the language around dependency management and upper bounds being a default for tools rather than the only way to do things. each tool has options. I've also clarified that lock files are completely separate. (they were conflated in the previous text)

feel free to give the text on the packaging tools page another round of review if you wish. the only item i have not formalized yet is the tests being shipped with the package by default or not but that will come in the next day. i think a new realistic goal is to merge by thursday morning mountain time in the USA (in ~2 days) - which leaves tonight tomorrow and all day wednesday for more thoughts to trickle in.

and i'll review the flowchart diagram once more as well as i think i have to clarify environment management there too.

we are SO CLOSE. thank you all for the input and feedback as it's clarified a lot!

@lwasser
Copy link
Member Author

lwasser commented Mar 21, 2023

This has come a long way, you're almost there! I mostly pointed out some minor nits and made some rephrasing suggestions, hope it helps

thank you @NickleDave i think we are indeed in the home stretch!! what a wonderful feeling after all of this time.

@lwasser
Copy link
Member Author

lwasser commented Mar 22, 2023

ok y'all. i think we're ready to merge today!! once merged - if you catch any other issues with the text, feel free to open issues about them! this is a living document that we can continue to refine and update as we learn more and get more eyes on the content.

@lwasser lwasser merged commit a6bb0d3 into pyOpenSci:main Mar 22, 2023
@lwasser lwasser deleted the package-build-p2 branch March 22, 2023 18:35
@ofek
Copy link
Contributor

ofek commented Mar 22, 2023

Great job!

@lwasser lwasser changed the title (Review 2) Packaging section - part 2 (Reviews Welcome) (Review 2 - complete) Packaging section - part 2 Mar 23, 2023
@lwasser
Copy link
Member Author

lwasser commented Mar 23, 2023

Thank you, @ofek . thank you again for all of the feedback on this pr!!

All - last ping from me here - I've had this zenodo file open for a while and just did a bit update with everyone's names who has participated in the reviews and contributed to our work.

Please review and add / edit if you'd like your information displayed differently:

#20

If i missed you (i'm very sorry if that happened but i hope it doesnt!) please either comment on the pr with your info or add yourself via an inline comment. i don't want to miss anyone.

I will push a new release of this guide with the updated zenodo file next Tuesday April 28th. you can still add your name or edit your details after that time but only as a pull request given the open one will be merged!! Many thanks!!

@lwasser
Copy link
Member Author

lwasser commented Mar 27, 2023

hi y'all - one last reminder - i will merge this pr with the zenodo doi information tomorrow (tuesday mar 28) and create a release of this repo at that time. feel free to update / comment on the pr that is open if any of your information if missing. many thanks y'all for the comments already there (i just committed them!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.