Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.1.38 meant to fix #1656 but there is no supporting gitdb release #1716

Closed
EliahKagan opened this issue Oct 18, 2023 · 11 comments
Closed

3.1.38 meant to fix #1656 but there is no supporting gitdb release #1716

EliahKagan opened this issue Oct 18, 2023 · 11 comments

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Oct 18, 2023

To support #1659 here in GitPython, gitdb received #98, and the GitPython repository's submodule reference to gitdb was updated in #1704 (and #1705). Thus #1656 might be1 fixed when developing against the cloned GitPython repository with its gitdb submodule.

With the intention of fixing #1656 in the far more common case of using GitPython via PyPI packages, GitPython 3.1.38 was released. But no new release of gitdb was ever made. Since the fix depends on changes to gitdb, it seems unlikely that 3.1.38 really fixes #1656.

I think it might be possible to make further refinements to __all__ and surrounding code, slightly enhancing #1659. However, since 3.1.38 was already released, it seems to me that such things can wait, and that it's reasonable to make a gitdb release (bumping the version in its setup.py and gitdb/__init__.py, or changing the current approach and bumping it in just one place), and to include a dependency bump requiring it the next GitPython release (which could be the same release that ships #1715, if the timing works out).

I'd be pleased to open pull requests to help with this issue, either following this plan or another one, if that would be helpful.


1 I had originally said it was likely fixed when developing against the cloned GitPython. However, that is probably not the case. First, most type checkers probably wouldn't understand how that version is added to sys.path, and it is probably good design of them not to (since they are static checkers). Second, as documented in #1717, it turns out this is never actually happening.

@Byron
Copy link
Member

Byron commented Oct 18, 2023

Thanks for bringing this to my attention!

I'd be pleased to open pull requests to help with this issue, either following this plan or another one, if that would be helpful.

Yes, I'd very much like your help with this, thank you 🙏.
I'd be glad if that PR could also remove the version from setup.py if that's non-breaking so the version has to be changed in only one place from then on, and if possible because it's non-breaking.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Oct 19, 2023

On further consideration, this might be not be the best time to fix the duplication.

This should not stop a patch version of gitdb from being released, so this is kind of a side topic. But my detailed thinking about why it may be better to fix this issue with a new patch version of gitdb without eliminating its metadata duplication follows.

Immediate options, and their drawbacks

Although the repetition is explained in terms of expected style, there is one benefit of it that may be worth preserving: keeping the smmap package from having to be installed just to build gitdb packages. If setup.py imports gitdb to get the metadata, it looks like the indirect imports through gitdb/__init__.py (both in _init_externals and in the * imports at the bottom of the file) require smmap to be available. I worry that some downstream builds may be depending on that not being required.

This could be fixed, without turning smmap or any future dependencies into build dependencies, by any scheme that determines the version without importing the code. This includes the very ad-hoc approach GitPython takes of substituting a version for a placeholder, which I'd be reluctant to introduce to gitdb and would prefer to find a way for GitPython to stop using too (see below), or a reverse ad-hoc approach where the version is hard-coded in packaged files and read by setup.py without importing them, which I would also be reluctant to introduce (also see below).

It can also be solved, I think better, with packages that specialize in automatically setting versions from source control tags, such as setuptools_scm or versioneer. However, if there are downstream build recipes that don't need smmap because they don't run the tests, they might also be done in way that doesn't have gitdb's .git directory, and so might fail to get the version.

Slightly longer-term idea

Unlike the situation in #1713--where a downstream build recipe was broken in a specific way that GitPython intended to guarantee would not break--I actually think breaking downstream builds for this sort of thing is totally defensible.

However, as a separate goal from de-duplicating version information, I think it would be best for GitPython, gitdb, and smmap to replace all their setup.py logic with a declarative approach, where everything would be defined in pyproject.toml, or if necessary setup.cfg; and where setup.py would either become a short stub script that just calls setup() with no arguments, or be removed altogether. They should also if possible all do it the same way, so that familiarity with one helps one understand the others, and so it is usually easy to know if a change made to one should be made to the others and, if so, how it should be made.

(Note that if the GitPython repository becomes a monorepo as discussed in gitpython-developers/smmap#53 (comment), the three projects would still be defined separately. So this would go along with or perhaps even support such a change, rather than clashing with or being an alternative to it.)

For this reason, I am reluctant to make any changes to any of these projects' setup.py files that would move further from that and make it harder or more complicated (and thus more likely to introduce bugs if and when done). That is, I would prefer to avoid making setup.py unnecessarily less declarative. In addition, it might turn out that fixing version duplication, so that the version is expressed in either one or (by getting it from tags automatically) zero places in the code, could naturally be done together with making the setup scripts fully declarative.

That sort of change could really be done at any time. But the current (temporary) skew, which this issue is about, between the current versions of GitPython and gitdb with respect to __all__, is something that would make me uneasy about doing it immediately or as part of the change that fixes this issue. Therefore, I suggest putting it off.

Should I still open PRs?

You might prefer I still open pull requests. You might:

  • ...disagree with my reasoning here and prefer another approach be taken. Although I suggest against this, I am nonetheless willing to make such a change, at least so long as it is not highly complex and time-consuming.
  • ...just want a pull request that bumps the version number in both places in gitdb and adds an entry to the changelog. This is something I think maintainers rather than other contributors have usually done and I imagine you might prefer to do it yourself, but I'd be pleased to do it if that would be helpful!
  • ...regard this as a good time for pull requests that change the submodule update check in dependabot.yml, in the GitPython and gitdb repositories, from monthly to weekly, per #1702 (comment). This could be done either with or without other pull requests (or as part of them).

@Byron
Copy link
Member

Byron commented Oct 20, 2023

However, as a separate goal from de-duplicating version information, I think it would be best for GitPython, gitdb, and smmap to replace all their setup.py logic with a declarative approach, where everything would be defined in pyproject.toml, or if necessary setup.cfg; and where setup.py would either become a short stub script that just calls setup() with no arguments, or be removed altogether. They should also if possible all do it the same way, so that familiarity with one helps one understand the others, and so it is usually easy to know if a change made to one should be made to the others and, if so, how it should be made.

Yes, please! Let's not do any changes to the way the packages are defined unless it makes them declarative.

Should I still open PRs?

If I understood correctly, I could create a new patch release of gitdb and everything should be in good order again
I will do that right away.

@Byron
Copy link
Member

Byron commented Oct 20, 2023

Alright, I managed and here is the new gitdb release: https://github.com/gitpython-developers/gitdb/releases/tag/4.0.11.

On another note, it seems like keeping the version duplicated in __init__ of the package itself isn't anything we'd want anyway. It's a breaking change to remove the field, but I think it would be alright to not update it anymore and maybe even make it emit a deprecation notice on access, if that's possible at all. Without that duplication, having a declarative package should be fine without adding any need for complications.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Oct 20, 2023

If I understood correctly, I could create a new patch release of gitdb and everything should be in good order again

Arguably GitPython's declared dependency on gitdb should have its minimum version raised to the new version. However, I am not really sure that should be done, and if it is to be done, it might be better to wait for feedback from @DeflateAwning (author of #1659 and gitdb#98) on whether things are working (or at least nothing new is broken) when using pyright with the latest versions of both packages.

I will do that right away.

Yes, whether or not anything further has to be changed in GitPython in connection with this, making that change in gitdb now should work fine.

@DeflateAwning
Copy link
Contributor

Everything is good! Main reported issues are fixed. Ideal siuation would be a release on gitdb, bump the submodule version, and release gitpython!

@DeflateAwning
Copy link
Contributor

Furthermore, nothing new is broken, as far as I can tell. Someone could double check, but it's probably fine

@Byron
Copy link
Member

Byron commented Oct 20, 2023

Everything is good! Main reported issues are fixed. Ideal siuation would be a release on gitdb, bump the submodule version, and release gitpython!

The gitdb release is done, and somehow I hope that I can avoid another near-noop GitPyhon release and let it wait until there is something substantial. Ideally, the submodule state won't matter at all for packaging GitPython, so maybe that's something to work on as well.

@EliahKagan
Copy link
Contributor Author

It seems to me that this issue can be considered fixed.

@EliahKagan
Copy link
Contributor Author

EliahKagan commented Oct 21, 2023

On another note, it seems like keeping the version duplicated in __init__ of the package itself isn't anything we'd want anyway. It's a breaking change to remove the field, but I think it would be alright to not update it anymore and maybe even make it emit a deprecation notice on access, if that's possible at all. Without that duplication, having a declarative package should be fine without adding any need for complications.

A DeprecationWarning could be be issued when the __version__ attribute is accessed. Here's an example of one way to do that.

However, whether or not that is done, I recommend against having a top-level __version__ attribute with a version that is intentionally older than the actual current version, because:

  • I believe there are significant disadvantages to __version__ existing with a wrong value, exacerbated by how DeprecationWarning is rarely actually seen except when a developer is deliberately looking for it.
  • There should be no need to do so. It was specifically while this issue was still open that I was reluctant to change the way the packages are defined. I've also looked into this further. Changing how they are defined so that they are defined declaratively, in addition to being valuable in and of itself, should allow the version metadata duplication problem to solved nicely. (Also, I had unduly emphasized getting the version from source control tags, thereby passing over some simpler options that, on further consideration, may be better.)

I think replacing all the non-stub setup.py logic with a declarative way is worthwhile in all three projects (GitPython, gitdb, and smmap), because even if they all end up moving into the GitPython repository in the future, it sounds like they'll remain separate projects that therefore need to be defined separately.

Disadvantages of __version__ existing but giving an incorrect version

  • It is very surprising. When people look at __version__, they're often debugging something else, and the possibility that __version__ exists but is not current is unlikely to be considered. Users may go on a wild goose chase trying to figure out why they don't seem to be able to get the latest version installed, may suspect a problem with sys.path, etc. Note that this does not make values like "git" harmful, just old or otherwise wrong version numbers.
  • DeprecationWarning and PendingDeprecationWarning are not shown by default except when issued directly from the running program's entry-point module. This does not make them useless, but it does make it so that people who examine __version__ when trying to figure out why something isn't working won't usually be helped by them.
  • One virtue of GitPython, gitdb, and smmap--as far as I have noticed--is that they follow the rules about dunders, never misusing existing dunder names or introducing new ones. What this means for __version__ is sort of unclear, since I believe its meaning is not directly documented anywhere; instead, its use is sort of officially smiled on by numerous official references to it, examples of it, and guidance about things to do when using it. However, having __version__ exist and give something unambiguously considered not to be correct version information probably, at minimum, constitutes a more severe misuse of a dunder than having __version__ exist and give a correct version.

The good news: we shouldn't need to do that to eliminate duplication

We can eliminate the duplication as part of the same change as defining the projects declaratively, and do so without turning a project's runtime dependencies into build dependencies. I suggest this be done at the same time, or around the same time, for GitPython, gitdb, and smmap, and in the same or similar ways for all of them.

I will assume that, at least in the immediate future, it is intended--for any of these projects in which such a change is made--that its version exist in exactly one place in the repository, and thus not be duplicated and also not be computed from tags. (Also, that the GitPython repository might become a monorepo is a reason to avoid getting versions from tags, which I hadn't thought of before.)

Three options, any of which I would advocate as reasonable, are:

  • If the version is given in pyproject.toml or setup.cfg, it is static metadata that can itself be inspected by tooling that can parse TOML or INI (or, if some brittleness is acceptable, as in check-version.sh, by grep). So it should not need to be repeated anywhere else. The __version__ attribute in __init__.py can then be populated--or computed on access--using importlib.metadata.version (which is what the example linked above does). To preserve support for Python 3,7, a dependency on the importlib_metadata backport can be added just for 3.7, which I think is no problem.
  • If the version is given in a VERSION file, this can still be used by a declaratively defined project. In pyproject.toml, the dynamic metadata feature with the file directive can be used to read a VERSION file. With setup.cfg, the file: directive can be used with the same effect. Then __init__.py can obtain the version with importlib.metadata the same as if it were written literally in the pyproject.toml or setup.cfg, as described above. Alternatively, if the VERSION file is also included in the distribution as package data, then there is the further option for __init__.py to use importlib.resources to read the VERSION file.
  • If the version is given in __init__.py, but the package is declaratively defined, then so long as the version is assigned literally, pyproject.toml or setup.cfg can cause it to be read without actually importing anything. In pyproject.toml, dynamic metadata with the attr directive will do this. With setup.cfg, the attr: directive would be used. Compare this to the file/file: directive described above. The attr/attr: directive first tries to read an assignment of an ast.literal_evalable expression to the name, and only imports the module if it can't find it that way--but it is simple to ensure that always succeeds. This works even if actually importing the module would fail, for example due to a missing dependency. (My understanding is that this feature is not available when defining a project in setup.py, but only when defining it declaratively in pyproject.toml or setup.cfg.)

However this is done, I think the main benefit is actually to GitPython rather than gitdb (or smmap), because all the custom version-stamping logic in setup.py could be eliminated. Since that logic was the main hurdle for converting GitPython to use a declarative package definition, this seems like a win. But of course it would also allow the version to be deduplicated in gitdb. I recommend the same approach be used for all three packages unless a strong reason arises not to, Which approach should be taken, I am not sure, which is one reason I'm writing this instead of opening pull requests.

If either of the first two ways are done, then accessing __version__ from __init__.py could either issue a DeprecationWarning or not. If the third way is done, then it should not (if accessing __version__ in the top-level module is deprecated, then that's not how building should access it either). If the idea of doing it the same way in all three projects is followed, then I think that includes issuing the DeprecationWarning either in all three projects or none of them. In that case, the __version__ attribute should only be deprecated if its documented role for GitPython bug reports is somehow superseded by some other recommendation.

A few caveats

  • All the above assumes we are keeping setuptools as the build backend. But since these packages have been using setuptools for some time, I think it may make sense to convert the packages to be defined declaratively within setuptools, and only then see if there are remaining disadvantages that justify switching to another build backend such as flit, hatch(ling), or poetry.
  • Sphinx uses the package version and I am not very familiar with the good ways for it to get it, so I'll have to look into that. Currently in GitPython conf.py reads the VERSION file, while in gitdb and in smmap conf.py seem to hard-code very old versions. (But probably the approach for managing the version in general should not be chosen based on Sphinx; instead, conf.py can be modified accordingly.)
  • There is the issue of priority: Although I don't think it's necessary to have native Windows CI set up before doing this--I would want to make sure building works locally, including on Windows, either way--setting that up before migrating the way we define these packages to be declarative (and eliminating version number duplication) does have the advantage that native Windows CI would be set up sooner. Having native Windows CI will probably help more prospective contributors get started than improving the way the packages are defined.

Especially relevant documentation

@Byron
Copy link
Member

Byron commented Oct 21, 2023

Thanks again for this fantastic analysis!

I was suggesting deprecation primarily due to my ignorance about the available options, and going with…

  • If the version is given in pyproject.toml or setup.cfg [..]

…seems like the way to go. I assume that the importlib dependency can only be imported when the version is actually accessed, so startup time shouldn't be affected by something that might rarely be used if at all.

Sticking with setuptools seems alright as long as it's the most supported one - with the switch to its declarative version I'd expect all ugliness around it to mostly go away.

Regarding priority, I see how native windows CI support and declarative setuptools can be independent, but I also see why native windows support will help with testing and validating any future change. As always, this choice lies solely with you :).

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 12, 2024
If people who want to run the tests didn't install the test extra,
they can still install that extra. This simplifies the instructions
accordingly. test-requirements.txt is still mentioned near the
beginning in case people want to look at it to see dependencies.
But the changed code is the only place where the instructions had
still said to do anything with those files.

A possible disadvantage of this change is that in the rare case
that someone following those instructions to run the tests locally
didn't do an editable installation, then installing with the extra
shouldn't be done editably either. But this doesn't seem like a
likely problem, and the new text has an example command with -e to
clarify the kind of command whose effect is augmented here.

Because of that subtlety, it is not obvious that this change is
really justified for purposes of clarity. However, this also helps
prepare for a time when test-requirements.txt won't exist anymore,
as may happen when the project definition is made fully declarative
(see discussion in comments in gitpython-developers#1716).
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 12, 2024
This is to make it so simple `tox` usage has the expected property
of leaving all source code files in the working tree unchanged.

As noted in c66257e, linting how sometimes performs auto-fixes
since gitpython-developers#1862, and the pre-commit command in tox.ini, which had also
run `black --check`, will do even more file editing with gitpython-developers#1865.

The bifurcation for black into separate mutating and non-mutating
hooks, introduced in 5d8ddd9 (gitpython-developers#1693), is not carried over into Ruff
autoformatting in gitpython-developers#1865. But also it:

- Was not necessarily a good approach, and likely should not be
  preserved in any form. It is an unusual and unintuitive use of
  pre-commit. (It can be brought back if no better approach is
  found, though.)

- Was done to avoid a situation where it was nontrivial to set up
  necessary dependencies for linting in the GitPython virtual
  environment itself, because flake8 and its various plugins would
  have to be installed.

  They were not listed in any existing or newly introduced extra
  (for example, they were not added to test-requirements.txt) in
  part in the hope that they would all be replaced by Ruff, which
  happened in gitpython-developers#1862.

- Already does not achieve its goal since gitpython-developers#1862, since it was
  (probably rightly) not extended to Ruff linting to use/omit --fix.

Now that Ruff is being used, people can run `pip install ruff` in a
virtual environment, then run the `ruff` command however they like.
This takes the place of multiple tools and plugins.

This commit *avoids* doing any of the following, even though it may
be useful to do them later:

- This does not give specific instructions in the readme for
  installing and running ruff (and c66257e before this also omits
  that). This can be added later and the best way to document it
  may depend on some other upcoming decisions (see below).

- This does not add ruff to the test extra or as any other kind of
  extra or optional dependency. Although the test extra currently
  contains some packages not used for running unit tests, such as
  pre-commit and mypy, adding Ruff will cause installation to take
  a long time and/or or fail on some platforms like Cygwin where
  Ruff has to be built from (Rust) source code. This can be solved
  properly by reorganizing the extras, but that is likely to wait
  until they are expressed completely inside pyproject.toml rather
  than one per requirements file (see discussion in comments
  in gitpython-developers#1716 for general information about possible forthcoming
  changes in how the project is defined).

- This does not update tox.ini to run ruff directly, which could be
  done by splitting the current lint tox environment into two or
  three environments for Python linting, Python autoformatting, and
  the miscellaneous other tasks performed by pre-commit hooks, only
  the latter of which would use the pre-commit command. Whether and
  how this should be done may depend on other forthcoming changes.

- This does not remove or update the Makefile "lint" target. That
  target should perhaps not have been added, and was always meant
  to be improved/replaced, since everything else in the top-level
  Makefile is for building and publishing releases. See 5d15063
  (gitpython-developers#1693). This is likewise not done now since it may depend on
  as-yet unmerged changes and tooling decisions not yet made. It
  should be feasible to do together when further updating tox.ini.

- This does not update tox.ini, Makefile, or the lint.yml GitHub
  Actions workflow to omit the manual hook-stage, which will be
  unused as of gitpython-developers#1865. This would complicate integration of changes,
  and once it is known that it won't be needed, it can removed.

The situation with the tox "lint" environment is thus now similar
to that of the tox "html" environment when it was added in e6ec6c8
(gitpython-developers#1667), until it was improved in f094909 (gitpython-developers#1693) to run with
proper isolation.
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 13, 2024
If people who want to run the tests didn't install the test extra,
they can still install that extra. This simplifies the instructions
accordingly. test-requirements.txt is still mentioned near the
beginning in case people want to look at it to see dependencies.
But the changed code is the only place where the instructions had
still said to do anything with those files.

A possible disadvantage of this change is that in the rare case
that someone following those instructions to run the tests locally
didn't do an editable installation, then installing with the extra
shouldn't be done editably either. But this doesn't seem like a
likely problem, and the new text has an example command with -e to
clarify the kind of command whose effect is augmented here.

Because of that subtlety, it is not obvious that this change is
really justified for purposes of clarity. However, this also helps
prepare for a time when test-requirements.txt won't exist anymore,
as may happen when the project definition is made fully declarative
(see discussion in comments in gitpython-developers#1716).
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 13, 2024
Although it seems likely that the requirements-dev.txt file will be
removed when the project definition is made declarative (discussed
in gitpython-developers#1716 comments), if not before, for now it exists and might be
in use, so this updates it with tools that are currently used but
not listed in any extras or other requirements files:

- ruff: This has replaced flake8 and its plugins (gitpython-developers#1862) as well as
  black (gitpython-developers#1865). Currently there is no separate extra for tooling
  that is not part of unit testing, with some such tools listed in
  test-requirements.txt. The `ruff` package belongs here rather
  than there for now because it should not be installed just to run
  unit tests, since Ruff has to be built from source on some rarer
  platforms like Cygwin, which would take a long time and/or fail.

- shellcheck: The PyPI package for this is a convenience for
  installing it in projects that are already using pip (shellcheck
  is neither written in Python nor a tool to scan Python code). It
  installs pre-built binaries, which are not available for all
  platforms.

These packages remain listed:

- pytest-icdiff: This seems not to have been promoted to be in the
  test-requirements.txt file and the `test` extra because it does
  not work as well for diffs from tests that the pytest runner runs
  but that are written for the unittest framework.

- pytest-profiling [commented out]: I am not sure what the status
  of this is, perhaps it has just not been experimented with
  enough to know if it would be useful for profiling in GitPython.

This requirements-dev.txt file has a few limitations that suggest
it should be removed altogether sometime soon:

- It is not updated regularly.

- It is not always clear why something is there. Originally I
  believe it was for tools where the desire to use the tool was
  established but the tool did not yet work or worked but performed
  checks for which code had to be fixed. That purpose has drifted.

- It uses a different naming convention from the
  test-requirements.txt file in active use.

- It cannot be readily used to create an extra in the current
  project definition in setup.py because the simple parsing done
  there will not recognize the `-r` lines and will not skip the
  comments, and neither enhancement should be done in setup.py
  since that would move things farther away from a declarative
  project definition.

- It will naturally go away when the project definition is made
  declarative, since it will then be feasible to define as many
  extras as desired without proliferating separate requirements
  files (while still allowing their contents to be statically
  available to tools).

Since it may go away soon and is not regularly updated, I have kept
the explanations for why particular packages are there out of it.
But as long as it exists it may as well list the tools that really
are being used yet are not explicitly listed as dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants