Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make data provider dependencies optional #199

Open
reece opened this issue Jul 19, 2014 · 13 comments
Open

make data provider dependencies optional #199

reece opened this issue Jul 19, 2014 · 13 comments
Labels
keep alive exempt issue from staleness checks

Comments

@reece
Copy link
Member

reece commented Jul 19, 2014

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


Reviewer 3 suggested:

UTA looks incredibly useful, so this only a small point. Integration with it does increase the install requirements, specifically psycopg2 and a postgresql-client. It might be worth making this an optional dependency to improve installation for users on more minimal systems.


@reece
Copy link
Member Author

reece commented Jul 19, 2014

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


We have considered this several times, but believe that this is not the right time to remove this dependency.

The rationale was that mapping requires a data provider and that we expect mapping to be an oft-used feature. Thus, this decision is a balance between A) the cost of installing psycopg2 for people who don't use mapping or validation and B) the difficulty created for people who discover this dependency at runtime.

We are considering providing a REST interface to UTA, which would obviate psycopg2 for public UTA access.

Please put alternative views in comments. If the majority of people really are using only the parsing and formatting component -- that is, without mapping or validation -- then that would certainly sway this decision.

We will continue to consider this suggestion.

@reece
Copy link
Member Author

reece commented Jul 21, 2014

@reece
Copy link
Member Author

reece commented Feb 8, 2015

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Issue #211 was marked as a duplicate of this issue.

@reece
Copy link
Member Author

reece commented Jun 3, 2015

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Issue #229 was marked as a duplicate of this issue.

@reece
Copy link
Member Author

reece commented Jun 3, 2015

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Update: Requiring postgresql native libs for all clients/users is clearly non-ideal. On the other hand, a significant portion of the utility of the package is lost without having access to gene, transcript, and sequence data.

We will push to implement a REST-based data source for UTA soon, thus eliminating the libpq dependency entirely.

@reece
Copy link
Member Author

reece commented Jun 3, 2015

Original comment by Maximilian Haeussler (Bitbucket: maximilianh, GitHub: maximilianh):


For me at least, it would be good if you document the psql dependency. The
dependency itself is not a problem, it's normal that one needs some db
layer access.

@reece
Copy link
Member Author

reece commented Jun 4, 2015

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


I agree. Which platform are you on and what did you need to do to install PostgreSQL? I'll put that in the instructions.

Thanks for suggesting this.

@reece
Copy link
Member Author

reece commented Jun 4, 2015

Original comment by Maximilian Haeussler (Bitbucket: maximilianh, GitHub: maximilianh):


I am on OSX/Homebrew:

brew install postgresql

@reece
Copy link
Member Author

reece commented Jun 29, 2015

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Added Maximilian's instructions in 6b9445ac6b5c.
Keeping open until resolved.

@reece reece mentioned this issue Jul 21, 2020
DavidCain added a commit to DavidCain/hgvs that referenced this issue Sep 24, 2021
This closes biocommons#623

Per the authors of the `psycopg2` package, `psycopg2-binary` "is meant
for beginners to start playing with Python and PostgreSQL without the
need to meet the build requirements."

However, it's preferable to depend on `psycopg2` for production software.

Issues in flight which should make this change irrelevant
---------------------------------------------------------

- [make data provider dependencies optional][hgvs-optional-deps]
- [replace psycopg2 with asyncpg][hgvs-asyncpg]
- [use a REST interface, eliminating libpq dependency][hgvs-rest]

------------------------------------------------------------------------

Why move away from `psycopg2-binary`?
=====================================
The psycopg2 project makes two recommendations in the
[psycopg2 vs psycopg2-binary][psycopg2-vs-psycopg2-binary] docs:

> If you are the maintainer of a published package depending on `psycopg2` you
> shouldn’t use `psycopg2-binary` as a module dependency.

Secondarily:

> **For production use you are advised to use the source distribution.**

This means that any production software depending on `hgvs` will have to
install `psycopg2-binary`.

History of the two packages
===========================
Starting with Psycopg2 2.8, there are now two packages (explained in the
[2.7.4 release notes][psycopg2-274]):

- `psycopg2`: source distribution, advised for production.
- `psycopg2-binary`: "quickest way to install" (a "pre-compiled binary version")

There is substantial discussion in [an issue on the `psycopg2` project][gh-thread],
which I won't re-summarize here, but one takeaway is that many Python packages
now depend on `psycopg2-binary`, and others on `psycopg2`.

Ideally, all libraries just depend on `psycopg2`.

Alternate solutions to the split exist (like [`Provides-Dist` from PEP 345][provides-dist]).
However, the [`Provides-Dist` field is "rarely used"][provides-dist-rarely-used]
and thus ignored most of the time.

[`hgvs` switched to `psycopg2-binary`][hgvs-switch-to-binary] shortly after the 2.7.4 release.

[gh-thread]: psycopg/psycopg2#674
[hgvs-asyncpg]: biocommons#603
[hgvs-optional-deps]: biocommons#199
[hgvs-rest]: biocommons#199 (comment)
[hgvs-switch-to-binary]: biocommons@4c75a4e5
[provides-dist-rarely-used]: https://packaging.python.org/specifications/core-metadata/#provides-dist-multiple-use
[provides-dist]: https://www.python.org/dev/peps/pep-0345/#provides-dist-multiple-use
[psycopg2-274]: https://www.psycopg.org/articles/2018/02/08/psycopg-274-released/
[psycopg2-vs-psycopg2-binary]: https://www.psycopg.org/docs/install.html#psycopg-vs-psycopg-binary
reece pushed a commit that referenced this issue Sep 25, 2021
This closes #623

Per the authors of the `psycopg2` package, `psycopg2-binary` "is meant
for beginners to start playing with Python and PostgreSQL without the
need to meet the build requirements."

However, it's preferable to depend on `psycopg2` for production software.

Issues in flight which should make this change irrelevant
---------------------------------------------------------

- [make data provider dependencies optional][hgvs-optional-deps]
- [replace psycopg2 with asyncpg][hgvs-asyncpg]
- [use a REST interface, eliminating libpq dependency][hgvs-rest]

------------------------------------------------------------------------

Why move away from `psycopg2-binary`?
=====================================
The psycopg2 project makes two recommendations in the
[psycopg2 vs psycopg2-binary][psycopg2-vs-psycopg2-binary] docs:

> If you are the maintainer of a published package depending on `psycopg2` you
> shouldn’t use `psycopg2-binary` as a module dependency.

Secondarily:

> **For production use you are advised to use the source distribution.**

This means that any production software depending on `hgvs` will have to
install `psycopg2-binary`.

History of the two packages
===========================
Starting with Psycopg2 2.8, there are now two packages (explained in the
[2.7.4 release notes][psycopg2-274]):

- `psycopg2`: source distribution, advised for production.
- `psycopg2-binary`: "quickest way to install" (a "pre-compiled binary version")

There is substantial discussion in [an issue on the `psycopg2` project][gh-thread],
which I won't re-summarize here, but one takeaway is that many Python packages
now depend on `psycopg2-binary`, and others on `psycopg2`.

Ideally, all libraries just depend on `psycopg2`.

Alternate solutions to the split exist (like [`Provides-Dist` from PEP 345][provides-dist]).
However, the [`Provides-Dist` field is "rarely used"][provides-dist-rarely-used]
and thus ignored most of the time.

[`hgvs` switched to `psycopg2-binary`][hgvs-switch-to-binary] shortly after the 2.7.4 release.

[gh-thread]: psycopg/psycopg2#674
[hgvs-asyncpg]: #603
[hgvs-optional-deps]: #199
[hgvs-rest]: #199 (comment)
[hgvs-switch-to-binary]: 4c75a4e5
[provides-dist-rarely-used]: https://packaging.python.org/specifications/core-metadata/#provides-dist-multiple-use
[provides-dist]: https://www.python.org/dev/peps/pep-0345/#provides-dist-multiple-use
[psycopg2-274]: https://www.psycopg.org/articles/2018/02/08/psycopg-274-released/
[psycopg2-vs-psycopg2-binary]: https://www.psycopg.org/docs/install.html#psycopg-vs-psycopg-binary
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Feb 29, 2024
Copy link

github-actions bot commented Mar 9, 2024

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 9, 2024
@mbiokyle29
Copy link

I realize this is stale, but I wanted to add another "voice" to the discussion. I have been working with the hgvs library for the better part of a year now, mostly for the ability to have a consistent framework and language to describe sequence variants. The use case I am working on is more or less completely distinct from using any datasource (at this time). We are using it to track variations in synthetic sequences and proteins. We've actually released an open source package to assist with this kind of thing. The idea is to give it 2 sequences, and get back the variants between them in HGVS spec, based on a global alignment. The package is here: https://github.com/mammothbio-os/palamedes. It is mostly just a translation layer between a biopython Alignment and the various SequenceVariant sub-types in existence. This could not have been made possible without the hard work of the hgsv folks, @reece especially. So I wanted to start with a huge thank you for all of the work.

Focusing in on this postgresql issue, we've already had a few folks (and the CI on our web app) mention that the postgres system dependency was a bit "big" for what our library provides directly (it depends on hgvs under the hood since it returns SequenceVariant objects). I know that myself and likely a few other folks would be able to commit some time to coming up with a path forward here, if that is something that the maintainers desire here. I played around with it a little bit, the psycopg2 dependencies are well isolated within the datasources repo so it seems like not a huge lift to make it an extra dependency and put something like this in datasources/__init__.py:

try:
    import psycopg2
except ModuleNotFoundError:
    print(
        "ModuleNotFoundError raised when trying to import psycopg2, if you need dataproviders support please "
        "install system dependencies and pip install with [dataproviders] extra!"
    )
    raise

Getting around this would require the user to install the system deps as well as pip install hgvs[datasources], assuming we make the following changes to the pypackage.toml:

dependencies=[
    "attrs >= 17.4.0",  # https://github.com/biocommons/hgvs/issues/473
    "biocommons.seqrepo >= 0.6.5",
    "bioutils >= 0.4.0,<1.0",
    "configparser >= 3.3.0",
    "ipython",
    "parsley",
    "six",
]

[project.optional-dependencies]
dataproviders = ['psycopg2']

This would obviously be a breaking change, which is a huge downside. The other option I can see would be a "core" package with the models and parser, which is installed into the hgvs package where the data sources and any other "extra" things are. These more esoteric use cases could install just the core package and not need the postgres dependencies.

tl;dr
Myself and likely a few other folks can contribute time to addressing this postgres issue, in whatever way the maintainers feel is appropriate, if they do feel it is at all appropriate.

Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Jun 27, 2024
@jsstevenson jsstevenson added keep alive exempt issue from staleness checks and removed stale Issue is stale and subject to automatic closing labels Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep alive exempt issue from staleness checks
Projects
None yet
Development

No branches or pull requests

3 participants