Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional pooch dependency #4666

Merged
merged 16 commits into from
May 11, 2020
Merged

Conversation

hmaarrfk
Copy link
Member

@hmaarrfk hmaarrfk commented May 9, 2020

Description

Makes pooch an optional dependency. This was at least in my original design plan. Not sure why it never got in, I'll sift through the comments again.

TODO: Pooch contribution guide.

When merged backport with:

@meeseeksdev backport to v0.17.x

Note that this does NOT address #4660 or #4664

Closes #4665

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

@hmaarrfk
Copy link
Member Author

hmaarrfk commented May 9, 2020

@jni was there a reason why we decided to make pooch a required dependency?

8e08610

I really don't remember.

@pep8speaks
Copy link

pep8speaks commented May 9, 2020

Hello @hmaarrfk! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 143:5: E303 too many blank lines (2)

Comment last updated at 2020-05-10 22:58:31 UTC

@hmaarrfk hmaarrfk changed the title [WIP] Optional pooch Optional pooch dependency May 10, 2020
Copy link
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmaarrfk I've suggested some typo fixes and some minor rephrasing. I also suggest keeping some of the tests I added recently, which still make sense even with pooch being optional. Otherwise this is good to go, approving so I don't hold this up for another round.

Comment on lines 211 to 215
"The requested file is part of the scikit-image distribution, "
"but requries the installation of an optional dependency, pooch, "
"to be used. To install pooch, use your preffered python "
"package manager. Follow installation instruction found at "
"https://scikit-image.org/docs/stable/install.html"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The requested file is part of the scikit-image distribution, "
"but requries the installation of an optional dependency, pooch, "
"to be used. To install pooch, use your preffered python "
"package manager. Follow installation instruction found at "
"https://scikit-image.org/docs/stable/install.html"
"The requested file is part of the scikit-image distribution, "
"but requires the installation of an optional dependency, pooch. "
"To install pooch, use your preferred python package manager. "
"For example: `conda install -c conda-forge pooch`, or "
"`pip install -m pooch`."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not pointing to our instructions since they don't have any pooch-specific documentation. I would find this confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think a little longer term about this? We can work on adding instructions, lets make sure that we are doing good by future generations of developers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two kinds of people:

  • Advanced users: will use shortcuts to get to google, and to search pooch python which will give them a package, that they will know to use conda/pip automatically. These don't need any instructions.
  • Novice users, that will be tempted to use sudo pip install pooch.

I am trying to help the novice users by pointing them to more fleshed out instructions than a one liner (that will likely not work and ensure their frustration).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading over this, I strongly disagree with the instructions provided.

For example, I strongly discourage

conda install -c conda-forge pooch

I really think there should be a single place where we provide install instructions, and not adhoc like an Error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised my concerns in:
#4672



@skipif(OFFLINE, reason='No internet connection')
def test_download_all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to remove this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test really doen't play with parallel testing.... affecting the global filesystem state is not fun.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also may induce a full download of all the files for the end users. not fun on our laptops when travelling with 4G connections.

def test_cells_3d():
"""Needs internet connection."""
path = fetch('data/cells.tif')
image = io.imread(path)
assert image.shape == (60, 256, 256)


def test_data_dir():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this test? I feel like it's important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added it back.

@jni
Copy link
Member

jni commented May 10, 2020

@hmaarrfk it looks like I made that decision unilaterally and explicitly labeled it as such in my PR #4483, but that comment got lost in the rebase/cherry-pick dance. So it was never discussed properly. I'm happy for this PR to go in. =) Thank you!

@jni
Copy link
Member

jni commented May 10, 2020

@hmaarrfk feel free to revert the last commit, I'm not super happy with it, but I think we should be able to make the initialisation lazy, and just fall back on our legacy dir in the meantime.

@hmaarrfk
Copy link
Member Author

I wanted to leave the laziness to an other PR for now. Seems hard to get right

@hmaarrfk
Copy link
Member Author

Super thanks for the rest!

@hmaarrfk hmaarrfk force-pushed the optional_pooch branch 2 times, most recently from 10ed769 to 26f5501 Compare May 10, 2020 15:21
Copy link
Member

@emmanuelle emmanuelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @hmaarrfk I left a tiny comment about one error message which was not completely clear to me. @jni you can merge in the Australian morning :-).

@emmanuelle
Copy link
Member

One question related to pooch: I don't remember why we are deprecating data.load()? Because for image data which don't have their own function, like the cells.tif file, using data.load() will be the only way to get this image? The deprecation message mentions using skimage.io.load but this function does not exist. Am I missing something?

Co-authored-by: Emmanuelle Gouillart <emma@plot.ly>
@hmaarrfk
Copy link
Member Author

I'm not too sure I remember why we deprecated load. I think the feeling was that it was a thin wrapper on imread and we preferred people not use it directly.

Having to use cell() will give you back the data, prelaoded, instead of asing you to read it again yourself.

For certain datasets, they are more complicated than "just a single image" so having a function that could return more flexible formats, like tuples was preferred.

@emmanuelle emmanuelle merged commit 14b521a into scikit-image:master May 11, 2020
@jni
Copy link
Member

jni commented May 11, 2020

@emmanuelle well I think the "new way" is supposed to be something like:

data.download_all()
cells = io.imread(os.path.join(data.data_dir, 'data/cells.tif'))

which is nice because it looks more like normal IO than data.function(). Having said that, we should provide a public way to fetch individual images.

@emmanuelle
Copy link
Member

@meeseeksdev backport to v0.17.x

@lumberbot-app
Copy link

lumberbot-app bot commented May 11, 2020

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout v0.17.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 14b521af8172ef3fad8dec289a796de004695f14
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #4666: Optional pooch dependency'
  1. Push to a named branch :
git push YOURFORK v0.17.x:auto-backport-of-pr-4666-on-v0.17.x
  1. Create a PR against branch v0.17.x, I would have named this PR:

"Backport PR #4666 on branch v0.17.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

@lumberbot-app lumberbot-app bot added the Still Needs Manual Backport MrMeeseeks-managed label label May 11, 2020
@emmanuelle
Copy link
Member

@jni thanks for the explanations. Yes, I agree that we should also have a way to download and open individual files. Probably not high priority since we indeed have download_all (which at the moment is still reasonable).

@hmaarrfk
Copy link
Member Author

i guess the easy way didn't work.

emmanuelle added a commit that referenced this pull request May 12, 2020
* Revert "Forget legacy data dir (#4662)"

This reverts commit 137f144.

* Make pooch an optional requirement

* Make pooch an optional dependency

* Add Pooch to documentation requirements

* Fixup add documentation to the docstring

* Add cell.png to the legacy registry

* Update datadir order

* Try to avoid circular stuff

* Split up the init pooch function

* Refine error message when pooch not found

* Remove unused imports

* Add the os directory test again

* Add a test for download all

* Add an error for installing Pooch.

* Crosslink to our installation instructions

* Update skimage/data/__init__.py

Co-authored-by: Emmanuelle Gouillart <emma@plot.ly>

Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Emmanuelle Gouillart <emma@plot.ly>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Still Needs Manual Backport MrMeeseeks-managed label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pooch is a hard requirement
4 participants