Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObsPy: Software Submission for Review #16

Closed
13 of 22 tasks
megies opened this issue Dec 12, 2019 · 37 comments
Closed
13 of 22 tasks

ObsPy: Software Submission for Review #16

megies opened this issue Dec 12, 2019 · 37 comments
Labels
0/pre-review-checks 3/reviewers-assigned on-hold A tag to represent packages on review hold until we figure out a bigger issue associate with review

Comments

@megies
Copy link

megies commented Dec 12, 2019

Submitting Author: @megies
Package Name: ObsPy
One-Line Description of Package: A Python Toolbox for seismology/seismological observatories.
Repository Link: https://github.com/obspy/obspy
Version submitted: 1.1.1
Editor: @lwasser
Reviewer 1: @canyon289
Reviewer 2: @raoulcollenteur
Archive: TBD
Version accepted: TBD


Description

ObsPy is an open-source project dedicated to provide a Python framework for processing seismological data. It provides parsers for common file formats, clients to access data centers and seismological signal processing routines which allow the manipulation of seismological time series (see Beyreuther et al. 2010, Megies et al. 2011, Krischer et al. 2015).

The goal of the ObsPy project is to facilitate rapid application development for seismology.

ObsPy is licensed under the GNU Lesser General Public License (LGPL) v3.0.

Scope

  • Please indicate which category or categories this package falls under:
    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization* <-- didn't tick this, because the benefit of a 'pre-submission inquiry' didn't become obvious for me, the issue form seems to be largely the same

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

  • Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

It falls under these categories, because we retrieve and munge data. Also loads of the functionality is concerning geospatial aspects (can't be avoided in seismology). Being as high-level as we are in functionality, I'd also consider it to be educational, since you can illustrate how to do many processing steps (and what pitfalls there are) in a few lines of code.
If this is more about the core aspects of the software, I guess you could also argue for only retrieval and munging, and the other are rather side aspects.. although it's unclear to me in which bullet item signal processing is included (munging I guess?).

  • Who is the target audience and what are scientific applications of this package?

Mainly scientists/researchers, but we know that obspy is also used in industry (no idea about numbers, though). Scientific applications span most fields of seismology that involve real or synthetic data, basically only excluding purely theoretical, pen and paper seismology.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

There are various packages with I/O of up to a handful file formats at a time, usually concerned only with 1-3 file formats used by the authors. I am not aware of other packages that offer access clients for all important protocols like we do (FDSN web services, SeedLink, Earthworm, Arclink, NRL, syngine, ...). Most important point would be the widespread use in projects built on top of obspy, most likely.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

no pre-submission enquiry filed.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
    • this feels a bit unclear, but in any case I'm not aware of any ToSs we would violate
  • has an OSI approved license
    • LGPLv3
  • contains a README with instructions for installing the development version.
    • these instructions are in the wiki which is linked in the README to keep it more concise
  • includes documentation with examples for all functions.
    • docs.obspy.org built from source with sphinx
  • contains a vignette with examples of its essential functions and uses.
    • submodule API landing pages (i.e. __init__.pys) hold the submodules' most important functions with explanations and usage examples,
  • has a test suite.
  • has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
    • Travis for Linux + Mac tests
    • Appveyor for Windows tests
    • Circle for quick linting and Flake8

Publication options

We have citable papers out, so no need for this I think. If anything we might do a "history of obspy" short story?

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

We're using Contributor Covenant Code of Conduct v1.4

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

@megies
Copy link
Author

megies commented Dec 12, 2019

Code of conduct was added after last stable release, it's in master and will be part of soon-to-be-released 1.2.0.

@lwasser
Copy link
Member

lwasser commented Dec 12, 2019

awesome! thanks @megies for this submission. we will get back to you soon!! likely after agu :) in the meantime are there people who you could suggest as reviewers? i can also dig into our spreadsheets but if you know of folks, please feel free to suggest here. now that i'm writing this i'm realizing we might consider adding this question to our template as well.

Editor's Template

Editor checks:

  • Fit: The package meets criteria for fit and overlap.
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has an OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

I asked the submitter below if they were interested in JOSS!


Reviewers: Listed above with a due date of feb 24
Due date:

note: i think it might be good to add the editor checks somewhere in the issue ?? it seems redundant to list reviewers twice and a due date at the bottom.

@megies
Copy link
Author

megies commented Dec 12, 2019

are there people who you could suggest as reviewers?

That'd be somebody working in seismonology, good in Python and not using obspy? Is that even possible?

👆 just kidding 😜 but yeah, most people I could come up with would be biased 😉

@lwasser
Copy link
Member

lwasser commented Jan 15, 2020

hey there @megies !! i'm just catching up on things with the holidays, and beginning of the semester here... so my apologies for the delay. Our next meeting is this thursday so i'll be sure to get this review going.

I'll put a few calls out to try to find some reviewers. it seems to me that someone who uses obspy could be ok?! but if they are closely related to the development already that would not be ok. let's see if we can use a combo of twitter and our forum to find a few reviewers. more in a bit. i'm hoping to play catch up over the next week and this is a priority for me as it's been sitting for a month. thank you for your patience!!

@megies
Copy link
Author

megies commented Jan 17, 2020

Like I said, it's all good, we're in no hurry about this at all. 👍

@canyon289
Copy link

Hi, I saw the call for reviewers on twitter, all for open source science, would like to be helpful. I'm not in seismology and don't use ObsPy, but do help maintain ArviZ and PyMC3 which I hope qualifies me enough. Let me know if I meet the bar.

@raoulcollenteur
Copy link

I could help with a review. Not specifically known in seismology but I am a hydrogeologist and (co-) develop some python packages. Let me know if help is needed :)

@lwasser
Copy link
Member

lwasser commented Jan 30, 2020

hi @canyon289 and @raoulcollenteur if you have experience developing packages, and are on the geosciences end of things, I think that could work well for this review. @megies please let me know what you think about two people reviewing who are not from the seismology community explicitly? i suspect this might be challenging to find someone without a conflict of interest and such. I know both are technically qualified based upon the comments above!! Thank you all for your responses.

@megies
Copy link
Author

megies commented Jan 31, 2020

@megies please let me know what you think about two people reviewing who are not from the seismology community explicitly?

I don't see a problem with that at all. 👍

@lwasser
Copy link
Member

lwasser commented Jan 31, 2020

awesome @raoulcollenteur and @canyon289 if you are still willing, I am going to assign you as reviewers for obspy!! Please plan to have your feedback to @megies by Monday Feb 24th.

You can use our review guide as a basis for your review. https://www.pyopensci.org/dev_guide/peer_review/reviewer_guide.html We actually have some reviews that you can look out now (our guide is dated already!!) which include:

@megies did check they are ok with issues being submitted directly to the obspy repo! so if you'd like you can organize that review with issues providing commentary here and then linking to issues that you open in obspy. how you prefer to work is up to you!

If you have any questions please get in touch here!! Thank you again for volunteering your time to review this package! It is a very widely used package in this domain looking at the package download stats on conda so we are excited to see it submitted here! Have a wonderful weekend!!

@lwasser
Copy link
Member

lwasser commented Jan 31, 2020

@megies please excse me if i already asked you this? are you interested in a JOSS citation? it's not a significant amount of extra work. you write the "paper" which can be quite short. earthpy has one in it's repo, and then joss automatically accepts the review that happens here so it's fast once the paper is complete. You then get a more formal citation through a journal which is really nice and linked to your orcid ID (and any other contributors that youd like to add). this is an option so just let me know if you have any interest in it! i'm excited to get this review going!

@lwasser
Copy link
Member

lwasser commented Jan 31, 2020

one other note -- @raoulcollenteur and @canyon289 the review template is here in case you didn't see it above! https://www.pyopensci.org/dev_guide/appendices/templates.html

@canyon289
Copy link

Great. I'll get this done by the 24th. Also interested in whether this is getting submitted to JOSS as there seem to be extra reviewer items for me to go through if that's the case.

@megies
Copy link
Author

megies commented Feb 1, 2020

please excse me if i already asked you this? are you interested in a JOSS citation? it's not a significant amount of extra work

No problem. I'm not sure if it would do much though, since we have three citable papers for obspy but if you have a good idea how to theme it or if you think it's still adding something, let me know. :-)

@megies
Copy link
Author

megies commented Feb 1, 2020

Also interested in whether this is getting submitted to JOSS as there seem to be extra reviewer items for me to go through if that's the case

You can start on those items not affected by JOSS submission :-)

@lwasser
Copy link
Member

lwasser commented Feb 3, 2020

awesome. thanks all!! sure. i'll actually reach out to @arfon on this. Arfon if you happen to see this - does JOSS provide anything new in terms of citation and visibility if a tool has already been published in a few other journals?

@arfon
Copy link

arfon commented Feb 3, 2020

I'm not sure if it would do much though, since we have three citable papers for obspy but if you have a good idea how to theme it or if you think it's still adding something, let me know.

👋 If you already have a paper (or multiple papers!) that you like people to cite for this software then I don't think it makes much sense to publish a paper with JOSS.

@lwasser
Copy link
Member

lwasser commented Feb 4, 2020

Thank you @arfon !! that was fast. ok great.. i just emailed you about this given i have a similar question for another review underway!

@canyon289
Copy link

Hi,
I've reviewed the package and have two comments

For the author

  1. Tests are failing in CI and some fail locally. http://tests.obspy.org/109013/
  2. I wasn't able to quickly find a usage tutorial of the "intro to obspy" sort in the README. The docs are quite extensive so I'm sure one exists somewhere but I wasn't able to find it. The README shows an output image, perhaps the code used to generate that can be shown.

For the editors

  1. There aren't examples for every user facing function, but there are for most and API for this library is so expansive I can sympathize with not having examples for all functionality. Nonetheless if go by the wording it states all functions need user facing documentation. How should the review proceed here?

In depth comments and examples are in this google doc
https://docs.google.com/document/d/1kmH3b9_X_KZok1wXAwVUhgjQTtDQg5xwIcjB5oA_WOQ/edit?usp=sharing

@raoulcollenteur
Copy link

raoulcollenteur commented Feb 23, 2020

Sorry for the slightly late response, but here it is! I think the authors did a great job on the package . I have a couple of comments which I think would improve the introduction of the package for new users. Please find those at the end of the review.

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need is provided.
  • Installation instructions: Good documentation for different forms of installation is provided. I tried an install on MacOS 10.15.1 and anaconda with Python 3.7 and that worked fine. I think support for P3.7 could be added?
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions. API documentation is provided.
  • Examples for all user-facing functions
  • Community guidelines a contributors guide is available, and looking at the increasing number of external contributors it is working.
  • Metadata A general email is provided, but everything is traceable through GitHub anyway.

Readme requirements

  • Package has a README.md file in the root directory.
  • The package name is included (nice logo:))
  • Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review has not been included yet.
  • Short description of goals of package with links is provided.
  • Installation instructions
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • Citation information

Functionality

  • Installation: Installation succeeds as documented. Running the tests gave me two minor errors, and ObsPy gives the handy functionality to automatically update the package maintainers on this. Well done.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: I could not find real claims on performance, so this is fine.
  • **Automated tests:**Running the tests gave me two minor errors, and ObsPy gives the handy functionality to automatically update the package maintainers on this.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

I am not a seismologist by training so I will limit to software technical/UI review. ObsPy is a Python package targeted at seismologists and seems to have a large and active user base. I think this is better proof for the need and usability of this package than anything else. I congratulate the authors on what seems to me a well accepted Python package. The Wiki and documentation are extensive. Some of the information feels a little dispersed, maybe these two sources could be merge to one website in the future to improve easy access to all these resources. The gallery gives a nice quick overview of the possibilities, including the necessary code. In addition to all these resources there are multiple articles introducing the software that provide a quick introduction. I also found a couple of useful repositories in the organisation account (e.g., https://github.com/obspy/docs) that include notebooks, these could be promoted from the main repository as well.

Major comments

  • One thing that I think should be addressed before publishing the package is the succes of CI tests. The CI is failing for both MacOS/Linus and Windows (on 2020/02/23).
  • I would suggest to create a single Jupyter Notebook that shows the most important functionality and runs on both local machines and Binder as an easy introduction to the package. I found it a little difficult to find a notebook to test the most important functionality (I now copied code from https://docs.obspy.org/tutorial/ into a notebook..)

Minor comments

  • Add explicit link to documentation website (http://docs.obspy.org) on the readme so it's easier to find. Possibly replacing the "read more" link as it is not clear this link brings you.
  • The wiki has a short three-line section describing how to get started. I suggest adding a short section "Getting started to the readme. It provides some good pointers on how to quickly check out what ObsPy can do (e.g., the gallery). This could replace the "usage demonstration" which I do not think makes sense to add to the readme for this package.
  • Possibly start supporting python 3.7? I tested on 3.7 and it all seems to work well.
  • I see that some commits are now pushed directly to the master branch, so that the master branch is not necessarily the stable version anymore (e.g., on 2020/02/23 the build is failing). It could be an idea to create an additional development branch where direct pushed are possible, and keep the master branch as the stable version (tagged by a GitHub release).

@megies
Copy link
Author

megies commented Feb 25, 2020

Thanks for the feedback, I'll try to have a detailed look as soon as I can.. we're pushing for a major release right now, so might not be able to reply immediately.

@megies megies mentioned this issue Feb 26, 2020
20 tasks
@megies
Copy link
Author

megies commented Feb 26, 2020

Here's my reply to @canyon289, thanks for the helpful comments. I also left some remarks in the google docs, mainly about those failing tests in CI.

1

We quite often have tests failing which most times is due to the tests being so rigorous. I can make you a log of numpy floating point operation result differences at the 15th digit and their changes across platforms, architectures and numpy versions. Fun aside, all of the fails you linked are image test fails, you can actually check out the images uploaded to imgur from the traceback message. Usually it's ticks moving around a bit.

2

Usage examples could be found following the read more link to the wiki and move to Getting started section referring to the Tutorial in the docs. I guess it was not perfectly visible, so I made the read more stand out a bit more and added a small Getting Started section as well.
The plot you were referring to is not really an ObsPy plot, but rather plotting our usage stats, so it's not appropriate to show the code for it, but I added a mini code example with a picture as a teaser in the README. Thanks for the suggestion to make usage teaser stand out a bit more in the README.

from obspy import read
st = read()  # load example seismogram
st.filter(type='highpass', freq=3.0)
st = st.select(component='Z')
st.plot()

plot

@megies
Copy link
Author

megies commented Feb 26, 2020

@raoulcollenteur, thanks for your review, here's my comments/replies:

  • @canyon289 also commented about the missing usage example in README, I added it to the README and also improved the "Getting started" in the wiki. I also added mention of www.seismo-live.org which has static preview notebooks as well as binder for interactive use of all the ObsPy training notebooks (and much more).
  • I don't really understand the Vignette(s) demonstrating major functionality that runs successfully locally item actually @lwasser. In any case all the important submodule API landing pages have extensive examples (in addition to examples in specific important method/function API pages), so I'm pretty confident we're good there, e.g. http://docs.obspy.org/master/packages/obspy.clients.fdsn.html
  • again, about the tests, same comments like to @canyon289 apply. many test fails are because tests are a) so rigorous, b) new matplotlib version decides to place ticks 2 pixels further left, c) network/server side issues we don't have control over. I also hate not see green all the time, but we tend to accept to have some more fails sometimes but be sure that we can actually see it when something really breaks
  • notebooks with the most basic usage are in place, also mentioned in the top of the "Tutorial" docs page. All of that is also on binder via seismo-live, like stated above I added better mention and links to that in README and wiki "Getting started".
  • Possibly start supporting python 3.7? We do actually, and have 3.7 and 3.8 also in CI, I guess you are referring to what is stated on PyPi? Those metadata will get updated by uploading the next release, its already fixed in master.
  • I see that some commits are now pushed directly to the master branch [...] It's an interesting idea but it might confuse some of our less git/github-savvy contributors even more. It already is confusing for some to have a master branch and then a maintenance branch for fixes only.

@megies
Copy link
Author

megies commented Feb 26, 2020

@lwasser responded to both reviews, let me know if you need more from my side

@lwasser
Copy link
Member

lwasser commented Feb 28, 2020

thank you @megies !! i'll look at this a bit more closely today!
one note - when we talk about vignettes we like to see them in the api docs but also something like this is always nice:

It's becoming clear to me that we need to more carefully consider levels of documentation for good, better and best . The issue of API documentation with examples for instance has come up multiple times. I'm curious what your take is on this. We have an discussion on it in discourse started here. @canyon289 and @raoulcollenteur i'm very interested in your thoughts as well if you have any input! All - thank you for supporting / going through review process and for your patience as we develop our policies and guidelines!

@lwasser
Copy link
Member

lwasser commented Feb 28, 2020

also @canyon289 and @raoulcollenteur can you kindly have a look at the comments above and let me know if you are satisfied with @megies responses? i see that you guys did go back and forth in the google doc! i will copy that text over to this issue just so we have a record of it now (in case the doc gets lost in the future!).

@lwasser
Copy link
Member

lwasser commented Feb 28, 2020

THIS IS THE GOOGLE DOC REVIEW TEMPLATE - from @canyon289 copying it here for records and safe keeping!!

Review Template

Checklist Legend
X - Complete and meets criteria
? - Not sure if criteria is met
Not Met - Criteria is not met
Not applicable - Not required for this review

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

No conflict of interest!

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README

Short description is available on README. Link to a very comprehensive presentation is linked on README to youtube.

But there are a number that are very well documented
http://docs.obspy.org/packages/obspy.core.html#module-obspy.core
How strict is the word all here?

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
    Code of Conduct is available and is very much in line with inclusive community standard

  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py file or elsewhere.
    Contributors list avaiable in repo and gitter is available for questions. I believe this enompasses the metadata question
    https://github.com/obspy/obspy/blob/master/obspy/CONTRIBUTORS.txt

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.

Badge for review has not been included yet but everything else has

  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.

  • Installation instructions

  • Any additional setup required (authentication tokens, etc)

  • [Not Met] Brief demonstration usage
    A brief demonstration of usage is not included at this time

  • Direction to more detailed documentation (e.g. your documentation files or website).

  • [?] If applicable, how the package compares to other similar packages and/or how it relates to other packages
    Unaware of what similar packages would be in this space

  • Citation information

Functionality

  • Installation: Installation succeeds as documented.
    Installation from conda forge was well documented and painless

  • Functionality: Any functional claims of the software been confirmed.
    I’m not an expert enough with seismology to deeply evaluate features. From documentation it looks like what is promised does work

  • [Not Applicable] Performance: Any performance claims of the software been confirmed.
    I assume it works. I’m not qualified enough in seismology to deeply verify this one

  • [Fail] Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

Tests do not pass on my local machine with fresh installation from master (commit: 34c70be39b71c6ec224a38045bb2a89a75f56a76)

When pyimgur is installed tests then fail because they’re hitting a web api that is sending back 400.

http://tests.obspy.org/109013/

This test failure service you’ve implemented though is really cool.

  • [?] Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

CI integration exists for appveyor and Travis but Travis seems to be failing at the moment

For packages co-submitting to JOSS

  • [Not Applicable] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • [Not Applicable] A short summary describing the high-level functionality of the software
  • [Not Applicable] Authors: A list of authors with their affiliations
  • [Not Applicable] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [Not Applicable] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

Review Comments

Fix: Develop installation

Developer installation isn’t straightforward. For editable mode. It seems that numpy is needed prior to using pip, but this is not listed in the instructions
https://github.com/obspy/obspy/wiki/Developer-Installation

Suggestion: Intro for dummies

A brief abstract about how this package is useful for seismologists would be great for non seismologists like me. There is an hour youtube video available but that doesn’t work for someone scrolling through

@canyon289
Copy link

also @canyon289 and @raoulcollenteur can you kindly have a look at the comments above and let me know if you are satisfied with @megies responses? i see that you guys did go back and forth in the google doc! i will copy that text over to this issue just so we have a record of it now (in case the doc gets lost in the future!).

There's a couple of things that are still open.

  1. Regarding the note below I still think developer installation can be easier. Instead of requiring a user to understand an error message to fix their install, would it be easier for newcomers if the instructions just included a note to install numpy?
    image

  2. The failing tests when locally cloned are confusing. I get that in CI things might be flaky, but for a developer who's trying to help out it's difficult to tell if failing tests are a problem with your local environment or an external service. I would advise adding pytest markers that skip tests that could fail locally so folks cloning the library can quickly tell if things are working before start contributing.

All of the above is especially important if code is being pushed straight to master. I think the fixes are fairly easy which is why I'm mentioning this here. These two things will help potential new contributors from experiencing frustration!

Hope this helps!

@raoulcollenteur
Copy link

Thanks for the comments on the review @megies, I think all the questions were answered well. Just the last thing I am not really sure about is the failing of CI tests (Travis fails: https://travis-ci.org/obspy/obspy). I agree with @canyon289 that this is rather confusing and could be fixed by allowing some tests to fail. I think many people use this to quickly evaluate a package for a) practical use / quality and b) ensure functioning during development of new features. In general, I would like to see the CI tests work for accepted packages in pyOpenSci.

  • @canyon289 also commented about the missing usage example in README, I added it to the README and also improved the "Getting started" in the wiki. I also added mention of www.seismo-live.org which has static preview notebooks as well as binder for interactive use of all the ObsPy training notebooks (and much more).

The links to seismic live don't work for me, you might want to check those. Readme looks good now.

  • I don't really understand the Vignette(s) demonstrating major functionality that runs successfully locally item actually @lwasser. In any case all the important submodule API landing pages have extensive examples (in addition to examples in specific important method/function API pages), so I'm pretty confident we're good there, e.g. http://docs.obspy.org/master/packages/obspy.clients.fdsn.html

I agree with @megies, I think this is fine as is. @lwasser I feel PyOpenSci should leave the exact implementation to the developers and not mandate that these vignettes should be in the API. It is nice, but for some applications it is not feasible in my opinion.

  • Possibly start supporting python 3.7? We do actually, and have 3.7 and 3.8 also in CI, I guess you are referring to what is stated on PyPi? Those metadata will get updated by uploading the next release, its already fixed in master.

I was indeed talking about the Pypi badges, great!

@megies
Copy link
Author

megies commented Mar 4, 2020

@canyon289

  • Regarding the note below I still think developer installation can be easier. Instead of requiring a user to understand an error message to fix their install, would it be easier for newcomers if the instructions just included a note to install numpy?

I did not see the problem with the developer installation instructions in the wiki, to be honest. It was advertising using Anaconda as the most straightforward solution, in which case the mentioned error is not encountered as numpy will be there. For people not following the recommendation, I would expect them to be experienced and I would deem "No module named numpy. Please install numpy first, it is needed before installing ObsPy." as sufficiently clear what is the problem.
Or is there a misunderstanding and you were you not looking at the "Developer Installation" page in the wiki but some other place I can not think of right now?

In any case, it never hurts to further improve so I made some more changes, which should be crystal clear now.

  • The failing tests when locally cloned are confusing. I get that in CI things might be flaky, but for a developer who's trying to help out it's difficult to tell if failing tests are a problem with your local environment or an external service. I would advise adding pytest markers that skip tests that could fail locally so folks cloning the library can quickly tell if things are working before start contributing.

I am sorry, but I don't see this being feasible right now in a manageable amount of time or without being disruptive to the whole test setup. Tests that can fail locally in maybe even 50% of the test cases, depending on installed numpy, scipy, matplotlib, ... version, depending on operating system in some cases, depending on platform and depending on network availability, etc etc. To be honest, it is even easier to keep false positive fails under control in CI as opposed to user installations (esp. because of volatile dependencies). I doubt it will ever be possible to a-priori identify those at all. Also, pytest is not (yet) our main testing backend but rather plain unittest with our test runner on top, steering things.
So, I'm sorry but I do not think I can facilitate that request short term.

In any case, thanks again for the review comments!

@megies
Copy link
Author

megies commented Mar 4, 2020

@raoulcollenteur

The links to seismic live don't work for me, you might want to check those. Readme looks good now.

You mean http://seismo-live.org/ is not working for you? Maybe a random network fail. The link should work

I agree with @megies, I think this is fine as is. @lwasser I feel PyOpenSci should leave the exact implementation to the developers and not mandate that these vignettes should be in the API. It is nice, but for some applications it is not feasible in my opinion.

Looking at the earthpy vignette gallery @lwasser linked, actually this is pretty much what our gallery is? It also has a mosaic of plots that link to corresponding parts of the API pages.


I can see that there is some argument about whether green badges are more important or being able to tell when things really go wrong, and when they started to do so (from the database of test results), and many people are speaking out for skipping tests or removing them -- I don't like either. I think this can not be resolved short term for this review, if anything, I'd rather remove those badges from the README, to be honest. We have a lot of tests relying on network/server feedback, image tests and numerical tests, all of which can be hard to keep passing with volatile dependencies, across platforms and external upstream server and network issues. I really don't see the point of a green badge when it mostly tells you "we removed all failing tests from the code base or just don't run em anymore because they just were a pain".
Feel free to keep the review on hold if it doesn't fit the guidelines, I just don't see a short term solution not taking heaps of work to do for now.

@lwasser
Copy link
Member

lwasser commented Mar 4, 2020

hey there @megies !! this is great insight - thank you for it.
Let me tell you first and foremost, things have been slow because i have had some medical and family issues and have pretty much been offline for 2 weeks and am now behind. MY APOLOGIES for that! i am trying to catch up now.

I don't want these issues to STOP this review process! what has been happening is that every review we learn something and potentially change, enhance update our docs. We are new and want to ensure our review process is

  1. robust
  2. not too onerous for reviewers AND
  3. not too onerous for developers

So what we are trying to do is find that middle ground where the python scientific community says "cool - we think pyopensci guidelines are ones we want to follow because they are reasonable yet still robust".

We (including @choldgraf ) had a great discussion about tests and expectations for pyopensci in our last meeting. I absolutely agree that we won't resolve this - in this review. We have some other ongoing discussions about how much documentation is "good enough". I'd like to find a middle ground for this review and then add what we learn to our docs. i will followup on language in our dev guide that might address this and would really appreciate your input on this @megies because as a developer of a larger package, i want to ensure things aren't too onerous for you!

We are running into the same issue with docs.

in the most perfect ideal world we'd have docs as follows:

  1. small application vignettes in the doc strings that are visible calling help(function/class etc) and
  2. tutorials or example workflows.

i see in our docs we ask for a get started in the readme, vignettes and doc strings which seems like a lot. we are trying to find a "good enough" middle ground. This is what we came to in our last meeting and i'm curious what you all think about it! i plan to add a bit more to our discourse discussion as well but let me know what you all think about below:

**good enough (required)**: all user facing functions (inputs and outputs) are documented both in the code and in the user-facing documentation.
**better**: all user facing functions are documented with examples (in tutorials or doc strings)
**best**: all user facing functions are documented and there is are tutorials (vignettes) that demonstrate example user workflows 

i don't have any language for testing yet but would love some help editing our dev_guide language to ensure it's reasonable.

I agree that just removing failing tests is not the answer if they do serve some purpose. i'm just not sure what language is best in this case.

thank you all for your patience. developing our review guidelines has been challenging but also a lot of fun as we are learning so much and getting a lot of input from people like you guys!

@canyon289
Copy link

canyon289 commented Mar 4, 2020

@megies
Thank you for updating the developer instructions! I feel that that point has been sufficiently addressed and I appreciate you sharing your thoughts as well.

@megies @lwasser and @choldgraf
About tests and independent of this review, I had similar experiences with flaky tests on ArviZ and the PyMC projects and ended up making some big changes to our testing pipeline to get over the hurdle. To @megies point its often not trivial, and not a feature for end users.

For what its worth I wrote a blog post on the thought process and our solution here. Not suggesting this is what PyOpenSci or obspy should be use as a standard. Just some more background on what my/our open source projects ended up doing, if it is at all helpful.
http://canyon289.github.io/DockerforDS.html#DockerforDS

@megies
Copy link
Author

megies commented Mar 5, 2020

@lwasser no worries and I hope everything got sorted out 👍

**good enough (required)**: all user facing functions (inputs and outputs) are documented both in the code and in the user-facing documentation.
**better**: all user facing functions are documented with examples (in tutorials or doc strings) 
**best**: all user facing functions are documented and there is are tutorials (vignettes) that demonstrate example user workflows 

@lwasser This looks like a good starting point for a classification. From my experience it might be enough to say (for e.g. "better") "all user facing functions are documented in docstrings, most important or likely-to-be-used-often functions include examples in doc strings and/or tutorials".

But then I have to be self-critical here, obspy has grown from small to very big and some policies for better coding practice were picked up one by one. So at some point we set a policy that all functions/methods/classes deemed private are prefixed by one underscore, and subject to change of API or complete removal without warning or deprecation at devs discretion. For these I would also not demand full documentation or examples in most cases. But, we have lots of such functionality that is not marked private by naming, but we would still consider it more internal helpers that do not need full effort in docstrings or even examples. So I guess in our case you could say that it is not straight forward to tell what is really supposed to be deemed "user-facing". These are historical artifacts and may be similar (or not..) for other projects that were not laid out on the drafting table but rather have grown steadily.

Also, I think many smaller, yet user-facing routines might not not full blown tutorials. As an example, this helper function is for users, but I would argue the docstring says it all: http://docs.obspy.org/master/packages/autogen/obspy.geodetics.base.gps2dist_azimuth.html

@lwasser
Copy link
Member

lwasser commented Mar 16, 2020

ok team. My apologies for being absent for the past few weeks. So many things have been happening. getting back to this and reading through everything again, it seems like the reviewers are generally happy with obspy now. I see that @megies did make the install a bit more clear! thank you. it looks great and i see numpy mentioned as a requirement.

So here we are left with two bigger issues --

  1. How much documentation is "good enough" for pyopensci
  2. Tests - how much do we ask of dev's in terms of testing frameworks when some tests fail locally. options
    • pytest markers which can work well but you are essentially skipping running some tests
    • leave it as is
    • Fix the tests (which may be quite onerous in this case to ensure they run across all user machines

@canyon289 and @raoulcollenteur have i represented things well above? are there other outstanding issues in this particular review?

@canyon289
Copy link

@lwasser Apologies on my late reply

That list looks good. The one other thing I would add is whether the master branch of CI should be failing or not. Basically should there be a green badge for CI on the README or not.

@lwasser
Copy link
Member

lwasser commented Sep 1, 2022

Good morning, @megies et al. I am following up on this review. I realize this is a long stale issue but I wanted to revisit to dot i's and cross t's as necessary. First, thank you all for your time in submitting and reviewing this package. If i recall, the issue with proceeding with this review was the fact that many of the tests (please correct me if i'm wrong) were failing due to understandable issues. But i think we, as a community need to talk more about what we look for in test suites. What is difficult from a contribution standpoint is that if tests regularly fail, it might be hard for contributors to understand the points of failure and contribute meaningfully. I'm happy to have a more in depth discussion about this @megies if you'd like to chat further as it could help us as we define standards for review in pyOpenSci. Please let me know. For now, I am going to close this issue. However, if anyone responds I am happy to reopen it and if we need to have a broader discussion, we can move it over to discourse for that purpose.

@lwasser lwasser closed this as completed Sep 1, 2022
@lwasser lwasser added the on-hold A tag to represent packages on review hold until we figure out a bigger issue associate with review label Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0/pre-review-checks 3/reviewers-assigned on-hold A tag to represent packages on review hold until we figure out a bigger issue associate with review
Projects
None yet
Development

No branches or pull requests

5 participants