-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python-graphblas: high-performance sparse linear algebra for scalable graph analytics #81
Comments
Hi @eriknw! We're very glad to see you all have gone ahead with a full submission, after discussion in pyOpenSci/python-package-guide#21 (comment) as you linked above. I just want to welcome you and let you know we are working on this. I will finish the initial editor checks by the end of this week. @lwasser is traveling and we will need to co-ordinate about editors, but I expect to get back to you about that by the middle of next week at the latest. Thank you for providing all the detail in the submission. The context you've provided will be helpful for the review (and will definitely help me with editor checks!). It sounds like you've anticipated some points @lwasser brought up when we discussed in Slack. Looking forward to helping you improve the docs and giving you some blog post material! 😁 |
Editor in Chief checksThese are the basic checks that the package needs to pass to begin review.
Editor commentsThis package passes all checks; we can begin review. It's obvious the developers and maintainers have done a ton of work. Along those lines, some notes for the review (none of these need to be addressed before we start):
|
@eriknw @jim22k, @SultanOrazbayev the tl;dr is that python-graphblas passed editor checks 🙂 🎉 Like I said above, I'll need to co-ordinate with @lwasser who is traveling about an editor for this review, but would expect us to reply back here by middle of next week at the latest |
Hooray! 🎉 Thanks @NickleDave. We appreciate the attention y'all are giving us, and thanks for telling us what (and when) to expect next. We're in no particular rush--it's more important to give the right people the right amount of time to do things right :) |
🙌 finding the right people now! 🙂 |
Hi again @eriknw, @jim22k, @SultanOrazbayev -- just letting you know that I did have a chance to talk with @lwasser now that she has returned, and we are in the process of finding an editor When you have a chance could you please (all) fill out the pre-review survey?
(I know it's easy to miss in the template) |
Hi @eriknw, @jim22k, @SultanOrazbayev, brief update: |
Hi @eriknw, @jim22k, @SultanOrazbayev, and thanks to @NickleDave for the introduction. I've reached out to potential reviewers, and incidentally look forward to taking a closer look at |
👋 Hi @sneakers-the-rat and @szhorvat! Thank you for volunteering to review The following resources will help you complete your review:
Please get in touch with any questions or concerns! Your review is due in three weeks: 29th March 2023 |
Hello everyone 👋 I'm excited to do this review and learn more about the GraphBLAS approach in general. I plan do the review gradually, and through continuous communication with the authors. I will make it clear when I consider the review to be completed. Feel free to respond to anything I might bring up before then. The same applies to the review checklist: I will post it below today, and will check off boxes gradually. Any issues I open for python-graphblas will have a title prefixed with Expect comments mostly on mathematical aspects, correctness, docs, and usability from me. Hopefully other reviewers will cover the more technical aspects of Python. I will aim to complete the review by April 2nd. Since the authors will have the opportunity to address concerns before then, I hope this little delay over the 3 week deadline will be fine. I will not be available during the week of the 20th. Let me know if you'd like any changes to this arrangement—I can be flexible. For transparency, I should note that am involved with the igraph project (https://igraph.org/). igraph is not a competitor to python-graphblas, but it does have similar aims to NetworkX and in extension to graphblas-algorithms. I think this review will be a good opportunity for us to learn form each other. |
Review is now complete. Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme file requirements
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
UsabilityReviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Functionality
For packages also submitting to JOSS
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted. The package contains a
Final approval (post-review)
Review Comments |
@tomalrussell Could you please clarify if the authors are expected to follow all the above checklist items to the letter? (E.g., all mentioned badges mandatory, links to all "vignettes" from the README, all functions have examples of use, etc.) @eriknw :
Indeed, the documentation of most Python projects won't give usage examples for every single function, and doing so would definitely be a lot of work. But just to show that it is possible, and often tremendously useful to users, I wanted to point to Mathematica's documentation where each function has not one, but many examples. See e.g. LinearSolve. I tried to follow the same with my IGraph/M package for Mathematica, but it's still a work in progress. R packages also often have at least one example for each function. |
From what I recall we had a conversation at some point about allowing the author to define what is intended as the public interface of the package and what isn't? but ya for packages that wrap another library it seems like a lot of extra work if, eg. there are examples from the main library that are trivially different (ie. could be inferred) from the wrapper's API. |
I'll also be doing this JOSS-style, leaving this here and editing/raising issues as I go. I like @szhorvat 's idea of using an issue tag, so i'll also prefix mine with I'm happy to focus on more of the python implementation side of things, glad to have someone who's more adept with the math :). I don't have any conflicts to declare, except that I'm going to be writing some triplet store code soon, but that doesn't really relate or create a conflict imo. I don't have an expected completion date, but i have this in my calendar as a daily todo item and welcome being relentessly pinged if i am the one holding us up :) Review status
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme file requirements
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
UsabilityReviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Functionality
In trying to summarize what the functional claims of the software were, I think it basically boils down to
For packages also submitting to JOSSN/A Final approval (post-review)
Estimated hours spent reviewing: ReviewGeneral Comments
My outstanding recommendations in the remaining open issues are all future suggestions that the authors can take or leave, none are mission-critical. I want to compliment the authors on this excellent work, I'm glad to have had a reason to have read it. I would be happy to respond to any questions the authors have about this review and otherwise continue to engage on open issues. Code QualityLibrary wrappers have their own sets of challenges and idioms, and
The At the time of reviewing,
This structure makes for a nice user-facing interface, being able to refer to classes and methods in a natural way (eg. The tradeoff is the significant amount of indirection and implicitness in the code which presents a relatively high barrier to entry for new contributors. The mappings between the suitesparse GraphBLAS implementation and Python are programmed in several places, eg. within a method and then again in the tests, which the authors describe as being useful as a consistency check. Having some parts of the library dependent on import time side-effects is less than optimal and makes the code more difficult to reason about, but is certainly not fatal to the usability of the package. Again, the nature of a wrapping package requires making decisions about abstraction, so the only concern I have for the current code structure is the impact on maintainability. The current maintainers seem to have no trouble reasoning about the package, and I believe they are aware of these challenges and are actively working on them (they refactored a formerly massive The other question I had was about the relationship between versioning in the API implementation in Python and in the underlying C library. GraphBLAS is a relatively mature API and seems to be only rarely changed, with care for backwards compatibility, so this is less of a concern than for a wrapper around a more actively evolving API. The authors have chosen to formally support a specific version of GraphBLAS ( python-graphblas/python-graphblas#441 ) rather than build version-compatibility infrastructure within the package, which seems like an appropriate decision to me, given their other comments on how refactoring their backend system is on their development roadmap. I want to also emphasize several of the "nice to have" features in the package, including a very impressive Recorder class that can keep a record of all the GraphBLAS calls made for debugging and reproducibility, automatic compilation of user-supplied functions including lambdas using numba, and the excellent i/o module. These indicate the authors are actively invested in user experience above and beyond the tidy API they expose. Aside from the above comments, the fundamentals of the package design are strong: modern packaging with setuptools, automated code quality checks, CI, and coverage reports. The decisions made in code structure seem principled, responsive to the constraints of the problem, and result in a very usable user interface - my compliments to the Authors. DocsThe package is well documented from an introduction to the problem that GraphBLAS attempts to solve through package design decisions and practical examples. It is suitable for a general audience with some exposure to graph operations and programming in python, which is impressive given the highly specialized nature of the library. Future suggestions for the authors include embedding their example notebooks in the documentation, and improving the API documentation. Currently, I assume due to some of the abstraction decisions made in the rest of the package, there is not comprehensive documentation of every operation available to the user that might otherwise be accomplished with calling Altogether the docs are excellent with several clear points of improvement, but far above average in the landscape of scientific python packages. TestsWho among us can claim 100% test coverage? https://coveralls.io/github/python-graphblas/python-graphblas The tests are well organized and comprehensive, and I was able to find a corresponding test for every package feature I looked at easily. In more security-sensitive contexts one would want to do more adversarial input fuzzing, but I don't think that's all that relevant here since I've never seen graph data analysis libraries used as a malware vector. I have no notes on the tests, this is good work. Issues Opened:
|
Thanks both, great start 💯 - I'll aim to check in occasionally or as needed.
I would comment on anything you notice and let the authors respond, we can always exercise judgment if "to the letter" seems unhelpful. |
hey y'all. We normally prefer that reviews happen "all at once" in the sense that we're prefer the text of the review to NOT change once submitted and the conversation to happen after. I have reached out to JOSS about how the implement their reviews but I don't want to change our policy on how reviews happen (submitted all at once) until i've spoken to JOSS. @sneakers-the-rat @szhorvat it's fine if you want to leave the review text and check things off and open issues as you go but i prefer that the text of the review that you add to be added all at once to avoid any confusion regarding when you review is complete and what the maintainer of the package should focus on. Many thanks for understanding. I will update once i hear back from JOSS but i don't want to modify our process on the fly until we've thought things through more completely. many thanks for your time y'all! |
one other note - i think it's great to open issues as you go but one other element that is important is documentation of what changed and why so there is a full record of the review so please if you open issues be sure to reference them in the text of the review in the context of why you opened them. that will allow the editor (and us as an organization) to keep track of the review in one place. again many thanks! we are learning as an organization in this process |
fair enough @lwasser :) So - for this review, will only post text of review when checklist finished, is that what you had in mind? |
Thanks for the above, @sneakers-the-rat and @szhorvat ! @eriknw @jim22k @SultanOrazbayev - recognising that you've been engaged with the review already, and that some things may be pushed to a longer timeframe, are you happy to respond and make any priority changes by the end of the week (30th June)?
|
I applaud the reviewers 👏 . I never expected such thorough and honest reviews. Thank you for positive feedback and the criticisms. I agree with all of it--I think you nailed both the strengths and the weaknesses. The reviews are valuable and I suspect will help shape our vision and effort for the next couple of years. In particular, there are two main areas we need to give more attention:
Heh, this is probably true for many projects. My main focus in the near and medium term will be maintainability. Now if I may ramble on for a bit... It's interesting that reviews occur at a specific moment in time. I know the history of Anyway. I want to highlight this comment:
Absolutely! I agree 100%. We aspire to this. It will take time. From a product perspective, I/we wanted to get the syntax and functionality stable enough to begin writing Oh, and if you're curious why our test coverage is so high, it's' for multiple reasons:
Wrapping up... I think we have replied to all open issues from the review. They will keep us busy, that's for sure. @jim22k @SultanOrazbayev want to say anything else? I don't think it's necessary for you to comment here. Thanks again all, hope to see you around ❤️ ! |
Thanks @eriknw (also for the lovely tone and for the background and bit of history)! @szhorvat and @sneakers-the-rat for complete clarity, can you confirm you're happy with responses? |
Yes. The suggestions I made are mostly for the long term. |
Same, full approval from me :) |
Thanks all, time and effort very much appreciated. The review process is done! All that's left is to wrap up, publish the version of record and acknowledge all your contributions. 🎉 Author and Reviewer Wrap-Up TasksThere are just a few things left to do to wrap up this submission, @eriknw, @jim22k, @SultanOrazbayev:
Both reviewers and maintainers (@sneakers-the-rat, @szhorvat too):
Editor Final ChecksPlease complete the final steps to wrap up this review. @tomalrussell, please do the following:
If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide. |
@eriknw and team, I'd also like to invite you to write a blog post on your package for us to promote your work. If you are interested - here are a few examples of other blog posts:
This can be a really high-level motivation for the package, for a slightly-scientific-Python-user-audience, or could draw on your introductory tutorial material to get straight to what the package does.. This is totally optional and not a requirement, but if you have time, we'd love to spread the word about |
did the post-review survey and submitted contributors.yml patch :) |
Friends - i Believe this issue can be closed!! if it should be opened please just reopen or let me know! congratulations on a successful review and thank you everyone for participating in our pyOpenSci review process! I am so appreciative of you all!! ✨ |
w00t! Also, we'd be happy to write a short blog post :) |
This adds the package details for python-graphblas. This to-do was marked as completed in the main thread here: pyOpenSci/software-submission#81, but I didn't see the update in yaml. Did I miss it? If so, kindly disregard. @eriknw
Submitting Author: Erik Welch (@eriknw)
All current maintainers: (@eriknw, @jim22k, @SultanOrazbayev)
Package Name: Python-graphblas
One-Line Description of Package: Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics
Repository Link: https://github.com/python-graphblas/python-graphblas
Version submitted: 2023.1.0
Editor: @tomalrussell
Reviewer 1: @sneakers-the-rat
Reviewer 2: @szhorvat
Archive:
JOSS DOI: N/A
Version accepted: 2023.7.0
Date accepted (month/day/year): 07/14/2023
Description
Python-graphblas is like a faster, more capable
scipy.sparse
that can implement NetworkX. It is a Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics. Python-graphblas mimics the math notation, making it the most natural way to learn, use, and think about GraphBLAS. In contrast to other high level GraphBLAS bindings, Python-graphblas can fully and cleanly support any implementation of the GraphBLAS C API specification thereby allowing us to be vendor-agnostic.Scope
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
Audience: anybody who works with sparse data or graphs. We are also implementing a backend to NetworkX (which supports dispatching in version 3.0) written in Python-graphblas called
graphblas-algorithms
, so we are quite literally targeting NetworkX users!Python-graphblas provides a faster, easier, more flexible, and more scalable way to operate on sparse data, including for graph algorithms. There are too many scientific applications to list ranging from neuroscience, genomics, biology, etc. It may be useful wherever
scipy.sparse
or NetworkX are used. Although GraphBLAS was designed to build graph algorithms, it is flexible enough to be used in other applications. Anecdotally, most of our current users that I know about are from research groups in universities and laboratories.We are also targeting applications that need very large distributed graphs. We have experimented with Dask-ifying
python-graphblas
here, and we get regular interest from people who want e.g. distributed PageRank or connected components.pygraphblas
, which hasn't been updated in more than 16 months. There are many differences in syntax, functionality, philosophy, architecture, and (I would argue) robustness and maturity.python-graphblas
syntax targets the math syntax, whereaspygraphblas
is much closer to C.python-graphblas
handles dtypes much more robustly, has efficient conversions to/from numpy and other formats, is architected to handle additional GraphBLAS implementations (more are on the way!), has exceptional error messages, has many more tests and functionality, supports Windows, and much, much more. We have also been growing our team, because sustainability is very important to us.Although we have/had irreconcilable differences (which is why we decided to create
python-graphblas
), the authors have always been cordial. We all believe strongly in the ethos of open source, and I would describe our relationship as having "radical generosity". For example, we have an outstanding agreement that each library is welcome to "borrow" from the other (with credit). We may "borrow" some of their documentation :)We also worked together to create and maintain the C binding to SuiteSparse:GraphBLAS:
https://github.com/GraphBLAS/python-suitesparse-graphblas/
We could use help automatically generating wheels for this library on major platforms via cibuildwheel.
@tag
the editor you contacted:Limited prior discussion in this issue: pyOpenSci/python-package-guide#21 (comment)
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Note: Do not submit your package separately to JOSS
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
Other comments (manually added)
Given a product mindset, we believe that Python-graphblas is a great product, but I think our go-to-market strategy has been lacking. We have been very engineering-heavy, and even our goal of targeting NetworkX users is engineering-heavy via creating
graphblas-algorithms
. I hope this peer-review process can help us prioritize our efforts (such as a plan to improve documentation) as well as a place to write a blog post or two.Please fill out our survey
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.
The text was updated successfully, but these errors were encountered: