-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
NumPy Roadmap
Note that this page is old and at most only has some historical relevance. Please see https://numpy.org/neps/roadmap.html for the up-to-date NumPy roadmap.
This document lives at https://github.com/numpy/numpy/wiki/NumPy-Roadmap It was created at the NumPy sprint held at BIDS, 24 May 2018. Present during discussion: Charles Harris, Stephan Hoyer, Matti Picus, Jarrod Millman, Stéfan van der Walt, Nathaniel J Smith, Matthew Rocklin
Since then it has been added to by various authors. There was a discussion on the mailing list. We will revisit this at the upcoming SciPy conference at a BOF meeting.
- triaging
- document both python and C-API (i.e. borrowed or new refs)
- can we document reference borrowing close to C code (in the spirit of doctests for Python code)
- code coverage in tests
- policy for merging pull requests
- bugfixes - does it fix the bug
- releases have their own rules
- at least one reviewer
- documentation typos can be merged by the committer
- enhancements should go through the mailing list
- actually deprecate things that are deprecated
- make internals private
- Extensible dtypes
- Overloads for duck arrays (Stephan / Nathaniel's notes)
- allow interop with dask, pytorch, cupy, xarray
- see Matthew Rocklin's draft blog post (dead link) and Stephan Hoyer's response
- also see the NEP that grew out of this
- Random number generation refactor (Robert's notes) and allow backend switching
-
Typing stubs for mypy
- Basic support in NumPy
- Standard way to write/check annotations for N-dimensional arrays in Python (this will need a PEP)
- Internal refactorings of almost-external things like MaskedArray, f2py, numpy.distutils, financial
- How to handle the Matrix class; we want to avoid new usage, reduce external dependency
- Better configuration reporting (use it when reporting bugs)
- Allow switching numpy.linalg backends
We may not be able to change any of these Right Now.
- Currently strategy is: either pass to ufunc or convert to ndarray. Makes it hard to use NumPy as high-level API, because it is unknown (without reading the code) whether other classes will survive various operations. See discussion above.
- 0-dim arrays get converted into NumPy scalars
- multibuild wheel builder (Linux/OSX/Windows)
- Airspeed Speed Velocity for benchmarks (https://pv.github.io/numpy-bench/) or codespeed (speed.python.org)
- Maintain the websites scipy.org, numpy.org
- numpydoc
- pytest
- To what extent should we use pytest APIs?
- pytest is a little magical (e.g., fixtures)
- Should define a philosophy. Discussion led to a direction of simple use of pytest, using as little magic as feasable.
- OpenBLAS
- In memory, N-dimensional single pointer + strided arrays on CPUs
- Not specialized hardware
- Basic functionality for scientific computing
- Could potentially be split out into separate packages with coordinated releases:
- random numbers
- masked arrays
- polynomials
- financial?
- Overlapping functionality with SciPy:
- linear algebra
- FFT
- Windowing functions for ffts.
- where is the cutoff? Simple things that everyone needs should be in NumPy, specialized and complicated functions in SciPy. How do we decide which is which?
- Infrastructrure for data science packages (distutils, f2py, testing)
- Could potentially be split out into separate packages with coordinated releases:
- Higher level APIs for N-dimensional arrays
- NumPy is a de facto standard for array APIs in Python
- Protocols like
__array_ufunc__
.
These ideas may be used to solicit further funding.
- Missing values
- Labeled arrays
- In or out?
- To some extent solved by third-party libraries like xarray and pandas
- Typing is also a potential solution
- Speed (probably out of scope, given current resources)
- How important is this? To what extent should we compete with libraries like TensorFlow, etc.
- Use intrinsics (AVX, SSE, ...)?
- Establish a known benchmark suite which would serve to quantify the discussion. Any change would be measured against its effect across the entire suite.
- JIT options
- Goal: grow number of core maintainers
- Example (above) of documenting C code more carefully to lower barrier to entry
- Challenge: retention; if we spend time training new contributors, how do we get them to stay 5 years instead of 1 or 2 (grad school, e.g.)
- Engage with universities or industry (who could e.g. sponsor developer time)
- Can we have a standard mechanisms whereby industry can engage
- Sponsorship of: developer time / specific feature / ... ?
- Could take on self-contained tasks, such as type stubs
- Companies may benefit from smaller, better scoped packages; they may not want to take on "NumPy", but could be willing to engage with a smaller project that won't become a time sink. Hypothetical example: company like UK Met Office -> package like Masked Arrays
- Can we have a standard mechanisms whereby industry can engage
- Goal: more diverse and inclusive contributor community
- Office hours for interested participants?
- Sprints for beginners?
- Consideration: if we cannot move forward with features, external packages will work around us somehow. I.e., other packages in the ecosystem progress at faster rates than NumPy, if NumPy is unable to respond in a timely fashion to external needs, then ad-hoc infrastructure develops outside of the project, missing out on the benefits of NumPy's central place within the community.
- It is also quite hard to get balanced input on this specific issue: members of community who want to move forward more quickly may not be around any more.
- Examples:
- Pandas, which used to rely on / couple more closely with NumPy.
- https://github.com/dgasmith/opt_einsum
- Intel / MKL
- Tensorflow
- CuPy / Chainer
- Consumers: Autograd, Tangent, TensorLy, Optimized Einsum
https://gist.github.com/stefanv/353ded6353435bd84fa5e0ab4443e7d5
Merge ratios for the past year:
1.00 Charles Harris
0.16 Eric Wieser
0.11 Allan Haldane
0.08 Marten van Kerkwijk
0.02 Julian Taylor
0.02 Sebastian Berg
0.02 Ralf Gommers
0.01 Jaime Fernandez
- Higher level API which became a NEP