Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: the recommended ruff rule for migration to numpy2 is missing many detections and fixes #26800

Open
ppashakhanloo opened this issue Jun 26, 2024 · 7 comments
Assignees

Comments

@ppashakhanloo
Copy link

Issue with current documentation:

Hello, Numpy folks!
There is a ruff rule (NPY201) that is recommended for migration from numpy to numpy2. I did a thorough comparison between Python API removals and deprecations with what the NPY201 rule provides. There are many removals and deprecations that are in the documentation but the ruff rule does not consider.
In many of the cases, a fix could be performed automatically.

Here is a summary of what is missing from the rule NPY201:

Changes that are not detected at all

  • np.geterrobj, np.seterrobj and the related ufunc keyword argument extobj= have been removed. The preferred replacement for all of these is using the context manager with np.errstate():.
  • removed ERR_*, SHIFT_*, np.kernel_version, np.numarray, np.oldnumeric and np.set_numeric_ops.
  • removed namespaces: np.FLOATING_POINT_SUPPORT, np.FPE_*, np.CLIP, np.WRAP, np.RAISE, np.BUFSIZE, np.UFUNC_BUFSIZE_DEFAULT, np.UFUNC_PYVALS_NAME, np.ALLOW_THREADS, np.MAXDIMS, np.MAY_SHARE_EXACT, np.MAY_SHARE_BOUNDS.
  • np.issctype, np.maximum_sctype, np.obj2sctype, np.sctype2char, np.sctypes were all removed from the main namespace without replacement, as they where niche members.
  • np.compare_chararrays has been removed from the main namespace. Use np.char.compare_chararrays instead.
  • The charrarray in the main namespace has been deprecated. It can be imported without a deprecation warning from np.char.chararray for now, but we are planning to fully deprecate and remove chararray in the future.
  • np.format_parser has been removed from the main namespace. Use np.rec.format_parser instead.
  • The experimental numpy.array_api submodule has been removed. Use the main numpy namespace for regular usage instead, or the separate array-api-strict package for the compliance testing use case for which numpy.array_api was mostly used.
  • Support for seven data type string aliases has been removed from np.dtype: int0, uint0, void0, object0, str0, bytes0 and bool8.
  • np.trapz has been deprecated. Use np.trapezoid or a scipy.integrate function instead.
  • np.in1d has been deprecated. Use np.isin instead.
  • Arrays of 2-dimensional vectors for np.cross have been deprecated. Use arrays of 3-dimensional vectors instead.
  • np.dtype("a") alias for np.dtype(np.bytes_) was deprecated. Use np.dtype("S") alias instead.
  • Use of keyword arguments x and y with functions assert_array_equal and assert_array_almost_equal has been deprecated. Pass the first two arguments as positional arguments instead.

Changes that are detected but no automatic fix is provided

  • np.cast has been removed. The literal replacement for np.cast[dtype](arg) is np.asarray(arg, dtype=dtype).
  • np.set_string_function has been removed. Use np.set_printoptions instead with a formatter for custom printing of NumPy objects.
  • np.recfromcsv and recfromtxt are now only available from np.lib.npyio.
  • np.asfarray has been removed. Use np.asarray with a proper dtype instead.
  • np.find_common_type has been removed. Use numpy.promote_types or numpy.result_type instead. To achieve semantics for the scalar_types argument, use numpy.result_type and pass 0, 0.0, or 0j as a Python scalar instead.
  • np.nbytes has been removed. Use np.dtype(<dtype>).itemsize instead.

Idea or request for content:

My suggestion is to add these changes to the ruff rule. Otherwise, it would be really helpful if the provided and not-provided changes were documented (either in the corresponding ruff rule page or the migration guide) so the users would know what they were getting (and not getting) by applying the rule.

@ngoldbaum
Copy link
Member

Ping @mtsokol in case any of this was done on purpose.

@mtsokol
Copy link
Member

mtsokol commented Jun 26, 2024

Thank you for writing a thorough comparison!

Changes that are not detected at all

I think that most of these members were niche, and GitHub search showed none or small number of usages (like https://github.com/search?q=np.FLOATING_POINT_SUPPORT&type=code).

Also the migration guide mentions all of them: https://numpy.org/doc/stable/numpy_2_0_migration_guide.html (I think np.format_parser and data type string aliases are missing, and two np.char related ones).

Generally I think that most of them could be still added to the Ruff rule.

Other comments: For np.cross we have a deprecation warning, I don't think it can be detected by a Ruff rule. assert_array_equal also has a deprecation warning. np.array_api was experimental, therefore I don't think we need a check for it.

Changes that are detected but no automatic fix is provided

For these points I think it's just a matter of figuring out how to achieve it in Rust.
I think that current migration messages are clear and I expect that downstream libraries would use these API members in a couple of places at most, therefore a manual intervention is still feasible offhand.
The rule definitely excels for e.g. replacing np.float_ with np.float64 that can occur in hundred of places in the codebase (like it did in SciPy).
One side note, np.recfromcsv and np.recfromtxt aren't exported by npyio (uh, the release note is outdated), instead genfromtxt should be used (as the migration guide and Ruff rule explain).

@mtsokol mtsokol self-assigned this Jun 26, 2024
@mtsokol
Copy link
Member

mtsokol commented Jun 27, 2024

np.issctype, np.maximum_sctype, np.obj2sctype, np.sctype2char, np.sctypes were all removed from the main namespace without replacement, as they where niche members.

I think ruff Rule detects all of them.

@mtsokol
Copy link
Member

mtsokol commented Jun 27, 2024

Will be partially fixed in astral-sh/ruff#12065.

charliermarsh pushed a commit to astral-sh/ruff that referenced this issue Jun 27, 2024
Hi!

This PR updates `NPY201` rule to address
#12034 and partially
numpy/numpy#26800.
jasper-tms added a commit to jasper-tms/nptyping that referenced this issue Jul 1, 2024
This commit includes the results of running
`ruff check --select NPY201 --fix` on the code in this repo,
plus the removal of `np.{dtype}0` aliases that currently aren't
caught by the NPY201 ruff rule (see
numpy/numpy#26800 where this is mentioned)
plus removing the `numpy<2.0.0` pin from `requirements.txt`.
After this commit, `import nptyping` succeeds in an environment with
`numpy==2.0.0`. No additional testing has been done at this time,
and additional changes could very well be necessary before nptyping
is fully compatible with numpy2.0.0. But this should be a start!
@brendan-m-murphy
Copy link

brendan-m-murphy commented Aug 7, 2024

PyTensor uses np.MAXDIMS. Information on how to replace this could be useful. (Although it seems that using NPY_RAVEL_AXIS should work.)

@ngoldbaum
Copy link
Member

I'm a little confused - are you using the python np.MAXDIMS symbol or the C NPY_MAXDIMS macro? For the latter, the discussion about NPY_RAVEL_AXIS is what you likely need in the migration guide. For the former, can you share what you were doing with that symbol in python so we can suggest what you should do?

@brendan-m-murphy
Copy link

brendan-m-murphy commented Aug 7, 2024

The code was using both. For np.MAXDIMS, this was usually to store an int that would be saved in a "params" dict that could be used by python objects, but could also be translated to C code. The NPY_MAXDIMS macro was used in some C code for some of the operations in Pytensor. Where NPY_MAXDIMS was used in (python functions that create) C code, I switched to NPY_RAVEL_AXIS. For places where np.MAXDIMS needed to be stored as an int in python code, I set the value to either 32 or the smallest int64, depending on the version on numpy.

The code base is pretty uneven. (Pytensor is a fork of a fork of Theano... they're starting to move away from the C code but they still need to support it. Parts of the code essentially compile algebraic operations into CPython code, then compile it, and import the resulting python module.)

This commit has the np.MAXDIMS work around: brendan-m-murphy/pytensor@9a248eb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants