Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Print Option: Always show an array's dtype #25787

Open
etienneschalk opened this issue Feb 7, 2024 · 1 comment
Open

ENH: Print Option: Always show an array's dtype #25787

etienneschalk opened this issue Feb 7, 2024 · 1 comment

Comments

@etienneschalk
Copy link

etienneschalk commented Feb 7, 2024

Proposed new feature or change:

Motive

The default dtype of array is platform-dependant. ( #9464 )

When running tests in a continuous integration context, that are ran on multiple platforms (Windows, macOS, Linux), the fact that the default dtypes of arrays can vary must be taken into account.

The issue appears for tests relying on numpy's arrays representations. Indeed, the default dtype of the array is not displayed in the array representation. This means that an expected output representation is now dependant on the platform. Writing OS-specific tests is now unavoidable.

What I would like is being able to write platform independent repeatable outputs that can be used for automated testing.

Example

Actual

On my machine, the default dtype for integer arrays is int64. Here are some examples of array creations and their representations:

In [3]: import numpy as np

In [4]: np.array([1, 2, 3])
Out[4]: array([1, 2, 3])

In [5]: np.array([1, 2, 3], dtype=np.int64)
Out[5]: array([1, 2, 3])

In [6]: np.array([1, 2, 3], dtype=np.int32)
Out[6]: array([1, 2, 3], dtype=int32)

We can see that:

  • When creating an array with no dtype kwarg, the default dtype is used. The array representation solely is not enough to know the actual dtype.
  • When creating an array with a dtype kwarg matching the default integer dtype of the platform, the resulting array representation is the same, and dtype is also implicit.
  • The last case is the most explicit: the user provides the expected dtype, and the representation reflects that. This only works for non-default dtypes.
Desired
In [3]: import numpy as np

In [4]: np.array([1, 2, 3])
Out[4]: array([1, 2, 3], dtype=int64)

In [5]: np.array([1, 2, 3], dtype=np.int64)
Out[5]: array([1, 2, 3], dtype=int64)

In [6]: np.array([1, 2, 3], dtype=np.int32)
Out[6]: array([1, 2, 3], dtype=int32)

The dtype is always printed out, and the default dtype does not influence the representation. So, since the default dtype depends on the platform, and the representation depends on the dtype, the chain is broken and the representation does not depend anymore on the platform. Writing platform independant tests relying on representation is now easier.

from
platform <- default dtype <- repr => platform <- repr
to
platform <- default dtype </- repr => platform </- repr

Existing solutions I looked for

np.set_printoptions

I first looked into https://numpy.org/doc/stable/reference/generated/numpy.set_printoptions.html
I experimented with kwarg legacy='1.13' and legacy='1.21, without success. Also, even if I were successful, I would have dislike relying on a kwarg named legacy, strongly implying it should not be used anymore in new code.

Proposed solution

Adding a new dtype printing option

import numpy


np.set_printoptions(dtype="default") # current behaviour
np.set_printoptions(dtype="always") # always print dtype
np.set_printoptions(dtype="never") # never print dtype

Technical Analysis

The function _array_repr_implementation implements the array representation logic. We can see the logic where it adds the suffix, and there is no way to force print the dtype, or force not printing it.

Allow to override this param could be helpful:

def _array_repr_implementation(
        arr, max_line_width=None, precision=None, suppress_small=None,
+       skipdtype: bool | None = None,       
        array2string=array2string):
        ...
- skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0
+ if skipdtype is None:
+     skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0

Role of the proposed new skipdtype three-valued kwarg:

  • None: current behaviour, platform-dependant
  • False: always print the , dtype=... suffix
  • True: never print the , dtype=... suffix

Additional links

@seberg
Copy link
Member

seberg commented Feb 8, 2024

FWIW, I changed things so in NumPy 2.0 the default on windows is 64bit also. It still is 32bit on 32bit platforms, though, so it doesn't remove the platform issue fully. Just hopefully the worst caveat.

I don't have an opinion on always printing it. But since we hide it, having an option in the printoptions for it seems very reasonable to me. (Not sure I think there is much reason to always hide it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants