-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a simple nbytes
representation in DataArrays and Dataset repr
#8702
Add a simple nbytes
representation in DataArrays and Dataset repr
#8702
Conversation
Thanks @etienneschalk . Very nice! However I don't think we need yet another global option for the repr. This takes up very little extra space so we can just show it always. |
Hello, |
c1dab61
to
859daea
Compare
Procedure used:
pytest --doctest-modules xarray/backends/api.py --accept
pytest --doctest-modules xarray/backends/common.py --accept
pytest --doctest-modules xarray/backends/file_manager.py --accept
pytest --doctest-modules xarray/backends/h5netcdf_.py --accept
pytest --doctest-modules xarray/backends/locks.py --accept
pytest --doctest-modules xarray/backends/lru_cache.py --accept
pytest --doctest-modules xarray/backends/memory.py --accept
pytest --doctest-modules xarray/backends/netcdf3.py --accept
pytest --doctest-modules xarray/backends/netCDF4_.py --accept
pytest --doctest-modules xarray/backends/plugins.py --accept
pytest --doctest-modules xarray/backends/pydap_.py --accept
pytest --doctest-modules xarray/backends/pynio_.py --accept
pytest --doctest-modules xarray/backends/scipy_.py --accept
pytest --doctest-modules xarray/backends/store.py --accept
pytest --doctest-modules xarray/backends/zarr.py --accept
pytest --doctest-modules xarray/coding/calendar_ops.py --accept
pytest --doctest-modules xarray/coding/cftimeindex.py --accept
pytest --doctest-modules xarray/coding/cftime_offsets.py --accept
pytest --doctest-modules xarray/coding/frequencies.py --accept
pytest --doctest-modules xarray/coding/strings.py --accept
pytest --doctest-modules xarray/coding/times.py --accept
pytest --doctest-modules xarray/coding/variables.py --accept
pytest --doctest-modules xarray/conventions.py --accept
pytest --doctest-modules xarray/convert.py --accept
pytest --doctest-modules xarray/core/accessor_dt.py --accept
pytest --doctest-modules xarray/core/accessor_str.py --accept
pytest --doctest-modules xarray/core/_aggregations.py --accept
pytest --doctest-modules xarray/core/alignment.py --accept
pytest --doctest-modules xarray/core/arithmetic.py --accept
pytest --doctest-modules xarray/core/combine.py --accept
pytest --doctest-modules xarray/core/common.py --accept
pytest --doctest-modules xarray/core/computation.py --accept
pytest --doctest-modules xarray/core/concat.py --accept
pytest --doctest-modules xarray/core/coordinates.py --accept
pytest --doctest-modules xarray/core/dask_array_ops.py --accept
pytest --doctest-modules xarray/core/daskmanager.py --accept
pytest --doctest-modules xarray/core/dataarray.py --accept
pytest --doctest-modules xarray/core/dataset.py --accept
pytest --doctest-modules xarray/core/dtypes.py --accept
pytest --doctest-modules xarray/core/duck_array_ops.py --accept
pytest --doctest-modules xarray/core/extensions.py --accept
pytest --doctest-modules xarray/core/formatting_html.py --accept
pytest --doctest-modules xarray/core/formatting.py --accept
pytest --doctest-modules xarray/core/groupby.py --accept
pytest --doctest-modules xarray/core/indexes.py --accept
pytest --doctest-modules xarray/core/indexing.py --accept
pytest --doctest-modules xarray/core/merge.py --accept
pytest --doctest-modules xarray/core/missing.py --accept
pytest --doctest-modules xarray/core/nanops.py --accept
pytest --doctest-modules xarray/core/npcompat.py --accept
pytest --doctest-modules xarray/core/nputils.py --accept
pytest --doctest-modules xarray/core/ops.py --accept
pytest --doctest-modules xarray/core/options.py --accept
pytest --doctest-modules xarray/core/parallelcompat.py --accept
pytest --doctest-modules xarray/core/parallel.py --accept
pytest --doctest-modules xarray/core/pdcompat.py --accept
pytest --doctest-modules xarray/core/pycompat.py --accept
pytest --doctest-modules xarray/core/resample_cftime.py --accept
pytest --doctest-modules xarray/core/resample.py --accept
pytest --doctest-modules xarray/core/rolling_exp.py --accept
pytest --doctest-modules xarray/core/rolling.py --accept
pytest --doctest-modules xarray/core/_typed_ops.py --accept
pytest --doctest-modules xarray/core/types.py --accept
pytest --doctest-modules xarray/core/utils.py --accept
pytest --doctest-modules xarray/core/variable.py --accept
pytest --doctest-modules xarray/core/weighted.py --accept
pytest --doctest-modules xarray/datatree_/conftest.py --accept
pytest --doctest-modules xarray/datatree_/datatree/common.py --accept
pytest --doctest-modules xarray/datatree_/datatree/datatree.py --accept
pytest --doctest-modules xarray/datatree_/datatree/extensions.py --accept
pytest --doctest-modules xarray/datatree_/datatree/formatting_html.py --accept
pytest --doctest-modules xarray/datatree_/datatree/formatting.py --accept
pytest --doctest-modules xarray/datatree_/datatree/io.py --accept
pytest --doctest-modules xarray/datatree_/datatree/iterators.py --accept
pytest --doctest-modules xarray/datatree_/datatree/mapping.py --accept
pytest --doctest-modules xarray/datatree_/datatree/ops.py --accept
pytest --doctest-modules xarray/datatree_/datatree/render.py --accept
pytest --doctest-modules xarray/datatree_/datatree/testing.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/conftest.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_dataset_api.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_datatree.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_extensions.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_formatting_html.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_formatting.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_io.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_mapping.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_treenode.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_version.py --accept
pytest --doctest-modules xarray/datatree_/datatree/treenode.py --accept
pytest --doctest-modules xarray/datatree_/docs/source/conf.py --accept
pytest --doctest-modules xarray/namedarray/_aggregations.py --accept
pytest --doctest-modules xarray/namedarray/_array_api.py --accept
pytest --doctest-modules xarray/namedarray/core.py --accept
pytest --doctest-modules xarray/namedarray/dtypes.py --accept
pytest --doctest-modules xarray/namedarray/_typing.py --accept
pytest --doctest-modules xarray/namedarray/utils.py --accept
pytest --doctest-modules xarray/plot/accessor.py --accept
pytest --doctest-modules xarray/plot/dataarray_plot.py --accept
pytest --doctest-modules xarray/plot/dataset_plot.py --accept
pytest --doctest-modules xarray/plot/facetgrid.py --accept
pytest --doctest-modules xarray/plot/utils.py --accept
pytest --doctest-modules xarray/testing/assertions.py --accept
pytest --doctest-modules xarray/testing/strategies.py --accept
pytest --doctest-modules xarray/tutorial.py --accept
pytest --doctest-modules xarray/util/deprecation_helpers.py --accept
pytest --doctest-modules xarray/util/generate_aggregations.py --accept
pytest --doctest-modules xarray/util/generate_ops.py --accept
pytest --doctest-modules xarray/util/print_versions.py --accept Finally the backslash-removing commit is reverted |
Hi @etienneschalk — this looks really good — thank you! The only issue I see is that the Windows tests seem to have different values for the size in the
Is anyone familiar with what's going on here? I guess we could exclude those in windows if there isn't an easily reconcilable approach... |
different default dtypes (e.g. As for the formatting, is there a reason why the size is on the same line? Now that it is prefixed with |
In general I'm a fan of more concise reprs, but no strong view. I agree the current version doesn't have perfect aesthetics.
I figured that having it at the dataset level meant that it referred to the whole dataset. But again, no strong view! |
Hello, About windows failing tests
Regarding the failing tests on windows, I had a similar issue recently, and a workaround is to use a "non-default" dtype that triggers the rendering of the dtype on the numpy array representation. import numpy
np.set_printoptions(dtype="always") About conciseness of the
|
I think that's reasonable for the moment. Tbc, it looks like it's not just the rendering — the actual dtype looks to be different, such that the size is correctly reported differently. So we need to make the dtypes be the same on windows. Does that make sense? The tests are failing and should pass if this is done correctly. |
About Windows
It seems Ubuntu + macOS defaults to 64 bits while Windows defaults on 32 bits. This default depends on the OS, I don't know if it's a "max" (eg if we could not use 64 bits on the Windows machines used by the CI because they have 32-bit OSes). I think it's worth trying to set the dtype to 64 bits at first, and see if it still fails. Indeed I looked for Another option is to use the
This is definitely the least effort, but excluding 9 tests for this PR seems overkill This is the function used for numpy array representations. We can see the logic where it adds the suffix, there is no way to force print the dtype, or force not printing it. What would solve this repeatable output issue would be to allow overriding this param:
About the basicness of the current size printing algorithmAlso, the current algorithm to print a human readable size is basic, and never shows decimal numbers. Maybe a better usage of the space could be made, eg:
Indeed, the current size is not reliable and is just an estimation that should not replace the integer value returned by |
I don't really understand why the tests fail. It seems that on windows, 3 int64 values take up 48 bytes of space?? https://github.com/pydata/xarray/actions/runs/7792432615/job/21250418347?pr=8702#step:9:419 self = <xarray.tests.test_dataarray.TestDataArray object at 0x00000234F4DB19D0>
def test_repr(self) -> None:
v = Variable(["time", "x"], [[1, 2, 3], [4, 5, 6]], {"foo": "bar"})
coords = {"x": np.arange(3, dtype=np.int64), "other": np.int64(0)}
data_array = DataArray(v, coords, name="my_variable")
expected = dedent(
"""\
<xarray.DataArray 'my_variable' (time: 2, x: 3)> Size: 48B
array([[1, 2, 3],
[4, 5, 6]])
Coordinates:
* x (x) int64 24B 0 1 2
other int64 8B 0
Dimensions without coordinates: time
Attributes:
foo: bar"""
)
> assert expected == repr(data_array)
E AssertionError: assert '<xarray.Data...foo: bar' == '<xarray.Data...foo: bar'
E
E Skipping 45 identical leading characters in diff, use -v to show
E Skipping 166 identical trailing characters in diff, use -v to show
E - 3)> Size: 24B
E ? -
E + 3)> Size: 48B
E ? +
E array([ Unless anyone has a better idea, I think skipping on Windows is OK.
I would keep it simple, and not conditionally change precision, at least for the moment. |
For big tests, I used a It adds many Crazy how a simple "add size to repr" issue turned out to "platform-dependant shenanigans" ! Links I consulted:
OK! |
|
Ah, very good point. In the case above it's that this array:
is being cast to the default size. So we can instead cast it with |
Tests pass with a differenciation between windows and non-windows env. I tagged Hopefully this is acceptable! The current state of the PR don't integrate recent discussions about total size. The solution implemented is the one described from #8690 (comment) . Maybe further discussions should happen on the original issue rather than this PR |
I would be +1 on merging. I think the windows issues could be better handled by setting the dtype (i.e. #8702 (comment), building on @keewis 's observation), but we can also do that in another PR, and this has a large enough blast radius that it would be better to merge sooner. Would someone else agree? (Or feel free to just hit the button...) |
By setting the dtype, I think that only a part of the issue would be solved as numpy would print out So the repr would still be different and the need to differentiate between Windows and non-Windows environments still remains 🤔 This comment on the numpy repo is interesting: numpy/numpy#9464 (comment) |
What's an xarray object that is explicitly typed that would show different reprs in linux/mac vs windows? |
@pydata/xarray could someone second the approval here? |
@etienneschalk great work! Thanks also for the issues into pytest-accept. I wanted to merge asap so we didn't get merge conflicts. If you're up for simplifying the formatting tests — assuming I'm not mistaken above — that would be a very nice 2nd PR... |
Thanks @max-sixty!
I detailed the issue in a previous comment #8702 (comment) If we use Also, even when providing explitly a dtype, if the dtype is a default, it won't show up in the repr. Here is an example on my machine (Linux): numpy-only In [3]: import numpy as np
In [4]: np.array([1,2,3])
Out[4]: array([1, 2, 3])
In [5]: np.array([1,2,3], dtype=np.int64)
Out[5]: array([1, 2, 3])
In [6]: np.array([1,2,3], dtype=np.int32)
Out[6]: array([1, 2, 3], dtype=int32) xarray (delegating repr to numpy) In [14]: import xarray as xr
In [15]: xr.DataArray(np.array([1,2,3]))
Out[15]:
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0
In [16]: xr.DataArray(np.array([1,2,3], dtype=np.int64))
Out[16]:
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0
In [17]: xr.DataArray(np.array([1,2,3], dtype=np.int32))
Out[17]:
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3], dtype=int32)
Dimensions without coordinates: dim_0 I expected the opposite on the Windows CI ;
I still need to confirm all I said above by more experimentation. Actually I can add such "repr testing" in a next PR yes, to try to make a catalog of all these problematic reprs |
@etienneschalk sorry, you're completely correct. I was thinking about the Dataset repr. But you're correct that the DataArray repr just inherits from numpy. So I don't have any strong views about better ways of doing this — the existing way seems OK. Another approach would be to just |
Is there a reason the title of this PR can't be updated to reflect what the description was edited/updated to say: this is not opt-in. |
nbytes
representation in DataArrays and Dataset repr
(opt-in)nbytes
representation in DataArrays and Dataset repr
* Update the formating tests PR (#8702) added nbytes representation in DataArrays and Dataset repr, this adds it to the datatree tests. * Migrate treenode module Moves treenode.py and test_treenode.py. Updates some typing. Updates imports from treenode. * Update NotFoundInTreeError description. * Reformat some comments Add test tree structure for easier understanding. * Updates whats-new.rst * mypy typing. (terrible?) There must be a better way, but I don't know it. particularly the list comprehension casts. * Adds __repr__ to NamedNode and updates test This test was broken becuase only the root node was being tested and none of the previous nodes were represented in the __str__. * Adds quotes to NamedNode __str__ representation. * swaps " for ' in NamedNode __str__ representation. * Adding Tom in so he gets blamed properly. * resolve conflict whats-new.rst Question is I did update below the released line to give Tom some credit. I hope that's is allowable. * Moves test_treenode.py to xarray/tests. Integrated tests. * refactors backend tests for datatree IO * Add explicit engine back in test_to_zarr * Removes OrderedDict from treenode * Renames tests/test_io.py -> tests/test_backends_datatree.py * typo * Add types * Pass mypy for 3.9
Edit: in contrary to what the title suggest, this is not an opt-in feature, it is enabled by default
nbytes
to repr? #8690whats-new.rst
[ ] New functions/methods are listed inapi.rst