Add a simple `nbytes` representation in DataArrays and Dataset `repr` #8702

etienneschalk · 2024-02-04T16:37:41Z

Edit: in contrary to what the title suggest, this is not an opt-in feature, it is enabled by default

Closes Add nbytes to repr? #8690
- (or at least is a proposal)
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
~~[ ] New functions/methods are listed in api.rst~~

xarray/tests/test_formatting.py

dcherian · 2024-02-04T21:09:22Z

Thanks @etienneschalk . Very nice! However I don't think we need yet another global option for the repr. This takes up very little extra space so we can just show it always.

etienneschalk · 2024-02-04T22:09:40Z

Thanks @etienneschalk . Very nice! However I don't think we need yet another global option for the repr. This takes up very little extra space so we can just show it always.

Hello,
Thanks for the review!
I originally added the feature flag to not have to change existing tests, also for the feature to be an opt-in. After having enabled it, I noticed that many tests and doctests are failing due to the change in repr (looking for string <xarray. in the code yields 800+ results.
Before making these many changes, maybe the new representation should be rediscussed and consolidated and validated by others (not to have to do these changes twice)

This reverts commit 85f6ee4.

etienneschalk · 2024-02-05T22:50:50Z

Procedure used:

As mentioned in the issue Issue with docstrings containing backslashes at the end of line max-sixty/pytest-accept#146, backslashes are not supported, so they are removed
Then doctests is ran on multiple files:

pytest --doctest-modules xarray/backends/api.py --accept
pytest --doctest-modules xarray/backends/common.py --accept
pytest --doctest-modules xarray/backends/file_manager.py --accept
pytest --doctest-modules xarray/backends/h5netcdf_.py --accept
pytest --doctest-modules xarray/backends/locks.py --accept
pytest --doctest-modules xarray/backends/lru_cache.py --accept
pytest --doctest-modules xarray/backends/memory.py --accept
pytest --doctest-modules xarray/backends/netcdf3.py --accept
pytest --doctest-modules xarray/backends/netCDF4_.py --accept
pytest --doctest-modules xarray/backends/plugins.py --accept
pytest --doctest-modules xarray/backends/pydap_.py --accept
pytest --doctest-modules xarray/backends/pynio_.py --accept
pytest --doctest-modules xarray/backends/scipy_.py --accept
pytest --doctest-modules xarray/backends/store.py --accept
pytest --doctest-modules xarray/backends/zarr.py --accept
pytest --doctest-modules xarray/coding/calendar_ops.py --accept
pytest --doctest-modules xarray/coding/cftimeindex.py --accept
pytest --doctest-modules xarray/coding/cftime_offsets.py --accept
pytest --doctest-modules xarray/coding/frequencies.py --accept
pytest --doctest-modules xarray/coding/strings.py --accept
pytest --doctest-modules xarray/coding/times.py --accept
pytest --doctest-modules xarray/coding/variables.py --accept
pytest --doctest-modules xarray/conventions.py --accept
pytest --doctest-modules xarray/convert.py --accept
pytest --doctest-modules xarray/core/accessor_dt.py --accept
pytest --doctest-modules xarray/core/accessor_str.py --accept
pytest --doctest-modules xarray/core/_aggregations.py --accept
pytest --doctest-modules xarray/core/alignment.py --accept
pytest --doctest-modules xarray/core/arithmetic.py --accept
pytest --doctest-modules xarray/core/combine.py --accept
pytest --doctest-modules xarray/core/common.py --accept
pytest --doctest-modules xarray/core/computation.py --accept
pytest --doctest-modules xarray/core/concat.py --accept
pytest --doctest-modules xarray/core/coordinates.py --accept
pytest --doctest-modules xarray/core/dask_array_ops.py --accept
pytest --doctest-modules xarray/core/daskmanager.py --accept
pytest --doctest-modules xarray/core/dataarray.py --accept
pytest --doctest-modules xarray/core/dataset.py --accept
pytest --doctest-modules xarray/core/dtypes.py --accept
pytest --doctest-modules xarray/core/duck_array_ops.py --accept
pytest --doctest-modules xarray/core/extensions.py --accept
pytest --doctest-modules xarray/core/formatting_html.py --accept
pytest --doctest-modules xarray/core/formatting.py --accept
pytest --doctest-modules xarray/core/groupby.py --accept
pytest --doctest-modules xarray/core/indexes.py --accept
pytest --doctest-modules xarray/core/indexing.py --accept
pytest --doctest-modules xarray/core/merge.py --accept
pytest --doctest-modules xarray/core/missing.py --accept
pytest --doctest-modules xarray/core/nanops.py --accept
pytest --doctest-modules xarray/core/npcompat.py --accept
pytest --doctest-modules xarray/core/nputils.py --accept
pytest --doctest-modules xarray/core/ops.py --accept
pytest --doctest-modules xarray/core/options.py --accept
pytest --doctest-modules xarray/core/parallelcompat.py --accept
pytest --doctest-modules xarray/core/parallel.py --accept
pytest --doctest-modules xarray/core/pdcompat.py --accept
pytest --doctest-modules xarray/core/pycompat.py --accept
pytest --doctest-modules xarray/core/resample_cftime.py --accept
pytest --doctest-modules xarray/core/resample.py --accept
pytest --doctest-modules xarray/core/rolling_exp.py --accept
pytest --doctest-modules xarray/core/rolling.py --accept
pytest --doctest-modules xarray/core/_typed_ops.py --accept
pytest --doctest-modules xarray/core/types.py --accept
pytest --doctest-modules xarray/core/utils.py --accept
pytest --doctest-modules xarray/core/variable.py --accept
pytest --doctest-modules xarray/core/weighted.py --accept
pytest --doctest-modules xarray/datatree_/conftest.py --accept
pytest --doctest-modules xarray/datatree_/datatree/common.py --accept
pytest --doctest-modules xarray/datatree_/datatree/datatree.py --accept
pytest --doctest-modules xarray/datatree_/datatree/extensions.py --accept
pytest --doctest-modules xarray/datatree_/datatree/formatting_html.py --accept
pytest --doctest-modules xarray/datatree_/datatree/formatting.py --accept
pytest --doctest-modules xarray/datatree_/datatree/io.py --accept
pytest --doctest-modules xarray/datatree_/datatree/iterators.py --accept
pytest --doctest-modules xarray/datatree_/datatree/mapping.py --accept
pytest --doctest-modules xarray/datatree_/datatree/ops.py --accept
pytest --doctest-modules xarray/datatree_/datatree/render.py --accept
pytest --doctest-modules xarray/datatree_/datatree/testing.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/conftest.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_dataset_api.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_datatree.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_extensions.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_formatting_html.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_formatting.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_io.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_mapping.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_treenode.py --accept
pytest --doctest-modules xarray/datatree_/datatree/tests/test_version.py --accept
pytest --doctest-modules xarray/datatree_/datatree/treenode.py --accept
pytest --doctest-modules xarray/datatree_/docs/source/conf.py --accept
pytest --doctest-modules xarray/namedarray/_aggregations.py --accept
pytest --doctest-modules xarray/namedarray/_array_api.py --accept
pytest --doctest-modules xarray/namedarray/core.py --accept
pytest --doctest-modules xarray/namedarray/dtypes.py --accept
pytest --doctest-modules xarray/namedarray/_typing.py --accept
pytest --doctest-modules xarray/namedarray/utils.py --accept
pytest --doctest-modules xarray/plot/accessor.py --accept
pytest --doctest-modules xarray/plot/dataarray_plot.py --accept
pytest --doctest-modules xarray/plot/dataset_plot.py --accept
pytest --doctest-modules xarray/plot/facetgrid.py --accept
pytest --doctest-modules xarray/plot/utils.py --accept
pytest --doctest-modules xarray/testing/assertions.py --accept
pytest --doctest-modules xarray/testing/strategies.py --accept
pytest --doctest-modules xarray/tutorial.py --accept
pytest --doctest-modules xarray/util/deprecation_helpers.py --accept
pytest --doctest-modules xarray/util/generate_aggregations.py --accept
pytest --doctest-modules xarray/util/generate_ops.py --accept
pytest --doctest-modules xarray/util/print_versions.py --accept

Finally the backslash-removing commit is reverted

max-sixty · 2024-02-06T16:36:29Z

Hi @etienneschalk — this looks really good — thank you!

The only issue I see is that the Windows tests seem to have different values for the size in the repr tests, for example:

  - <xarray.Dataset> Size: 1kB
  ?                        ^^
  + <xarray.Dataset> Size: 640B
  ?                        ^^^

Is anyone familiar with what's going on here?

I guess we could exclude those in windows if there isn't an easily reconcilable approach...

keewis · 2024-02-06T16:47:39Z

different default dtypes (e.g. float32 instead of float64), I'd assume. The easiest fix would be to hard-code the dtypes, although that makes the code example a bit more verbose.

As for the formatting, is there a reason why the size is on the same line? Now that it is prefixed with Size:, we could just as well move it on a new line. Additionally, since this is the combined size (the sum of coordinates and data variables for datasets), maybe we should call it Total size or something similar? (to be clear, I'm hoping for a discussion, not requesting an immediate change)

max-sixty · 2024-02-06T17:06:51Z

As for the formatting, is there a reason why the size is on the same line? Now that it is prefixed with Size:, we could just as well move it on a new line.

In general I'm a fan of more concise reprs, but no strong view. I agree the current version doesn't have perfect aesthetics.

Additionally, since this is the combined size (the sum of coordinates and data variables for datasets), maybe we should call it Total size or something similar? (to be clear, I'm hoping for a discussion, not requesting an immediate change)

I figured that having it at the dataset level meant that it referred to the whole dataset. But again, no strong view!

etienneschalk · 2024-02-06T18:54:24Z

Hello,

About windows failing tests

The only issue I see is that the Windows tests seem to have different values for the size in the repr tests, for example:

Regarding the failing tests on windows, I had a similar issue recently, and a workaround is to use a "non-default" dtype that triggers the rendering of the dtype on the numpy array representation.
However, to avoid these kind of tricky workarounds, the best would be to force numpy to show the dtype in the array's reprs, at least in a testing context to produce repeatable outputs. I made a quick search but could not find a way. Here is what I would like to do with numpy: forcing the printing of the dtype in the representation, something like:

import numpy


np.set_printoptions(dtype="always")

About conciseness of the `repr`

In general I'm a fan of more concise reprs, but no strong view. I agree the current version doesn't have perfect aesthetics.

Regarding aesthetics: I agree that having a new line would look way more clean, and decluttering the header on the way

<xarray.Dataset>
Total size: 640B
...

However it would be at the cost of a newline, the question is if this newline acceptable, as it is an "uncompressible cost" (the rest of the repr changes until now only added info on existing lines). Personnally, I would prefer the newline too, just to keep the aesthetics.

Here is what it would looks like:

(A)

<xarray.Dataset>
Dimensions:  (x: 1)
Total size: 16B
Coordinates:
	y        (x) int64 8B dask.array<chunksize=(1,), meta=np.ndarray>
Dimensions without coordinates: x
Data variables:
	a        (x) int64 8B dask.array<chunksize=(1,), meta=np.ndarray>

(total size after dimensions, consistent with the inline repr)

or

(B)

<xarray.Dataset>
Total size: 16B
Dimensions:  (x: 1)
Coordinates:
	y        (x) int64 8B dask.array<chunksize=(1,), meta=np.ndarray>
Dimensions without coordinates: x
Data variables:
	a        (x) int64 8B dask.array<chunksize=(1,), meta=np.ndarray>

(total size before dimensions, not consistent with the inline repr, but keep dims and coords grouped)

About the definition of the size and total size of a DataArray

⚠️ This may be out of scope of the required change, as in both Dataset and DataArray cases, the nbytes attribute is used. But this may seem surprising from a user perspective. (maybe for a future issue)

Here is an example of the output of a DataArray then Dataset repr with the proposed change:

In [10]: xda = xr.DataArray([[1,2,3],[4,5,6]], coords = {"y": [40, 60], "x": [700, 800, 900]})

In [11]: xda
Out[11]: 
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[1, 2, 3],
       [4, 5, 6]])
Coordinates:
  * y        (y) int64 16B 40 60
  * x        (x) int64 24B 700 800 900

In [12]: xr.Dataset({"var": xda})
Out[12]: 
<xarray.Dataset> Size: 88B
Dimensions:  (y: 2, x: 3)
Coordinates:
  * y        (y) int64 16B 40 60
  * x        (x) int64 24B 700 800 900
Data variables:
    var      (y, x) int64 48B 1 2 3 4 5 6

48//6 == 8, this corresponds to the size of the data for the DataArray. However, the coordinates y and x also brings extra weight, that is not well represented here. For the Dataset however, containing the same DataArray, the nbytes also takes into account the coordinates: 16 + 24 + 48 == 88

The doc indicates:

xarray.DataArray.nbytes
Total bytes consumed by the elements of this DataArray’s data.

xarray.Dataset.nbytes
Total bytes consumed by the data arrays of all variables in this dataset.

The distinction is not perfectly clear to me, indeed, a DataArray can also group other DataArrays (the coordinates)

In [14]: xda.x
Out[14]: 
<xarray.DataArray 'x' (x: 3)> Size: 24B
array([700, 800, 900])
Coordinates:
  * x        (x) int64 24B 700 800 900

max-sixty · 2024-02-06T18:57:54Z

Regarding the failing tests on windows, I had a similar issue recently, and a workaround is to use a "non-default" dtype that triggers the rendering of the dtype on the numpy array representation.

I think that's reasonable for the moment.

Tbc, it looks like it's not just the rendering — the actual dtype looks to be different, such that the size is correctly reported differently. So we need to make the dtypes be the same on windows. Does that make sense?

The tests are failing and should pass if this is done correctly.

etienneschalk · 2024-02-06T19:28:51Z

About Windows

Tbc, it looks like it's not just the rendering — the actual dtype looks to be different, such that the size is correctly reported differently. So we need to make the dtypes be the same on windows. Does that make sense?

It seems Ubuntu + macOS defaults to 64 bits while Windows defaults on 32 bits. This default depends on the OS, I don't know if it's a "max" (eg if we could not use 64 bits on the Windows machines used by the CI because they have 32-bit OSes). I think it's worth trying to set the dtype to 64 bits at first, and see if it still fails.

Indeed I looked for dtype=np.int64 in the tests, and the occurences are tests that don't have any "skipif windows" decorator. However, the issue would move from the actual size to the , dtype=int64 suffix in the numpy array repr.

Another option is to use the ON_WINDOWS to make the 9 failing tests dependant on the OS. It is not the cleanest for sure

I guess we could exclude those in windows if there isn't an easily reconcilable approach...

This is definitely the least effort, but excluding 9 tests for this PR seems overkill

This is the function used for numpy array representations. We can see the logic where it adds the suffix, there is no way to force print the dtype, or force not printing it.

https://github.com/numpy/numpy/blob/d35cd07ea997f033b2d89d349734c61f5de54b0d/numpy/core/arrayprint.py#L1487

What would solve this repeatable output issue would be to allow overriding this param:

def _array_repr_implementation(
        arr, max_line_width=None, precision=None, suppress_small=None,
+       skipdtype: bool | None = None,       
        array2string=array2string):
        ...
- skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0
+ if skipdtype is None:
+     skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0

About the basicness of the current size printing algorithm

Also, the current algorithm to print a human readable size is basic, and never shows decimal numbers. Maybe a better usage of the space could be made, eg:

Max used space: 5 letters
999kB 

What could be improved is:

9kB 
vvvvv
9.9kB  (use 5 letter space)

Indeed, the current size is not reliable and is just an estimation that should not replace the integer value returned by nbytes 🤔

max-sixty · 2024-02-06T19:57:43Z

I don't really understand why the tests fail. It seems that on windows, 3 int64 values take up 48 bytes of space??

https://github.com/pydata/xarray/actions/runs/7792432615/job/21250418347?pr=8702#step:9:419

 self = <xarray.tests.test_dataarray.TestDataArray object at 0x00000234F4DB19D0>

    def test_repr(self) -> None:
        v = Variable(["time", "x"], [[1, 2, 3], [4, 5, 6]], {"foo": "bar"})
        coords = {"x": np.arange(3, dtype=np.int64), "other": np.int64(0)}
        data_array = DataArray(v, coords, name="my_variable")
        expected = dedent(
            """\
            <xarray.DataArray 'my_variable' (time: 2, x: 3)> Size: 48B
            array([[1, 2, 3],
                   [4, 5, 6]])
            Coordinates:
              * x        (x) int64 24B 0 1 2
                other    int64 8B 0
            Dimensions without coordinates: time
            Attributes:
                foo:      bar"""
        )
>       assert expected == repr(data_array)
E       AssertionError: assert '<xarray.Data...foo:      bar' == '<xarray.Data...foo:      bar'
E         
E         Skipping 45 identical leading characters in diff, use -v to show
E         Skipping 166 identical trailing characters in diff, use -v to show
E         - 3)> Size: 24B
E         ?           -
E         + 3)> Size: 48B
E         ?            +
E           array([

Unless anyone has a better idea, I think skipping on Windows is OK.

Also, the current algorithm to print a human readable size is basic, and never shows decimal numbers. Maybe a better usage of the space could be made, eg:

I would keep it simple, and not conditionally change precision, at least for the moment.

etienneschalk · 2024-02-06T20:15:13Z

Unless anyone has a better idea, I think skipping on Windows is OK.

https://github.com/pydata/xarray/pull/8702/files/e98a97d3085e7dc3b1bcb11ac8af012fc1acc1c4..e2db82a6d2a322e6ed18ebb0cb2bd696458b540c

For big tests, I used a skipif approach to test both Windows and non-Windows. For smaller, I added the condition in-test.

It adds many ON_WINDOWS constants, but is unavoidable, as the representation including size is OS-dependant in the current CI. (It seems win32 is not reliable to determine if we are on 32 bit OS as all windows would return win32.

Crazy how a simple "add size to repr" issue turned out to "platform-dependant shenanigans" !

Links I consulted:

I would keep it simple, and not conditionally change precision, at least for the moment.

OK!

keewis · 2024-02-06T20:32:47Z

I don't really understand why the tests fail. It seems that on windows, 3 int64 values take up 48 bytes of space??

int64 has 8 bytes per element, so I agree, 3 values should be 24 bytes. However, if you look at the data it's actually a 2×3 array, and with 6 values 48 bytes makes sense. And looking at the traceback, the issue is that on windows the data only has a size of 24 bytes, which means that it is using int32 as a dtype, with 4 bytes per element. Which tells me that once again the default size is the issue.

max-sixty · 2024-02-06T20:50:22Z

int64 has 8 bytes per element, so I agree, 3 values should be 24 bytes. However, if you look at the data it's actually a 2×3 array, and with 6 values 48 bytes makes sense. And looking at the traceback, the issue is that on windows the data only has a size of 24 bytes, which means that it is using int32 as a dtype, with 4 bytes per element. Which tells me that once again the default size is the issue.

Ah, very good point. In the case above it's that this array:

        v = Variable(["time", "x"], [[1, 2, 3], [4, 5, 6]], {"foo": "bar"})

is being cast to the default size. So we can instead cast it with .astype(int64) or similar (or create from np.arange, whatever is easier.

etienneschalk · 2024-02-06T21:27:49Z

Tests pass with a differenciation between windows and non-windows env.

I tagged test_dask_roundtrip as flaky (xfail) as it frequently failed in my previous CI runs.

Hopefully this is acceptable!

The current state of the PR don't integrate recent discussions about total size. The solution implemented is the one described from #8690 (comment) . Maybe further discussions should happen on the original issue rather than this PR

max-sixty · 2024-02-06T22:41:32Z

I would be +1 on merging.

I think the windows issues could be better handled by setting the dtype (i.e. #8702 (comment), building on @keewis 's observation), but we can also do that in another PR, and this has a large enough blast radius that it would be better to merge sooner.

Would someone else agree? (Or feel free to just hit the button...)

etienneschalk · 2024-02-07T20:08:22Z

@max-sixty

By setting the dtype, I think that only a part of the issue would be solved as numpy would print out , dtype=np.int64, as numpy always prints out non-default dtypes. This behaviour does not seem to be able to being turned off unfortunately.

So the repr would still be different and the need to differentiate between Windows and non-Windows environments still remains 🤔

This comment on the numpy repo is interesting: numpy/numpy#9464 (comment)

max-sixty · 2024-02-07T20:36:37Z

By setting the dtype, I think that only a part of the issue would be solved as numpy would print out , dtype=np.int64, as numpy always prints out non-default dtypes. This behaviour does not seem to be able to being turned off unfortunately.

What's an xarray object that is explicitly typed that would show different reprs in linux/mac vs windows?

max-sixty · 2024-02-07T20:42:17Z

@pydata/xarray could someone second the approval here?

welcome · 2024-02-07T20:47:39Z

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again!

max-sixty · 2024-02-07T20:49:39Z

@etienneschalk great work!

Thanks also for the issues into pytest-accept.

I wanted to merge asap so we didn't get merge conflicts. If you're up for simplifying the formatting tests — assuming I'm not mistaken above — that would be a very nice 2nd PR...

etienneschalk · 2024-02-07T21:07:33Z

Thanks @max-sixty!

By setting the dtype, I think that only a part of the issue would be solved as numpy would print out , dtype=np.int64, as numpy always prints out non-default dtypes. This behaviour does not seem to be able to being turned off unfortunately.

What's an xarray object that is explicitly typed that would show different reprs in linux/mac vs windows?

I detailed the issue in a previous comment #8702 (comment)

If we use dtype=np.int32, macOS and Linux will add a , dtype=np.int32 in the array repr and if we use dtype=np.int64, then Windows will add a , dtype=np.int64 in the array repr.

Also, even when providing explitly a dtype, if the dtype is a default, it won't show up in the repr. Here is an example on my machine (Linux):

numpy-only

In [3]: import numpy as np

In [4]: np.array([1,2,3])
Out[4]: array([1, 2, 3])

In [5]: np.array([1,2,3], dtype=np.int64)
Out[5]: array([1, 2, 3])

In [6]: np.array([1,2,3], dtype=np.int32)
Out[6]: array([1, 2, 3], dtype=int32)

xarray (delegating repr to numpy)

In [14]: import xarray as xr

In [15]: xr.DataArray(np.array([1,2,3]))
Out[15]: 
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0

In [16]: xr.DataArray(np.array([1,2,3], dtype=np.int64))
Out[16]: 
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0

In [17]: xr.DataArray(np.array([1,2,3], dtype=np.int32))
Out[17]: 
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3], dtype=int32)
Dimensions without coordinates: dim_0

I expected the opposite on the Windows CI ; , dtype=int64 to be shown. So the only way to always get an explicit dtype repr would be to:

either, force its display (which is not possible, see https://github.com/numpy/numpy/blob/d35cd07ea997f033b2d89d349734c61f5de54b0d/numpy/core/arrayprint.py#L1487 ; the skipdtype is not overridable with a kwarg, and experiment with changing the format printoptions to legacy was not concluant)
or, find a dtype that is a non-default on all of the 3 CI platforms (eg int16)

I wanted to merge asap so we didn't get merge conflicts. If you're up for simplifying the formatting tests — assuming I'm not mistaken above — that would be a very nice 2nd PR...

I still need to confirm all I said above by more experimentation. Actually I can add such "repr testing" in a next PR yes, to try to make a catalog of all these problematic reprs

max-sixty · 2024-02-07T22:16:13Z

@etienneschalk sorry, you're completely correct. I was thinking about the Dataset repr. But you're correct that the DataArray repr just inherits from numpy.

So I don't have any strong views about better ways of doing this — the existing way seems OK. Another approach would be to just re.sub Size: \d+B in all the reprs...

djhoese · 2024-02-19T19:55:36Z

Is there a reason the title of this PR can't be updated to reflect what the description was edited/updated to say: this is not opt-in.

* Update the formating tests PR (#8702) added nbytes representation in DataArrays and Dataset repr, this adds it to the datatree tests. * Migrate treenode module Moves treenode.py and test_treenode.py. Updates some typing. Updates imports from treenode. * Update NotFoundInTreeError description. * Reformat some comments Add test tree structure for easier understanding. * Updates whats-new.rst * mypy typing. (terrible?) There must be a better way, but I don't know it. particularly the list comprehension casts. * Adds __repr__ to NamedNode and updates test This test was broken becuase only the root node was being tested and none of the previous nodes were represented in the __str__. * Adds quotes to NamedNode __str__ representation. * swaps " for ' in NamedNode __str__ representation. * Adding Tom in so he gets blamed properly. * resolve conflict whats-new.rst Question is I did update below the released line to give Tom some credit. I hope that's is allowable. * Moves test_treenode.py to xarray/tests. Integrated tests. * refactors backend tests for datatree IO * Add explicit engine back in test_to_zarr * Removes OrderedDict from treenode * Renames tests/test_io.py -> tests/test_backends_datatree.py * typo * Add types * Pass mypy for 3.9

etienneschalk added 3 commits February 4, 2024 17:30

Add nbytes to repr

79b54af

Disable by default

9279563

What's new

8611840

etienneschalk marked this pull request as ready for review February 4, 2024 16:39

etienneschalk added 2 commits February 4, 2024 20:22

Specify dtype explictly as Windows seems to default on int32

560a35d

Use int16

aafacd4

dcherian reviewed Feb 4, 2024

View reviewed changes

xarray/tests/test_formatting.py Outdated Show resolved Hide resolved

PR comments: Removed global option + size format in repr header

12bd33d

etienneschalk added 4 commits February 5, 2024 21:19

Remove width justified

be042cf

Remove problematic backslashes

85f6ee4

pytest-accept

655f0af

Revert "Remove problematic backslashes"

859daea

This reverts commit 85f6ee4.

etienneschalk force-pushed the feature/eschalk/issue-8690-nbytes-repr branch from c1dab61 to 859daea Compare February 5, 2024 22:48

etienneschalk added 4 commits February 5, 2024 23:53

Fix missing backslash escape

fe1bced

Fix ellipsis in excess leading to excess last line

7c7214d

Update reprs in tests

2049c9e

Fix tests

e98a97d

Try windows specific expected outputs

e724825

Conditional Windows testing

e2db82a

etienneschalk and others added 2 commits February 6, 2024 21:17

Fix indent

0f93f9f

Merge branch 'main' into feature/eschalk/issue-8690-nbytes-repr

7d200e2

Flaky test + fix windows

9ae49dc

max-sixty approved these changes Feb 6, 2024

View reviewed changes

Merge branch 'main' into feature/eschalk/issue-8690-nbytes-repr

d4a816d

dcherian approved these changes Feb 7, 2024

View reviewed changes

max-sixty merged commit db680b0 into pydata:main Feb 7, 2024
29 checks passed

etienneschalk deleted the feature/eschalk/issue-8690-nbytes-repr branch February 7, 2024 20:51

etienneschalk mentioned this pull request Feb 7, 2024

Test formatting platform #8719

Merged

1 task

etienneschalk mentioned this pull request Feb 7, 2024

ENH: Print Option: Always show an array's dtype numpy/numpy#25787

Open

etienneschalk mentioned this pull request Feb 17, 2024

Setting node name keeps tree linkage xarray-contrib/datatree#310

Closed

5 tasks

dcherian changed the title ~~Add a simple nbytes representation in DataArrays and Dataset repr (opt-in)~~ Add a simple nbytes representation in DataArrays and Dataset repr Feb 19, 2024

aaronspring mentioned this pull request Mar 17, 2024

Bump softprops/action-gh-release from 1 to 2 pangeo-data/climpred#852

Merged

TimothyCera-NOAA mentioned this pull request Jun 27, 2024

nbytes not available for lazy loaded array and so can't print(ds) #9185

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a simple `nbytes` representation in DataArrays and Dataset `repr` #8702

Add a simple `nbytes` representation in DataArrays and Dataset `repr` #8702

etienneschalk commented Feb 4, 2024 •

edited

Loading

dcherian commented Feb 4, 2024 •

edited

Loading

etienneschalk commented Feb 4, 2024

etienneschalk commented Feb 5, 2024 •

edited by mathause

Loading

max-sixty commented Feb 6, 2024

keewis commented Feb 6, 2024

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 6, 2024

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 6, 2024

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 6, 2024

keewis commented Feb 6, 2024

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 6, 2024

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 7, 2024

max-sixty commented Feb 7, 2024

max-sixty commented Feb 7, 2024

welcome bot commented Feb 7, 2024

max-sixty commented Feb 7, 2024

etienneschalk commented Feb 7, 2024

max-sixty commented Feb 7, 2024

djhoese commented Feb 19, 2024

Add a simple nbytes representation in DataArrays and Dataset repr #8702

Add a simple nbytes representation in DataArrays and Dataset repr #8702

Conversation

etienneschalk commented Feb 4, 2024 • edited Loading

dcherian commented Feb 4, 2024 • edited Loading

etienneschalk commented Feb 4, 2024

etienneschalk commented Feb 5, 2024 • edited by mathause Loading

max-sixty commented Feb 6, 2024

keewis commented Feb 6, 2024

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 6, 2024

About windows failing tests

About conciseness of the repr

About the definition of the size and total size of a DataArray

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 6, 2024

About Windows

About the basicness of the current size printing algorithm

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 6, 2024

keewis commented Feb 6, 2024

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 6, 2024

max-sixty commented Feb 6, 2024

etienneschalk commented Feb 7, 2024

max-sixty commented Feb 7, 2024

max-sixty commented Feb 7, 2024

welcome bot commented Feb 7, 2024

max-sixty commented Feb 7, 2024

etienneschalk commented Feb 7, 2024

max-sixty commented Feb 7, 2024

djhoese commented Feb 19, 2024

Add a simple `nbytes` representation in DataArrays and Dataset `repr` #8702

Add a simple `nbytes` representation in DataArrays and Dataset `repr` #8702

etienneschalk commented Feb 4, 2024 •

edited

Loading

dcherian commented Feb 4, 2024 •

edited

Loading

etienneschalk commented Feb 5, 2024 •

edited by mathause

Loading

About conciseness of the `repr`