add dataset->stacked dataarray/dataframe converters #25

OriolAbril · 2024-10-15T23:04:44Z

closes #15. I am very much not happy with these names, so please suggest alternatives.
Usable and documented but tests are still missing.

📚 Documentation preview 📚: https://arviz-base--25.org.readthedocs.build/en/25/

amaloney

LGTM. Naming is tough, but I think each method a user will understand what is happening when they call it as they are currently labelled.

amaloney · 2024-10-16T03:30:57Z

src/arviz_base/converters.py

+__all__ = [
+    "convert_to_datatree",
+    "convert_to_dataset",
+    "extract",
+    "to_labelled_stacked_da",
+    "to_labelled_stacked_df",
+]


My opinion for naming is typically to use modules (or classes) and methods therein to write a sentence (or command) that someone can read as a sentence. For instance, the difference between

azb.converter.convert_to_datatree(...)

and

azb.convert.to_datatree(...)

is (written as sentences in my head as)

ArviZ base converter convert to datatree (...).

ArviZ base convert to datatree (...).

Semantically they both mean the same thing, but the second sentence makes more sense grammatically. Things like da, ds, df and now dt are shortened versions of objects people are familiar with, so the to_labelled_stacked_df does make sense to me, and I like it.

If we are discussing an architecture for converting a dataset, I (as a user/reader of code) would think it is easier to read (for me at least) azb.convert.to_df(ds=ds, sample_dims=None, stacked=True, labeller=...)
than azb.converters.to_labelled_stacked_df(ds=ds, sample_dims=None, stacked=False, labeller=...) since the former is closer to what I would read as this sentence;

ArviZ base convert to dataframe (using this dataset, with these sample dimensions, and ensure it is stacked, using this labeller).

This is purely an opinion though, and I think the way you have written it makes perfect sense.

I only thing I would suggest is to make things consistent, so maybe choosing how you want the methods to read. For example using all abbreviations or not (eg dataset vs ds or df vs dataframe).

original abbreviated long version

convert_to_datatree to_dt to_datatree

convert_to_dataset to_ds to_dataset

to_labelled_stacked_da to_da to_dataarray

to_labelled_stacked_df to_df to_dataframe

My opinion for naming is typically to use modules (or classes) and methods therein to write a sentence (or command) that someone can read as a sentence.

I like this idea! In general arviz module names are ignored, at least for now. Maybe we could do a bit more of that for new things. You'll see they are available and documented as arviz_base.xyz so far we have not expected any user to do arviz.stats.ess or arviz.plot.plot_xyz; modules were only internal file organization.

We should probably think about the functions and which do we really care about. As of now, the difference isn't only the output type but the accepted input types. convert_to_xyz are (or attempt to) be catch all functions. Whatever you give to them, if arviz knows how to handle it you'll get an xyz at the end. extract only takes inferencedata/datatree and returns a dataset or dataarray. These new ones I added are even more restrictive, only valid input is dataset and they also have a single output.

I also thought about using to_xyz directly for these new functions but I feared that would be quite confusing because xarray datasets already have methods .to_dataframe, to_dataarray and to_stacked_dataarray. These are not that nor try to be, they are somewhat similar but extremely overloaded with ArviZ conventions and labeling; they won't work on datasets where there are no sample dimensions (aka a dimension that is present in all variables of the dataset), it ignores the fact that different variables in a dataset can have different dtypes... Maybe the fact that they are arviz.to_xyz is already enough to convey that and it won't be confusing?

Maybe the fact that they are arviz.to_xyz is already enough to convey that and it won't be confusing?

I would agree with this.

amaloney · 2024-10-16T03:32:24Z

src/arviz_base/converters.py

+        xr.set_options(display_expand_data=False)
+
+        idata = load_arviz_data("centered_eight")
+        to_labelled_stacked_da(idata.posterior.ds)


love the example

src/arviz_base/converters.py

amaloney · 2024-10-16T03:41:06Z

src/arviz_base/converters.py

+            for idx_name in da.xindexes
+            if sample_dim in da[idx_name].dims
+        }
+        columns = pd.MultiIndex.from_arrays(list(idx_dict.values()), names=list(idx_dict.keys()))


Co-authored-by: Andy Maloney <amaloney@mailbox.org>

Co-authored-by: Sandra Yojana Meneses <sandrayojana@gmail.com>

amaloney approved these changes Oct 16, 2024

View reviewed changes

add dataset->stacked dataarray/dataframe converters

345933f

OriolAbril force-pushed the to_labelled_da_df branch from 710f1a4 to 345933f Compare October 16, 2024 19:23

OriolAbril and others added 5 commits October 16, 2024 22:16

Add end of sentence dot

2ef96b6

Co-authored-by: Andy Maloney <amaloney@mailbox.org>

reorganize functions and rename

ed7a456

update docs

432a0a4

pin xarray-datatree

0fcebe5

Add some tests

28f97fc

Co-authored-by: Sandra Yojana Meneses <sandrayojana@gmail.com>

OriolAbril marked this pull request as ready for review October 30, 2024 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dataset->stacked dataarray/dataframe converters #25

add dataset->stacked dataarray/dataframe converters #25

OriolAbril commented Oct 15, 2024 •

edited by github-actions bot

Loading

amaloney left a comment

amaloney Oct 16, 2024

OriolAbril Oct 16, 2024

amaloney Oct 16, 2024

amaloney Oct 16, 2024

amaloney Oct 16, 2024

original	abbreviated	long version
convert_to_datatree	to_dt	to_datatree
convert_to_dataset	to_ds	to_dataset
to_labelled_stacked_da	to_da	to_dataarray
to_labelled_stacked_df	to_df	to_dataframe

add dataset->stacked dataarray/dataframe converters #25

Are you sure you want to change the base?

add dataset->stacked dataarray/dataframe converters #25

Conversation

OriolAbril commented Oct 15, 2024 • edited by github-actions bot Loading

amaloney left a comment

Choose a reason for hiding this comment

amaloney Oct 16, 2024

Choose a reason for hiding this comment

OriolAbril Oct 16, 2024

Choose a reason for hiding this comment

amaloney Oct 16, 2024

Choose a reason for hiding this comment

amaloney Oct 16, 2024

Choose a reason for hiding this comment

amaloney Oct 16, 2024

Choose a reason for hiding this comment

OriolAbril commented Oct 15, 2024 •

edited by github-actions bot

Loading