WIP: tutorial on merging datasets #3131

rabernat · 2019-07-15T01:28:25Z

Closes Adding Example/Tutorial of importing data to Xarray (Merge/conact/etc) #1391
Fully documented, including whats-new.rst for all changes and api.rst for new API

This is a start on a tutorial about merging / combining datasets.

codecov · 2019-07-15T01:32:30Z

Codecov Report

Merging #3131 into scipy19-docs will decrease coverage by 0.16%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           scipy19-docs    #3131      +/-   ##
================================================
- Coverage         96.18%   96.02%   -0.17%     
================================================
  Files                66       63       -3     
  Lines             13858    12799    -1059     
================================================
- Hits              13330    12290    -1040     
+ Misses              528      509      -19

TomNicholas

This is awesome, and sorely-needed, thanks for doing this!

I have a couple of comments but I'm not sure if it's a good idea to comment directly on the source code of a ipython notebook so I'll just write them here.

I assume the plan for the next bit is to go on to use combine_nested to accomplish the same thing as combine_by_coords? Then to start worrying about dirty data?

The structure is really nice, but you could also explicitly separate your data creation section from the data loading sections with a subtitle, because then it's almost like "meat of tutorial starts here".

I also really like the graphs to show what happens to your data if you concatenate in the wrong order.

I don't know if you've read the recent discussion on the mailing list but there was a nice example of a real-world problem yesterday, where the user had a set of datasets which each had a different length along one dimension, and wanted to pad them with NaNs. Something similar might be good to include here? Maybe instead of padding we do trimming using the preprocess argument?

Another thing - here you say the future default behaviour of open_mfdataset will be to use combine='by_coords', but I was under the impression it was going to be combine='nested'. I don't think this ambiguity is a problem, because in the error messages we haven't stated what the default will be, and we've just told people to be explicit in order to be future-compatible, which is fine. But we should be consistent about what the future default will be (@shoyer?).

shoyer · 2019-07-17T15:05:47Z

I think "by_coords" is probably the most user friendly default for open_mfdataset? But I'm not entirely sure...

TomNicholas · 2019-07-17T16:22:26Z

Nor am I. I originally thought it should be 'nested' because the concat-in-order behaviour is more similar to the original auto_combine, but I don't know. It seems to me like it depends on the quality of user's data: if they have high-quality datasets which already have sensible coordinates for each dimension then' by_coords' is best, but if their data is more primitive without coordinates (like mine happens to be) then 'nested' is the most natural default. So do we have a sense of what the largest number of users would find most natural? Or is this not a good way to think about it?

…

On Wed, 17 Jul 2019, 16:05 Stephan Hoyer, ***@***.***> wrote: I think "by_coords" is probably the most user friendly default for open_mfdataset? But I'm not entirely sure... — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#3131?email_source=notifications&email_token=AISNPI47L2F45PAJVZX6CHTP74YNFA5CNFSM4IDSQHX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2EQLPA#issuecomment-512296380>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AISNPI4UBJO5K5NAC65VURTP74YNFANCNFSM4IDSQHXQ> .

shoyer · 2019-07-17T16:28:31Z

It is possible that we do actually need a third combine mode that works like the old auto_combine.

rabernat · 2020-05-18T14:10:34Z

I am still hoping to finish this one day. Any reason it needs to be closed?

keewis · 2020-05-18T14:13:06Z

err, sorry, no. That happened because I deleted the branch you tried to merge into. Let me try to fix that.

keewis · 2020-05-18T14:42:08Z

we should rebase this and #3111 onto master so we don't depend on the old scipy19-docs branch. If we want to continue having a separate development branch for documentation, I think we should use one that is kept in sync with current master.

keewis · 2020-06-02T13:33:31Z

@rabernat, I did the rebase for this and #3111, so when you eventually pick this up again, a simple merge should get this up-to-date with master

dcherian · 2020-06-02T14:13:44Z

Thanks @keewis !

andersy005 · 2021-07-06T22:57:29Z

@rabernat, the gentlest of bumps on this :)... How much work (content) is left to bring this to completion? I'm asking because I'd be happy to help if there's still more work and/or follow-up PR needed.

TomNicholas self-assigned this Jul 16, 2019

TomNicholas reviewed Jul 17, 2019

View reviewed changes

rabernat mentioned this pull request Sep 10, 2019

0.13.0 release #3257

Closed

dcherian force-pushed the scipy19-docs branch from 4dc2e56 to 1faf67c Compare November 21, 2019 21:26

rabernat mentioned this pull request Nov 22, 2019

DOC: from examples to tutorials #3564

Open

TomNicholas added the topic-documentation label Apr 6, 2020

keewis closed this May 18, 2020

keewis reopened this May 18, 2020

keewis changed the base branch from scipy19-docs to master May 18, 2020 14:24

keewis changed the base branch from master to scipy19-docs May 18, 2020 14:24

wip: tutorial on merging datasets

211a2b3

keewis force-pushed the tutorial-on-merging branch from 8e84a4a to 211a2b3 Compare June 2, 2020 13:24

keewis changed the base branch from scipy19-docs to master June 2, 2020 13:24

TomNicholas mentioned this pull request Jul 11, 2022

Link to or absorb external tutorial material xarray-contrib/xarray-tutorial#104

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: tutorial on merging datasets #3131

WIP: tutorial on merging datasets #3131

rabernat commented Jul 15, 2019

codecov bot commented Jul 15, 2019 •

edited

Loading

TomNicholas left a comment

shoyer commented Jul 17, 2019

TomNicholas commented Jul 17, 2019 via email

shoyer commented Jul 17, 2019

rabernat commented May 18, 2020

keewis commented May 18, 2020

keewis commented May 18, 2020 •

edited

Loading

keewis commented Jun 2, 2020

dcherian commented Jun 2, 2020

andersy005 commented Jul 6, 2021

WIP: tutorial on merging datasets #3131

Are you sure you want to change the base?

WIP: tutorial on merging datasets #3131

Conversation

rabernat commented Jul 15, 2019

codecov bot commented Jul 15, 2019 • edited Loading

Codecov Report

TomNicholas left a comment

Choose a reason for hiding this comment

shoyer commented Jul 17, 2019

TomNicholas commented Jul 17, 2019 via email

shoyer commented Jul 17, 2019

rabernat commented May 18, 2020

keewis commented May 18, 2020

keewis commented May 18, 2020 • edited Loading

keewis commented Jun 2, 2020

dcherian commented Jun 2, 2020

andersy005 commented Jul 6, 2021

codecov bot commented Jul 15, 2019 •

edited

Loading

keewis commented May 18, 2020 •

edited

Loading