Why do we want to CMORize all observational datasets? #1120

bascrezee · 2019-06-04T11:20:10Z

Maybe we should rather define certain interfaces to existing packages that take care of reading datasets into common Python data structures. E.g. particularly suitable for reading a very diverse set of data in different formats is intake. Another interesting project focusing on satellite datasets is open data cube.

zklaus · 2019-06-04T13:23:41Z

@hb326 thoughts? Comments?

bascrezee · 2019-06-04T14:36:06Z

@mattiarighi just answered this question offline to me, his answer is :

because we want to have a pool of observational data

bouweandela · 2019-06-06T08:22:05Z

That does not really answer the question, because you can also have a pool of observational data without reformatting it.

I think the real answer is probably more a perceived run-time advantage, reformatting takes some time, so if you have to do it every time you run a recipe, it could potentially be slower.

valeriupredoi · 2019-08-27T16:19:37Z

I'll take a step back and point to a few things:

reformatting needs to be done according to a number of standards: CF and CMOR conventions most importantly but also ESMValTool-specific conventions that are not forcefully imposed but it makes life easier (ie preferred time units, preferred metadata items etc) so it's much better if reformatting is done once and reformatted data is shoved into a box from where it can be used right out the box;
as @bouweandela points out reformatting can be done on the fly but can potentially be time and CPU consuming depending on how much data needs to be converted;
one social aspect to this matter is the user's comfort knowing that there is a database with nicely formatted data where they can just point the tool to and all is done smoothly (ie risk of tool's failure is smaller since it needs to perform less actions) - same aspects the ESGF nodes provide the user - a nice place where nice data lives (nice my arse, given how many problem ESGF data has but that's a different fish altogether)
lastly, the question of LARGE datasets comes in my mind, the ones that are so large that they can't be stored in one place but need to be reformatted on the fly -> that is something that should probably be done on the fly, but apart from that I reckon if we can store data then we should run the reformatting as few times as possible
🍺

mattiarighi · 2020-09-29T13:50:10Z

Talking about intake, today's tech-talk by DKRZ on this subject might be of interest:
https://www.dkrz.de/up/de-news-and-events/de-tech-talks/de-dkrz-tech-talk-intake-taking-the-pain-out-of-data-access

It should be available on their youtube channel soon.

bascrezee added the Workshop / Paper label Jun 4, 2019

bascrezee mentioned this issue Jun 4, 2019

Use cases for station data #1119

Open

zklaus assigned bascrezee and hb326 Jun 4, 2019

mattiarighi added observations question and removed Workshop / Paper labels Jun 4, 2019

bascrezee changed the title ~~Do we really need to CMORize all observational datasets?~~ Why do we want to CMORize all observational datasets? Jun 4, 2019

bouweandela mentioned this issue Nov 4, 2019

Support for ERA5 data in native format ESMValGroup/ESMValCore#358

Merged

9 tasks

mattiarighi mentioned this issue Dec 19, 2019

Strategy to integrate large observational data sets #1109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we want to CMORize all observational datasets? #1120

Why do we want to CMORize all observational datasets? #1120

bascrezee commented Jun 4, 2019

zklaus commented Jun 4, 2019

bascrezee commented Jun 4, 2019 •

edited

Loading

bouweandela commented Jun 6, 2019

valeriupredoi commented Aug 27, 2019

mattiarighi commented Sep 29, 2020 •

edited

Loading

Why do we want to CMORize all observational datasets? #1120

Why do we want to CMORize all observational datasets? #1120

Comments

bascrezee commented Jun 4, 2019

zklaus commented Jun 4, 2019

bascrezee commented Jun 4, 2019 • edited Loading

bouweandela commented Jun 6, 2019

valeriupredoi commented Aug 27, 2019

mattiarighi commented Sep 29, 2020 • edited Loading

bascrezee commented Jun 4, 2019 •

edited

Loading

mattiarighi commented Sep 29, 2020 •

edited

Loading