Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue and additional day from make_hourly_temperature #1938

Closed
2 tasks done
saschahofmann opened this issue Oct 3, 2024 · 5 comments · Fixed by #1939
Closed
2 tasks done

Memory issue and additional day from make_hourly_temperature #1938

saschahofmann opened this issue Oct 3, 2024 · 5 comments · Fixed by #1939
Labels
bug Something isn't working

Comments

@saschahofmann
Copy link
Contributor

saschahofmann commented Oct 3, 2024

Setup Information

  • Xclim version:
  • Python version:
  • Operating System:

Description

From this comment
Hi, I've been using the Utah chill unit model for my research but my code is a mess and time-consuming so I'm happy to see it implemented in xclim (thanks @saschahofmann for that)! I did some tests with my data today and wanted to inform you about two things I've noticed.

Memory issue
First, it's more a warning than an issue. Because chill_unit uses hourly data, some memory errors can occur using an important dataset while the computation of other indices or indicators with the same dataset (but daily resolution) is fine. This can be handled easily by iterating over the years and it's what I've done. However, I think it could be relevant to add a warning message when using make_hourly_temperature. What do you think about this?

Output bug
The second is make_hourly_temperature adds one more day to compute the hourly temperature but does not remove it after the computation, thus returning an array with a shape different to the inputs. See below for data corresponding to a complete year with a shape (time: 365, lat: 56, lon: 54) as input:

>>> data_yr['tasmin'].shape, data_yr['tasmax'].shape
((365, 56, 54), (365, 56, 54))
>>> make_hourly_temperature(data_yr['tasmin'], data_yr['tasmax']).resample(time='D').mean().shape
(366, 56, 54)

This is particularly annoying if the input data ends on the last day of the year and a resampling YS is done, creating a year at the end considering only one day.

>>> hourly_temp = make_hourly_temperature(data_yr['tasmin'], data_yr['tasmax'])
>>> cu = chill_units(hourly_temp, freq='YS')
>>> print(cu)
<xarray.DataArray (time: 2, lat: 56, lon: 54)> Size: 48kB
dask.array<stack, shape=(2, 56, 54), dtype=float64, chunksize=(1, 56, 54), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float64 448B -47.38 -47.12 -46.88 ... -34.12 -33.88 -33.62
  * lon      (lon) float64 432B 166.1 166.4 166.6 166.9 ... 178.9 179.1 179.4
  * time     (time) datetime64[ns] 16B 2100-01-01 2101-01-01
Attributes:
    units:    1

I hope my explanations are clear enough! Thanks !

Steps To Reproduce

No response

Additional context

No response

Contribution

  • I would be willing/able to open a Pull Request to address this bug.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@saschahofmann saschahofmann added the bug Something isn't working label Oct 3, 2024
@saschahofmann
Copy link
Contributor Author

Hi @baptistehamon ,
Thanks for testing this implementation. In an earlier version, the make_hourly_temperature function was setting the chunksize to daily chunks (in combination with chill_portions, yearly chunks would probably make more sense). But we decided to leave chunking to user discretion. Maybe we could raise a warning if chunksize becomes really large but I am not sure when that'd be? Can you try changing the chunk size after make_hourly_temperature to maybe 365*24 and see if you still have to loop?

Will investigate the bug now, in theory the function adds a new timestamp to do the hourly resampling but then removes the last time step again.

@saschahofmann
Copy link
Contributor Author

@baptistehamon could you send me the dataset you are using. I can reproduce the issue with my test dataset but I suspect it has maybe to do with the hour of your daily data. I have been testing it with data that's set to midnight.

@saschahofmann
Copy link
Contributor Author

Yes. Ok I can see it now if I move the hour to 12pm. I guess the easiest for me would be to ignore the hour of day and just set it to midnight?

@aulemahal
Copy link
Collaborator

I think ignoring the time is correct! In many cases, having daily data at noon means that the data is an aggregation over the day (in opposition to an instant measurement). But in the context of make_hourly_temperature we always assume the former, the time information is of no use.

As for the memory issue, @baptistehamon, the idea was indeed that the user should expect the data to inflate by a factor of 24, and so choose an appropriate chunking.

@baptistehamon
Copy link

Sorry for the late news! Thanks a lot for your help. I still have to loop over the year even with a chunk 365*24 but it's easily manageable so it's ok!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants