-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge competing Dataset approaches #78
Comments
We should probably have a wiki page on this or something. |
Personally, I would like to see something fully built around xarray.DataArray and xarray.Dataset (it is unfortunate that our class is also called Dataset). I wasn't aware of xarray when I first started the typhon.datasets package. The typhon.spareice.datasets approach uses the ArrayGroup class which appears very similar to an xarray.Dataset, whereas the Array appears very similar to xarray.DataArray, yet they aren't built around it. |
If I understand |
Thanks @gerritholl for opening this issue. xarray supportI started to build spareice.datasets on xarray but I ran into some trouble hence I wrote my own Array and ArrayGroup implementations. Maybe you have some suggestions for me how to solve them. Here are my thoughts about xarray:
Nevertheless xarray has a great dask support which makes it preferable for big data applications and it seems to have a bright future. So I totally agree with you that future specific datasets implementations should support xarray objects. I therefore try to make ArrayGroup compatible to xarray objects. But in general, I think the actual Dataset base class should be independent from its file content - therefore it should not care about xarrays, ArrayGroups or whatever. This makes it more powerful and also usable for datasets of other data types (e.g. text based or images). |
RegEx supportYou are right. So far |
I like your idea of stating the limitations and features of the packages. typhon.spareice.datasetsLimitations
Features
|
I moved this discussion into a new project (https://github.com/atmtools/typhon/projects/1?). Are you okay with that? |
I have no experience with "projects" in github but I suppose it is a good idea. As a regex example, an example of a HIRS filename is 'NSS.HIRX.NJ.D99127.S0632.E0820.B2241718.WI.gz'. I describe that with the regex stored_name = ("FIDUCEO_FCDR_L1C_HIRS{version:d}_{satname:s}_"
"{year:04d}{month:02d}{day:02d}{hour:02d}{minute:02d}{second:02d}_"
"{year_end:04d}{month_end:02d}{day_end:02d}{hour_end:02d}{minute_end:02d}{second_end:02d}_"
"{fcdr_type:s}_v{data_version:s}_fv{format_version:s}.nc")
write_subdir = "{fcdr_type:s}/{satname:s}/{year:04d}/{month:02d}/{day:02d}"
stored_re = (r"FIDUCEO_FCDR_L1C_HIRS(?P<version>[2-4])_"
r"(?P<satname>.{6})_"
r"(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})"
r"(?P<hour>\d{2})(?P<minute>\d{2})(?P<second>\d{2})_"
r"(?P<year_end>\d{4})(?P<month_end>\d{2})(?P<day_end>\d{2})"
r"(?P<hour_end>\d{2})(?P<minute_end>\d{2})(?P<second_end>\d{2})_"
r"(?P<fcdr_type>[a-zA-Z]*)_"
r"v(?P<data_version>.+)_"
r"fv(?P<format_version>.+)\.nc") My file-finder uses the regular expression, but the writing part uses the template. There is a duplication here, ideally one should only need one. |
Dear @gerritholl , I had a deeper look into your modules (especially tovs.py and models.py) and I think it might be surprisingly easy to use them with my Dataset class. Theoretically, it should already work if the reader classes (HIRS, HIRSPOD, IASIEPS, IASISub, HIASI, HIRSHIRS, and ERAInterim) inherit from |
I suppose so! I would love to try it out but right now I really don't have the time. |
Typhon currently has two competing Dataset approaches. One lives in the typhon.datasets package and its subpackages, another in typhon.spareice.datasets. The two have overlapping aims but diverge in their implementation and approach. This duplicates maintenance effort and in the long term, it is desirable to merge them. This requires a major effort.
To achieve this goal, it would be needed to make a full inventory of the features within each approach, where they overlap and where they differ. I suspect the typhon.datasets package has many features not currently available in typhon.spareice.datasets. It would affect readers implemented in either way.
Limitations of the typhon.datasets package:
I don't expect to be able to do much on this any time soon, nor do I expect others to do so, but I'm keeping this issue here so we keep it in mind long term.
The text was updated successfully, but these errors were encountered: