Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api.stations() not returning all stations #40

Closed
anthony-meza opened this issue Aug 19, 2024 · 6 comments
Closed

api.stations() not returning all stations #40

anthony-meza opened this issue Aug 19, 2024 · 6 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@anthony-meza
Copy link

Hi,

great tool! I'm hoping to make a wrapper for this to easily put all this data into netcdfs. I noticed that api.stations() only lists 146 stations, not the full 894. Why is that?

Also it may be worth it to add an "instrument" column in this, so the user can filter for buoys, drifiting buoys or seabirds. I think this is contained in the "station_table.txt" file

@CDJellen
Copy link
Owner

Hello @anthony-meza ; thank you for the suggestion and for opening this issue.

When api.stations() was initially implemented, it was scoped to cover only the NDBC owned and maintained stations. Those stations typically support the typical data modalities (cwind, stdmet, ...) and are active for at least some part of the year.

With that said, I've checked through some of the other 894 stations, and many of these also seem to have good temporal coverage and offer supported data formats.

I'll update the stations handler to provide the full list of stations, with your suggested "instrument" column as well as the buoy operator (NDBC, CBIBS, UW, ...). I should be able to make this change over the next few days, and will create a new release before I mark the issue completed.

NetCDF support would also be valuable to support first-class. The xarray package allows robust conversion of pd.DataFrame to xarray.Dataset which can be serialized as NetCDF4 compatible files. If there are suggestions you have on the requirements for netcdf support from the get_data method, I can look into implementing this as I also was planning to store bulk data in that format outside the scope of the package.

Thanks again and have an excellent rest of your day;

@CDJellen CDJellen self-assigned this Aug 20, 2024
@CDJellen CDJellen added bug Something isn't working enhancement New feature or request labels Aug 20, 2024
@anthony-meza
Copy link
Author

I can share some code! I used ndbc-api yesterday to make some observation-model comparisons. I stored everything in netcdf format in one notebook. I'd be happy to see some of it make it into this package.

I didn't write things as nice as you have though...I'll clean it all up and then link you to a repository.

@anthony-meza
Copy link
Author

anthony-meza commented Aug 20, 2024

Here is a bunch of functions I made to put the NDBC data into netcdfs using ndbc.api. It's pretty focused, so I'm sure there are more bugs out there. My use case is extracting water temperatures "WTMP" from the historical record.

https://github.com/anthony-meza/xBuoy/blob/main/src.py

Here's a notebook using these functions:

https://github.com/anthony-meza/xBuoy/blob/main/save_NDBC_subset.ipynb

@anthony-meza
Copy link
Author

One suggestion I have for the get_data function, which I implemented in my own code is querying data in parallel. I found that the buoy data is pretty sparse (many outages/missing periods) so there's a lot of time lost by querying the data one by one.

@CDJellen
Copy link
Owner

Hello @anthony-meza; thank you again for opening this issue. Release v2024.08.28.1 changes api.stations() to return all active stations as suggested. There is a new api.historical_stations() which provides the historical deployments from the NDBC's metadata/stationmetadata.xml file.

The next release (unless there is a small bug fix release in the coming days) will support direct netcdf data export. This will likely be an option similar to as_df in the get_data method, but given the time it takes to query the text files across long time ranges and large station lists, I'm looking at other, faster ways of making the data available through the API.

In response to your suggestion in this comment we do indeed query data in parallel. There may be optimizations we can make here, and this is a recent change from #37 .

I'll close this issue out and track netcdf support as a feature.

Please feel free to open another issue with suggestions or submissions. One other interesting feature that might be worth implementing is a set of plotting utilities such as those in the code you linked to better visualize where stations are located, or for what date ranges data is available for given station.

@CDJellen
Copy link
Owner

CDJellen commented Sep 1, 2024

Hello @anthony-meza ; I've added direct support for netcdf4 files through the THREDDS data service in the latest commit (#43 ).

This functionality is still somewhat experimental, but an example is added near the bottom of this notebook. Thanks again for the suggestions and have an excellent rest of your day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants