Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add USGS storm based time series source #14

Closed
saeed-moghimi-noaa opened this issue Mar 16, 2022 · 21 comments
Closed

Add USGS storm based time series source #14

saeed-moghimi-noaa opened this issue Mar 16, 2022 · 21 comments
Labels
discussion enhancement New feature or request

Comments

@saeed-moghimi-noaa
Copy link

saeed-moghimi-noaa commented Mar 16, 2022

@zacharyburnettNOAA @SorooshMani-NOAA

Adding USGS timeseries from https://stn.wim.usgs.gov/FEV/#FlorenceSep2018

This is a code I got from @flackdl . See this repo developed by Danny as well: https://github.com/flackdl/cwwed .

import os
import re
import sys
import errno
import requests
from io import open

EVENT_ID_MATTHEW = 135  # default

# capture an event_id from the command line, defaulting to Matthew
EVENT_ID = sys.argv[1] if len(sys.argv) > 1 else EVENT_ID_MATTHEW

# file type "data"
# https://stn.wim.usgs.gov/STNServices/FileTypes.json
FILE_TYPE_DATA = 2

# deployment types
# https://stn.wim.usgs.gov/STNServices/DeploymentTypes.json
DEPLOYMENT_TYPE_WATER_LEVEL = 1
DEPLOYMENT_TYPE_WAVE_HEIGHT = 2
DEPLOYMENT_TYPE_BAROMETRIC = 3
DEPLOYMENT_TYPE_TEMPERATURE = 4
DEPLOYMENT_TYPE_WIND_SPEED = 5
DEPLOYMENT_TYPE_HUMIDITY = 6
DEPLOYMENT_TYPE_AIR_TEMPERATURE = 7
DEPLOYMENT_TYPE_WATER_TEMPERATURE = 8
DEPLOYMENT_TYPE_RAPID_DEPLOYMENT = 9

# create output directory
output_directory = 'output'
try:
    os.makedirs(output_directory)
except OSError as exception:
    if exception.errno != errno.EEXIST:
        raise

# fetch event data files
files_req = requests.get('https://stn.wim.usgs.gov/STNServices/Events/{}/Files.json'.format(EVENT_ID))
files_req.raise_for_status()
files_json = files_req.json()

# fetch event sensors
sensors_req = requests.get('https://stn.wim.usgs.gov/STNServices/Events/{}/Instruments.json'.format(EVENT_ID))
sensors_req.raise_for_status()
sensors_json = sensors_req.json()

# filter sensors down to barometric ones
barometric_sensors = [sensor for sensor in sensors_json if sensor.get('deployment_type_id') == DEPLOYMENT_TYPE_BAROMETRIC]

# print file urls for barometric sensors for this event
for file in files_json:
    if file['filetype_id'] == FILE_TYPE_DATA and file['instrument_id'] in [s['instrument_id'] for s in barometric_sensors]:

        file_url = 'https://stn.wim.usgs.gov/STNServices/Files/{}/item'.format(file['file_id'])

        # fetch the actual file
        file_req = requests.get(file_url, stream=True)

        # capture the filename from the headers so we can save it appropriately
        match = re.match('.*filename="(?P<filename>.*)"', file_req.headers['Content-Disposition'])
        if match:
            filename = match.group('filename')
        else:
            filename = '{}.unknown'.format(file['file_id'])
            print('COULD NOT FIND "filename" in header, saving as {}'.format(filename))

        print('{}\t\t({})'.format(filename, file_url))

        with open('{}/{}'.format(output_directory, filename), 'wb') as f:
            for chunk in file_req.iter_content(chunk_size=1024):
                f.write(chunk)
@ghost
Copy link

ghost commented Mar 16, 2022

thanks!

@saeed-moghimi-noaa
Copy link
Author

Thanks to @flackdl just share the location of the latest file:
https://github.com/flackdl/cwwed/blob/ad39f0e9bea6a0a3bdbc937fea41994f4ed359ba/scripts/usgs.py

@ghost
Copy link

ghost commented Mar 16, 2022

@saeed-moghimi-noaa
Copy link
Author

Thanks @zacharyburnettNOAA . See the email I just sent to Danny.

@ghost ghost moved this to Todo in observational data retrieval Mar 30, 2022
@ghost ghost added the enhancement New feature or request label Mar 30, 2022
@brey
Copy link
Contributor

brey commented Oct 5, 2022

@SorooshMani-NOAA provided some input via email. I repost here for completeness:

Today I noticed this package on GitHub: https://github.com/USGS-python/dataretrieval

I was wondering if this retrieves the same data that you were interested in or if there's another USGS database that you'd like to query?

This ones seems to have the following data available for retrieval:
instantaneous values (iv)
daily values (dv)
statistics (stat)
site info (site)
discharge peaks (peaks)
discharge measurements (measurements)
water quality samples (qwdata)

which seems to be what the water services REST API provides:
https://waterservices.usgs.gov/rest/

George, if this is the same database the Jack is interested in, does it make sense to add a "normalization" wrapper on top of the dataretrieval package or should searvey directly use REST API?

@brey
Copy link
Contributor

brey commented Oct 5, 2022

I looked a bit into dataretrieval and looks good. It already has users, they are considering doing a conda package
(see issue 44 therein) and the lead developer works for USGS which is beneficial for updates and access.

If it exposes all the data, then we can make a wrapper and use it as upstream dependency.

We can also invite Timothy Hodson to a meeting and discuss it.

@SorooshMani-NOAA
Copy link
Contributor

SorooshMani-NOAA commented Oct 7, 2022

Documenting relevant email between me and @Rjialceky (slightly modified):

[...] CSDL [...] is interested in the following observations in support of the coastal application teams modeling work for NOAA products and services:

  • Surface water level
  • Water level datums, relative and geodetic observations
  • Water temperature
  • Water salinity
  • Water currents

I am primarily interested in datum points in support of navigation products and services; and, where unavailable, interested in the surface water levels to formulate new datums. The challenge of course is to have searvey assemble available observations sourced from NOAA, IOC, USGS, etc. into the normalized categories above. In the case of USGS, the number of [potentially] available parameters to sort out from their observation sites looks especially large—so any software API / wrapper that makes that easier, maintainable, etc. should be leveraged:

https://waterdata.usgs.gov/nwis/uv?referred_module=sw&search_criteria=multiple_site_no&submitted_form=introduction

@brey @pmav99 I can't find the other ticket where we discussed normalization and/or standardization of the outputs. Given the quoted email above, how would you approach adding getter functions? Do we have a template to follow?

@pmav99
Copy link
Member

pmav99 commented Oct 8, 2022

We don't have a "template". I added some thoughts of how the API could/should be in the wiki: https://github.com/oceanmodeling/searvey/wiki/API-design
but feel free to open a new ticket to further discuss this.

@SorooshMani-NOAA
Copy link
Contributor

So does that mean if we want to add USGS data we (for now) just need to return the raw output we get from their API? In this case, is it really meaningful to have a wrapper around USGS dataretrieval package? Because they're already returning a dataframe

@SorooshMani-NOAA
Copy link
Contributor

SorooshMani-NOAA commented Dec 6, 2022

Today I was exploring using dataretrieval package for obtaining USGS datasets. It seems that dataretrieval removes a lot of metadata from the NWIS response during the creation of data tables. For example when getting the "instantaneous value" record for a station we might have something like the following as response from the web API:

{
    "name": "USGS:0148472405:00035:00000",
    "sourceInfo": {
        "geoLocation": {
            "geogLocation": {
                "latitude": 38.1389722,
                "longitude": -75.18363889,
                "srs": "EPSG:4326"
            },
            "localSiteXY": []
        },
        "note": [],
        "siteCode": [
            {
                "agencyCode": "USGS",
                "network": "NWIS",
                "value": "0148472405"
            }
        ],
        "siteName": "BUNTINGS GUT NEAR CEDARTOWN, MD",
        "siteProperty": [
            {
                "name": "siteTypeCd",
                "value": "ST-TS"
            },
            {
                "name": "hucCd",
                "value": "02040303"
            },
            {
                "name": "stateCd",
                "value": "24"
            },
            {
                "name": "countyCd",
                "value": "24047"
            }
        ],
        "siteType": [],
        "timeZoneInfo": {
            "daylightSavingsTimeZone": {
                "zoneAbbreviation": "EDT",
                "zoneOffset": "-04:00"
            },
            "defaultTimeZone": {
                "zoneAbbreviation": "EST",
                "zoneOffset": "-05:00"
            },
            "siteUsesDaylightSavingsTime": true
        }
    },
    "values": [
        {
            "censorCode": [],
            "method": [
                {
                    "methodDescription": "",
                    "methodID": 234506
                }
            ],
            "offset": [],
            "qualifier": [
                {
                    "network": "NWIS",
                    "qualifierCode": "P",
                    "qualifierDescription": "Provisional data subject to revision.",
                    "qualifierID": 0,
                    "vocabulary": "uv_rmk_cd"
                }
            ],
            "qualityControlLevel": [],
            "sample": [],
            "source": [],
            "value": [
                {
                    "dateTime": "2022-12-06T12:00:00.000-05:00",
                    "qualifiers": [
                        "P"
                    ],
                    "value": "1.2"
                }
            ]
        }
    ],
    "variable": {
        "noDataValue": -999999.0,
        "note": [],
        "oid": "45807109",
        "options": {
            "option": [
                {
                    "name": "Statistic",
                    "optionCode": "00000"
                }
            ]
        },
        "unit": {
            "unitCode": "mph"
        },
        "valueType": "Derived Value",
        "variableCode": [
            {
                "default": true,
                "network": "NWIS",
                "value": "00035",
                "variableID": 45807109,
                "vocabulary": "NWIS:UnitValues"
            }
        ],
        "variableDescription": "Wind speed, miles per hour",
        "variableName": "Wind speed, mph",
        "variableProperty": []
    }
}

But the resulting data set only returns (examples not from the same station!):

                           00060 00060_cd     site_no  00065 00065_cd
datetime
2022-12-06 08:45:00-05:00   4.48        P  0148471320   3.72        P

Does this make sense then to instead use web API directly (going back to the original question!)? Since in any case we need to create tables of constants, such as parameter codes, quality codes, etc. It may be that dataretrieval doesn't really take much heavy lifting off of searvey development in the end.

There's also the delay in fixing issues in dataretrieval and waiting for it to get to conda for searvey to depend on it. Right now, for example, there are some issues when retrieving data from stations with different time zones that results in an exception.

@SorooshMani-NOAA
Copy link
Contributor

After discussion the comment above with @pmav99 during data retrieval meeting, we decided it makes more sense to start calling the NWIS API directly to start with, and just use our own mapping of response to data frames.

@brey
Copy link
Contributor

brey commented Dec 11, 2022

I understand the point but I wonder if we should bring this to the attention of Timothy first (with an issue on dataretrieval) and see what he has to say. Having said that I leave it up to you guys.

@SorooshMani-NOAA
Copy link
Contributor

I think it would be better to do what you suggest. I already created an issue here DOI-USGS/dataretrieval-python#59. In the last meeting only two of us were present, so I just wanted to relay what was discussed. I haven't yet implemented anything for USGS.

@mroberge
Copy link

There are a variety of Python packages that use the USGS API. I set up a discussion among the authors here: mroberge/hydrofunctions#79

  • Taher Chegini @cheginit just added some elegant code to his HyRiver package that deals with timezone information from the USGS metadata.
  • my hydrofunctions requests data, stores the original response, and formats it into dataframes upon request. My plan is to offer more ways to organize the dataframe in the future: a 'tidy' format, wide, and multiindex.

@SorooshMani-NOAA
Copy link
Contributor

Thank you @mroberge this information is very helpful.

@SorooshMani-NOAA
Copy link
Contributor

I just realized that the get_iv metadata item in the returned tuple can include information about the parameter code or site. I though that the metadata only includes header or url information, but if the right arguments are passed, more information is extracted and included. I think the main question now is how much we want to keep the data from REST API untouched?

For IOC and COOPS stations we pretty much return whatever is provided by the web services, but for USGS NWIS we have to do so transformation either way. Can we then just take output of dataretrieval (or even one of the other packages from #14 (comment)) to be the main source of data and just return that data with minimal changes to fit searvey API conventions?

@cheginit
Copy link

@mroberge, Thanks for mentioning HyRiver. As Martin said, PyGeoHydro includes a class called NWIS that provides access to several NWIS endpoints (you can check out this example notebook). Also, I developed robust and performant engines for working with web services (AsyncRetriever and PyGeoOGC), so feel free to explore them and let me know if you need any help.

@SorooshMani-NOAA
Copy link
Contributor

@cheginit I learned about your toolset a couple of weeks ago when working on a different project. Your software stack is very impressive and useful, however since searvey is focused on giving access to the original data from the source at the lowest level, it makes more sense to use minimal packages like dataretrieval. With that being said, I'm looking forward to using your software stack in other projects.

@SorooshMani-NOAA
Copy link
Contributor

@brey, @pmav99, @saeed-moghimi-noaa, if you haven't already, I highly recommend reading this summary by @mroberge: mroberge/hydrofunctions#79. (mentioned in #14 (comment))

After that I'd like us to re-evaluate why we want to add USGS support within searvey. My take is:

  • searvey is a one-stop shop for [original] measurement data used for validating coastal ocean models
  • dataretrieval returns the data in a form very close to original source (NWIS REST API)
  • We don't want to reimplement the wheel

I'm just thinking out load, but given above (as opposed to what I said to @pmav99 the other day) maybe it makes more sense to follow the original plan of using dataretrieval package, and just assume the return values are the original data from source.

What do you think?

@saeed-moghimi-noaa
Copy link
Author

@SorooshMani-NOAA

What you suggested make sense. I am fine with that. However I will let @brey and @pmav99 as the lead developers of searvey to have the final say.

Thanks,

@brey
Copy link
Contributor

brey commented Jan 6, 2023

After the discussion with @SorooshMani-NOAA few days back and seeing his progress (!) using dataretrieval let's go with that. Thanks Soroosh.

I will close this issue and we can open more specific ones if needed during the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request
Projects
Development

No branches or pull requests

6 participants