-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First crack at changing the url for one function #152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that seems reasonable
dataretrieval/wqp.py
Outdated
|
||
else: | ||
url = wqx3_url('Result') | ||
if 'dataProfile' not in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should check whether dataProfile is among a list of valid profiles and throw an error if not. Examples in nwis
.
Well ok, this might be worse before it gets better. I don't really want to introduce a new function input just for dealing with legacy services. I think as long as the dataProfiles remain distinct between legacy and WQX3, users can specify which URL they want to grab from (and technically they should be adapting their workflows to the new format). However, I would like to create a nice table in the documentation to show how users can grab different legacy and WQX3 profiles. I'll also look into a column name crosswalk...I think Laura might have one. Thoughts on all this? Eventually, we'd hopefully be able to simply remove all of the legacy kwarg options, but keep the functions essentially the same. |
Might be unnecessary? I might try something like:
|
I like the idea of having I'm trying to avoid adding a new input Regarding the column names, I'm not sure I follow. I'm thinking about users who have to change their coding workflow and want to see the new column names associated with WQX3, and how they map onto the legacy names. Showing them how to leverage this csv might be sufficient: https://www.epa.gov/system/files/other-files/2024-07/schema_outbound_wqx3.0.csv |
OK, after talking to @lstanish-usgs, I have added a new |
@jzemmels Thanks for reviewing! A few things to check in your review:
Let me know if you have additional questions. I would say bullet #2 is most important to review most thoroughly. |
Sounds good. Recommend setting |
I'm satisfied with the documentation updates to indicate the new legacy parameter. When a user incorrectly specifies an argument like dataProfile that must be from a small list of options, it'd be great to print the list the user must choose from in the error message. For example, Also, I'm not sure if I missed this, but a short definition of each dataProfile would be helpful in the package's documentation. The definitions are a bit tricky to find in the waterqualitydata.us documentation. |
I think the message could start from the second sentence. The new Perhaps the warning message could also provide an alternative for users who want new data. E.g., "To fetch new data, set Other than those two things, I think the warning message is good. (Edit) Oh, one other thing: is there some way to automatically wrap the message so that it doesn't appear on one long line? Maybe this is just a VSCode thing... |
I've checked a couple requests against what the WQP website returns and they work. I'm running into an issue where some requests that run quickly on the WQP website take a really long time to run in Python. Not sure what's causing it. I'm not done checking different combinations of kwargs, but I can check more once I return on Sept. 9. |
@jzemmels, near the top of the page, you should see a big green button labeled "Add your review". I recommend you use that process to create your review and suggest changes, rather than adding comments in the issue. Also,
Keep the function documentation concise. The doc should note that a full description of the profiles can be found at https://www.waterqualitydata.us/beta/portal_userguide/#data-profiles. The doc should also provide a list of valid profiles, either by writing them out or by referring the user to a convenience function. ...but let's continue this discussion using the normal review process. |
Thanks for these suggestions. I think I'm going to keep the warning as-is for the legacy services (because they're not specific to the Result service), but I added another sentence about setting legacy=False to get the latest profiles, if available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now tested out each function with multiple argument combinations and am pleased that nothing threw unexpected errors. The comments I've left are general usability/documentationcomments that can either be addressed here or filed under future considerations.
dataretrieval/wqp.py
Outdated
if legacy is True: | ||
if 'dataProfile' in kwargs: | ||
if kwargs['dataProfile'] not in result_profiles_legacy: | ||
raise TypeError(f""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition!
dataretrieval/wqp.py
Outdated
zip: string | ||
Parameter to stream compressed data, if 'yes', or uncompressed data | ||
if 'no'. Default is 'no'. | ||
be 'geojson' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GeoJSON is not yet supported, so this documentation should reflect that mimeType
will automatically be changed to csv
.
On a related note, is there a reason why the other possible WQP file outputs (tsv, xlsx) aren't supported here? I tested out passing mimeType="tsv"
and "xlsx"
to the query
function and it returned the expected output. Seems like it would be a simple addition to expand the functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Legacy supports tsv and xlsx, but it looks like beta does not yet support anything other than csv. I might add this as an issue to follow up on, but leave as-is for this PR. #162
dataretrieval/wqp.py
Outdated
Please choose from "fullPhysChem", "narrow", or "basicPhysChem" | ||
""") | ||
else: | ||
kwargs['dataProfile'] = 'fullPhysChem' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Genuine question: is this a reasonable default for dataProfile -- would the "typical" user want this to be set for them or should the function throw an error if the user doesn't set the dataProfile argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow whatever @ldecicco-USGS did
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R dataRetrieval uses the fullPhysChem profile. That is generally based on what the EPA TADA package uses and what provides the most comprehensive amount of USGS metadata.
dataretrieval/wqp.py
Outdated
if legacy is True: | ||
url = wqp_url('Organization') | ||
else: | ||
print('No WQX3.0 profile currently available, returning legacy profile.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should throw an error if the user requests WQX3.0 data that aren't available rather than defaulting to the legacy profile (same point applies to the other endpoints that aren't available in WQX3.0).
At the very least, these warning messages could be more specific to the endpoint - for example, "The Organization data profile is not currently available in WQX3.0."
dataretrieval/wqp.py
Outdated
Describes the type of columns to return with the result dataset. | ||
Most recent WQX3 profiles include 'fullPhysChem', 'narrow', and | ||
'basicPhysChem'. Legacy profiles include 'resultPhysChem', | ||
'biological', and 'narrowResult'. | ||
siteid: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these parameters be added to the documentation of the other WQP functions? Seems like it would be annoying to look at the get_results
documentation to know how to use the what_*
functions. Plus, the claim in the other functions that they "accept the same parameters" as get_results
is not true; for example, none of them need the dataProfile
argument. Copying + pasting the relevant parameters to the other functions' doc strings is easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye. I'm not sure the convention with kwargs
documentation, but I can look around (see bottom of this Sphinx doc page: https://pythonhosted.org/an_example_pypi_project/sphinx.html#full-code-exampl). @thodson-usgs, what do you think? I appreciate linking to the webservices guide, which means we don't need to be checking our documentation against the WQP's, but for the uninitiated, wading through the webservices guide (which covers building urls but isn't exactly specific to using a python function) might be a little confusing. Again, I think I'm going to add this as an issue and let this PR only mess with the wqx3 machinery stuff.
'ResultDetectionQuantitationLimit', | ||
'BiologicalMetric'] | ||
|
||
def get_results(ssl_check=True, legacy=True, **kwargs): | ||
"""Query the WQP for results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the documentation here and in the other what_*
functions could be clearer. For example, what are the "results" that one would expect to obtain? Or what does it mean to search for sites/projects/organizations/activities/etc. "within a region with specific data?" For what purpose would someone use these functions? A simple description of these WQP endpoints would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the definitions of the functions are a bit murky. We can use dataRetrieval as a good example: https://doi-usgs.github.io/dataRetrieval/reference/wqpSpecials.html, though also quite light on specifics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When attempting to use the new profile, I got a looooong delay then an error message. Let's table the other discussions until the basic functionality is working.
Interesting, the WQX 3.0 profiles seemed to be working for me earlier but aren't now. I just tried to perform the same query on the WQP beta webpage and also ran into issues. It might be a timeout problem? |
Yeah, seems to be an issue on the service side. Let's add another warning for "legacy=False"; otherwise, I'm fine with this as a first crack PR. |
@jzemmels can you share the tests you ran that gave you errors? I want to share them with the services development teams. |
Sure, this call threw an error. And this call returned an empty data frame after about a minute. The equivalent URL couldn't complete the download, so I assume its somehow related to the empty data frame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve, though I added some suggestions in a PR to your fork, which you might want to merge first.
Nice work.
Docstrings and linting
Fix mimetype test
Created a
new wqx3_url
that is used iflegacy=False
, which is the default. I also addeddataProfile
as a kwarg (for this function at least, may not be needed for others). Thoughts on this approach? RdataRetrieval
has a service argument where the user specifies if they want the old or the new profiles by name: https://github.com/DOI-USGS/dataRetrieval/blob/main/R/readWQPdata.R, but the functions in python do not let the user pick the service, and are instead specific to them (e.g. Results, Activity, Station, etc.).Also removed
zip
since it's not a part of the new WQX3 url, but perhaps should leave it in for the legacy profiles and ensure it is switched to 'no'.