-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testdata #142
Testdata #142
Conversation
added importFrom(rlang,":=") to depth profile function. Added rlang as import in description file
added test as well for changing "meters" to "m"
checking for non-standard things in the check directory ... NOTE
possibly lines 409-445 |
@jbousquin Here is what I have so far, it runs and serves expected results now, but throws a new warning when I run TADAProfile1 in the vignette (see: #142). I don't see this warning when I run it on your test data though. I plan to try and troubleshoot a bit more & then we can merge after. I also addressed the “meters” and “m” issue as part of the autoclean function which runs within TADAdataRetrieval. Not sure if that is the best place, but for now I deal with a few specific USGS/EPA data compatibility issues there within autoclean. FYI - I provided you with write access to the TADA repo. You can tag people for review now & do other things like merge a pull request in after reviewer approval (e.g. if you wanted you could merge my pull request in after review since I tagged you as a reviewer). Since I’ve mostly been working on the repo alone the last few months, I’ve been bypassing this review requirement for myself. |
Note: the original code removed the original unit information & conversion factor columns when transform = TRUE. I commented out that code for now so the fields will show up in the final dataset. Not sure what users would prefer - any thoughts? Note: I added ActivityEndTime.TimeZoneCode to cols checked In this example: mock data frameActivityDepthHeightMeasure.MeasureValue <- c(2.0, 1) These 3 columns would now be added: |
R/ResultFlagsDependent.R
Outdated
dplyr::relocate("ActivityDepthHeightMeasure.MeasureUnitCode", | ||
.after = "ActivityDepthHeightMeasure.MeasureValue" | ||
) | ||
# uncoment below to delete ActDepth.Conversion.Unit column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to uncomment to remove ActDepth.Conversion.Unit from clean dataset
On Note: the original code removed the original unit information & conversion factor columns when transform = TRUE. Three pieces of information (1) Original Units (2) Updated Units, and (3) Conversion Factor I see three options (with some combinations thereof):
What I did was to keep the original, and then intermediate columns (equivalent to conversion factor) had an optional argument where the default was to delete it but the user could change that to keep the column (mainly for debugging). At the end of my workflow I moved the original unit column to a secondary table anyway, so I figured keeping it small at this stage didn't matter (endpoint for TADA has all columns so this might be different). |
autoclean sounds a lot like the pre-processing of units I did (e.g., '%' -> 'percent', 'deg C' -> 'degC') so that the units package would recognize them correctly. Looks like the change is specific, i.e., avoids 'centimeter' -> 'centim'. Will it catch 'Meters'? Only caution I might add is do you want them characteristic specific (e.g., does 'meters' ever not mean 'm' in some other context). I know there are some units libraries in R but I'm not familiar with them, might be able to leverage some of those somewhere between autoclean and conversions in the table for more basic unit recognition issues like this. Good catch on the rlang warning (importFrom(rlang,":=")), totally open to something else if there is another way to pass the column as a string to dplyr |
check_autoclean_meters_works <- TADAdataRetrieval(statecode = "UT", | ||
characteristicName = c("Ammonia", "Nitrate", "Nitrogen"), | ||
startDate = "01-01-2021") | ||
expect_equal(check_autoclean_meters_works$ActivityDepthHeightMeasure.MeasureUnitCode[975], "m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this right you're running the function on the data retrieved by TADAdataRetrieval function. If data is ever added that will be included in this query result could the results at index 975 change? One way around that is to have the current result as a static file in the test data and run the function on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized because this is testing autoclean and that is part of the retrieval process its a bit more complicated... so may have to just keep an eye out if this test fails that it may be an index change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. USGS is also planning to actually change their data in the future (~summer 2023) or consistency with WQX, with includes things like using "m" instead of "meters". When that happen, it would would render this specific piece of code within autoclean and this test unnecessary --so in the future this can probably be deleted.
This is all helpful to keep in mind so I'll try to track this discussion somewhere, maybe in a new issue. It might end up being helpful to track if an index changes on retrieved data in certain cases too, because that would mean the backend data profile or data itself is changing for some reason (e.g. could be a sign of updates to WQX 3.0 profiles or USGS actually changing their data in the future, etc.).
Are you still seeing the warning referenced here when running the vignette? (I tried to recreate but didn't see the warning) |
@jbousquin I spent some time troubleshooting the warning this afternoon & was able to fix it. The current pull request is now up to date with my commits from today. Everything should be running smoothly now with no warnings or errors. The changes I made actually removed the rlang,":=" dependency too, so I removed that dependency from the package for now. I think the only thing left to do is decide which columns to keep when transform = TRUE vs. FALSE. What do you think? |
Sounds good to me - that way users have access to the intermediates by setting transform = False, but the default is what we anticipate most will want (fewer columns). |
No description provided.