Testdata #142

cristinamullin · 2022-10-18T15:59:55Z

No description provided.

added importFrom(rlang,":=") to depth profile function. Added rlang as import in description file

added test as well for changing "meters" to "m"

cristinamullin · 2022-10-18T23:11:30Z

checking for non-standard things in the check directory ... NOTE
Found the following files/directories:
'cli' 'commonmark' 'curl' 'data.table' 'digest' 'jsonlite' 'purrr'
'vctrs' 'yaml'

TADAProfileClean1 <- DepthProfileData(TADAProfile, unit = "m", transform = TRUE)
Warning messages:
1: In [<-.data.frame(*tmp*, targetUnit, value = list(WQX.Depth.TargetUnit.x = c(NA, :
provided 2 variables to replace 1 variables
2: In [<-.data.frame(*tmp*, targetUnit, value = list(WQX.Depth.TargetUnit.x = c(NA, :
provided 2 variables to replace 1 variables

possibly lines 409-445

cristinamullin · 2022-10-19T14:07:39Z

@jbousquin Here is what I have so far, it runs and serves expected results now, but throws a new warning when I run TADAProfile1 in the vignette (see: #142). I don't see this warning when I run it on your test data though. I plan to try and troubleshoot a bit more & then we can merge after.

I also addressed the “meters” and “m” issue as part of the autoclean function which runs within TADAdataRetrieval. Not sure if that is the best place, but for now I deal with a few specific USGS/EPA data compatibility issues there within autoclean.

FYI - I provided you with write access to the TADA repo. You can tag people for review now & do other things like merge a pull request in after reviewer approval (e.g. if you wanted you could merge my pull request in after review since I tagged you as a reviewer). Since I’ve mostly been working on the repo alone the last few months, I’ve been bypassing this review requirement for myself.

cristinamullin · 2022-10-19T14:10:15Z

Note: the original code removed the original unit information & conversion factor columns when transform = TRUE. I commented out that code for now so the fields will show up in the final dataset. Not sure what users would prefer - any thoughts? Note: I added ActivityEndTime.TimeZoneCode to cols checked

In this example:

mock data frame

ActivityDepthHeightMeasure.MeasureValue <- c(2.0, 1)
ActivityDepthHeightMeasure.MeasureUnitCode <- c("m", "ft")
ActivityTopDepthHeightMeasure.MeasureValue <- c(NaN, NaN)
ActivityTopDepthHeightMeasure.MeasureUnitCode <- c(NaN, NaN)
ActivityBottomDepthHeightMeasure.MeasureValue <- c(NaN, NaN)
ActivityBottomDepthHeightMeasure.MeasureUnitCode <- c(NaN, NaN)
ResultDepthHeightMeasure.MeasureValue <- c(NaN, NaN)
ResultDepthHeightMeasure.MeasureUnitCode <- c(NaN, NaN)
ActivityEndTime.TimeZoneCode <- c(NaN, NaN)
TADAProfile <- data.frame(ActivityDepthHeightMeasure.MeasureValue,
ActivityDepthHeightMeasure.MeasureUnitCode,
ActivityTopDepthHeightMeasure.MeasureValue,
ActivityTopDepthHeightMeasure.MeasureUnitCode,
ActivityBottomDepthHeightMeasure.MeasureValue,
ActivityBottomDepthHeightMeasure.MeasureUnitCode,
ResultDepthHeightMeasure.MeasureValue,
ResultDepthHeightMeasure.MeasureUnitCode,
ActivityEndTime.TimeZoneCode)
x = DepthProfileData(TADAProfile)

These 3 columns would now be added:
"WQX.ActDepth.ConversionFactor"
"ActivityDepthHeightMeasure.MeasureUnitCode.Original"
"WQX.Depth.TargetUnit"

cristinamullin · 2022-10-19T14:15:17Z

R/ResultFlagsDependent.R

+          dplyr::relocate("ActivityDepthHeightMeasure.MeasureUnitCode",
+                          .after = "ActivityDepthHeightMeasure.MeasureValue"
+          )
+        # uncoment below to delete ActDepth.Conversion.Unit column


May want to uncomment to remove ActDepth.Conversion.Unit from clean dataset

jbousquin · 2022-10-19T19:03:55Z

On Note: the original code removed the original unit information & conversion factor columns when transform = TRUE.

Three pieces of information (1) Original Units (2) Updated Units, and (3) Conversion Factor

I see three options (with some combinations thereof):

Replace the Original Units and delete Conversion Factor (results in fewest columns)
Keep everything (best from a data management perspective)
Give the user the option what to keep (flexible but don't want too many args)

What I did was to keep the original, and then intermediate columns (equivalent to conversion factor) had an optional argument where the default was to delete it but the user could change that to keep the column (mainly for debugging). At the end of my workflow I moved the original unit column to a secondary table anyway, so I figured keeping it small at this stage didn't matter (endpoint for TADA has all columns so this might be different).

jbousquin · 2022-10-19T19:19:11Z

autoclean sounds a lot like the pre-processing of units I did (e.g., '%' -> 'percent', 'deg C' -> 'degC') so that the units package would recognize them correctly. Looks like the change is specific, i.e., avoids 'centimeter' -> 'centim'. Will it catch 'Meters'? Only caution I might add is do you want them characteristic specific (e.g., does 'meters' ever not mean 'm' in some other context). I know there are some units libraries in R but I'm not familiar with them, might be able to leverage some of those somewhere between autoclean and conversions in the table for more basic unit recognition issues like this.

Good catch on the rlang warning (importFrom(rlang,":=")), totally open to something else if there is another way to pass the column as a string to dplyr

jbousquin · 2022-10-19T19:26:36Z

tests/testthat/test-DataDiscoveryRetrieval.R

+  check_autoclean_meters_works <- TADAdataRetrieval(statecode = "UT",
+                                    characteristicName = c("Ammonia", "Nitrate", "Nitrogen"),
+                                    startDate = "01-01-2021")
+  expect_equal(check_autoclean_meters_works$ActivityDepthHeightMeasure.MeasureUnitCode[975], "m"


If I understand this right you're running the function on the data retrieved by TADAdataRetrieval function. If data is ever added that will be included in this query result could the results at index 975 change? One way around that is to have the current result as a static file in the test data and run the function on that.

Realized because this is testing autoclean and that is part of the retrieval process its a bit more complicated... so may have to just keep an eye out if this test fails that it may be an index change.

Good points. USGS is also planning to actually change their data in the future (~summer 2023) or consistency with WQX, with includes things like using "m" instead of "meters". When that happen, it would would render this specific piece of code within autoclean and this test unnecessary --so in the future this can probably be deleted.

This is all helpful to keep in mind so I'll try to track this discussion somewhere, maybe in a new issue. It might end up being helpful to track if an index changes on retrieved data in certain cases too, because that would mean the backend data profile or data itself is changing for some reason (e.g. could be a sign of updates to WQX 3.0 profiles or USGS actually changing their data in the future, etc.).

importFrom(rlang,":=")

jbousquin · 2022-10-19T20:43:24Z

@jbousquin Here is what I have so far, it runs and serves expected results now, but throws a new warning when I run TADAProfile1 in the vignette (see: #142). I don't see this warning when I run it on your test data though. I plan to try and troubleshoot a bit more & then we can merge after.

I also addressed the “meters” and “m” issue as part of the autoclean function which runs within TADAdataRetrieval. Not sure if that is the best place, but for now I deal with a few specific USGS/EPA data compatibility issues there within autoclean.

FYI - I provided you with write access to the TADA repo. You can tag people for review now & do other things like merge a pull request in after reviewer approval (e.g. if you wanted you could merge my pull request in after review since I tagged you as a reviewer). Since I’ve mostly been working on the repo alone the last few months, I’ve been bypassing this review requirement for myself.

Are you still seeing the warning referenced here when running the vignette? (I tried to recreate but didn't see the warning)

cristinamullin · 2022-10-19T20:54:04Z

@jbousquin I spent some time troubleshooting the warning this afternoon & was able to fix it. The current pull request is now up to date with my commits from today. Everything should be running smoothly now with no warnings or errors. The changes I made actually removed the rlang,":=" dependency too, so I removed that dependency from the package for now.

I think the only thing left to do is decide which columns to keep when transform = TRUE vs. FALSE.
I think I want to adjust it so:
A. We keep everything (i.e., the three pieces of information (1) Original Units (2) Updated Units, and (3) Conversion Factor) when transform = FALSE. That way a user can review it, but...
B. When transform = TRUE, it would Replace the Original Units and delete Conversion Factor (results in fewest columns)

What do you think?

jbousquin · 2022-10-20T17:57:07Z

Sounds good to me - that way users have access to the intermediates by setting transform = False, but the default is what we anticipate most will want (fewer columns).

cristinamullin and others added 6 commits October 18, 2022 11:11

update github pages

84c04d2

fixed note

dc662f9

added importFrom(rlang,":=") to depth profile function. Added rlang as import in description file

updated internal WQX ref files

76beba5

depth profile bug fix

9fef48c

added test as well for changing "meters" to "m"

cut the suggestions

7bf416d

updates

6912271

cristinamullin requested a review from jbousquin October 18, 2022 23:18

cristinamullin commented Oct 19, 2022

View reviewed changes

big fix depth conv

b21d777

jbousquin reviewed Oct 19, 2022

View reviewed changes

cristinamullin added 3 commits October 19, 2022 15:32

update pages

8ace3ee

removes rlang req

f3d31c2

importFrom(rlang,":=")

update pages

d041c8f

cristinamullin merged commit 946f6d3 into develop Oct 20, 2022

cristinamullin deleted the testdata branch October 20, 2022 18:28

cristinamullin restored the testdata branch October 20, 2022 18:33

cristinamullin deleted the testdata branch October 20, 2022 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testdata #142

Testdata #142

cristinamullin commented Oct 18, 2022

cristinamullin commented Oct 18, 2022 •

edited

Loading

cristinamullin commented Oct 19, 2022

cristinamullin commented Oct 19, 2022 •

edited

Loading

cristinamullin Oct 19, 2022

jbousquin commented Oct 19, 2022

jbousquin commented Oct 19, 2022 •

edited

Loading

jbousquin Oct 19, 2022

jbousquin Oct 19, 2022

cristinamullin Oct 19, 2022

jbousquin commented Oct 19, 2022

cristinamullin commented Oct 19, 2022 •

edited

Loading

jbousquin commented Oct 20, 2022

Testdata #142

Testdata #142

Conversation

cristinamullin commented Oct 18, 2022

cristinamullin commented Oct 18, 2022 • edited Loading

cristinamullin commented Oct 19, 2022

cristinamullin commented Oct 19, 2022 • edited Loading

mock data frame

cristinamullin Oct 19, 2022

Choose a reason for hiding this comment

jbousquin commented Oct 19, 2022

jbousquin commented Oct 19, 2022 • edited Loading

jbousquin Oct 19, 2022

Choose a reason for hiding this comment

jbousquin Oct 19, 2022

Choose a reason for hiding this comment

cristinamullin Oct 19, 2022

Choose a reason for hiding this comment

jbousquin commented Oct 19, 2022

cristinamullin commented Oct 19, 2022 • edited Loading

jbousquin commented Oct 20, 2022

cristinamullin commented Oct 18, 2022 •

edited

Loading

cristinamullin commented Oct 19, 2022 •

edited

Loading

jbousquin commented Oct 19, 2022 •

edited

Loading

cristinamullin commented Oct 19, 2022 •

edited

Loading