-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Censored data 2 eh #224
Censored data 2 eh #224
Conversation
- pulled id censored data out of simplecensoredmethods and created own function that can be used within/on own like autoclean - created more warnings for detection limit metadata that may be missing from WQX domain table - found example. - created tests to ensure data are not dropped during censored data handling.
@@ -61,7 +61,7 @@ ggplot2 from GitHub. | |||
|
|||
# remotes::install_github("hadley/ggplot2", dependencies=TRUE) | |||
|
|||
remotes::install_github("USEPA/TADA", dependencies=TRUE) | |||
remotes::install_github("USEPA/TADA", ref = "censored_data_2_eh",dependencies=TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the ref included/needed in the install?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so pesky! When I add new functions, GitHub will throw an error when trying to use the develop branch to build the vignette, so I have to temporarily tell it to use my branch. Feel free to change it back to develop once those new functions are included in the develop branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, good to know! I can do that
@@ -669,7 +669,7 @@ that we want to remove the "Quality Control Sample-Field Replicate" and | |||
field. | |||
|
|||
```{r} | |||
TADAProfileClean14 <- dplyr::filter(TADAProfileClean13, !(ActivityTypeCode %in% c("Quality Control Sample-Field Replicate", "Quality Control Sample-Field Blank", "Quality Control Sample-Lab Duplicate", "Quality Control Sample-Equipment Blank"))) | |||
TADAProfileClean14 <- dplyr::filter(TADAProfileClean13, !(ActivityTypeCode %in% ActivityTypeCode[grepl("Quality",ActivityTypeCode)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add line for users # this removes rows where any value in the ActivityTypeCode includes the string "quality" (not case sensitive?), These quality control samples are not included in the analysis. Are we now including this set as an automated step in autoclean, or just including it here? If the latter, we will need to remember to do this step separately in the shiny app as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't happen in autoclean. Would it be better to do it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait for now. It may be better as a flag that can then be used to inform use of that data in the future.
@@ -682,7 +682,7 @@ ResultStatusIdentifier field. | |||
FilterFieldReview("ActivityMediaSubdivisionName", TADAProfileClean14) | |||
``` | |||
|
|||
The ActivityMediaSubdivisionName field has two unique values, "Surface | |||
[Not true of current test dataset] The ActivityMediaSubdivisionName field has two unique values, "Surface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to remove the specifics here (and elsewhere) and be more general so that it does not need updating each time we change the example data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great. It's hard to keep this current when we occasionally change the test dataset!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we can work on that in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehinman Approved, but added a few questions and suggestions
This branch separates the pre-existing censored data function into two: one that identifies censored data and flags it, and another that performs simple handling rules for non-detects and over detects. It also contains a new summary function that gives the sample count, % censored, and a suggested statistical analysis for censored data handling for user-specified groups. It also contains three new tests for the censored data functions.