Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Censored data 2 eh #224

Merged
merged 4 commits into from
Mar 9, 2023
Merged

Censored data 2 eh #224

merged 4 commits into from
Mar 9, 2023

Conversation

ehinman
Copy link
Contributor

@ehinman ehinman commented Mar 9, 2023

This branch separates the pre-existing censored data function into two: one that identifies censored data and flags it, and another that performs simple handling rules for non-detects and over detects. It also contains a new summary function that gives the sample count, % censored, and a suggested statistical analysis for censored data handling for user-specified groups. It also contains three new tests for the censored data functions.

- pulled id censored data out of simplecensoredmethods and created own function that can be used within/on own like autoclean
- created more warnings for detection limit metadata that may be missing from WQX domain table - found example.
- created tests to ensure data are not dropped during censored data handling.
@@ -61,7 +61,7 @@ ggplot2 from GitHub.

# remotes::install_github("hadley/ggplot2", dependencies=TRUE)

remotes::install_github("USEPA/TADA", dependencies=TRUE)
remotes::install_github("USEPA/TADA", ref = "censored_data_2_eh",dependencies=TRUE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the ref included/needed in the install?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so pesky! When I add new functions, GitHub will throw an error when trying to use the develop branch to build the vignette, so I have to temporarily tell it to use my branch. Feel free to change it back to develop once those new functions are included in the develop branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, good to know! I can do that

@@ -669,7 +669,7 @@ that we want to remove the "Quality Control Sample-Field Replicate" and
field.

```{r}
TADAProfileClean14 <- dplyr::filter(TADAProfileClean13, !(ActivityTypeCode %in% c("Quality Control Sample-Field Replicate", "Quality Control Sample-Field Blank", "Quality Control Sample-Lab Duplicate", "Quality Control Sample-Equipment Blank")))
TADAProfileClean14 <- dplyr::filter(TADAProfileClean13, !(ActivityTypeCode %in% ActivityTypeCode[grepl("Quality",ActivityTypeCode)]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add line for users # this removes rows where any value in the ActivityTypeCode includes the string "quality" (not case sensitive?), These quality control samples are not included in the analysis. Are we now including this set as an automated step in autoclean, or just including it here? If the latter, we will need to remember to do this step separately in the shiny app as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't happen in autoclean. Would it be better to do it there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for now. It may be better as a flag that can then be used to inform use of that data in the future.

@@ -682,7 +682,7 @@ ResultStatusIdentifier field.
FilterFieldReview("ActivityMediaSubdivisionName", TADAProfileClean14)
```

The ActivityMediaSubdivisionName field has two unique values, "Surface
[Not true of current test dataset] The ActivityMediaSubdivisionName field has two unique values, "Surface
Copy link
Collaborator

@cristinamullin cristinamullin Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to remove the specifics here (and elsewhere) and be more general so that it does not need updating each time we change the example data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great. It's hard to keep this current when we occasionally change the test dataset!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can work on that in the future

Copy link
Collaborator

@cristinamullin cristinamullin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehinman Approved, but added a few questions and suggestions

@cristinamullin cristinamullin merged commit 0f3178d into develop Mar 9, 2023
@cristinamullin cristinamullin deleted the censored_data_2_eh branch March 9, 2023 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants