Skip to content

Commit

Permalink
Merge pull request #351 from tidymodels/one-sample-prop-z
Browse files Browse the repository at this point in the history
Implement the standardized proportion z statistic for one categorical variable
  • Loading branch information
echasnovski authored Dec 21, 2020
2 parents 426bff7 + 7278842 commit 18c723b
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 13 deletions.
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
more closely resembles `dplyr::slice_sample()` (the function that supersedes)
`dplyr::sample_n()` (#325)
- Added a `success` argument to `prop_test()` (#343)
- Implemented the standardized proportion $z$ statistic for one categorical variable

# infer 0.5.3

Expand Down
22 changes: 11 additions & 11 deletions R/calculate.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,10 @@ calculate <- function(x,
"implemented) for `stat` = \"{stat}\". Are you missing ",
"a `generate()` step?"
)
} else if (!(stat %in% c("Chisq", "prop", "count")) &
!(stat == "t" & (attr(x, "theory_type") == "One sample t"))) {
} else if (
!(stat %in% c("Chisq", "prop", "count")) &
!(stat %in% c("t", "z")
& (attr(x, "theory_type") %in% c("One sample t", "One sample prop z")))) {
# From `hypothesize()` to `calculate()`
# Catch-all if generate was not called
# warning_glue(
Expand Down Expand Up @@ -484,16 +486,14 @@ calc_impl.z <- function(type, x, order, ...) {

# When `hypothesize()` has been called
success <- attr(x, "success")

p0 <- attr(x, "params")[1]
num_rows <- nrow(x) / length(unique(x$replicate))

col <- attr(x, "response")
# if (is.null(success)) {
# success <- quo(get_par_levels(x)[1])
# }
# Error given instead

p0 <- unname(attr(x, "params")[1])
if (!is_generated(x)) {
num_rows <- nrow(x)
} else {
num_rows <- nrow(x) / length(unique(x$replicate))
}

df_out <- x %>%
dplyr::summarize(
stat = (
Expand Down
3 changes: 2 additions & 1 deletion R/hypothesize.R
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,8 @@ hypothesize <- function(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL)
}
}
)
append_infer_class(tibble::as_tibble(x))
res <- append_infer_class(tibble::as_tibble(x))
copy_attrs(res, x, "params")
}

is_hypothesized <- function(x){
Expand Down
14 changes: 14 additions & 0 deletions tests/testthat/test-calculate.R
Original file line number Diff line number Diff line change
Expand Up @@ -568,3 +568,17 @@ test_that("calc_impl.ratio_of_props works", {
tolerance = eps
)
})

test_that("calc_impl.z works for one sample proportions", {
infer_obs_stat <- gss %>%
specify(response = sex, success = "female") %>%
hypothesize(null = "point", p = .5) %>%
calculate(stat = "z") %>%
dplyr::pull()

base_obs_stat <-
(mean(gss$sex == "female") - .5) /
sqrt(.5^2 / nrow(gss))

expect_equal(infer_obs_stat, base_obs_stat, tolerance = eps)
})
35 changes: 34 additions & 1 deletion vignettes/observed_stat_examples.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,40 @@ null_distn <- gss %>%

### One categorical variable (standardized proportion $z$)

While the standardized proportion $z$ statistic has not yet been implemented in the randomization-based framework, the package supplies a wrapper around `prop.test` to allow for tests of a single proportion on tidy data.
Calculating the observed statistic,

```{r}
p_hat <- gss %>%
specify(response = sex, success = "female") %>%
hypothesize(null = "point", p = .5) %>%
calculate(stat = "z")
```

Then, generating the null distribution,

```{r}
null_distn <- gss %>%
specify(response = sex, success = "female") %>%
hypothesize(null = "point", p = .5) %>%
generate(reps = 1000, type = "simulate") %>%
calculate(stat = "z")
```

Visualizing the observed statistic alongside the null distribution,

```{r}
visualize(null_distn) +
shade_p_value(obs_stat = p_hat, direction = "two-sided")
```

Calculating the p-value from the null distribution and observed statistic,

```{r}
null_distn %>%
get_p_value(obs_stat = p_hat, direction = "two-sided")
```

The package also supplies a wrapper around `prop.test` for tests of a single proportion on tidy data.

```{r prop_test_1_grp}
prop_test(gss,
Expand Down

0 comments on commit 18c723b

Please sign in to comment.