diff --git a/06_StatisticalInference/homework/hw1.Rmd b/06_StatisticalInference/homework/hw1.Rmd index 9307c269..5deb6ab6 100644 --- a/06_StatisticalInference/homework/hw1.Rmd +++ b/06_StatisticalInference/homework/hw1.Rmd @@ -1,6 +1,6 @@ --- title : Homework 1 for Stat Inference -subtitle : Extra problems for Stat Inference +subtitle : (Use arrow keys to navigate) author : Brian Caffo job : Johns Hopkins Bloomberg School of Public Health framework : io2012 diff --git a/06_StatisticalInference/homework/hw1.html b/06_StatisticalInference/homework/hw1.html index c4f273c1..868bc0c7 100644 --- a/06_StatisticalInference/homework/hw1.html +++ b/06_StatisticalInference/homework/hw1.html @@ -34,7 +34,7 @@

Homework 1 for Stat Inference

-

Extra problems for Stat Inference

+

(Use arrow keys to navigate)

Brian Caffo
Johns Hopkins Bloomberg School of Public Health

@@ -169,7 +169,7 @@

About these slides

-

A random variable takes the value -4 with probabability .2 and 1 with proabability .8. What +

A random variable takes the value -4 with probability .2 and 1 with probability .8. What is the variance of this random variable?

    @@ -333,7 +333,7 @@

    About these slides

    -

    The variance is \(E[X^2] - (E[X])^2\)

    +

    The variance is \(E[X^2] - E[X^2]\)

    @@ -447,4 +447,4 @@

    About these slides

    - + \ No newline at end of file diff --git a/06_StatisticalInference/homework/hw1.md b/06_StatisticalInference/homework/hw1.md index a7712c14..0955f1f6 100644 --- a/06_StatisticalInference/homework/hw1.md +++ b/06_StatisticalInference/homework/hw1.md @@ -1,6 +1,6 @@ --- title : Homework 1 for Stat Inference -subtitle : Extra problems for Stat Inference +subtitle : (Use arrow keys to navigate) author : Brian Caffo job : Johns Hopkins Bloomberg School of Public Health framework : io2012 diff --git a/06_StatisticalInference/homework/hw2.Rmd b/06_StatisticalInference/homework/hw2.Rmd index f6342810..429640d0 100644 --- a/06_StatisticalInference/homework/hw2.Rmd +++ b/06_StatisticalInference/homework/hw2.Rmd @@ -1,6 +1,6 @@ --- title : Homework 2 for Stat Inference -subtitle : Extra problems for Stat Inference +subtitle : (Use the arrow keys to navigate) author : Brian Caffo job : Johns Hopkins Bloomberg School of Public Health framework : io2012 diff --git a/06_StatisticalInference/homework/hw2.html b/06_StatisticalInference/homework/hw2.html index 25107d92..4ec6f697 100644 --- a/06_StatisticalInference/homework/hw2.html +++ b/06_StatisticalInference/homework/hw2.html @@ -34,7 +34,7 @@

    Homework 2 for Stat Inference

    -

    Extra problems for Stat Inference

    +

    (Use the arrow keys to navigate)

    Brian Caffo
    Johns Hopkins Bloomberg School of Public Health

    diff --git a/06_StatisticalInference/homework/hw2.md b/06_StatisticalInference/homework/hw2.md index 0a658978..8e99067a 100644 --- a/06_StatisticalInference/homework/hw2.md +++ b/06_StatisticalInference/homework/hw2.md @@ -1,6 +1,6 @@ --- title : Homework 2 for Stat Inference -subtitle : Extra problems for Stat Inference +subtitle : (Use the arrow keys to navigate) author : Brian Caffo job : Johns Hopkins Bloomberg School of Public Health framework : io2012 diff --git a/06_StatisticalInference/homework/hw3.Rmd b/06_StatisticalInference/homework/hw3.Rmd index a205e5f3..05713442 100644 --- a/06_StatisticalInference/homework/hw3.Rmd +++ b/06_StatisticalInference/homework/hw3.Rmd @@ -1,6 +1,6 @@ --- title : Homework 3 for Stat Inference -subtitle : Extra problems for Stat Inference +subtitle : (Use the arrow keys to navigate) author : Brian Caffo job : Johns Hopkins Bloomberg School of Public Health framework : io2012 @@ -61,7 +61,7 @@ round(t.test(mtcars$mpg)$conf.int) `r round(max(t.test(mtcars$mpg)$conf.int))` --- &multitext -Suppose that data of 9 paired differences has a standard error of $1$, what value would the average difference have to be to have the lower endpoint of a 95% +Suppose that standard deviation of 9 paired differences is $1$, what value would the average difference have to be so that the lower endpoint of a 95% students t confidence interval touch zero? 1. Give the number here to two decimal places @@ -79,7 +79,7 @@ round(qt(.975, df = 8) * 1 / 3, 2) --- &radio -An independent group Student's T interval is used over +An independent group Student's T interval is used instead of a paired T interval when: 1. The observations are paired between the groups. @@ -155,7 +155,7 @@ The interval was conducted subtracting 4 - 6 and was entirely above zero. --- &multitext Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was 3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. The study aims to answer whether the change in BMI over the four week period appear to differ between the treated and placebo groups. -What is the pooled variance estimate? (to 2 decimal places) +1. What is the pooled variance estimate? (to 2 decimal places) *** .hint diff --git a/06_StatisticalInference/homework/hw3.html b/06_StatisticalInference/homework/hw3.html index 6e54ea85..c3c22b78 100644 --- a/06_StatisticalInference/homework/hw3.html +++ b/06_StatisticalInference/homework/hw3.html @@ -1,476 +1,478 @@ - - - - Homework 3 for Stat Inference - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -

    Homework 3 for Stat Inference

    -

    Extra problems for Stat Inference

    -

    Brian Caffo
    Johns Hopkins Bloomberg School of Public Health

    -
    -
    -
    - - - - -
    -

    About these slides

    -
    -
    -
      -
    • These are some practice problems for Statistical Inference Quiz 3
    • -
    • They were created using slidify interactive which you will learn in -Creating Data Products
    • -
    • Please help improve this with pull requests here -(https://github.com/bcaffo/courses)
    • -
    - -
    - -
    - - -
    - -
    -

    Load the data set mtcars in the datasets R package. Calculate a -95% confidence interval to the nearest MPG.

    - -
      -
    1. What is the lower endpoint of the interval?
    2. -
    3. What is the upper endpoint of the interval?
    4. -
    - - - - - - -
    -

    Do library(datasets) and then data(mtcars) to get the data. -Consider t.test for calculations. You may have to install -the datasets package.

    - -
    -
    -
    library(datasets); data(mtcars)
    -round(t.test(mtcars$mpg)$conf.int)
    -
    - -
    [1] 18 22
    -attr(,"conf.level")
    -[1] 0.95
    -
    - -

    18 -22

    - -
    -
    -
    - -
    - - -
    - -
    -

    Suppose that data of 9 paired differences has a standard error of \(1\), what value would the average difference have to be to have the lower endpoint of a 95% -students t confidence interval touch zero?

    - -
      -
    1. Give the number here to two decimal places
    2. -
    - - - - - - -
    -

    The t interval is \(\bar x t_{.95, 8}\pm s /sqrt{n}\)

    - -
    -
    -

    0.62

    - -

    We want \(\bar x = t_{.95} s / sqrt{n}\)

    - -
    round(qt(.95, df = 8) * 1 / 3, 2)
    -
    - -
    [1] 0.62
    -
    - -
    -
    -
    - -
    - - -
    - -
    -

    An independent group Student's T interval is used over -a paired T interval when:

    - -
      -
    1. The observations are paired between the groups.
    2. -
    3. The observations between the groups are natually assumed to be statistically independent
    4. -
    5. As long as you do it correctly, either is fine.
    6. -
    7. More details are needed to answer this question
    8. -
    - - - - - - -
    -

    A paired interval is for paired observations.

    - -
    -
    -

    We can't pair them if the groups are independent of each other as well as independent within themselves.

    - -
    -
    -
    - -
    - - -
    - -
    -

    Consider the mtcars dataset. Construct a 95% T interval for MPG comparing -4 to 6 cylinder cars (subtracting in the order of 4 - 6) -assume a constant variance.

    - -
      -
    1. What is the lower endpoint of the interval to 1 decimal place?
    2. -
    3. What is the upper endpoint of the interval to 1 decimal place?
    4. -
    - - - - - - -
    -

    Use t.test with var.equal=TRUE

    - -
    -
    -
    m4 <- mtcars$mpg[mtcars$cyl == 4]
    -m6 <- mtcars$mpg[mtcars$cyl == 6]
    -#this does 4 - 6
    -confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int)
    -
    - -

    3.2 -10.7

    - -
    -
    -
    - -
    - - -
    - -
    -

    If someone put a gun to your head and said "Your confidence interval -must contain what it's estimating or I'll pull the trigger", what would -be the smart thing to do?

    - -
      -
    1. Make your interval as wide as possible
    2. -
    3. Make your interval as small as possible
    4. -
    5. Call the authorities
    6. -
    - - - - - - -
    -

    C'mon. You don't need a hint

    - -
    -
    -

    This is just an example of what happens to confidence intervals as you -increase the confidence level. You want to be quite sure in your interval (i.e. -have a large confidence level) and so you would increase the interval's width

    - -
    -
    -
    - -
    - - -
    - -
    -

    Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude?

    - -
      -
    1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG
    2. -
    3. The interval is above zero, suggesting 4 is better than 6 in the terms of MPG
    4. -
    5. The interval does not tell you anything about the hypothesis test; you have to do the test.
    6. -
    7. The interval contains 0 suggesting no difference.
    8. -
    - - - - - - -
    -

    Refer back to the problem, consider the implications of the interval being -larger than 0, double check the order in which things were subtracted and -make sure the results make sense in the context of the problem.

    - -
    -
    -

    The interval was conducted subtracting 4 - 6 and was entirely above zero.

    - -
    -
    -
    - -
    - - -
    - -
    -

    Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was 3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. The study aims to answer whether the change in BMI over the four week period appear to differ between the treated and placebo groups.

    - -

    What is the pooled variance estimate? (to 2 decimal places)

    - - - - - - -
    -

    The sample sizes are equal, so the pooled variance is the average of the -individual variances

    - -
    -
    -
    n1 <- n2 <- 9
    -x1 <- -3  ##treated
    -x2 <- 1  ##placebo
    -s1 <- 1.5  ##treated
    -s2 <- 1.8  ##placebo
    -spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2)
    -
    - -

    2.75

    - -
    -
    -
    - -
    - - -
    - -
    -

    For Binomial data the maximum likelihood estimate for the probability of -a success is

    - -
      -
    1. The proportion of successes
    2. -
    3. The proportion of failures
    4. -
    5. A shrunken version of the proportion of successes
    6. -
    7. A shrunken version of the proportion of failures
    8. -
    - - - - - - -
    -

    Look back at the notes about likelihood.

    - -
    -
    -

    The MLE for binomial data is always the proportion of successes.

    - -
    -
    -
    - -
    - - -
    - -
    -

    Bayesian inference requires

    - -
      -
    1. A type I error rate
    2. -
    3. Setting your confidence level
    4. -
    5. Assigning a prior probability distribution
    6. -
    7. Evaluating frequency error rates
    8. -
    - - - - - - -
    -

    All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior.

    - -
    -
    -
    - -
    - - -
    - - - - - - - - - - - - - - - - - - - - + + + + Homework 3 for Stat Inference + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    Homework 3 for Stat Inference

    +

    (Use the arrow keys to navigate)

    +

    Brian Caffo
    Johns Hopkins Bloomberg School of Public Health

    +
    +
    +
    + + + + +
    +

    About these slides

    +
    +
    +
      +
    • These are some practice problems for Statistical Inference Quiz 3
    • +
    • They were created using slidify interactive which you will learn in +Creating Data Products
    • +
    • Please help improve this with pull requests here +(https://github.com/bcaffo/courses)
    • +
    + +
    + +
    + + +
    + +
    +

    Load the data set mtcars in the datasets R package. Calculate a +95% confidence interval to the nearest MPG.

    + +
      +
    1. What is the lower endpoint of the interval?
    2. +
    3. What is the upper endpoint of the interval?
    4. +
    + + + + + + +
    +

    Do library(datasets) and then data(mtcars) to get the data. +Consider t.test for calculations. You may have to install +the datasets package.

    + +
    +
    +
    library(datasets); data(mtcars)
    +round(t.test(mtcars$mpg)$conf.int)
    +
    + +
    [1] 18 22
    +attr(,"conf.level")
    +[1] 0.95
    +
    + +

    18 +22

    + +
    +
    +
    + +
    + + +
    + +
    +

    Suppose that standard deviation of 9 paired differences is \(1\), what value would the average difference have to be so that the lower endpoint of a 95% +students t confidence interval touch zero?

    + +
      +
    1. Give the number here to two decimal places
    2. +
    + + + + + + +
    +

    The t interval is \(\bar x \pm t_{.975, 8} * s /\sqrt{n}\)

    + +
    +
    +

    0.77

    + +

    We want \(\bar x = t_{.975,8} * s / \sqrt{n}\)

    + +
    round(qt(.975, df = 8) * 1 / 3, 2)
    +
    + +
    [1] 0.77
    +
    + +
    +
    +
    + +
    + + +
    + +
    +

    An independent group Student's T interval is used instead of +a paired T interval when:

    + +
      +
    1. The observations are paired between the groups.
    2. +
    3. The observations between the groups are naturally assumed to be statistically independent
    4. +
    5. As long as you do it correctly, either is fine.
    6. +
    7. More details are needed to answer this question
    8. +
    + + + + + + +
    +

    A paired interval is for paired observations.

    + +
    +
    +

    We can't pair them if the groups are independent of each other as well as independent within themselves.

    + +
    +
    +
    + +
    + + +
    + +
    +

    Consider the mtcars dataset. Construct a 95% T interval for MPG comparing +4 to 6 cylinder cars (subtracting in the order of 4 - 6) +assume a constant variance.

    + +
      +
    1. What is the lower endpoint of the interval to 1 decimal place?
    2. +
    3. What is the upper endpoint of the interval to 1 decimal place?
    4. +
    + + + + + + +
    +

    Use t.test with var.equal=TRUE

    + +
    +
    +
    m4 <- mtcars$mpg[mtcars$cyl == 4]
    +m6 <- mtcars$mpg[mtcars$cyl == 6]
    +#this does 4 - 6
    +confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int)
    +
    + +

    3.2 +10.7

    + +
    +
    +
    + +
    + + +
    + +
    +

    If someone put a gun to your head and said "Your confidence interval +must contain what it's estimating or I'll pull the trigger", what would +be the smart thing to do?

    + +
      +
    1. Make your interval as wide as possible
    2. +
    3. Make your interval as small as possible
    4. +
    5. Call the authorities
    6. +
    + + + + + + +
    +

    C'mon. You don't need a hint

    + +
    +
    +

    This is just an example of what happens to confidence intervals as you +increase the confidence level. You want to be quite sure in your interval (i.e. +have a large confidence level) and so you would increase the interval's width

    + +
    +
    +
    + +
    + + +
    + +
    +

    Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude?

    + +
      +
    1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG
    2. +
    3. The interval is above zero, suggesting 4 is better than 6 in the terms of MPG
    4. +
    5. The interval does not tell you anything about the hypothesis test; you have to do the test.
    6. +
    7. The interval contains 0 suggesting no difference.
    8. +
    + + + + + + +
    +

    Refer back to the problem, consider the implications of the interval being +larger than 0, double check the order in which things were subtracted and +make sure the results make sense in the context of the problem.

    + +
    +
    +

    The interval was conducted subtracting 4 - 6 and was entirely above zero.

    + +
    +
    +
    + +
    + + +
    + +
    +

    Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was 3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. The study aims to answer whether the change in BMI over the four week period appear to differ between the treated and placebo groups.

    + +
      +
    1. What is the pooled variance estimate? (to 2 decimal places)
    2. +
    + + + + + + +
    +

    The sample sizes are equal, so the pooled variance is the average of the +individual variances

    + +
    +
    +
    n1 <- n2 <- 9
    +x1 <- -3  ##treated
    +x2 <- 1  ##placebo
    +s1 <- 1.5  ##treated
    +s2 <- 1.8  ##placebo
    +spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2)
    +
    + +

    2.75

    + +
    +
    +
    + +
    + + +
    + +
    +

    For Binomial data the maximum likelihood estimate for the probability of +a success is

    + +
      +
    1. The proportion of successes
    2. +
    3. The proportion of failures
    4. +
    5. A shrunken version of the proportion of successes
    6. +
    7. A shrunken version of the proportion of failures
    8. +
    + + + + + + +
    +

    Look back at the notes about likelihood.

    + +
    +
    +

    The MLE for binomial data is always the proportion of successes.

    + +
    +
    +
    + +
    + + +
    + +
    +

    Bayesian inference requires

    + +
      +
    1. A type I error rate
    2. +
    3. Setting your confidence level
    4. +
    5. Assigning a prior probability distribution
    6. +
    7. Evaluating frequency error rates
    8. +
    + + + + + + +
    +

    All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior.

    + +
    +
    +
    + +
    + + +
    + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/06_StatisticalInference/homework/hw3.md b/06_StatisticalInference/homework/hw3.md index 93859ed5..ba004b46 100644 --- a/06_StatisticalInference/homework/hw3.md +++ b/06_StatisticalInference/homework/hw3.md @@ -1,210 +1,210 @@ ---- -title : Homework 3 for Stat Inference -subtitle : Extra problems for Stat Inference -author : Brian Caffo -job : Johns Hopkins Bloomberg School of Public Health -framework : io2012 -highlighter : highlight.js -hitheme : tomorrow -#url: -# lib: ../../librariesNew #Remove new if using old slidify -# assets: ../../assets -widgets : [mathjax, quiz, bootstrap] -mode : selfcontained # {standalone, draft} ---- - - - -## About these slides -- These are some practice problems for Statistical Inference Quiz 3 -- They were created using slidify interactive which you will learn in -Creating Data Products -- Please help improve this with pull requests here -(https://github.com/bcaffo/courses) - - - ---- &multitext -Load the data set `mtcars` in the `datasets` R package. Calculate a -95% confidence interval to the nearest MPG. - -1. What is the lower endpoint of the interval? -2. What is the upper endpoint of the interval? - -*** .hint -Do `library(datasets)` and then `data(mtcars)` to get the data. -Consider `t.test` for calculations. You may have to install -the datasets package. - - -*** .explanation - -```r -library(datasets); data(mtcars) -round(t.test(mtcars$mpg)$conf.int) -``` - -``` -[1] 18 22 -attr(,"conf.level") -[1] 0.95 -``` - - -18 -22 - ---- &multitext -Suppose that data of 9 paired differences has a standard error of $1$, what value would the average difference have to be to have the lower endpoint of a 95% -students t confidence interval touch zero? - -1. Give the number here to two decimal places - -*** .hint -The t interval is $\bar x t_{.95, 8}\pm s /sqrt{n}$ - -*** .explanation -0.62 - -We want $\bar x = t_{.95} s / sqrt{n}$ - -```r -round(qt(.95, df = 8) * 1 / 3, 2) -``` - -``` -[1] 0.62 -``` - - - ---- &radio -An independent group Student's T interval is used over -a paired T interval when: - -1. The observations are paired between the groups. -2. _The observations between the groups are natually assumed to be statistically independent_ -3. As long as you do it correctly, either is fine. -4. More details are needed to answer this question - -*** .hint -A paired interval is for paired observations. - -*** .explanation -We can't pair them if the groups are independent of each other as well as independent within themselves. - - ---- &multitext -Consider the `mtcars` dataset. Construct a 95% T interval for MPG comparing -4 to 6 cylinder cars (subtracting in the order of 4 - 6) -assume a constant variance. - -1. What is the lower endpoint of the interval to 1 decimal place? -2. What is the upper endpoint of the interval to 1 decimal place? - -*** .hint -Use `t.test` with `var.equal=TRUE` - -*** .explanation - - -```r -m4 <- mtcars$mpg[mtcars$cyl == 4] -m6 <- mtcars$mpg[mtcars$cyl == 6] -#this does 4 - 6 -confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int) -``` - - -3.2 -10.7 - - ---- &radio -If someone put a gun to your head and said "Your confidence interval -must contain what it's estimating or I'll pull the trigger", what would -be the smart thing to do? - -1. _Make your interval as wide as possible_ -2. Make your interval as small as possible -3. Call the authorities - -*** .hint -C'mon. You don't need a hint - -*** .explanation -This is just an example of what happens to confidence intervals as you -increase the confidence level. You want to be quite sure in your interval (i.e. -have a large confidence level) and so you would increase the interval's width - ---- &radio - -Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude? - -1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG -2. _The interval is above zero, suggesting 4 is better than 6 in the terms of MPG_ -3. The interval does not tell you anything about the hypothesis test; you have to do the test. -4. The interval contains 0 suggesting no difference. - -*** .hint -Refer back to the problem, consider the implications of the interval being -larger than 0, double check the order in which things were subtracted and -make sure the results make sense in the context of the problem. - -*** .explanation -The interval was conducted subtracting 4 - 6 and was entirely above zero. - ---- &multitext -Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was 3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. The study aims to answer whether the change in BMI over the four week period appear to differ between the treated and placebo groups. - -What is the pooled variance estimate? (to 2 decimal places) - - -*** .hint -The sample sizes are equal, so the pooled variance is the average of the -individual variances - - -*** .explanation - -```r -n1 <- n2 <- 9 -x1 <- -3 ##treated -x2 <- 1 ##placebo -s1 <- 1.5 ##treated -s2 <- 1.8 ##placebo -spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2) -``` - -2.75 - - ---- &radio - -For Binomial data the maximum likelihood estimate for the probability of -a success is - -1. _The proportion of successes_ -2. The proportion of failures -3. A shrunken version of the proportion of successes -4. A shrunken version of the proportion of failures - -*** .hint -Look back at the notes about likelihood. - -*** .explanation -The MLE for binomial data is always the proportion of successes. - ---- &radio - -Bayesian inference requires - -1. A type I error rate -2. Setting your confidence level -3. _Assigning a prior probability distribution_ -4. Evaluating frequency error rates - -*** .explanation -All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior. - - +--- +title : Homework 3 for Stat Inference +subtitle : (Use the arrow keys to navigate) +author : Brian Caffo +job : Johns Hopkins Bloomberg School of Public Health +framework : io2012 +highlighter : highlight.js +hitheme : tomorrow +#url: +# lib: ../../librariesNew #Remove new if using old slidify +# assets: ../../assets +widgets : [mathjax, quiz, bootstrap] +mode : selfcontained # {standalone, draft} +--- + + + +## About these slides +- These are some practice problems for Statistical Inference Quiz 3 +- They were created using slidify interactive which you will learn in +Creating Data Products +- Please help improve this with pull requests here +(https://github.com/bcaffo/courses) + + + +--- &multitext +Load the data set `mtcars` in the `datasets` R package. Calculate a +95% confidence interval to the nearest MPG. + +1. What is the lower endpoint of the interval? +2. What is the upper endpoint of the interval? + +*** .hint +Do `library(datasets)` and then `data(mtcars)` to get the data. +Consider `t.test` for calculations. You may have to install +the datasets package. + + +*** .explanation + +```r +library(datasets); data(mtcars) +round(t.test(mtcars$mpg)$conf.int) +``` + +``` +[1] 18 22 +attr(,"conf.level") +[1] 0.95 +``` + + +18 +22 + +--- &multitext +Suppose that standard deviation of 9 paired differences is $1$, what value would the average difference have to be so that the lower endpoint of a 95% +students t confidence interval touch zero? + +1. Give the number here to two decimal places + +*** .hint +The t interval is $\bar x \pm t_{.975, 8} * s /\sqrt{n}$ + +*** .explanation +0.77 + +We want $\bar x = t_{.975,8} * s / \sqrt{n}$ + +```r +round(qt(.975, df = 8) * 1 / 3, 2) +``` + +``` +[1] 0.77 +``` + + + +--- &radio +An independent group Student's T interval is used instead of +a paired T interval when: + +1. The observations are paired between the groups. +2. _The observations between the groups are naturally assumed to be statistically independent_ +3. As long as you do it correctly, either is fine. +4. More details are needed to answer this question + +*** .hint +A paired interval is for paired observations. + +*** .explanation +We can't pair them if the groups are independent of each other as well as independent within themselves. + + +--- &multitext +Consider the `mtcars` dataset. Construct a 95% T interval for MPG comparing +4 to 6 cylinder cars (subtracting in the order of 4 - 6) +assume a constant variance. + +1. What is the lower endpoint of the interval to 1 decimal place? +2. What is the upper endpoint of the interval to 1 decimal place? + +*** .hint +Use `t.test` with `var.equal=TRUE` + +*** .explanation + + +```r +m4 <- mtcars$mpg[mtcars$cyl == 4] +m6 <- mtcars$mpg[mtcars$cyl == 6] +#this does 4 - 6 +confint <- as.vector(t.test(m4, m6, var.equal = TRUE)$conf.int) +``` + + +3.2 +10.7 + + +--- &radio +If someone put a gun to your head and said "Your confidence interval +must contain what it's estimating or I'll pull the trigger", what would +be the smart thing to do? + +1. _Make your interval as wide as possible_ +2. Make your interval as small as possible +3. Call the authorities + +*** .hint +C'mon. You don't need a hint + +*** .explanation +This is just an example of what happens to confidence intervals as you +increase the confidence level. You want to be quite sure in your interval (i.e. +have a large confidence level) and so you would increase the interval's width + +--- &radio + +Refer back to comparing MPG for 4 versus 6 cylinders. What do you conclude? + +1. The interval is above zero, suggesting 6 is better than 4 in the terms of MPG +2. _The interval is above zero, suggesting 4 is better than 6 in the terms of MPG_ +3. The interval does not tell you anything about the hypothesis test; you have to do the test. +4. The interval contains 0 suggesting no difference. + +*** .hint +Refer back to the problem, consider the implications of the interval being +larger than 0, double check the order in which things were subtracted and +make sure the results make sense in the context of the problem. + +*** .explanation +The interval was conducted subtracting 4 - 6 and was entirely above zero. + +--- &multitext +Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects' body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was 3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. The study aims to answer whether the change in BMI over the four week period appear to differ between the treated and placebo groups. + +1. What is the pooled variance estimate? (to 2 decimal places) + + +*** .hint +The sample sizes are equal, so the pooled variance is the average of the +individual variances + + +*** .explanation + +```r +n1 <- n2 <- 9 +x1 <- -3 ##treated +x2 <- 1 ##placebo +s1 <- 1.5 ##treated +s2 <- 1.8 ##placebo +spsq <- ( (n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2) +``` + +2.75 + + +--- &radio + +For Binomial data the maximum likelihood estimate for the probability of +a success is + +1. _The proportion of successes_ +2. The proportion of failures +3. A shrunken version of the proportion of successes +4. A shrunken version of the proportion of failures + +*** .hint +Look back at the notes about likelihood. + +*** .explanation +The MLE for binomial data is always the proportion of successes. + +--- &radio + +Bayesian inference requires + +1. A type I error rate +2. Setting your confidence level +3. _Assigning a prior probability distribution_ +4. Evaluating frequency error rates + +*** .explanation +All of the other answers discuss frequentist concepts. All Bayesian analyses requiring setting a prior. + + diff --git a/06_StatisticalInference/homework/hw4.Rmd b/06_StatisticalInference/homework/hw4.Rmd index b8e628fe..65a3b6db 100644 --- a/06_StatisticalInference/homework/hw4.Rmd +++ b/06_StatisticalInference/homework/hw4.Rmd @@ -1,6 +1,6 @@ --- title : Homework 4 for Stat Inference -subtitle : Extra problems for Stat Inference +subtitle : (Use arrow keys to navigate) author : Brian Caffo job : Johns Hopkins Bloomberg School of Public Health framework : io2012 @@ -38,14 +38,14 @@ Creating Data Products --- &multitext -Load the data set `mtcars` in the `datasets` R package. You want -to test whether the MPG is $\mu_0$ or smaller using a one sided +Load the data set `mtcars` in the `datasets` R package. Assume that the data set mtcars is a random sample. Compute the mean MPG, $\bar x,$ of this sample. + +You want +to test whether the true MPG is $\mu_0$ or smaller using a one sided 5% level test. ($H_0 : \mu = \mu_0$ versus $H_a : \mu < \mu_0$). Using that data set and a Z test: -1. what is the smallest value of $\mu_0$ that you would reject for? - -Both to two decimal places. +1. . Based on the mean MPG of the sample $\bar x,$ and by using a Z test: what is the smallest value of $\mu_0$ that you would reject for (to two decimal places)? *** .hint This is the inversion of a one sided hypothesis test. It yields confidence @@ -57,10 +57,12 @@ We want to solve $$ \frac{\sqrt{n}(\bar{X} - \mu_0)}{s} = Z_{0.05} $$ -Or $$\mu_0 = \bar{X} - Z_{0.05} s / \sqrt{n} = \bar{X} + Z_{0.95} s / \sqrt{n}$$. Note that the quantile is negative. +Or $$\mu_0 = \bar{X} - Z_{0.05} s / \sqrt{n} = \bar{X} + Z_{0.95} s / \sqrt{n}$$ Note that the quantile is negative. ```{r} -mn <- mean(mtcars$mpg); s <- sd(mtcars$mpg); z <- qnorm(.05) +mn <- mean(mtcars$mpg) +s <- sd(mtcars$mpg) +z <- qnorm(.05) mu0 <- mn - z * s / sqrt(nrow(mtcars)) ``` Note, it's easy to get tripped up in this problem on signs. If you get a value @@ -170,12 +172,12 @@ setting. --- &multitext Suppose that in an AB test, one advertising scheme led to an average of 10 purchases per day for a sample of 100 days, while the other led to 11 purchaces per day, also for a sample of 100 days. -Assuming a common standard deviation of 4 purchaces per day. +Assuming a common standard deviation of 4 purchases per day. Assuming that the groups are independent and that they days are iid, perform a Z test of equivalence. 1. What is the P-value reported to 3 digits expressed as a proportion? -2. Do you reject the test? (O for no 1 for yes). +2. Do you reject the test? (0 for no 1 for yes). *** .hint The standard error is @@ -300,8 +302,7 @@ innocent. Relate this property back to hypothesis tests. --- &multitext Consider the `mtcars` data set. -1. Give the p-value for a t-test for assuming -constant variance comparing MPG for 6 and 8 cylinder cars as a proportion to 3 decimal places. +1. Give the p-value for a t-test comparing MPG for 6 and 8 cylinder cars assuming equal variance, as a proportion to 3 decimal places. 2. Give the associated P-value for a z test. 3. Give the common standard deviation estimate for MPG across cylinders to 3 decimal places. 4. Would the t test reject at the two sided 0.05 level (0 for no 1 for yes)? diff --git a/06_StatisticalInference/homework/hw4.html b/06_StatisticalInference/homework/hw4.html index f82df19b..0bf8ce41 100644 --- a/06_StatisticalInference/homework/hw4.html +++ b/06_StatisticalInference/homework/hw4.html @@ -34,7 +34,7 @@

    Homework 4 for Stat Inference

    -

    Extra problems for Stat Inference

    +

    (Use arrow keys to navigate)

    Brian Caffo
    Johns Hopkins Bloomberg School of Public Health

    @@ -63,17 +63,17 @@

    About these slides

    -

    Load the data set mtcars in the datasets R package. You want -to test whether the MPG is \(\mu_0\) or smaller using a one sided +

    Load the data set mtcars in the datasets R package. Assume that the data set mtcars is a random sample. Compute the mean MPG, \(\bar x,\) of this sample.

    + +

    You want +to test whether the true MPG is \(\mu_0\) or smaller using a one sided 5% level test. (\(H_0 : \mu = \mu_0\) versus \(H_a : \mu < \mu_0\)). Using that data set and a Z test:

      -
    1. what is the smallest value of \(\mu_0\) that you would reject for?
    2. +
    3. Based on the mean MPG of the sample \(\bar x,\) and by using a Z test: what is the smallest value of \(\mu_0\) that you would reject for (to two decimal places)?
    -

    Both to two decimal places.

    - @@ -90,9 +90,11 @@

    About these slides

    \[ \frac{\sqrt{n}(\bar{X} - \mu_0)}{s} = Z_{0.05} \] -Or \[\mu_0 = \bar{X} - Z_{0.05} s / \sqrt{n} = \bar{X} + Z_{0.95} s / \sqrt{n}\]. Note that the quantile is negative.

    +Or \[\mu_0 = \bar{X} - Z_{0.05} s / \sqrt{n} = \bar{X} + Z_{0.95} s / \sqrt{n}\] Note that the quantile is negative.

    -
    mn <- mean(mtcars$mpg); s <- sd(mtcars$mpg); z <- qnorm(.05)
    +
    mn <- mean(mtcars$mpg)
    +s <- sd(mtcars$mpg)
    +z <- qnorm(.05)
     mu0 <- mn - z * s / sqrt(nrow(mtcars))
     
    @@ -273,13 +275,13 @@

    About these slides

    Suppose that in an AB test, one advertising scheme led to an average of 10 purchases per day for a sample of 100 days, while the other led to 11 purchaces per day, also for a sample of 100 days. -Assuming a common standard deviation of 4 purchaces per day. +Assuming a common standard deviation of 4 purchases per day. Assuming that the groups are independent and that they days are iid, perform a Z test of equivalence.

    1. What is the P-value reported to 3 digits expressed as a proportion?
    2. -
    3. Do you reject the test? (O for no 1 for yes).
    4. +
    5. Do you reject the test? (0 for no 1 for yes).
    @@ -478,8 +480,7 @@

    About these slides

    Consider the mtcars data set.

      -
    1. Give the p-value for a t-test for assuming -constant variance comparing MPG for 6 and 8 cylinder cars as a proportion to 3 decimal places.
    2. +
    3. Give the p-value for a t-test comparing MPG for 6 and 8 cylinder cars assuming equal variance, as a proportion to 3 decimal places.
    4. Give the associated P-value for a z test.
    5. Give the common standard deviation estimate for MPG across cylinders to 3 decimal places.
    6. Would the t test reject at the two sided 0.05 level (0 for no 1 for yes)?
    7. diff --git a/06_StatisticalInference/homework/hw4.md b/06_StatisticalInference/homework/hw4.md index 2bb60c10..74d18d90 100644 --- a/06_StatisticalInference/homework/hw4.md +++ b/06_StatisticalInference/homework/hw4.md @@ -1,6 +1,6 @@ --- title : Homework 4 for Stat Inference -subtitle : Extra problems for Stat Inference +subtitle : (Use arrow keys to navigate) author : Brian Caffo job : Johns Hopkins Bloomberg School of Public Health framework : io2012 @@ -24,14 +24,14 @@ Creating Data Products --- &multitext -Load the data set `mtcars` in the `datasets` R package. You want -to test whether the MPG is $\mu_0$ or smaller using a one sided +Load the data set `mtcars` in the `datasets` R package. Assume that the data set mtcars is a random sample. Compute the mean MPG, $\bar x,$ of this sample. + +You want +to test whether the true MPG is $\mu_0$ or smaller using a one sided 5% level test. ($H_0 : \mu = \mu_0$ versus $H_a : \mu < \mu_0$). Using that data set and a Z test: -1. what is the smallest value of $\mu_0$ that you would reject for? - -Both to two decimal places. +1. . Based on the mean MPG of the sample $\bar x,$ and by using a Z test: what is the smallest value of $\mu_0$ that you would reject for (to two decimal places)? *** .hint This is the inversion of a one sided hypothesis test. It yields confidence @@ -43,11 +43,13 @@ We want to solve $$ \frac{\sqrt{n}(\bar{X} - \mu_0)}{s} = Z_{0.05} $$ -Or $$\mu_0 = \bar{X} - Z_{0.05} s / \sqrt{n} = \bar{X} + Z_{0.95} s / \sqrt{n}$$. Note that the quantile is negative. +Or $$\mu_0 = \bar{X} - Z_{0.05} s / \sqrt{n} = \bar{X} + Z_{0.95} s / \sqrt{n}$$ Note that the quantile is negative. ```r -mn <- mean(mtcars$mpg); s <- sd(mtcars$mpg); z <- qnorm(.05) +mn <- mean(mtcars$mpg) +s <- sd(mtcars$mpg) +z <- qnorm(.05) mu0 <- mn - z * s / sqrt(nrow(mtcars)) ``` @@ -170,12 +172,12 @@ setting. --- &multitext Suppose that in an AB test, one advertising scheme led to an average of 10 purchases per day for a sample of 100 days, while the other led to 11 purchaces per day, also for a sample of 100 days. -Assuming a common standard deviation of 4 purchaces per day. +Assuming a common standard deviation of 4 purchases per day. Assuming that the groups are independent and that they days are iid, perform a Z test of equivalence. 1. What is the P-value reported to 3 digits expressed as a proportion? -2. Do you reject the test? (O for no 1 for yes). +2. Do you reject the test? (0 for no 1 for yes). *** .hint The standard error is @@ -306,8 +308,7 @@ innocent. Relate this property back to hypothesis tests. --- &multitext Consider the `mtcars` data set. -1. Give the p-value for a t-test for assuming -constant variance comparing MPG for 6 and 8 cylinder cars as a proportion to 3 decimal places. +1. Give the p-value for a t-test comparing MPG for 6 and 8 cylinder cars assuming equal variance, as a proportion to 3 decimal places. 2. Give the associated P-value for a z test. 3. Give the common standard deviation estimate for MPG across cylinders to 3 decimal places. 4. Would the t test reject at the two sided 0.05 level (0 for no 1 for yes)? diff --git a/07_RegressionModels/pdfs/Binder1.pdf b/07_RegressionModels/pdfs/Binder1.pdf new file mode 100644 index 00000000..aa3b50e7 Binary files /dev/null and b/07_RegressionModels/pdfs/Binder1.pdf differ diff --git a/09_DevelopingDataProducts/plotly/courseraData.rda b/09_DevelopingDataProducts/plotly/courseraData.rda new file mode 100644 index 00000000..111e3ec6 Binary files /dev/null and b/09_DevelopingDataProducts/plotly/courseraData.rda differ diff --git a/09_DevelopingDataProducts/plotly/plotly.R b/09_DevelopingDataProducts/plotly/plotly.R new file mode 100644 index 00000000..f26647a3 --- /dev/null +++ b/09_DevelopingDataProducts/plotly/plotly.R @@ -0,0 +1,28 @@ +## An analysis of the coursera johns hopkins data (from a few months back) +## Used to illustrate plotly and ggplot +## +## Brian Caffo 7/10/2014 + + +load("courseraData.rda") + + +## Make sure that you've followed the first few set up steps +## https://plot.ly/ggplot2/getting-started/ +## Particularly set_credentials_file(username=FILL IN, api_key=FILL IN) +library(plotly) + + +library(ggplot2) +## First do a bar plot in ggplot +g <- ggplot(myData, aes(y = enrollment, x = class, fill = offering)) +g <- g + geom_bar(stat = "identity") +g + +## Let's try to get it into plot.ly +py <- plotly() +out <- py$ggplotly(g) +out$response$url + + +