Skip to content

Latest commit

 

History

History
87 lines (58 loc) · 3.11 KB

5-1-blue_men.md

File metadata and controls

87 lines (58 loc) · 3.11 KB

Think Stats Chapter 5 Exercise 1 (blue men)

Q4. Think Stats Chapter 5 Exercise 1 (normal distribution of blue men)

This is a classic example of hypothesis testing using the normal distribution. The effect size used here is the Z-statistic.

Exercise 1 In the BRFSS (see Section 5.4), the distribution of heights is roughly normal with parameters µ = 178 cm and σ = 7.7 cm for men, and µ = 163 cm and σ = 7.3 cm for women. In order to join Blue Man Group, you have to be male between 5’10” and 6’1” (see http://bluemancasting.com). What percentage of the U.S. male population is in this range? Hint: use scipy.stats.norm.cdf.

Find percentage U.S. male population between 5’10” and 6’1”.

"""


Table of Contents

1) Code
2) Results
3) Explanation 4) Glossary


#!/bin/env python3


import brfss
import scipy.stats
import sys


def FeetAndInchesToCentimeters(feet, inches):
    """
    INPUT: feet and inches
    OUTPUT: centimeters
    """
    cm_per_inch = (2.54/1.0)
    return (feet*12 + inches) * cm_per_inch


y1 = low_cm = FeetAndInchesToCentimeters(5, 10)
y2 = high_cm = FeetAndInchesToCentimeters(6, 1)

df = brfss.ReadBrfss()

column = 'htm3'
heights = df[df.sex == 1][column]
mean = heights.mean()
standard_deviation = heights.std()
cdf_at_y1, cdf_at_y2 = scipy.stats.norm.cdf([y1, y2], loc=mean, scale=standard_deviation)
probability_in_range = cdf_at_y2 - cdf_at_y1
print('Model from cummulative distribution function of normal distribution predicts {:.1f} percent males with height in range 5\'10" to 6\'1"'.format(100 * probability_in_range))

ThinkStats2/code/metis_q4_ch5_ex1.py 
Model from cummulative distribution function of normal distribution predicts 34.3 percent males with height in range 5'10" to 6'1"

We first converted the feet and inches to centimeters for the lower and upper end of the height range. We read in the sample data, selected males and chose the column labeled htm3 for heights. Then we calculated the mean and standard deviation and for each point and found the difference. The model showed 34.3% though counting shows 73,697 in range out of 155,703 or nearly 47 percent.


  1. empirical distribution: The distribution of values in a sample.
  2. analytic distribution: A distribution whose CDF is an analytic function.
  3. model: A useful simplification. Analytic distributions are often good models of more complex empirical distributions.
  4. interarrival time: The elapsed time between two events.
  5. complementary CDF: A function that maps from a value, x, to the fraction of values that exceed x, which is 1 − CDF(x).
  6. standard normal distribution: The normal distribution with mean 0 and standard deviation 1.
  7. normal probability plot: A plot of the values in a sample versus random values from a standard normal distribution.