-
Notifications
You must be signed in to change notification settings - Fork 177
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* cars04 data added * added life_exp data and adjusted documentation for cars04 * comics data added * data cleaning update for comics * nyc dataset added * iowa dataset added * adjusted iowa documentation, added iran data * manhattan data added * gss_wordsum_class added * twins data added * LAhomes data added * partial movies data set * movies data set complete * ucb_admit data added * updated news.md * fixed documentation for life_exp
- Loading branch information
Showing
65 changed files
with
32,564 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#' LAhomes | ||
#' | ||
#' Data collected by Andrew Bray at Reed College on characteristics of LA Homes in 2010. | ||
#' | ||
#' @name LAhomes | ||
#' @docType data | ||
#' @format A data frame with 1594 observations on the following 8 variables. | ||
#' \describe{ | ||
#' \item{city}{City where the home is located.} | ||
#' \item{type}{Type of home with levels `Condo/Twh` - condo or townhouse, `SFR` - single family residence, and `NA`} | ||
#' \item{bed}{Number of bedrooms in the home.} | ||
#' \item{bath}{Number of bathrooms in the home.} | ||
#' \item{garage}{Number of cars that can be parked in the garage. Note that a value of `4` refers to 4 or more garage spaces.} | ||
#' \item{sqft}{Squarefootage of the home.} | ||
#' \item{pool}{Indicates if the home has a pool.} | ||
#' \item{price}{Listing price of the home.} | ||
#' } | ||
#' @keywords datasets | ||
#' @examples | ||
#' | ||
#' library(ggplot2) | ||
#' | ||
#' ggplot(LAhomes, aes(sqft, price)) + | ||
#' geom_point(alpha = 0.2) + | ||
#' theme_minimal() + | ||
#' labs( | ||
#' title = "Can we predict list price from squarefootage?", | ||
#' subtitle = "Homes in the Los Angeles area", | ||
#' x = "Square feet", | ||
#' y = "List price" | ||
#' ) | ||
|
||
"LAhomes" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#' cars04 | ||
#' | ||
#' A data frame with 428 rows and 19 columns. This is a record of characteristics on all of the new models of cars for sale in the US in the year 2004. | ||
#' | ||
#' | ||
#' @name cars04 | ||
#' @docType data | ||
#' @format A data frame with 428 observations on the following 19 variables. | ||
#' \describe{ | ||
#' \item{name}{The name of the vehicle including manufacturer and model.} | ||
#' \item{sports_car}{Logical variable indicating if the vehicle is a sports car.} | ||
#' \item{suv}{Logical variable indicating if the vehicle is an suv.} | ||
#' \item{wagon}{Logical variable indicating if the vehicle is a wagon.} | ||
#' \item{minivan}{Logical variable indicating if the vehicle is a minivan.} | ||
#' \item{pickup}{Logical variable indicating if the vehicle is a pickup.} | ||
#' \item{all_wheel}{Logical variable indicating if the vehicle is all-wheel drive.} | ||
#' \item{rear_wheel}{Logical variable indicating if the vehicle is rear-wheel drive.} | ||
#' \item{msrp}{Manufacturer suggested retail price of the vehicle.} | ||
#' \item{dealer_cost}{Amount of money the dealer paid for the vehicle.} | ||
#' \item{eng_size}{Displacement of the engine - the total volume of all the cylinders, measured in liters.} | ||
#' \item{ncyl}{Number of cylinders in the engine.} | ||
#' \item{horsepwr}{Amount of horsepower produced by the engine.} | ||
#' \item{city_mpg}{Gas mileage for city driving, measured in miles per gallon.} | ||
#' \item{hwy_mpg}{Gas mileage for highway driving, measured in miles per gallon.} | ||
#' \item{weight}{Total weight of the vehicle, measured in pounds.} | ||
#' \item{wheel_base}{Distance between the center of the front wheels and the center of the rear wheels, measured in inches.} | ||
#' \item{length}{Total length of the vehicle, measured in inches.} | ||
#' \item{width}{Total width of the vehicle, measured in inches.} | ||
#' } | ||
#' @keywords datasets | ||
#' @examples | ||
#' | ||
#' library(ggplot2) | ||
#' | ||
#' # Highway gas mileage | ||
#' ggplot(cars04, aes(x = hwy_mpg)) + | ||
#' geom_histogram(bins = 15, color = "white", | ||
#' fill = openintro::IMSCOL["green", "full"]) + | ||
#' theme_minimal() + | ||
#' labs( | ||
#' title = "Highway gas milage for cars from 2004", | ||
#' x = "Gas Mileage (miles per gallon)", | ||
#' y = "Number of cars") | ||
|
||
"cars04" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#' comics | ||
#' | ||
#' A data frame containing information about comic book characters from Marvel Comics and DC Comics. | ||
#' | ||
#' | ||
#' @name comics | ||
#' @docType data | ||
#' @format A data frame with 21821 observations on the following 11 variables. | ||
#' \describe{ | ||
#' \item{name}{Name of the character. May include: Real name, hero or villain name, alias(es) and/or which universe they live in (i.e. Earth-616 in Marvel's multiverse).} | ||
#' \item{id}{Status of the characters identity with levels `Secret`, `Publie`, `No Dual` and `Unknown`.} | ||
#' \item{align}{Character's alignment with levels `Good`, `Bad`, `Neutral` and `Reformed Criminals`.} | ||
#' \item{eye}{Character's eye color.} | ||
#' \item{hair}{Character's hair color.} | ||
#' \item{gender}{Character's gender.} | ||
#' \item{gsm}{Character's classification as a gender or sexual minority.} | ||
#' \item{alive}{Is the character dead or alive?} | ||
#' \item{appearances}{Number of comic boooks the character appears in.} | ||
#' \item{first_appear}{Date of publication for the comic book the character first appeared in.} | ||
#' \item{publisher}{Publisher of the comic with levels `Marvel` and `DC`.} | ||
#' } | ||
#' @keywords datasets | ||
#' @examples | ||
#' | ||
#' library(ggplot2) | ||
#' library(dplyr) | ||
#' | ||
#' # Good v Bad | ||
#' | ||
#' plot_data <- comics %>% | ||
#' filter(align == "Good" | align == "Bad") | ||
#' | ||
#' ggplot(plot_data, aes(x = align, fill = align)) + | ||
#' geom_bar() + | ||
#' facet_wrap(~publisher)+ | ||
#' scale_fill_manual(values = c(IMSCOL["red", "full"], IMSCOL["blue", "full"])) + | ||
#' theme_minimal() + | ||
#' labs( | ||
#' title = "Is there a balance of power", | ||
#' x = "", | ||
#' y = "Number of characters", | ||
#' fill = "" | ||
#' ) | ||
|
||
"comics" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
#' gss_wordsum_class | ||
#' | ||
#' A data frame containing data from the General Social Survey. | ||
#' | ||
#' @name gss_wordsum_class | ||
#' @docType data | ||
#' @format A data frame with 795 observations on the following 2 variables. | ||
#' \describe{ | ||
#' \item{wordsum}{A vocabulary score calculated based on a ten question vocabulary test, where a higher score means better vocabulary. Scores range from 1 to 10.} | ||
#' \item{class}{Self-identified social class has 4 levels: lower, working, middle, and upper class.} | ||
#' } | ||
#' @keywords datasets | ||
#' @examples | ||
#' | ||
#' library(dplyr) | ||
#' | ||
#' gss_wordsum_class %>% | ||
#' group_by(class) %>% | ||
#' summarize(mean_wordsum = mean(wordsum)) | ||
#' | ||
|
||
"gss_wordsum_class" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
#' iowa | ||
#' | ||
#' A data frame containing information about the 2016 US Presidential Election for the state of Iowa. | ||
#' | ||
#' @name iowa | ||
#' @docType data | ||
#' @format A data frame with 1386 observations on the following 5 variables. | ||
#' \describe{ | ||
#' \item{office}{The office that the candidates were running for.} | ||
#' \item{candidate}{President/Vice President pairs who were running for office.} | ||
#' \item{party}{Political part of the candidate.} | ||
#' \item{county}{County in Iowa where the votes were cast.} | ||
#' \item{votes}{Number of votes received by the candidate.} | ||
#' } | ||
#' @keywords datasets | ||
#' @examples | ||
#' | ||
#' library(ggplot2) | ||
#' library(dplyr) | ||
#' | ||
#' plot_data <- iowa %>% | ||
#' filter(candidate != "Total") %>% | ||
#' group_by(candidate) %>% | ||
#' summarize(total_votes = sum(votes) / 1000) | ||
#' | ||
#' ggplot(plot_data, aes(total_votes, candidate)) + | ||
#' geom_col() + | ||
#' theme_minimal() + | ||
#' labs( | ||
#' title = "2016 Presidential Election in Iowa", | ||
#' subtitle = "Popular vote", | ||
#' y = "", | ||
#' x = "Number of Votes (in thousands) | ||
#' " | ||
#' ) | ||
|
||
"iowa" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
#' iran | ||
#' | ||
#' A data frame containing information about the 2009 Presidential Election in Iran. There were widespread claims of election fraud in this election both internationally and within Iran. | ||
#' | ||
#' @name iran | ||
#' @docType data | ||
#' @format A data frame with 366 observations on the following 9 variables. | ||
#' \describe{ | ||
#' \item{province}{Iranian province where votes were cast.} | ||
#' \item{city}{City within province where votes were cast.} | ||
#' \item{ahmadinejad}{Number of votes received by Ahmadinejad.} | ||
#' \item{rezai}{Number of votes received by Rezai.} | ||
#' \item{karrubi}{Number of votes received by Karrubi.} | ||
#' \item{mousavi}{Number of votes received by Mousavi.} | ||
#' \item{total_votes_cast}{Total number of votes cast.} | ||
#' \item{voided_votes}{Number of votes that were not counted.} | ||
#' \item{legitimate_votes}{Number of votes that were counted.} | ||
#' } | ||
#' @keywords datasets | ||
#' @examples | ||
#' | ||
#' library(dplyr) | ||
#' library(ggplot2) | ||
#' library(tidyr) | ||
#' library(stringr) | ||
#' | ||
#' plot_data <- iran %>% | ||
#' summarize( | ||
#' ahmadinejad = sum(ahmadinejad) / 1000, | ||
#' rezai = sum(rezai) / 1000, | ||
#' karrubi = sum(karrubi) / 1000, | ||
#' mousavi = sum(mousavi) / 1000 | ||
#' ) %>% | ||
#' pivot_longer( | ||
#' cols = c(ahmadinejad, rezai, karrubi, mousavi), | ||
#' names_to = "candidate", | ||
#' values_to = "votes" | ||
#' ) %>% | ||
#' mutate(candidate = str_to_title(candidate)) | ||
#' | ||
#' ggplot(plot_data, aes(votes, candidate)) + | ||
#' geom_col() + | ||
#' theme_minimal() + | ||
#' labs( | ||
#' title = "2009 Iranian Presidential Election", | ||
#' x = "Number of votes (in thousands)", | ||
#' y = "" | ||
#' ) | ||
|
||
"iran" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
#' life_exp | ||
#' | ||
#' A data frame with 3142 rows and 4 columns. County level data for life expectancy and median income in the United States. | ||
#' | ||
#' | ||
#' @name life_exp | ||
#' @docType data | ||
#' @format A data frame with 3142 observations on the following 4 variables. | ||
#' \describe{ | ||
#' \item{state}{Name of the state.} | ||
#' \item{county}{Name of the county.} | ||
#' \item{expectancy}{Life expectancy in the county.} | ||
#' \item{income}{Median income in the county, measured in US $.} | ||
#' } | ||
#' @keywords datasets | ||
#' @examples | ||
#' | ||
#' library(ggplot2) | ||
#' | ||
#' # Income V Expectancy | ||
#' ggplot(life_exp, aes(x = income, y = expectancy)) + | ||
#' geom_point(color = openintro::IMSCOL["green", "full"], alpha = 0.2) + | ||
#' theme_minimal() + | ||
#' labs( | ||
#' title = "Is there a relationship between median income and life expectancy?", | ||
#' x = "Median income (US $)", | ||
#' y = "Life Expectancy (year)") | ||
|
||
"life_exp" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
#' manhattan | ||
#' | ||
#' A data frame containing data on apartment rentals in Manhattan. | ||
#' | ||
#' @name manhattan | ||
#' @docType data | ||
#' @format A data frame with 20 observations on the following 1 variable. | ||
#' \describe{ | ||
#' \item{rent}{Monthly rent for a 1 bedroom apartment listed as "For rent by owner".} | ||
#' } | ||
#' @keywords datasets | ||
#' @examples | ||
#' | ||
#' library(ggplot2) | ||
#' | ||
#' ggplot(manhattan, aes(rent)) + | ||
#' geom_histogram(color = "white", binwidth = 300) + | ||
#' theme_minimal() + | ||
#' labs( | ||
#' title = "Rent in Manhattan", | ||
#' subtitle = "1 Bedroom Apartments", | ||
#' x = "Rent (in US$)", | ||
#' caption = "Source: Craigslist" | ||
#' ) | ||
|
||
"manhattan" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
#' movies | ||
#' | ||
#' A data set with information about movies released in 2003. | ||
#' | ||
#' @name movies | ||
#' @docType data | ||
#' @format A data frame with 140 observations on the following 5 variables. | ||
#' \describe{ | ||
#' \item{movie}{Title of the movie.} | ||
#' \item{genre}{Genre of the movie.} | ||
#' \item{score}{Critics score of the movie on a 0 to 100 scale.} | ||
#' \item{rating}{MPAA rating of the film.} | ||
#' \item{box_office}{Millions of dollars earned at the box office in the US and Canada.} | ||
#' } | ||
#' @keywords datasets | ||
#' @source [Investigating Statistical Concepts, Applications and Methods](http://www.rossmanchance.com/iscam2/data/movies03.txt) | ||
#' @examples | ||
#' | ||
#' library(ggplot2) | ||
#' | ||
#' ggplot(movies, aes(score, box_office, color = genre)) + | ||
#' geom_point() + | ||
#' theme_minimal() + | ||
#' labs( | ||
#' title = "Does a critic score predict box office earnings?", | ||
#' x = "Critic rating", | ||
#' y = "Box office earnings (millions US$", | ||
#' color = "Genre" | ||
#' ) | ||
#' | ||
|
||
"movies" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
#' nyc | ||
#' | ||
#' Zagat is a public survey where anyone can provide scores to a restaurant. The scores from the general public are then gathered to produce ratings. This data set contains a list of 168 NYC restaurants and their Zagat Ratings. | ||
#' | ||
#'For each category the scales are as follows: | ||
#' | ||
#' 0 - 9: poor to fair | ||
#' 10 - 15: fair to good | ||
#' 16 - 19: good to very good | ||
#' 20 - 25: very good to excellent | ||
#' 25 - 30: extraordinary to perfection | ||
#' | ||
#' @name nyc | ||
#' @docType data | ||
#' @format A data frame with 168 observations on the following 6 variables. | ||
#' \describe{ | ||
#' \item{restaurant}{Name of the restaurant.} | ||
#' \item{price}{Price of a mean for two, with drinks, in US $.} | ||
#' \item{food}{Zagat rating for food.} | ||
#' \item{decor}{Zagat rating for decor.} | ||
#' \item{service}{Zagat rating for service.} | ||
#' \item{east}{Indicator variable for location of the restaurant. `0` = west of 5th Avenue, `1` = east of 5th Avenue} | ||
#' } | ||
#' @keywords datasets | ||
#' | ||
#' @examples | ||
#' library(dplyr) | ||
#' library(ggplot2) | ||
#' | ||
#' location_labs <- c("West", "East") | ||
#' names(location_labs) <- c(0, 1) | ||
#' | ||
#' ggplot(nyc, mapping = aes(x = price, group = east, fill = east)) + | ||
#' geom_boxplot(alpha = 0.5) + | ||
#' facet_grid(east ~ ., labeller = labeller(east = location_labs)) + | ||
#' labs( | ||
#' title = "Is food more expensive east of 5th Avenue?", | ||
#' x = "Price (US$)" | ||
#' ) + | ||
#' guides(fill = "none") + | ||
#' theme_minimal() + | ||
#' theme(axis.text.y = element_blank()) | ||
|
||
"nyc" |
Oops, something went wrong.