Skip to content

Latest commit

 

History

History
174 lines (139 loc) · 10.1 KB

readme.md

File metadata and controls

174 lines (139 loc) · 10.1 KB

Coffee beans - Mae Mu @picoftasty

Coffee ratings

The data this week comes from Coffee Quality Database courtesy of Buzzfeed Data Scientist James LeDoux. The original data can be found on James' github. The data was re-posted to Kaggle.

"These data were collected from the Coffee Quality Institute's review pages in January 2018."

Thrillist has an article on the top coffee-producing countries.

Yorgos Askalidis analyzed this data as well.

There is data for both Arabica and Robusta beans, across many countries and professionally rated on a 0-100 scale. All sorts of scoring/ratings for things like acidity, sweetness, fragrance, balance, etc - may be useful for either separating into visualizations/categories or for modeling/recommenders.

Wikipedia on Coffee Beans:

The two most economically important varieties of coffee plant are the Arabica and the Robusta; ~60% of the coffee produced worldwide is Arabica and ~40% is Robusta. Arabica beans consist of 0.8–1.4% caffeine and Robusta beans consist of 1.7–4% caffeine.

Wiki on Cupping

Coffee cupping, or coffee tasting, is the practice of observing the tastes and aromas of brewed coffee. It is a professional practice but can be done informally by anyone or by professionals known as "Q Graders". A standard coffee cupping procedure involves deeply sniffing the coffee, then loudly slurping the coffee so it spreads to the back of the tongue. The coffee taster attempts to measure aspects of the coffee's taste, specifically the body (the texture or mouthfeel, such as oiliness), sweetness, acidity (a sharp and tangy feeling, like when biting into an orange), flavour (the characters in the cup), and aftertaste. Since coffee beans embody telltale flavours from the region where they were grown, cuppers may attempt to identify the coffee's origin.

Importantly - there is the concept of ethical or Fair Trade coffee - we'll be covering more of the production numbers of Coffee in a future dataset.

Based on the simple idea that the products bought and sold every day are connected to the livelihoods of others, fair trade is a way to make a conscious choice for a better world.

Fair Trade Coffee definition from Wikipedia:

Fair trade coffee is coffee that is certified as having been produced to fair trade standards by fair trade organizations, which create trading partnerships that are based on dialogue, transparency and respect, with the goal of achieving greater equity in international trade. These partnerships contribute to sustainable development by offering better trading conditions to coffee bean farmers. Fair trade organizations support producers and sustainable environmental farming practices and prohibit child labor or forced labor.

If you're looking to buy some coffee - check out this list of 12 Black-Owned Coffee Brands.

Get the data here

# Get the Data

# Read in with tidytuesdayR package 
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest

# Either ISO-8601 date or year/week works!

tuesdata <- tidytuesdayR::tt_load('2020-07-07')
tuesdata <- tidytuesdayR::tt_load(2020, week = 28)

coffee_ratings <- tuesdata$coffee_ratings

# Or read in the data manually

coffee_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-07-07/coffee_ratings.csv')

Data Dictionary

coffee_ratings.csv

Note full description/examples at: Coffee Quality Institute

variable class description
total_cup_points double Total rating/points (0 - 100 scale)
species character Species of coffee bean (arabica or robusta)
owner character Owner of the farm
country_of_origin character Where the bean came from
farm_name character Name of the farm
lot_number character Lot number of the beans tested
mill character Mill where the beans were processed
ico_number character International Coffee Organization number
company character Company name
altitude character Altitude - this is a messy column - I've left it for some cleaning
region character Region where bean came from
producer character Producer of the roasted bean
number_of_bags double Number of bags tested
bag_weight character Bag weight tested
in_country_partner character Partner for the country
harvest_year character When the beans were harvested (year)
grading_date character When the beans were graded
owner_1 character Who owns the beans
variety character Variety of the beans
processing_method character Method for processing
aroma double Aroma grade
flavor double Flavor grade
aftertaste double Aftertaste grade
acidity double Acidity grade
body double Body grade
balance double Balance grade
uniformity double Uniformity grade
clean_cup double Clean cup grade
sweetness double Sweetness grade
cupper_points double Cupper Points
moisture double Moisture Grade
category_one_defects double Category one defects (count)
quakers double quakers
color character Color of bean
category_two_defects double Category two defects (count)
expiration character Expiration date of the beans
certification_body character Who certified it
certification_address character Certification body address
certification_contact character Certification contact
unit_of_measurement character Unit of measurement
altitude_low_meters double Altitude low meters
altitude_high_meters double Altitude high meters
altitude_mean_meters double Altitude mean meters

Cleaning Script

library(tidyverse)

raw_arabica <- read_csv("https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv") %>% 
  janitor::clean_names()

raw_robusta <- read_csv("https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/robusta_data_cleaned.csv",
                        col_types = cols(
                          X1 = col_double(),
                          Species = col_character(),
                          Owner = col_character(),
                          Country.of.Origin = col_character(),
                          Farm.Name = col_character(),
                          Lot.Number = col_character(),
                          Mill = col_character(),
                          ICO.Number = col_character(),
                          Company = col_character(),
                          Altitude = col_character(),
                          Region = col_character(),
                          Producer = col_character(),
                          Number.of.Bags = col_double(),
                          Bag.Weight = col_character(),
                          In.Country.Partner = col_character(),
                          Harvest.Year = col_character(),
                          Grading.Date = col_character(),
                          Owner.1 = col_character(),
                          Variety = col_character(),
                          Processing.Method = col_character(),
                          Fragrance...Aroma = col_double(),
                          Flavor = col_double(),
                          Aftertaste = col_double(),
                          Salt...Acid = col_double(),
                          Balance = col_double(),
                          Uniform.Cup = col_double(),
                          Clean.Cup = col_double(),
                          Bitter...Sweet = col_double(),
                          Cupper.Points = col_double(),
                          Total.Cup.Points = col_double(),
                          Moisture = col_double(),
                          Category.One.Defects = col_double(),
                          Quakers = col_double(),
                          Color = col_character(),
                          Category.Two.Defects = col_double(),
                          Expiration = col_character(),
                          Certification.Body = col_character(),
                          Certification.Address = col_character(),
                          Certification.Contact = col_character(),
                          unit_of_measurement = col_character(),
                          altitude_low_meters = col_double(),
                          altitude_high_meters = col_double(),
                          altitude_mean_meters = col_double()
                        )) %>% 
  janitor::clean_names() %>% 
  rename(acidity = salt_acid, sweetness = bitter_sweet,
         aroma = fragrance_aroma, body = mouthfeel,uniformity = uniform_cup)


all_ratings <- bind_rows(raw_arabica, raw_robusta) %>% 
  select(-x1) %>% 
  select(total_cup_points, species, everything())

all_ratings %>% 
  skimr::skim()

all_ratings %>% 
  write_csv("2020/2020-07-07/coffee_ratings.csv")