The data this week comes from Eurostat.
H/t to Karim Douïeb who created a very nice graphic in Observable (D3), based off a Washington post article for US Energy. Their graphic can be found below.
Additional data can be found via the OECD.
There's also a nice report for 2017 form the EU Power sector. Lots of graphics inside.
Thermal power according to Wikipedia:
A thermal power station is a power station in which heat energy is converted to electric power. In most, a steam-driven turbine converts heat to mechanical power as an intermediate to electrical power. Water is heated, turns into steam and drives a steam turbine which drives an electrical generator.
Clean vs renewable vs fossil fuels by Peninsula Energy.
Clean energy is carbon-free energy that creates little to no greenhouse gas emissions. This is in contrast to fossil fuels, which produce a significant amount of greenhouse gas emissions, including carbon dioxide and methane. Renewable energy is energy that comes from resources that are naturally replenished such as sunlight, wind, water, and geothermal heat. Unlike fossil fuels, such as oil, natural gas and coal, which cannot be replaced, renewable energy regenerates naturally in a short period of time.
Overall, this dataset was an exploration of pulling data from an Excel-based data product. This was a bit messy to start but applying techniques to extract the data out of repeated tables and into a tidy format is possible programatically with a bit of logic + readxl
! Check out the cleaning script for the details.
# Get the Data
# Read in with tidytuesdayR package
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest
# Either ISO-8601 date or year/week works!
tuesdata <- tidytuesdayR::tt_load('2020-08-04')
tuesdata <- tidytuesdayR::tt_load(2020, week = 32)
energy_types <- tuesdata$energy_types
# Or read in the data manually
energy_types <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-08-04/energy_types.csv')
country_totals <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-08-04/country_totals.csv')
Limited to Level 1 or 2 production by type - either conventional thermal (fossil fuels), nuclear, hydro, wind, solar, geothermal, or other.
variable | class | description |
---|---|---|
country | character | Country ID |
country_name | character | Country name |
type | character | Type of energy production |
level | character | Level - either total, level 1 or level 2. Where level 2 makes up level 1 that makes up the total. |
2016 | double | Energy in GWh (Gigawatt hours) |
2017 | double | Energy in GWh (Gigawatt hours) |
2018 | double | Energy in GWh (Gigawatt hours) |
Limited to total net production, along with imports, exports, energy lost, and energy supplied (net + import - export - energy absorbed by pumping).
variable | class | description |
---|---|---|
country | character | Country ID |
country_name | character | Country name |
type | character | Type of energy production |
level | character | Level - either total, level 1 or level 2. Where level 2 makes up level 1 that makes up the total. |
2016 | double | Energy in GWh (Gigawatt hours) |
2017 | double | Energy in GWh (Gigawatt hours) |
2018 | double | Energy in GWh (Gigawatt hours) |
library(tidyverse)
library(readxl)
library(countrycode)
raw_code <- countrycode::codelist %>%
select(country_name = country.name.en, country = eurostat)
raw_excel <- read_excel("2020/2020-08-04/Electricity_generation_statistics_2019.xlsx", sheet = 3)
raw_excel %>%
filter(!is.na(...4)) %>%
mutate(country = str_remove_all(...4, "[:digit:]"), .before = ...1) %>%
mutate(country = if_else(
str_length(country) > 1, country, NA_character_),
country = str_extract(country, "[:alpha:]+")
) %>%
fill(country) %>%
select(-c(...1, ...2, ...14:...18))
row_stat <- read_excel("2020/2020-08-04/Electricity_generation_statistics_2019.xlsx",
sheet = 3,
range = "C48:C61", col_names = FALSE)[[1]][c(1,3:14)] %>%
str_remove("[:digit:]") %>%
str_remove("of which: ") %>%
str_remove("\\.") %>% str_trim()
country_range <- tibble(row_start = seq(from = 46, to = 454, by = 34),
row_end = seq(from = 61, to = 469, by = 34)) %>%
mutate(col1 = 4, col2 = col1 + 5, col3 = col2 + 5) %>%
pivot_longer(cols = col1:col3, names_to = "col_var", values_to = "col_start") %>%
mutate(col_end = col_start + 2) %>%
select(-col_var) %>%
slice(-n(), -(n()-1)) %>%
mutate(row_stat = list(row_stat))
get_country_stats <- function(row_start, row_end, col_start, col_end, row_stat){
# # pull the row_stat names
# row_stat <- row_stat
# create the range programatically
col_range <- glue::glue("{LETTERS[col_start]}{row_start}:{LETTERS[col_end]}{row_end}")
# read in the data section quietly
raw_data <- suppressMessages(
read_excel("2020/2020-08-04/Electricity_generation_statistics_2019.xlsx",
sheet = 3,
col_names = FALSE,
range = col_range))
country_data <- raw_data %>%
# set appropriate names
set_names(nm = c(2016:2018)) %>%
# drop the year ranges
filter(!is.na(`2016`), `2016` != "2016") %>%
# get the country into a column rather than a header
mutate(country = if_else(
is.na(`2017`),
`2016`,
NA_character_),
.before = `2016`) %>%
# fill country down
fill(country) %>%
# drop old country header
filter(!is.na(`2017`)) %>%
# add row stat in
mutate(type = row_stat,
.after = country,
# add levels of the stats
level = c("Total", "Level 1", "Level 1", "Level 1", "Level 2",
"Level 1", "Level 1", "Level 1", "Level 1", "Total",
"Total", "Total", "Total")) %>%
# format as double
mutate(across(c(`2016`:`2018`), as.double))
# return data
country_data
}
all_countries <- country_range %>%
pmap_dfr(get_country_stats) %>%
left_join(raw_code, by = "country") %>%
select(country, country_name, everything())
country_totals <- all_countries %>%
filter(level == "Total")
country_production <- all_countries %>%
filter(level != "Total")
# sanity check
country_totals %>%
# filter(type == "Total net production") %>%
pivot_longer(cols = `2016`:`2018`, names_to = "year", values_to = "value") %>%
filter(type == "Total net production") %>%
# count(type)%>%
ggplot(aes(y = value, x = year, color = country, group = country)) +
geom_line()
write_csv(country_totals, "2020/2020-08-04/country_totals.csv")
write_csv(country_production, "2020/2020-08-04/energy_types.csv")