library(tidytuesdayR)
library(dplyr)
library(tidyr)
library(forcats)
library(ggplot2)
Introduction
In this session, Sam Langton demonstrated long to wide (and wide to long) transformations using functions available in the tidyr package using data from the London Fire Brigade (Langton, 2022).
Preparation
Identify datasets available (fire service data).
<- tidytuesdayR::tt_load("2021-06-29") tt_df
Downloading file 1 of 1: `animal_rescues.csv`
Pull out the data.
<- tt_df$animal_rescues fire_df
Save for train working.
Harrie: don't know what this means
save.image("data/train_working.Rdata")
load("data/train_working.Rdata")
Basic cleaning.
<- fire_df %>%
fire_clean_df filter(cal_year < 2021) %>%
mutate(animal_group_broad = fct_lump(animal_group_parent, n = 5))
Transformations
Create some wide data. Freq counts of incidents for each animal by year, then wide. 2021 is not complete. Note animal category is just for the demo!
Take care, this is a crude recode.
<- fire_clean_df %>%
fire_wide1_df filter(cal_year < 2021) %>%
select(incident_number, cal_year, animal_group_broad) %>%
group_by(cal_year, animal_group_broad) %>%
summarise(yearly_count = n()) %>%
pivot_wider(names_from = cal_year, values_from = yearly_count)
Non-longitudinal example. Note the fiddly way of keeping zero counts. Requires an ungroup() and then complete(). There might be a better way of doing this!
<- fire_clean_df %>%
fire_wide2_df filter(cal_year == 2009) %>%
select(incident_number, animal_group_broad, property_category) %>%
group_by(animal_group_broad, property_category) %>%
summarise(yearly_count = n()) %>%
ungroup() %>%
complete(animal_group_broad, property_category, fill = list(yearly_count = 0)) %>%
pivot_wider(names_from = property_category, values_from = yearly_count)
Nested data / multilevel example.
<- fire_clean_df %>%
fire_wards_df group_by(borough_code, ward_code, cal_year) %>%
summarise(yearly_count = n()) %>%
ungroup() %>%
filter(ward_code != "NULL") %>%
arrange(borough_code, ward_code, cal_year)
Make wide for example, fill in zeros and arrange. Why interesting? Because it’s nested levels, longitudinal and ‘year’ in the var
names.
<- fire_wards_df %>%
fire_wide3_df arrange(cal_year) %>%
pivot_wider(names_from = cal_year, values_from = yearly_count, names_prefix = "year_") %>%
mutate_if(is.numeric, ~replace_na(., 0)) %>%
arrange(ward_code)
Remove things that might confuse people.
rm(fire_clean_df, fire_df, tt_df, fire_wards_df)
References
Langton, S. (2022, April). NSC-R Workshops: NSC-R Tidy Tuesday. NSCR. Retrieved from https://nscrweb.netlify.app/posts/2022-01-25-nsc-r-tidy-tuesday/