O Freedom, My Freedom!
March 15, 2022
It’s 2022, where technology is driving the world with data and information, often times it turns out to be private data, and misinformation. However this unprecedented ease of communication allowed humans to express themselves in more divers and creative ways. But for this expression to grow, thrive, and produce an output freedom is mandatory.
In this context Freedom house together with the United Nations created a data-set on the state of freedom in the world. As part of the Tidy Tuesday project We will check this data and see what we can come up with.
Loading libraries
library(tidyverse)
# we probably will need GT, sf, and biscale
library(gt)
library(sf)
library(biscale)
Data Summary
Let’s first import data (and clean it’s column names)
freedom <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-22/freedom.csv') %>%
janitor::clean_names()
freedom
## # A tibble: 4,979 × 8
## country year cl pr status region_code region_name is_ldc
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
## 1 Afghanistan 1995 7 7 NF 142 Asia 1
## 2 Afghanistan 1996 7 7 NF 142 Asia 1
## 3 Afghanistan 1997 7 7 NF 142 Asia 1
## 4 Afghanistan 1998 7 7 NF 142 Asia 1
## 5 Afghanistan 1999 7 7 NF 142 Asia 1
## 6 Afghanistan 2000 7 7 NF 142 Asia 1
## 7 Afghanistan 2001 7 7 NF 142 Asia 1
## 8 Afghanistan 2002 6 6 NF 142 Asia 1
## 9 Afghanistan 2003 6 6 NF 142 Asia 1
## 10 Afghanistan 2004 6 5 NF 142 Asia 1
## # … with 4,969 more rows
A quick look at the summary stats
freedom %>%
summary()
## country year cl pr
## Length:4979 Min. :1995 Min. :1.000 Min. :1.000
## Class :character 1st Qu.:2001 1st Qu.:2.000 1st Qu.:1.000
## Mode :character Median :2008 Median :3.000 Median :3.000
## Mean :2008 Mean :3.369 Mean :3.411
## 3rd Qu.:2014 3rd Qu.:5.000 3rd Qu.:6.000
## Max. :2020 Max. :7.000 Max. :7.000
## status region_code region_name is_ldc
## Length:4979 Min. : 2.00 Length:4979 Min. :0.0000
## Class :character 1st Qu.: 2.00 Class :character 1st Qu.:0.0000
## Mode :character Median : 19.00 Mode :character Median :0.0000
## Mean : 72.53 Mean :0.2362
## 3rd Qu.:142.00 3rd Qu.:0.0000
## Max. :150.00 Max. :1.0000
It might be a little easier if we simply convert the character cols to factors
freedom %>%
mutate(
country = as_factor(country),
status = as_factor(status),
region_name = as_factor(region_name)
) %>%
summary()
## country year cl pr
## Afghanistan : 26 Min. :1995 Min. :1.000 Min. :1.000
## Albania : 26 1st Qu.:2001 1st Qu.:2.000 1st Qu.:1.000
## Algeria : 26 Median :2008 Median :3.000 Median :3.000
## Andorra : 26 Mean :2008 Mean :3.369 Mean :3.411
## Angola : 26 3rd Qu.:2014 3rd Qu.:5.000 3rd Qu.:6.000
## Antigua and Barbuda: 26 Max. :2020 Max. :7.000 Max. :7.000
## (Other) :4823
## status region_code region_name is_ldc
## NF:1257 Min. : 2.00 Asia :1218 Min. :0.0000
## PF:1503 1st Qu.: 2.00 Europe :1099 1st Qu.:0.0000
## F :2219 Median : 19.00 Africa :1388 Median :0.0000
## Mean : 72.53 Americas: 910 Mean :0.2362
## 3rd Qu.:142.00 Oceania : 364 3rd Qu.:0.0000
## Max. :150.00 Max. :1.0000
##
Unfortunately we can’t get much information using the transformation we did because there are various years for each of the factors, thus in order to get a better idea on the numbers for each factor let’s choose one year and filter.
freedom %>%
mutate(
country = as_factor(country),
status = as_factor(status),
region_name = as_factor(region_name)
) %>%
filter(
year == 2020
) %>%
summary()
## country year cl pr
## Afghanistan : 1 Min. :2020 Min. :1.00 Min. :1.000
## Albania : 1 1st Qu.:2020 1st Qu.:2.00 1st Qu.:1.000
## Algeria : 1 Median :2020 Median :3.00 Median :3.000
## Andorra : 1 Mean :2020 Mean :3.42 Mean :3.554
## Angola : 1 3rd Qu.:2020 3rd Qu.:5.00 3rd Qu.:6.000
## Antigua and Barbuda: 1 Max. :2020 Max. :7.00 Max. :7.000
## (Other) :187
## status region_code region_name is_ldc
## NF:54 Min. : 2.00 Asia :47 Min. :0.0000
## PF:58 1st Qu.: 2.00 Europe :43 1st Qu.:0.0000
## F :81 Median : 19.00 Africa :54 Median :0.0000
## Mean : 72.66 Americas:35 Mean :0.2383
## 3rd Qu.:142.00 Oceania :14 3rd Qu.:0.0000
## Max. :150.00 Max. :1.0000
##
With this simple filter we can see in the summary output how many countries we have in each region as well as the general status (free(F), partially free(PF), and not free(NF)) for all countries.
Let’s check the data time wise
freedom %>%
count(region_name, year) %>%
ggplot()+
geom_line(
aes(
year,
n,
color = region_name
)
) +
labs(
title = 'The number of observations(countries) for each region over time'
)
It’s probably not the best option to use a line chart to describe slight changes in the data, in our case the number of observation per region. However it’s a fast solution to check things out fast, in this case we can detect that for the African and Asian regions there has been an increase in the number of countries in one instance, however for the Europe region this happened twice. I doubt this is going to affect our approach, but let’s keep it in mind for further analysis.
Exploration
The first idea that comes in mind for me is to check how the freedom status for each region changed over the years. But first, you noticed the first chart is quite vanilla, by my standards I would say it lacks aesthetics. Yes I’m doing all this for fun, but it doesn’t mean I should compromise on beauty and style. Therefore I will start by setting up a theme that will be used hereafter.
Le theme
le_theme <- theme(
panel.background = element_rect(
fill = '#E5E3C9',
color = NA
),
plot.background = element_rect(
fill = '#E5E3C9',
color = NA
),
strip.background = element_rect(
fill = '#E5E3C9',
color = NA
),
legend.background = element_rect(
fill = '#E5E3C9',
color = NA
),
legend.box.background = element_rect(
fill = '#E5E3C9',
color = NA
),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(
colour = '#F4FCD9'
),
panel.grid.minor.y = element_blank(),
panel.grid.minor.x = element_line(
colour = '#F4FCD9'
),
legend.text = element_text(
family = 'URWGothic'
),
legend.title = element_text(
family = 'URWGothic'
),
plot.title = element_text(
family = 'URWGothic'
),
plot.subtitle = element_text(
family = 'URWGothic'
),
axis.text = element_text(
family = 'URWGothic'
),
axis.title = element_text(
family = 'URWGothic'
),
strip.text = element_text(
family = 'URWGothic'
),
legend.key = element_rect(colour = NA, fill = NA),
)
Another Important Aesthetic addition in my oppinion is the use of a different color palette. I didn’t have much time to set my own palettes, but using the MetBrewer
package we can use pre-built palettes inspired from the worlds most famous paintings. It’s brilliant package, shout-out to the maintainers.
freedom %>%
group_by(region_name, year, status) %>%
summarise(
status,
count = n()
) %>%
ggplot(
aes(
year,
count
)
) +
geom_point(
aes(
color = status
) ,
size = 0.6,
alpha = 0.3
)+
geom_line(
aes(
color = status
)
) +
facet_grid(region_name~.) +
##############################################################################
# Using the MetBrewer package to set the palettes manually for the color scales.
scale_color_manual(
values= MetBrewer::met.brewer("VanGogh2", 3)
) +
##############################################################################
labs(
title = 'Count of countries per status and region from 1995 to 2020'
) +
le_theme
I like aggregation, it gives general tendencies and ease up investigation. What if we look at the global aggregate of the status over the years.
freedom %>%
group_by(status, year) %>%
summarise(
count = n()
) %>%
ggplot() +
geom_line(
aes(
year,
count,
color = status
)
) +
scale_color_manual(
values= MetBrewer::met.brewer("VanGogh2", 3)
) +
labs(
title = 'Number of countries per status between 1995 and 2020'
) +
le_theme
At the aggregate level, we can observe that the difference between 1995 and 2020 in the number of free countries is positive, however also the difference in the number of non free country in the period is positive, wile the less countries are partially free in 2020 compared to 1995.
Let’s dig deep on the changes, this a great use case for the gt
package.
freedom %>%
group_by(status, year) %>%
summarise(
count = n()
) %>%
filter(
year %in% c(1995,2020)
) %>%
pivot_wider(
names_from = year,
values_from = count
) %>%
mutate(
difference = `2020` - `1995`,
percentage = difference / `2020`
) %>%
ungroup() %>%
gt() %>%
fmt_percent(
columns = percentage
) %>%
cols_align(
columns = c(percentage, difference),
align = 'right'
) %>%
# add black thick border to the left of the difference column
tab_style(
style = list(
cell_borders(
sides = 'left',
color = 'black',
weight = px(3)
)
),
locations = list(
cells_body(
columns = difference
)
)
) %>%
# add black thick border to the bottom of the column names
tab_style(
style = list(
cell_borders(
sides = 'bottom',
color = 'black',
weight = px(3)
)
),
locations = list(
cells_column_labels(
columns = everything()
)
)
) %>%
data_color(
columns = difference,
colors = scales::col_numeric(
palette = as.character(paletteer::paletteer_d("ggsci::red_material", n = 3)),
domain = NULL
)
) %>%
data_color(
columns = percentage,
colors = scales::col_numeric(
palette = as.character(paletteer::paletteer_d("nord::frost")),
domain = NULL
)
) %>%
tab_header(
title = 'Global difference in Status between 1995 and 2020',
subtitle = 'Is the world more free in 2020 than in 1995?'
) %>%
tab_source_note(
'@Bennour007sin|www.bennour.tn'
) %>%
tab_source_note(
'data: UN & Freedom House'
)
Global difference in Status between 1995 and 2020 | ||||
---|---|---|---|---|
Is the world more free in 2020 than in 1995? | ||||
status | 1995 | 2020 | difference | percentage |
F | 76 | 81 | 5 | 6.17% |
NF | 52 | 54 | 2 | 3.70% |
PF | 61 | 58 | -3 | −5.17% |
@Bennour007sin|www.bennour.tn | ||||
data: UN & Freedom House |
Digging deeper to check for the relative changes in the general status between 1995 and 2020 for each continent is an interesting idea after reviewing these status in the global context. For this one I decided to fo something different and make an interactive chart using ggplotly()
.
plt <- freedom %>%
group_by(region_name, status, year) %>%
summarise(
count = n()
) %>%
filter(
year %in% c(1995,2020)
) %>%
mutate(
year = as_factor(year)
) %>%
ungroup(status, year) %>%
pivot_wider(
names_from = year,
values_from = count
) %>%
mutate(
pct_change = (`1995` - `2020`) / `2020`
) %>%
pivot_longer(
cols = 3:4,
names_to = 'year',
values_to = 'count'
) %>%
ggplot(
aes(
pct_change,
status
)
)+
geom_col(
aes(
count,
status,
fill = year
),
width = 0.09,
alpha = 0.7,
position = 'dodge'
) +
geom_point(
aes(
pct_change,
status
),
shape = 18,
color = '#A68DAD',
size = 4
) +
scale_x_continuous(
labels = scales::percent_format(
accuracy = 1,
scale = 1
)
) +
scale_fill_manual(
values= MetBrewer::met.brewer("VanGogh2", 2)
) +
facet_grid(~region_name, scales = 'free_x') +
labs(
title = 'Number of countries and percentage of change in each status in 1995 and 2020 per region'
) +
le_theme
plotly::ggplotly(plt)
The users can hoover on the components of the chart and check the difference on the spot. But somehow this is not rendering in the right manner, if anyone can help please reach out.
Liberties : Political and Civil
What is the state of Civil and Political liberties over the continents
freedom %>%
group_by(region_name) %>%
summarise(
avg_CL = mean(cl),
avg_PR = mean(pr),
avg_L = (avg_CL + avg_PR)/2
) %>%
pivot_longer(
cols = 2:3,
names_to = 'index',
values_to = 'value'
) %>%
mutate(
region_name = fct_reorder(region_name, avg_L, .desc = T)
) %>%
ggplot()+
geom_col(
aes(
value,
region_name,
fill = index
),
position = 'dodge'
) +
geom_point(
aes(
avg_L,
region_name
),
shape = 18,
size = 10,
color = '#A68DAD'
) +
labs(
title = 'Overall year/region mean of Civil and Political liberties',
subtitle = 'less is better, and Europe is the Freeiest region in the world',
y = 'Region'
) +
scale_fill_manual(
values= MetBrewer::met.brewer("VanGogh2", 2),
name = 'Average Liberty',
labels = c('Civil', 'Political')
) +
le_theme
As an African I’m usually curious in investigating data regarding my region, this time I was inspirded by CÉDRIC SCHERER’s Visualisation of the Malaria Data from the WHO.
You can check his code here and the chart itself here.
The reason I appreciated this chart was the Artist’s use of bi-scale legend to denote the dispurtion of the Malaria virus. In our case it can be ver useful to apply the same principle to make a bi-scale for the civil and political liberty at the same time on the same chart.
Making a 2-dimensional legend scale is made possible by the biscale
package which is what we will make below.
Le map
To make this map we need to get geometric data of the African continent, this possible with the sf
package.
africa_map_data <- st_as_sf(
rworldmap::getMap(resolution = 'high')
) %>%
filter(
continent == 'Africa'
)
Once the geometric data is ready, we must create our main data, the one we will use to make the map itself. i.e: We will need to join the freedom data with the geometric data.
At the same time we will need to let ggplot
know that we will be using bi-dimensional scales by using the bi_class()
and the bi_scale_fill()
functions.
In the bi_class()
function we set up the the dimensions we will use to create our new scale, in our case those are the political right and civil liberty variables. We need to also make the fill mapping geom_sf
assigned to bi_class
, and we also need to set a palette theme in the bi_scale_fill
.
Shout-out to the maintainers of this package for the great documentation and for Cedric for his constant sharing of his art, it definitely helps.
africa_map <- africa_map_data %>%
left_join(
freedom %>%
filter(year == 2020),
by = c("NAME"="country")
) %>%
bi_class(
x = pr,
y = cl,
dim= 3,
style = 'quantile'
) %>%
ggplot() +
geom_sf(
aes(
fill = bi_class
),
color = "grey20",
lwd = 0.2,
show.legend = F
) +
bi_scale_fill(
pal = "DkCyan",
dim = 3
)+
labs(
title = 'Political and Civil Liberties in Africa in 2020',
subtitle = 'Lighter colors denote more freedom',
caption = "@Bennour007sin|www.bennour.tn \n data: UN & Freedom House"
)
All what is left for us to do is set up the legend itself to help the readers navigate the map, we do that by determining the extent of the legend, in our case we will use a 3x3 map. We will also need to set up the palette theme it must be the same as the map theme of course. Lastly we set up the labs for the legend, the background color, the size, and the fonts, then we should be ready to combine all of this in one chart.
africa_map_legend <- bi_legend(
pal = "DkCyan",
dim = 3,
xlab = "Political Rights",
ylab = "Civil Liberties"
) +
bi_theme(
bg_color = "#D3E4CD",
base_family = "Changa One",
base_size = 15
)
Finaly we use the cowplot
package to combine both the map and the legend and we end up with both the map and the legend together. We will use a different theme so everything look better before we render.
le_theme_map <- theme(
axis.ticks = element_blank(),
axis.text = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(color = NA,
fill = "#D3E4CD"),
plot.background = element_rect(color = NA,
fill = "#D3E4CD",
size = 5),
plot.title = element_text(family = "Changa One",
color = "black",
size = 25,
face = "bold",
hjust = 0.5,
margin = margin(t = 36, b = 6)),
plot.subtitle = element_text(family = "Changa One",
color = "#5F939A",
size = 20,
hjust = 0.5,
margin = margin(t = 6, b = 20)),
plot.caption = element_text(family = "Changa One",
color = "#806A8A",
size = 14,
face = "plain",
hjust = 0.5,
margin = margin(t = 0, b = 36))
)
africa_map_final <- cowplot::ggdraw() +
cowplot::draw_plot(africa_map + le_theme, 0, 0, 1, 1) +
cowplot::draw_plot(africa_map_legend, 0.15, 0.25, 0.2, 0.2)
Cheers!