Tunisan primary schools: the data is on!

January 1, 2021

This post is intended to explain the data set I collected and tidyed using data from the Tunisian ministry of education.

The Tunisian ministry of education releases yearly reports that include statistics and count tables describing various aspects of the primary, preparatory and secondary schools. You can find the reports in their websites here. I used the latest report, which is a pdf document, extracted the data from two specific tables in pages 24 and 40 respectively.

Unfortunately for the community the reports are basically in Arabic with some annotation in French. I managed to translate the names of the columns to English. you can access the report that I used entitled: “Statistics for schools 2018/1019”.

Brief description

The table in page 24 describes the evolution of the Tunisian primary schools characteristics and I cleaned it into the file primary_time_clean.csv. You can access that file in this repo.

The second data set describes the characteristics by state, this table can be found in the table in page 40, you can also find the tidy and clean version of this table in the primary_state_clean.csv file in the same repo here. This table describes basically the same characteristics of the previous table, but on a regional basis state by state for the year 2019.

Let’s take a look on the tidy version of the data

Let’s start by the second data set: THE STATE-WISE DATA :

primary_state_clean <- read_csv("https://raw.githubusercontent.com/bennour007/education_data/master/data_primary_schools/primary_state_clean.csv")

primary_state_clean %>%
  glimpse()
## Rows: 1,404
## Columns: 9
## $ state                 <chr> "Tunis1", "Tunis1", "Tunis1", "Tunis1", "Tunis1…
## $ ratios                <chr> "pupil_to_teacher", "pupil_to_teacher", "pupil_…
## $ ratios_value          <dbl> 18.5, 18.5, 18.5, 18.5, 18.5, 18.5, 18.5, 18.5,…
## $ schools_char          <chr> "number_taught_classes", "number_taught_classes…
## $ char_count            <dbl> 1479, 1479, 1479, 1479, 1479, 1479, 1479, 1479,…
## $ teachers_gender       <chr> "teachers_female", "teachers_female", "teachers…
## $ teachers_gender_count <dbl> 1685, 1685, 1685, 465, 465, 465, 2150, 2150, 21…
## $ pupils_gender         <chr> "pupils_female", "pupils_all", "pupils_male", "…
## $ pupils_gender_count   <dbl> 19362, 39865, 20503, 19362, 39865, 20503, 19362…

As we can see I tried to tidy up this data and the columns describe the following :

  • state: There are 24 states in Tunisia, 2 of them are administratively decomposed into two region as they are heavily populated:

    • Tunis : decomposed to Tunis1 and Tunis2.
    • Sfax : decomposed to Sfax1 and Sfax2.
  • ratio: Provides two ratio measures theses are :

    • pupil_to_teacher: the number of pupils for each teacher(aggregated by average)
    • pupil_to_classroom: the number of pupils in each classroom(aggregated by average)
  • ratios_value: provides the value of each ratio.

  • schools_char: provides the characteristics of schools:

    • number_taught_classes: is the number of pupil classes
    • number_classrooms: is the number of classrooms
    • number_schools: is the number of schools
  • char_count: the counts of the provided characteristics in schools_char

  • teachers_gender : the gender of the teaching staff

    • teachers_female: teachers who are females
    • teachers_male: teachers who are male
    • teachers_all: teachers from all genders
  • teachers_gender_count: the counts of teacher_gender

  • pupils_gender: the gender of the enrolled pupils

    • pupils_female: pupils who are females
    • pupils_male: pupils who are males
    • pupils_all: pupils from all genders
  • pupils_gender_count: the counts of pupils_gender

The second data set THE TIME-WISE DATA, describes the change of the same variables from 1985 to 2019, I organized it similarly to the previous data set as follows:

primary_time_clean <- read_csv("https://raw.githubusercontent.com/bennour007/education_data/master/data_primary_schools/primary_time_clean.csv")

primary_time_clean %>% 
  glimpse()
## Rows: 594
## Columns: 10
## $ female_prop_per_calss <dbl> 44.1, 44.1, 44.1, 44.1, 44.1, 44.1, 44.1, 44.1,…
## $ year                  <dbl> 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985,…
## $ ratios                <chr> "pupil_to_class", "pupil_to_class", "pupil_to_c…
## $ ratios_value          <dbl> 32.9, 32.9, 32.9, 32.9, 32.9, 32.9, 32.9, 32.9,…
## $ schools_char          <chr> "taught_classes", "taught_classes", "taught_cla…
## $ char_count            <dbl> 37705, 37705, 37705, 37705, 37705, 37705, 37705…
## $ teachers_gender       <chr> "teachers_female", "teachers_female", "teachers…
## $ teachers_gender_count <dbl> 13150, 13150, 13150, 24262, 24262, 24262, 37412…
## $ pupils_gender         <chr> "pupils_female", "pupils_male", "pupils_all", "…
## $ pupils_gender_count   <dbl> 546089, 692879, 1238968, 546089, 692879, 123896…

As you can see the only difference is that we have a year by year dataset with one new variable which is the female_prop_per_class, which is the average female proportion per class of that year. To give you some context, the Tunisian government decided that it needs to promote more gender equality and in order to do that it needed to create the cultural basis of this move starting by primary education. Moreover, this proportion seemed to be an acceptable KPI to measure how well the government is achieving its goal.

One important note

In the case of the state-wise dataset, the values concerns the year 2019 only, and are aggregated by average for each state. As for the time-wise(or time series) dataset, the values are aggregated nationally by average for each year.

Fianl thoughts

Of course this is a mere essay to tidy what I considered two important tables in that report. However a more useful data set would include all of the described columns in a panel table for each state and for consecutive years to establish consistent basis for comparison. If this dataset successfully make it to the TidyTuesday project (I submitted an issue on github to propose it), the ministry may actually provides the community with the panel data that will allows researchers to conduct sophisticated empirical research, and the community would enjoy a very interesting dataset on the Tunisian experience in promoting equity in education.

If you want to check the code I used you can access it in the same repo, your feedback is welcomed.

thank you for stopping by.

Posted on:
January 1, 2021
Length:
5 minute read, 912 words
Tags:
Tunisian stuff
See Also:
Tunisian Poverty data
Data-oriented democracy