Elections project
January 1, 0001
As you may probably read in my blog, it’s been a while now since I decided to see how the elections work and how a change in the voting system in tunisia may yield a tremendous change.
At first I wasn’t really overwhelmed by the process nor the work that needs to be done it self, however, the further I dive into some technical issues with my code, the more I realize THIS SUCKS BIG TIME.
I mean honestly, most of the problems I ran into were related to low quality data. Anyways, crying over the incompetence of isie data mangers won’t change anything now.
Basically what took me too long was the fact that I was really confused in this project with R, relearning python and cleaning the rust out of my memory was really time consuming, even though I still need to do lots of work in order to upgrade my coding skills and algorithmic thinking, in addition some tweaks may be needed to make the code not only functional the way it is now, but also, elegant and optimal.
Now that I am more comfortable in the pythonic world of dynamism and ease, THE COMMUNITY REALLY NEEDS TO FIND A WAY TO MIMIC RMD. while this entire website is written in Rmd with Rstudio, I haven’t come cross any reporting tool as elegant, and pertinent as Rmd and the knitr package in R.
In what follows we will discuss some functionalities, some methodology and define few concepts for the reader who has no experience in the given matter.
I will not show all the code I used in this report however it will be here in my rep, for those who are going to check it out please leave me a feedback.
I will be dividing the work into phases, each phase will be focusing on a specific task over the data.
Defining the files directory and paths
This is important for the Importing phase to loop on them when we need to access all the files and get all the names.
Phase : Importation
Import the files with pandas in a dictionary to loop on in the analysis
we can also think about importing the data from the github repo, further preprocessing is needed if done.
Phase : Create a preperation function
We Have to prepare the data in ‘dfs’ to get the total votes for each list,
# this function takes the dataframe sums row wise for each candidate
# and return list name and the total votes
def prep(df):
# act on the data frame and process it
df_0 = pd.concat([pd.DataFrame({'sumv':np.sum(df, axis=1)}), df['list']], axis = 1)
return df_0
Phase: Define and prepare Hare quota arguments
Hare quota is the number of all votes in given city divided by the number of seats for that city. Based on the HQ we will create a table where we have:
- electoral quota for each list: Q = 1 means list gets 1 seat etc.
- seats collected fully by the votes quota, quota seats: QS.
- remains R from the quota: votes who didn’t got any seats to the list.
- percentage P of the votes of the list from all votes.
def hare(df, s):
#total votes:
ts = np.sum(df.sumv, axis = 0)
#hare quota:
hq = np.round(ts/s,decimals=3)
#hare quota per list
df['q'] = df.sumv/hq
#quota seats
df['qseats'] = np.fix(df.q)
#remains
df['r'] = df.q - df.qseats
#percentage
df['p'] = df.sumv/ts
#sort the values with the highest remains first
df = df.sort_values('r', ascending = False)
return df
This is the result in Sousse:
py$tmp %>% kable
sumv | list | q | qseats | r | p | |
---|---|---|---|---|---|---|
14 | 102604 | قائمة حركة نداء تونس | 4.9290930 | 4 | 0.9290930 | 0.4929093 |
9 | 12360 | قائمة حزب آفاق تونس | 0.5937740 | 0 | 0.5937740 | 0.0593774 |
15 | 50820 | قائمة حزب حركة النهضة | 2.4413912 | 2 | 0.4413912 | 0.2441391 |
41 | 8626 | قائمة حزب المبادرة | 0.4143928 | 0 | 0.4143928 | 0.0414393 |
33 | 5502 | قائمة الجبهة الشعبية | 0.2643159 | 0 | 0.2643159 | 0.0264316 |
Phase : Prepare and perform computations for seats allocation
In this phase we will implement the largest remains allocation method for a given dataset. We will also give the opportunity to assign a minimum percentage of representation for lists to be accorded the remained seats. For each list sorted with regard to it’s remains, and satisfying the condition on the percentage, we add a seat, until all lists are given one, if more seats are still not allocated we iterate again with the same order until we have none, or until the first condition is met and we repeat.
def seats(df, s, p):
df = df[df.p > p].reset_index()
rs = np.int32(s - np.sum(df.qseats))
for i in range(rs):
df.qseats[i] = df.qseats[i] + 1
return df[df.qseats > 0]
Phase : Combine all processing and computations
For future ease of use, testing, and debugging, it is convenient to create a function that combine all of the above. Let’s call it results().
def results(df, s, p):
return seats(hare(prep(df), s), s, p)
here are the results in sousse
index | sumv | list | q | qseats | r | p | |
---|---|---|---|---|---|---|---|
0 | 14 | 102604 | قائمة حركة نداء تونس | 4.9290930 | 5 | 0.9290930 | 0.4929093 |
1 | 9 | 12360 | قائمة حزب آفاق تونس | 0.5937740 | 1 | 0.5937740 | 0.0593774 |
2 | 15 | 50820 | قائمة حزب حركة النهضة | 2.4413912 | 3 | 0.4413912 | 0.2441391 |
3 | 41 | 8626 | قائمة حزب المبادرة | 0.4143928 | 1 | 0.4143928 | 0.0414393 |
Phase : Get data on seats for each region
Using data from the wikipedia article on regional dispatching of parliamentary seats in Tunisia,
link.
Ofcourse we will not be needing all the page, only the table, the region, and seats associated.
الموقع | الدائرة الانتخابية | الأماكن | المقاعد |
---|---|---|---|
تونس (199 مقعد) | أريانة | ولاية أريانة | 8 |
تونس (199 مقعد) | باجة | ولاية باجة | 6 |
تونس (199 مقعد) | بن عروس | ولاية بن عروس | 10 |
تونس (199 مقعد) | بنزرت | ولاية بنزرت | 9 |
تونس (199 مقعد) | قابس | ولاية قابس | 7 |
Because of the confusion between arabic letters and latin letters python orders them in a different manner so we needed to reorder them again in a consistent manner between arabic and latin letters.
seats | gov |
---|---|
8 | ariana |
6 | beja |
10 | ben_arous |
9 | bizerte |
7 | gabes |
7 | gafsa |
8 | jendouba |
9 | kairouan |
8 | kasserine |
5 | kebili |
7 | mannouba |
6 | kef |
8 | mahdia |
9 | mednine |
9 | monastir |
7 | nabeul1 |
6 | nabeul2 |
7 | sfax1 |
9 | sfax2 |
8 | sidibouzid |
6 | siliana |
10 | sousse |
4 | tataouine |
4 | tozeur |
9 | tunis1 |
8 | tunis2 |
5 | zaghouan |
Phase : Constructing the results
We will now use the table above, to loop for each region, it’s associated dataset in ‘dfs’, pass it to the functions one by one , using also the corresponding number of seats for each specified in the column seats.
We will associate the results into a dictionary we will call it ‘fr’, for final results, each key will be looped on as the name of the data set, and each value will be resulted dataset from the results function of the looped upon dfs dictionary.
In order to do this iteration, We will need to reorder the arrangement of names on which we created the ‘dfs’ dictionary, therefore, we need to reimport the data again in a proper manner.
Phase : Cleaning the names of the lists.
In this section I discovered That some names in arabic have been written in a terrible manner, this could create inconsistent results later on in the plotting, therefore we need to make the names of the lists that went to the parliament are clean.
0 |
---|
قائمة حركة نداء تونس |
قائمة حزب حركة النهضة |
قائمة حزب الاتحاد الوطني الحر |
قائمة الجبهة الشعبية |
قائمة حزب التيار الديمقراطي |
قائمة حزب التحالف |
قائمة حزب المؤتمر من أجل الجمهورية |
القائمة المستقلة الإقلاع |
قائمة حزب صوت الفلاحين |
قائمة تيار المحبة |
قائمة حركة الشعب |
قائمة حزب آفاق تونس |
قائمة الجبهة الوطنية للإنقاذ |
قائمة الوفاء لمشروع الشهيد |
قائمة حزب المبادرة |
قائمة المجد الجريد |
We also needed to clean up some issues in the lists names like spaces, underscores, and more confusion because of arabic letters.
The results : a simple graphic
I only used mainland data here we haven’t integrated abroad citizens votes
tmp <- py$win
tmp %>%
mutate(list = fct_reorder(list, seats)) %>%
ggplot(aes(x= seats, y = list)) +
geom_col(fill = "sky blue") +
geom_text(aes(label = tmp$seats), hjust = -.2, color = "grey") +
theme_light()
Summary and conclusion:
I have used both python and R in order to achieve this, technically using them both is the ideal. The technical issues I faced during this little project, which I will keep updating and improving, are mainly related to the bad data quality that the ISIE is sharing with the public, the problem is that this is official legal and approved-by-the-court public data, the least that (the institution with the biggest budget in the entire country) can do is provide data with quality and clear standards. Putting all of this aside, I will be developing a shiny app to further demonstrate the effect of the percentage quota on the results really shortly. This was fun, looking forward for your feedback.
TO BE CONTINUED.
- Posted on:
- January 1, 0001
- Length:
- 7 minute read, 1480 words
- See Also: