One of the suggested datasets was the Italian Football Data so I picked a similar dataset for the premiere league since I’m more familiar with it
I got the data for the season 2017/2018 from football-data.co.uk. after exploring the dataset I found out it had many columns (65) describing the events and results of each match (380 matchs), after reading this note to figure out what each column represents a simple question came to my mind : Does home team have more chance to win ?. there is already alot of studies about home matches and its effect on the home team performance and the factors causing this effect.

I didn’t want to make a complex study, It’s my first time and I want it to be as simple as possible. so I’ll see if the home team wins most of its home matches.
(‘FTHG’, ‘FTAG’, ‘FTR’) I picked those 3 columns to get the match final results and do the statistical calculations on them.
FTHG and HG = Full Time Home Team Goals FTAG and AG = Full Time Away Team Goals FTR and Res = Full Time Result (H=Home Win, D=Draw, A=Away Win)
so, from the ‘FTR’ column I can count how many matches the home team won, how many it lost and how many was a draw and here is the count result

FTR A 108 (Loss) D 99 (Draw) H 173 (Win)
you can notice that home team wins has the higher percentage than the other 2 possibilities.
Descriptive Statistics
It was required to include descriptive statistics in the project so I created an new column and labeled it ‘HTGD’ home team goal differnces which is the result of subtracting ‘FTAG’ from ‘FTHG’. so the data type of the column is numerical and I can calculate descriptive statistcs I learnt in the course, also you can know the results from it [>0: home team win, 0: draw, <0: home team loss]



From previous statistics the mean equals 0.38 which is > 0 and also by looking at the histogram we can notice that the frequency of values which are > 0 is more than those <0. This tells us that home team wins most of the matches.
Inferential statistics
z-test, t-test, ANOVA, chi-squared were all covered in the Intro to inferential statistics course. I chose chi-squared goodness of fit test as the data being analyzed is nominal (match result for the home teams: win, draw or loss) and Cramer’s V is used as an effect size measure.
chi square test hypothesis:
Ho: Being the host doesn’t affect match results : (wins: 33.33%, draw:33.33%, loss:33.33%)
Ha: being the host gives you higher chance to win (win > 33.33%)

chi-square statistics = sum of the last row in the previous image
Results

A chi-square test of goodness-of-fit was performed to determine whether the home team has more chance to win the match, (2, N = 380) = 25.74, p < .0001.
Effect size measure: Cramer’s V = 0.18
Conclusion
from the previous results : Chi square test results suggest that there is a relationship between playing home or not with the match results. But Cramer’s V result points out that there is a week relationship which is minimally acceptable.
note: calculations were done using MS excel , you can download the excel spreadsheet and final report from github.
0 comments: