Dr3bdo

Dr3bdo

Home Top Ad

After finishing Udacity’s statistics courses ( Intro to Descriptive Statistics , Intro to Inferential Statistics ) the final project w...

Practicing Statistics on Premier league match results


After finishing Udacity’s statistics courses (Intro to Descriptive Statistics, Intro to Inferential Statistics) the final project was to pick a data set explore it and ask a question/hypothesis and try to answer it with statistics.
One of the suggested datasets was the Italian Football Data so I picked a similar dataset for the premiere league since I’m more familiar with it
I got the data for the season 2017/2018 from football-data.co.uk. after exploring the dataset I found out it had many columns (65) describing the events and results of each match (380 matchs), after reading this note to figure out what each column represents a simple question came to my mind : Does home team have more chance to win ?. there is already alot of studies about home matches and its effect on the home team performance and the factors causing this effect.


10 rows of the data set


I didn’t want to make a complex study, It’s my first time and I want it to be as simple as possible. so I’ll see if the home team wins most of its home matches.
(‘FTHG’, ‘FTAG’, ‘FTR’) I picked those 3 columns to get the match final results and do the statistical calculations on them.
FTHG and HG = Full Time Home Team Goals
FTAG and AG = Full Time Away Team Goals
FTR and Res = Full Time Result (H=Home Win, D=Draw, A=Away Win)
so, from the ‘FTR’ column I can count how many matches the home team won, how many it lost and how many was a draw and here is the count result





count of Home team Match Results
FTR
A    108  (Loss)
D     99  (Draw)
H    173  (Win)
you can notice that home team wins has the higher percentage than the other 2 possibilities.

Descriptive Statistics

It was required to include descriptive statistics in the project so I created an new column and labeled it ‘HTGD’ home team goal differnces which is the result of subtracting ‘FTAG’ from ‘FTHG’. so the data type of the column is numerical and I can calculate descriptive statistcs I learnt in the course, also you can know the results from it [>0: home team win, 0: draw, <0: home team loss]





5 rows of the selected





‘HTGD’ descriptive statistics and histogram





‘HTGD’ Descriptive Statistics
From previous statistics the mean equals 0.38 which is > 0 and also by looking at the histogram we can notice that the frequency of values which are > 0 is more than those <0. This tells us that home team wins most of the matches.

Inferential statistics

z-test, t-test, ANOVA, chi-squared were all covered in the Intro to inferential statistics course. I chose chi-squared goodness of fit test as the data being analyzed is nominal (match result for the home teams: win, draw or loss) and Cramer’s V is used as an effect size measure.
chi square test hypothesis:
Ho: Being the host doesn’t affect match results : (wins: 33.33%, draw:33.33%, loss:33.33%)
Ha: being the host gives you higher chance to win (win > 33.33%)





chi-square statistics = sum of the last row in the previous image

Results






chi-square, Cramer’s V tests results
A chi-square test of goodness-of-fit was performed to determine whether the home team has more chance to win the match, (2, N = 380) = 25.74, p < .0001.
Effect size measure: Cramer’s V = 0.18

Conclusion

from the previous results : Chi square test results suggest that there is a relationship between playing home or not with the match results. But Cramer’s V result points out that there is a week relationship which is minimally acceptable.
note: calculations were done using MS excel , you can download the excel spreadsheet and final report from github.

0 comments: