-
Notifications
You must be signed in to change notification settings - Fork 0
jonas-richter/Football
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Using the worldfootballR package, I scrape some football data from FBRef and Transfermarkt.de with the intention to make some predictions on future games. At the moment, everything should work properly for the first divisions of England and Germany (male). Small adjustments might be necessary to make predict matches from other divisions as well. **Example for prediction** [prediction.csv](https://github.com/jonas-richter/Football/blob/main/Output/Matchday_data/Prediction/prediction.csv) **Workflow:** After cloning this repository, you need to read in the functions from the Scripts directory. You can do so by running: sapply(paste0("./Scripts/", list.files(path = "./Scripts/", pattern = "*.R")), source). See the Apply_functions.R script in the Application directory. You can then get more training data by calling the get_training_data function. However, this step should not be necessary because the classifier did not seem to improve much more with further training data. Subsequently, you can combine all the training data to one big data frame by running the function combine_training_data. You can then test the accuracy of a categorical random Forest classifier, a regression randomForest classifier and a categorical naivebayes classifier in a k-fold cross-validation manner, using the function k_fold_testing. Under the hood, the k_fold_testing function runs the train_test_split function k times. If you want to predict the upcoming matchday of a division of your choice (at the moment the first German and English male division work), you can generate the necessary features using the get_matchday_data function. You need to run the get_matchday_data function on a date *before* the first match of the matchday. After that, you can run the impute_train_full_model function to impute missing data (NAs) in your dataset and to train the classifier (random Forest, naive bayes) on all available training data. These models are used by the predict_matchday function to predict the upcoming matchday. **Scripts:** this directory contains the code for 8 functions: - get_matchday_data: generate data frame with features for matches of the upcoming matchday. The features are build based on team and player statistics of the current season. Statistics of players are retrieved which played on the prior matchday. The features are constructed by *dividing* the features of the home team by the features of the away team. Division is done because some features concern player values which are on a different level between different divisions (e.g. England has overall players with higher Transfermarkt values than Germany). By using ratios (division), features can be compared between divisions. Prior to division, a small constant (0.01) is added to all features to avoid divisions by zero. A few features are also constructed by *subtracting* the values of the home team by the values of the away team (the points of the teams). The reason for that is the high frequency of zero points in the past few matches and division would lead to misleading results. The features are generalizeable, because instead of using player statistics directly, players were grouped into positions (e.g. central back, winger, attacking midfielder, ...) and can therefore be compared between teams.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published