arxiv.org - László Csató, András Gyimesi, Dries Goossens, Karel Devriesere, Roel Lambers, Frits Spieksma
Abstract:A fundamental reform has been introduced in the 2024/25 season of club competitions organised by the Union of European Football Associations (UEFA): the well-established group stage has been replaced by an incomplete round-robin format. In this format, the 36 teams are ranked in a single league table, but play against only a subset of the competitors. While this innovative change has highlighted that the incomplete round-robin tournament is a reasonable alternative to the standard design of allocating the teams into round-robin groups, the characteristics of the new format remain unexplored. Our paper contributes to this topic by using simulations to compare the uncertainty generated by the draw in the old format with that in the new format of the UEFA Champions League. We develop a method to break down the impact of the 2024/25 reform into various components for each team. The new format is found to decrease the overall effect of the draw. However, this reduction can mainly be attributed to the inaccurate seeding system used by UEFA. If the teams are seeded based on their actual strengths, the impact of the draw is about the same in a tournament with an incomplete round-robin league or a group stage.
arxiv.org - Roberto Macrì-Demartino, Leonardo Egidi, Nicola Torelli
Abstract:In recent years, great emphasis has been placed on the prediction of association football. Due to this, several studies have proposed different types of statistical models to predict the outcome of a football match. However, most existing approaches usually assume that the offensive and defensive abilities of teams remain static over time. We introduce a Bayesian dynamic approach for football goal based models that uses period-specific commensurate priors to flexibly weight the evolution of attacking and defensive abilities. Our approach assigns separate, time varying precisions for each ability and period, controlled via spike and slab hyperpriors. This adaptive shrinkage borrows information about teams' strength when past and current performance aligns and allows rapid adjustments when teams experience substantial changes (e.g., transfer windows or coaching changes). We integrate this framework into six standard goal based models evaluating predictive performance using data from the last five seasons of the German Bundesliga, English Premier League, and Spanish La Liga. Compared with the other discrete time dynamic models, our adaptive approach yields better predictive performance. The proposed methodology has also been implemented in the free and open source R package footBayes.
arxiv.org - Daniel Groos
Abstract:Fantasy Premier League engages the football community in selecting the Premier League players who will perform best from gameweek to gameweek. Access to accurate performance forecasts gives participants an edge over competitors by guiding expectations about player outcomes and reducing uncertainty in squad selection. However, high-accuracy forecasts are currently limited to commercial services whose inner workings are undisclosed and that rely on proprietary data. This paper aims to democratize access to highly accurate forecasts of player performance by presenting OpenFPL, an open-source Fantasy Premier League forecasting method developed exclusively from public data. Comprising position-specific ensemble models optimized on Fantasy Premier League and Understat data from four previous seasons (2020-21 to 2023-24), OpenFPL achieves accuracy comparable to a leading commercial service when tested prospectively on data from the 2024-25 season. OpenFPL also surpasses the commercial benchmark for high-return players (> 2 points), which are most influential for rank gains. These findings hold across one-, two-, and three-gameweek forecast horizons, supporting long-term planning of transfers and strategies while also informing final-day decisions.
arxiv.org - Kenjiro Ide, Taiga Someya, Kohei Kawaguchi, Keisuke Fujii
Abstract:Invasion team sports such as soccer produce a high-dimensional, strongly coupled state space as many players continuously interact on a shared field, challenging quantitative tactical analysis. Traditional rule-based analyses are intuitive, while modern predictive machine learning models often perform pattern-matching without explicit agent representations. The problem we address is how to build player-level agent models from data, whose learned values and policies are both tactically interpretable and robust across heterogeneous data sources. Here, we propose Expandable Decision-Making States (EDMS), a semantically enriched state representation that augments raw positions and velocities with relational variables (e.g., scoring of space, pass, and score), combined with an action-masking scheme that gives on-ball and off-ball agents distinct decision sets. Compared to prior work, EDMS maps learned value functions and action policies to human-interpretable tactical concepts (e.g., marking pressure, passing lanes, ball accessibility) instead of raw coordinate features, and aligns agent choices with the rules of play. In the experiments, EDMS with action masking consistently reduced both action-prediction loss and temporal-difference (TD) error compared to the baseline. Qualitative case studies and Q-value visualizations further indicate that EDMS highlights high-risk, high-reward tactical patterns (e.g., fast counterattacks and defensive breakthroughs). We also integrated our approach into an open-source library and demonstrated compatibility with multiple commercial and open datasets, enabling cross-provider evaluation and reproducible experiments.
arxiv.org - Yi-chen Yao, Jerry Wang, Yi-cheng Lai, Lyn Chao-ling Chen
Abstract:The topic of aging decline on performance of NBA players has been discussed in this study. The autoencoder with K-means clustering machine learning method was adopted to career trend classification of NBA players, and the LSTM deep learning method was adopted in performance prediction of each NBA player. The dataset was collected from the basketball game data of veteran NBA players. The contribution of the work performed better than the other methods with generalization ability for evaluating various types of NBA career trend, and can be applied in different types of sports in the field of sport analytics.
argmin.net - Ben Recht
It’s undeniable that everything in machine learning is an optimization problem. The fundamental problem of machine learning is an optimization problem: minimizing average prediction errors on data we haven’t seen yet. In practice, we more or less do this by minimizing average prediction errors on the data we’ve collected so far. Given its fundamental position, how much numerical optimization should we learn in a machine learning class?
