coursera.org - Stefan Szymanski
Predictive Sports Analytics with Real Sports Data. Anticipate player and team performance using sports analytics principles.
arxiv.org - Quang Nguyen, Ronald Yurko
Abstract:Player tracking data have provided great opportunities to generate novel insights into understudied areas of American football, such as pre-snap motion. Using a Bayesian multilevel model with heterogeneous variances, we provide an assessment of NFL quarterbacks and their ability to synchronize the timing of the ball snap with pre-snap movement from their teammates. We focus on passing plays with receivers in motion at the snap and running a route, and define the snap timing as the time between the moment a receiver begins motioning and the ball snap event. We assume a Gamma distribution for the play-level snap timing and model the mean parameter with player and team random effects, along with relevant fixed effects such as the motion type identified via a Gaussian mixture model. Most importantly, we model the shape parameter with quarterback random effects, which enables us to estimate the differences in snap timing variability among NFL quarterbacks. We demonstrate that higher variability in snap timing is beneficial for the passing game, as it relates to facing less havoc created by the opposing defense. We also obtain a quarterback leaderboard based on our snap timing variability measure, and Patrick Mahomes stands out as the top player.
arxiv.org - Koen W. van Arem, Floris Goes-Smit, Jakob Söhl
Abstract:Transfers in professional football (soccer) are risky investments because of the large transfer fees and high risks involved. Although data-driven models can be used to improve transfer decisions, existing models focus on describing players' historical progress, leaving their future performance unknown. Moreover, recent developments have called for the use of explainable models combined with uncertainty quantification of predictions. This paper assesses explainable machine learning models based on predictive accuracy and uncertainty quantification methods for the prediction of the future development in quality and transfer value of professional football players. Using a historical data set of data-driven indicators describing player quality and the transfer value of a football player, the models are trained to forecast player quality and player value one year ahead. These two prediction problems demonstrate the efficacy of tree-based models, particularly random forest and XGBoost, in making accurate predictions. In general, the random forest model is found to be the most suitable model because it provides accurate predictions as well as an uncertainty quantification method that naturally arises from the bagging procedure of the random forest model. Additionally, our research shows that the development of player performance contains nonlinear patterns and interactions between variables, and that time series information can provide useful information for the modeling of player performance metrics. Our research provides models to help football clubs make more informed, data-driven transfer decisions by forecasting player quality and transfer value.
arxiv.org - Ali Baouan
Abstract:The availability of tracking data in football presents unique opportunities for analyzing team shape and player roles, but leveraging it effectively remains challenging. This difficulty arises from the significant overlap in player positions, which complicates the identification of distinct roles and team formations. In this work, we propose a novel model that incorporates a hidden permutation matrix to simultaneously estimate team formations and assign roles to players at the frame level. To address the cardinality of permutation sets, we develop a statistical procedure to parsimoniously select relevant matrices prior to parameter estimation. Additionally, to capture formation changes during a match, we introduce a latent regime variable, enabling the modeling of dynamic tactical adjustments. This framework disentangles player locations from role-specific positions, providing a clear representation of team structure. We demonstrate the applicability of our approach using player tracking data, showcasing its potential for detailed team and player analysis.
arxiv.org - Thijs Overmeer, Tim Janssen, Wim P.M. Nuijten
Abstract:This paper introduces the first Expected Possession Value (EPV) benchmark and a new and improved EPV model for football. Through the introduction of the OJN-Pass-EPV benchmark, we present a novel method to quantitatively assess the quality of EPV models by using pairs of game states with given relative EPVs. Next, we attempt to replicate the results of Fernández et al. (2021) using a dataset containing Dutch Eredivisie and World Cup matches. Following our failure to do so, we propose a new architecture based on U-net-type convolutional neural networks, achieving good results in model loss and Expected Calibration Error. Finally, we present an improved pass model that incorporates ball height and contains a new dual-component pass value model that analyzes reward and risk. The resulting EPV model correctly identifies the higher value state in 78% of the game state pairs in the OJN-Pass-EPV benchmark, demonstrating its ability to accurately assess goal-scoring potential. Our findings can help assess the quality of EPV models, improve EPV predictions, help assess potential reward and risk of passing decisions, and improve player and team performance.
arxiv.org - Nourah Buhamra, Andreas Groll
Abstract:In this manuscript, we concentrate on a specific type of covariates, which we call statistically enhanced, for modeling tennis matches for men at Grand slam tournaments. Our goal is to assess whether these enhanced covariates have the potential to improve statistical learning approaches, in particular, with regard to the predictive performance. For this purpose, various proposed regression and machine learning model classes are compared with and without such features. To achieve this, we considered three slightly enhanced variables, namely elo rating along with two different player age variables. This concept has already been successfully applied in football, where additional team ability parameters, which were obtained from separate statistical models, were able to improve the predictive performance. In addition, different interpretable machine learning (IML) tools are employed to gain insights into the factors influencing the outcomes of tennis matches predicted by complex machine learning models, such as the random forest. Specifically, partial dependence plots (PDP) and individual conditional expectation (ICE) plots are employed to provide better interpretability for the most promising ML model from this work. Furthermore, we conduct a comparison of different regression and machine learning approaches in terms of various predictive performance measures such as classification rate, predictive Bernoulli likelihood, and Brier score. This comparison is carried out on external test data using cross-validation, rolling window, and expanding window strategies.
arxiv.org - Akhil Penta, Vaibhav Adwani, Ankush Chopra
Abstract:Accurate skier tracking is essential for performance analysis, injury prevention, and optimizing training strategies in alpine sports. Traditional tracking methods often struggle with occlusions, dynamic movements, and varying environmental conditions, limiting their effectiveness. In this work, we used STARK (Spatio-Temporal Transformer Network for Visual Tracking), a transformer-based model, to track skiers. We adapted STARK to address domain-specific challenges such as camera movements, camera changes, occlusions, etc. by optimizing the model's architecture and hyperparameters to better suit the dataset.
arxiv.org - Shounak Desai
Abstract:Technological advancements in soccer have surged over the past decade, transforming aspects of the sport. Unlike binary rules, many soccer regulations, such as the "Offside Rule," rely on subjective interpretation rather than straightforward True or False criteria. The on-field referee holds ultimate authority in adjudicating these nuanced decisions. A significant breakthrough in soccer officiating is the Video Assistant Referee (VAR) system, leveraging a network of 20-30 cameras within stadiums to minimize human errors. VAR's operational scope typically encompasses 10-30 cameras, ensuring high decision accuracy but at a substantial cost. This report proposes an innovative approach to offside detection using a single camera, such as the broadcasting camera, to mitigate expenses associated with sophisticated technological setups.
arxiv.org - Francesco Della Santa, Morgana Lalli
Abstract:This study presents a novel Deep Learning-based and lightweight approach for the automated detection of sports highlights (HLs) from audio and video sources. HL detection is a key task in sports video analysis, traditionally requiring significant human effort. Our solution leverages Deep Learning (DL) models trained on relatively small datasets of audio Mel-spectrograms and grayscale video frames, achieving promising accuracy rates of 89% and 83% for audio and video detection, respectively. The use of small datasets, combined with simple architectures, demonstrates the practicality of our method for fast and cost-effective deployment. Furthermore, an ensemble model combining both modalities shows improved robustness against false positives and false negatives. The proposed methodology offers a scalable solution for automated HL detection across various types of sports video content, reducing the need for manual intervention. Future work will focus on enhancing model architectures and extending this approach to broader scene-detection tasks in media analysis.
arxiv.org - Chang Liu, Chengcheng Ma, XuanQi Zhou
Abstract:This paper presents a comprehensive framework for time series prediction using a hybrid model that combines ARIMA and LSTM. The model incorporates feature engineering techniques, including embedding and PCA, to transform raw data into a lower-dimensional representation while retaining key information. The embedding technique is used to convert categorical data into continuous vectors, facilitating the capture of complex relationships. PCA is applied to reduce dimensionality and extract principal components, enhancing model performance and computational efficiency. To handle both linear and nonlinear patterns in the data, the ARIMA model captures linear trends, while the LSTM model models complex nonlinear dependencies. The hybrid model is trained on historical data and achieves high accuracy, as demonstrated by low RMSE and MAE scores. Additionally, the paper employs the run test to assess the randomness of sequences, providing insights into the underlying patterns. Ablation studies are conducted to validate the roles of different components in the model, demonstrating the significance of each module. The paper also utilizes the SHAP method to quantify the impact of traditional advantages on the predicted results, offering a detailed understanding of feature importance. The KNN method is used to determine the optimal prediction interval, further enhancing the model's accuracy. The results highlight the effectiveness of combining traditional statistical methods with modern deep learning techniques for robust time series forecasting in Sports.
arxiv.org - Tiago Mendes-Neves, Yassine Baghoussi, Luís Meireles, Carlos Soares, João Mendes-Moreira
Abstract:Forecasting sporting events encapsulate a compelling intellectual endeavor, underscored by the substantial financial activity of an estimated $80 billion wagered in global sports betting during 2022, a trend that grows yearly. Motivated by the challenges set forth in the Springer Soccer Prediction Challenge, this study presents a method for forecasting soccer match outcomes by forecasting the shot quantity and quality distributions. The methodology integrates established ELO ratings with machine learning models. The empirical findings reveal that, despite the constraints of the challenge, this approach yields positive returns, taking advantage of the established market odds.