arxiv.org - Purnachandra Mandadapu
Abstract:This paper explores the significant history of professional football and the betting industry, tracing its evolution from clandestine beginnings to a lucrative multi-million-pound enterprise. Initiated by the legalization of gambling in 1960 and complemented by advancements in football data gathering pioneered by Thorold Charles Reep, the symbiotic relationship between these sectors has propelled rapid growth and innovation. Over the past six decades, both industries have undergone radical transformations, with data collection methods evolving from rudimentary notetaking to sophisticated technologies such as high-definition cameras and Artificial Intelligence (AI)-driven analytics. Therefore, the primary aim of this study is to utilize Machine Learning (ML) algorithms to forecast premier league football match outcomes. By analyzing historical data and investigating the significance of various features, the study seeks to identify the most effective predictive models and discern key factors influencing match results. Additionally, the study aims to utilize these forecasting to inform the establishment of bookmaker odds, providing insights into the impact of different variables on match outcomes. By highlighting the potential for informed decision-making in sports forecasting and betting, this study opens up new avenues for research and practical applications in the domain of sports analytics.
arxiv.org - Chawin Terawong, Dave Cliff
Abstract:We present first results from the use of XGBoost, a highly effective machine learning (ML) method, within the Bristol Betting Exchange (BBE), an open-source agent-based model (ABM) designed to simulate a contemporary sports-betting exchange with in-play betting during track-racing events such as horse races. We use the BBE ABM and its array of minimally-simple bettor-agents as a synthetic data generator which feeds into our XGBoost ML system, with the intention that XGBoost discovers profitable dynamic betting strategies by learning from the more profitable bets made by the BBE bettor-agents. After this XGBoost training, which results in one or more decision trees, a bettor-agent with a betting strategy determined by the XGBoost-learned decision tree(s) is added to the BBE ABM and made to bet on a sequence of races under various conditions and betting-market scenarios, with profitability serving as the primary metric of comparison and evaluation. Our initial findings presented here show that XGBoost trained in this way can indeed learn profitable betting strategies, and can generalise to learn strategies that outperform each of the set of strategies used for creation of the training data. To foster further research and enhancements, the complete version of our extended BBE, including the XGBoost integration, has been made freely available as an open-source release on GitHub.
substack.com - Swiss Ramble
The departures of Lionel Messi and Neymar last summer highlighted a dramatic change in Paris Saint-Germain’s strategy, as they appear to be moving away from the “Galacticos” model, replacing their expensive superstars with younger, hungrier players.
theguardian.com - Ed Aarons
“Can fans really enjoy the game when you know the money used to buy your favourite player comes from organised criminal groups proceeds or from a Russian oligarch who supports the war of aggression against Ukraine? I personally cannot.”
statsbomb.com - Matt Edwards
The opening of the 2024 NFL Draft in Detroit went exactly as you'd expect. The boos when Roger Goodell walked onto the stage, quickly turned to cheers when Eminem joined him, then back to boos when the commissioner started talking. The crowd kept getting louder, rising to a fever pitch as each successive Lions great came to the microphone. It may be the offseason, but the crowd (the largest in NFL Draft history for the full weekend) was in midseason form.
arxiv.org - Simon Lacan
Abstract:Datascouting is one of the most known data applications in professional sport, and specifically football. Its objective is to analyze huge database of players in order to detect high potentials that can be then individually considered by human scouts. In this paper, we propose a stacking-based deep learning model to detect high potential football players. Applied on open-source database, our model obtains significantly better results that classical statistical methods.
arxiv.org - Jan Held, Hani Itani, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck
Abstract:The rapid advancement of artificial intelligence has led to significant improvements in automated decision-making. However, the increased performance of models often comes at the cost of explainability and transparency of their decision-making processes. In this paper, we investigate the capabilities of large language models to explain decisions, using football refereeing as a testing ground, given its decision complexity and subjectivity. We introduce the Explainable Video Assistant Referee System, X-VARS, a multi-modal large language model designed for understanding football videos from the point of view of a referee. X-VARS can perform a multitude of tasks, including video description, question answering, action recognition, and conducting meaningful conversations based on video content and in accordance with the Laws of the Game for football referees. We validate X-VARS on our novel dataset, SoccerNet-XFoul, which consists of more than 22k video-question-answer triplets annotated by over 70 experienced football referees. Our experiments and human study illustrate the impressive capabilities of X-VARS in interpreting complex football clips. Furthermore, we highlight the potential of X-VARS to reach human performance and support football referees in the future.
arxiv.org - Enzo Brox, Michael Lechner
Abstract:This article shows how coworker performance affects individual performance evaluation in a teamwork setting at the workplace. We use high-quality data on football matches to measure an important component of individual performance, shooting performance, isolated from collaborative effects. Employing causal machine learning methods, we address the assortative matching of workers and estimate both average and heterogeneous effects. There is substantial evidence for spillover effects in performance evaluations. Coworker shooting performance, meaningfully impacts both, manager decisions and third-party expert evaluations of individual performance. Our results underscore the significant role coworkers play in shaping career advancements and highlight a complementary channel, to productivity gains and learning effects, how coworkers impact career advancement. We characterize the groups of workers that are most and least affected by spillover effects and show that spillover effects are reference point dependent. While positive deviations from a reference point create positive spillover effects, negative deviations are not harmful for coworkers.
arxiv.org - Quang Nguyen, Ruitong Jiang, Meg Ellingwood, Ronald Yurko
Abstract:Tackling is a fundamental defensive move in American football, with the main purpose of stopping the forward motion of the ball-carrier. However, current tackling metrics are manually recorded outcomes that are inherently flawed due to their discrete and subjective nature. Using player tracking data, we present a novel framework for assessing tackling contribution in a continuous and objective manner. Our approach first identifies when a defender is in a ``contact window'' of the ball-carrier during a play, before assigning value to each window and the players involved. This enables us to devise a new metric called fractional tackles, which credits defenders for halting the ball-carrier's forward motion toward the end zone. We demonstrate that fractional tackles overcome the shortcomings of traditional metrics such as tackles and assists, by providing greater variation and measurable information for players lacking recorded statistics like defensive linemen. We view our contribution as a significant step forward in measuring defensive performance in American football and a clear demonstration of the capabilities of player tracking data.
arxiv.org - Lachlan Thorpe, Lewis Bawden, Karanjot Vendal, John Bronskill, Richard E. Turner
Abstract:We present a transformer decoder based model, SportsNGEN, that is trained on sports player and ball tracking sequences that is capable of generating realistic and sustained gameplay. We train and evaluate SportsNGEN on a large database of professional tennis tracking data and demonstrate that by combining the generated simulations with a shot classifier and logic to start and end rallies, the system is capable of simulating an entire tennis match. In addition, a generic version of SportsNGEN can be customized to a specific player by fine-tuning on match data that includes that player. We show that our model is well calibrated and can be used to derive insights for coaches and broadcasters by evaluating counterfactual or what if options. Finally, we show qualitative results indicating the same approach works for football.
arxiv.org - Benjamin Lucas
Abstract:In December 2023 the Florida State Seminoles became the first Power 5 school to have an undefeated season and miss selection for the College Football Playoff. In order to assess this decision, we employed an Elo ratings model to rank the teams and found that the selection committee's decision was justified and that Florida State were not one of the four best teams in college football in that season (ranking only 11th!). We extended this analysis to all other years of the CFP and found that the top four teams by Elo ratings differ greatly from the four teams selected in almost every year of the CFP's existence. Furthermore, we found that there have been more egregious non-selections including when Alabama was ranked first by Elo ratings in 2022 and were not selected. The analysis suggests that the current criteria are too subjective and a ratings model should be implemented to provide transparency for the sport, its teams, and its fans.
arxiv.org - Daniil Sulimov
Abstract:We developed an artificial intelligence approach to predict the transfer fee of a football player. This model can help clubs make better decisions about which players to buy and sell, which can lead to improved performance and increased club budgets. Having collected data on player performance, transfer fees, and other factors that might affect a player's value, we then used this data to train a machine learning model that can accurately predict a player's impact on the game. We further passed the obtained results as one of the features to the predictor of transfer fees. The model can help clubs identify players who are undervalued and who could be sold for a profit. It can also help clubs avoid overpaying for players. We believe that our model can be a valuable tool for football clubs. It can help them make better decisions about player recruitment and transfers.
arxiv.org - Luke S. Benz, Thompson J. Bliss, Michael J. Lopez
Abstract:The existence and justification to the home advantage -- the benefit a sports team receives when playing at home -- has been studied across sport. The majority of research on this topic is limited to individual leagues in short time frames, which hinders extrapolation and a deeper understanding of possible causes. Using nearly two decades of data from the National Football League (NFL), the National Collegiate Athletic Association (NCAA), and high schools from across the United States, we provide a uniform approach to understanding the home advantage in American football. Our findings suggest home advantage is declining in the NFL and the highest levels of collegiate football, but not in amateur football. This increases the possibility that characteristics of the NCAA and NFL, such as travel improvements and instant replay, have helped level the playing field.
arxiv.org - Adnan Azmat, Su Su Yi
Abstract:This paper introduces the Expected Booking (xB) model, a novel metric designed to estimate the likelihood of a foul resulting in a yellow card in football. Through three iterative experiments, employing ensemble methods, the model demonstrates improved performance with additional features and an expanded dataset. Analysis of FIFA World Cup 2022 data validates the model's efficacy in providing insights into team and player fouling tactics, aligning with actual defensive performance. The xB model addresses a gap in fouling efficiency examination, emphasizing defensive strategies which often overlooked. Further enhancements are suggested through the incorporation of comprehensive data and spatial features.
arxiv.org - Ruixin Peng, Ziqing Li
Abstract:Tennis is so popular that coaches and players are curious about factors other than skill, such as momentum. This article will try to define and quantify momentum, providing a basis for real-time analysis of tennis matches. Based on the tennis Grand Slam men's singles match data in recent years, we built two models, one is to build a model based on data-driven, and the other is to build a model based on empirical formulas. For the data-driven model, we first found a large amount of public data including public data on tennis matches in the past five years and personal information data of players. Then the data is preprocessed, and feature engineered, and a fusion model of SVM, Random Forrest algorithm and XGBoost was established. For the mechanism analysis model, important features were selected based on the suggestions of many tennis players and enthusiasts, the sliding window algorithm was used to calculate the weight, and different methods were used to visualize the momentum. For further analysis of the momentum fluctuation, it is based on the popular CUMSUM algorithm in the industry as well as the RUN Test, and the result shows the momentum is not random and the trend might be random. At last, the robustness of the fusion model is analyzed by Monte Carlo simulation.
arxiv.org - Jingdi Lei, Tianqi Kang, Yuluan Cao, Shiwei Ren
Abstract:This paper represents an analysis on the momentum of tennis match. And due to Generalization performance of it, it can be helpful in constructing a system to predict the result of sports game and analyze the performance of player based on the Technical statistics. We First use hidden markov models to predict the momentum which is defined as the performance of players. Then we use Xgboost to prove the significance of momentum. Finally we use LightGBM to evaluate the performance of our model and use SHAP feature importance ranking and weight analysis to find the key points that affect the performance of players.
arxiv.org - Ethan Baron, Nathan Sandholtz, Devin Pleuler, Timothy C. Y. Chan
Abstract:Measuring soccer shooting skill is a challenging analytics problem due to the scarcity and highly contextual nature of scoring events. The introduction of more advanced data surrounding soccer shots has given rise to model-based metrics which better cope with these challenges. Specifically, metrics such as expected goals added, goals above expectation, and post-shot expected goals all use advanced data to offer an improvement over the classical conversion rate. However, all metrics developed to date assign a value of zero to off-target shots, which account for almost two-thirds of all shots, since these shots have no probability of scoring. We posit that there is non-negligible shooting skill signal contained in the trajectories of off-target shots and propose two shooting skill metrics that incorporate the signal contained in off-target shots. Specifically, we develop a player-specific generative model for shot trajectories based on a mixture of truncated bivariate Gaussian distributions. We use this generative model to compute metrics that allow us to attach non-zero value to off-target shots. We demonstrate that our proposed metrics are more stable than current state-of-the-art metrics and have increased predictive power.
arxiv.org - Grzegorz Rypeść, Grzegorz Kurzejamski
Abstract:This work introduces a novel end-to-end approach for estimating extrinsic parameters of cameras in multi-camera setups on real-life sports fields. We identify the source of significant calibration errors in multi-camera environments and address the limitations of existing calibration methods, particularly the disparity between theoretical models and actual sports field characteristics. We propose the Evolutionary Stitched Camera calibration (ESC) algorithm to bridge this gap. It consists of image segmentation followed by evolutionary optimization of a novel loss function, providing a unified and accurate multi-camera calibration solution with high visual fidelity. The outcome allows the creation of virtual stitched views from multiple video sources, being as important for practical applications as numerical accuracy. We demonstrate the superior performance of our approach compared to state-of-the-art methods across diverse real-life football fields with varying physical characteristics.
arxiv.org - Vladimir Somers, Victor Joos, Anthony Cioppa, Silvio Giancola, Seyed Abolfazl Ghasemzadeh, Floriane Magera, ...
Abstract:Tracking and identifying athletes on the pitch holds a central role in collecting essential insights from the game, such as estimating the total distance covered by players or understanding team tactics. This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i.e. a minimap). However, reconstructing the game state from videos captured by a single camera is challenging. It requires understanding the position of the athletes and the viewpoint of the camera to localize and identify players within the field. In this work, we formalize the task of Game State Reconstruction and introduce SoccerNet-GSR, a novel Game State Reconstruction dataset focusing on football videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number. Furthermore, we introduce GS-HOTA, a novel metric to evaluate game state reconstruction methods. Finally, we propose and release an end-to-end baseline for game state reconstruction, bootstrapping the research on this task. Our experiments show that GSR is a challenging novel task, which opens the field for future research. Our dataset and codebase are publicly available at this https URL.
arxiv.org - Karolina Seweryn, Gabriel Chęć, Szymon Łukasik, Anna Wróblewska
Abstract:This study explores the potential of super-resolution techniques in enhancing object detection accuracy in football. Given the sport's fast-paced nature and the critical importance of precise object (e.g. ball, player) tracking for both analysis and broadcasting, super-resolution could offer significant improvements. We investigate how advanced image processing through super-resolution impacts the accuracy and reliability of object detection algorithms in processing football match footage. Our methodology involved applying state-of-the-art super-resolution techniques to a diverse set of football match videos from SoccerNet, followed by object detection using Faster R-CNN. The performance of these algorithms, both with and without super-resolution enhancement, was rigorously evaluated in terms of detection accuracy. The results indicate a marked improvement in object detection accuracy when super-resolution preprocessing is applied. The improvement of object detection through the integration of super-resolution techniques yields significant benefits, especially for low-resolution scenarios, with a notable 12\% increase in mean Average Precision (mAP) at an IoU (Intersection over Union) range of 0.50:0.95 for 320x240 size images when increasing the resolution fourfold using RLFN. As the dimensions increase, the magnitude of improvement becomes more subdued; however, a discernible improvement in the quality of detection is consistently evident. Additionally, we discuss the implications of these findings for real-time sports analytics, player tracking, and the overall viewing experience. The study contributes to the growing field of sports technology by demonstrating the practical benefits and limitations of integrating super-resolution techniques in football analytics and broadcasting.
arxiv.org - Atom Scott, Ikuma Uchida, Ning Ding, Rikuhei Umemoto, Rory Bunker, Ren Kobayashi, Takeshi Koyama, Masaki Onishi
Abstract:Multi-object tracking (MOT) is a critical and challenging task in computer vision, particularly in situations involving objects with similar appearances but diverse movements, as seen in team sports. Current methods, largely reliant on object detection and appearance, often fail to track targets in such complex scenarios accurately. This limitation is further exacerbated by the lack of comprehensive and diverse datasets covering the full view of sports pitches. Addressing these issues, we introduce TeamTrack, a pioneering benchmark dataset specifically designed for MOT in sports. TeamTrack is an extensive collection of full-pitch video data from various sports, including soccer, basketball, and handball. Furthermore, we perform a comprehensive analysis and benchmarking effort to underscore TeamTrack's utility and potential impact. Our work signifies a crucial step forward, promising to elevate the precision and effectiveness of MOT in complex, dynamic settings such as team sports. The dataset, project code and competition is released at: this https URL.
arxiv.org - Floriane Magera, Thomas Hoyoux, Olivier Barnich, Marc Van Droogenbroeck
Abstract:Camera calibration is a crucial component in the realm of sports analytics, as it serves as the foundation to extract 3D information out of the broadcast images. Despite the significance of camera calibration research in sports analytics, progress is impeded by outdated benchmarking criteria. Indeed, the annotation data and evaluation metrics provided by most currently available benchmarks strongly favor and incite the development of sports field registration methods, i.e. methods estimating homographies that map the sports field plane to the image plane. However, such homography-based methods are doomed to overlook the broader capabilities of camera calibration in bridging the 3D world to the image. In particular, real-world non-planar sports field elements (such as goals, corner flags, baskets, ...) and image distortion caused by broadcast camera lenses are out of the scope of sports field registration methods. To overcome these limitations, we designed a new benchmarking protocol, named ProCC, based on two principles: (1) the protocol should be agnostic to the camera model chosen for a camera calibration method, and (2) the protocol should fairly evaluate camera calibration methods using the reprojection of arbitrary yet accurately known 3D objects. Indirectly, we also provide insights into the metric used in SoccerNet-calibration, which solely relies on image annotation data of viewed 3D objects as ground truth, thus implementing our protocol. With experiments on the World Cup 2014, CARWC, and SoccerNet datasets, we show that our benchmarking protocol provides fairer evaluations of camera calibration methods. By defining our requirements for proper benchmarking, we hope to pave the way for a new stage in camera calibration for sports applications with high accuracy standards.
arxiv.org - Qi Li, Tzu-Chen Chiu, Hsiang-Wei Huang, Min-Te Sun, Wei-Shinn Ku
Abstract:In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.
arxiv.org - Vuko Vukcevic, Robert Keser
Abstract:The paper provides a mathematical model and a tool for the focused investing strategy as advocated by Buffett, Munger, and others from this investment community. The approach presented here assumes that the investor's role is to think about probabilities of different outcomes for a set of businesses. Based on these assumptions, the tool calculates the optimal allocation of capital for each of the investment candidates. The model is based on a generalized Kelly Criterion with options to provide constraints that ensure: no shorting, limited use of leverage, providing a maximum limit to the risk of permanent loss of capital, and maximum individual allocation. The software is applied to an example portfolio from which certain observations about excessive diversification are obtained. In addition, the software is made available for public use.
alphaarchitect.com - Tommi Johnsen, PhD
Can AI models improve on the failures in predicting returns strictly from a practical point of view? In this paper, the possibilities are tested with a battery of AI models including linear regression, dimensional reduction methods, regression trees and neural networks. These machine learning models may be better equipped to address the multidimensional nature of stock returns when compared to traditional sorting and cross-sectional regressions used in factor research. The authors hope to overcome the drawbacks and confirm the results of traditional quant methods. As it turns out, those hopes are only weakly fulfilled by the MLM framework.
bishopbook.com - Christopher Bishop
This book offers a comprehensive introduction to the central ideas that underpin deep learning. It is intended both for newcomers to machine learning and for those already experienced in the field. Covering key concepts relating to contemporary architectures and techniques, this essential book equips readers with a robust foundation for potential future specialization. The field of deep learning is undergoing rapid evolution, and therefore this book focusses on ideas that are likely to endure the test of time.The book is organized into numerous bite-sized chapters, each exploring a distinct topic, and the narrative follows a linear progression, with each chapter building upon content from its predecessors. This structure is well-suited to teaching a two-semester undergraduate or postgraduate machine learning course, while remaining equally relevant to those engaged in active research or in self-study.A full understanding of machine learning requires some mathematical background and so the book includes a self-contained introduction to probability theory. However, the focus of the book is on conveying a clear understanding of ideas, with emphasis on the real-world practical value of techniques rather than on abstract theory. Complex concepts are therefore presented from multiple complementary perspectives including textual descriptions, diagrams, mathematical formulae, and pseudo-code.
A free-to-use online version is available.
arxiv.org - Anna Calissano, Matteo Fontana, Gianluca Zeni, Simone Vantini
Abstract:The analysis of data such as graphs has been gaining increasing attention in the past years. This is justified by the numerous applications in which they appear. Several methods are present to predict graphs, but much fewer to quantify the uncertainty of the prediction. The present work proposes an uncertainty quantification methodology for graphs, based on conformal prediction. The method works both for graphs with the same set of nodes (labelled graphs) and graphs with no clear correspondence between the set of nodes across the observed graphs (unlabelled graphs). The unlabelled case is dealt with the creation of prediction sets embedded in a quotient space. The proposed method does not rely on distributional assumptions, it achieves finite-sample validity, and it identifies interpretable prediction sets. To explore the features of this novel forecasting technique, we perform two simulation studies to show the methodology in both the labelled and the unlabelled case. We showcase the applicability of the method in analysing the performance of different teams during the FIFA 2018 football world championship via their player passing networks.
arxiv.org - Hadiseh Safdari, Caterina De Bacco
Abstract:Anomaly detection is an essential task in the analysis of dynamic networks, as it can provide early warning of potential threats or abnormal behavior. We present a principled approach to detect anomalies in dynamic networks that integrates community structure as a foundational model for regular behavior. Our model identifies anomalies as irregular edges while capturing structural changes. Leveraging a Markovian approach for temporal transitions and incorporating structural information via latent variables for communities and anomaly detection, our model infers these hidden parameters to pinpoint abnormal interactions within the network. Our approach is evaluated on both synthetic and real-world datasets. Real-world network analysis shows strong anomaly detection across diverse scenarios. In a more specific study of transfers of professional male football players, we observe various types of unexpected patterns and investigate how the country and wealth of clubs influence interactions. Additionally, we identify anomalies between clubs with incompatible community memberships, but also instances of anomalous transactions between clubs with similar memberships. The latter is due in particular to the dynamic nature of the transactions, as we find that the frequency of transfers results in anomalous behaviors that are otherwise expected to interact as they belong to similar communities.
towardsdatascience.com - Jonte Dancker
An important but easy-to-use tool for uncertainty quantification every data scientist should know.