arxiv.org - Ryan Sides, Jane L. Harvill
For NCAA football, we provide a method for sports bettors to determine if they have a positive expected value bet based on the betting lines available to them and how they believe the game will end. The method we develop modifies probabilities based on a normal distribution using historical data. The result is that more common point differentials are given appropriate weights. We provide a freely available online tool for implementing our technique.
arxiv.org - Marius Ötting, Christian Deutscher, Carl Singleton, Luca De Angelis
Sports betting markets are proven real-world laboratories to test theories of asset pricing anomalies and risky behaviour. Using a high-frequency dataset provided directly by a major bookmaker, containing the odds and amounts staked throughout German Bundesliga football matches, we test for evidence of momentum in the betting and pricing behaviour after equalising goals. We find that bettors see value in teams that have the apparent momentum, staking about 40% more on them than teams that just conceded an equaliser. Still, there is no evidence that such perceived momentum matters on average for match outcomes or is associated with the bookmaker offering favourable odds. We also confirm that betting on the apparent momentum would lead to substantial losses for bettors.
fantasyfootballcommunity.com - Fantasy Football Community
The 2023 Women’s World Cup is under two weeks away which could mean the fantasy game is just around the corner. We look through some of the most important details to help you get up to speed so you can start planning.
statsbomb.com - Matt Edwards
One of my favorite articles that I’ve written while at StatsBomb was an introduction of a new metric titled Randy. If you haven’t read the article yet, it’s worth taking the time to check it out. A quick recap of Randy to start would be helpful though.
arxiv.org - Brian Szekely, Christian Sinnott, Savannah Halow, Gregory Ryan
The National Football League (NFL) Scouting Combine serves as a tool to evaluate the skills of prospective players and assess their readiness to play in the NFL. The development of machine learning brings new opportunities in assessing the utility of the Scouting Combine. Using machine and statistical learning, it may be possible to predict future success of prospective athletes, as well as predict which Scouting Combine tests are the most important. Results from statistical learning research have been contradicting whether the Scouting combine is a useful metric for player success. In this study, we investigate if machine learning can be used to determine matriculation and future success in the NFL. Using Scouting Combine data, we evaluate six different algorithms' ability to predict whether a potential draft pick will play a single NFL snap (matriculation). If a player is drafted, we predict how many snaps they go on to play (success). We are able to predict matriculation with 83% accuracy; however, we are unable to predict later success. Our best performing algorithm returns large error and low explained variance (RMSE=1,210 snaps; R2=0.17). These findings indicate that while the Scouting Combine can predict NFL matriculation, it may not be a reliable predictor of long-term player success.
arxiv.org - Nicholas Grieshop, Yong Feng, Guanyu Hu, Michael Schweinberger
Technological advances have paved the way for collecting high-resolution tracking data in basketball, football, and other team-based sports. Such data consist of interactions among players of competing teams indexed by space and time. High-resolution tracking data on interactions among players are vital to understanding and predicting the performance of teams, because the performance of a team is more than the sum of the strengths of its individual players. We introduce a continuous-time stochastic process as a model of interactions among players of competing teams indexed by space and time, discuss properties of the continuous-time stochastic process, and learn the stochastic process from high-resolution tracking data by pursuing a Bayesian approach. We present an application to Juventus Turin, Inter Milan, and other Italian football clubs.
arxiv.org - Leszek Szczecinski
Ranking is used in sport leagues to determine a champion and/or to decide on promotion/relegation of teams. Arguably, the best known ranking method relies on scores obtained by cumulating the points associated with the wins and the draws of all teams, which are then ranked by sorting the score obtained. There are two main problems with this ranking method. First, the meaning of the ranking is undefined, and, second, it depends on the relative value of the wins that is arbitrarily set. We remedy these issues by introducing a probabilistic model of the game results and by showing an interpretation of the ranking that is consistent with the model. We also propose a methodology to estimate the parameter of the model which allows us to objectively determine the value of the win. In particular, using data from the association football (soccer), we show that the value of the win is close to five (5) points.
arxiv.org - Ali Baouan, Sébastien Coustou, Mathieu Lacome, Sergio Pulido, Mathieu Rosenbaum
We introduce an innovative methodology to identify football players at the origin of threatening actions in a team. In our framework, a threat is defined as entering the opposing team's danger area. We investigate the timing of threat events and ball touches of players, and capture their correlation using Hawkes processes. Our model-based approach allows us to evaluate a player's ability to create danger both directly and through interactions with teammates. We define a new index, called Generation of Threat (GoT), that measures in an unbiased way the contribution of a player to threat generation. For illustration, we present a detailed analysis of Chelsea's 2016-2017 season, with a standout performance from Eden Hazard. We are able to credit each player for his involvement in danger creation and determine the main circuits leading to threat. In the same spirit, we investigate the danger generation process of Stade Rennais in the 2021-2022 season. Furthermore, we establish a comprehensive ranking of Ligue 1 players based on their generated threat in the 2021-2022 season. Our analysis reveals surprising results, with players such as Jason Berthomier, Moses Simon and Frederic Guilbert among the top performers in the GoT rankings. We also present a ranking of Ligue 1 central defenders in terms of generation of threat and confirm the great performance of some center-back pairs, such as Nayef Aguerd and Warmed Omari.
arxiv.org - Hiroshi Nakahara, Kazushi Tsutsui, Kazuya Takeda, Keisuke Fujii
Analysis of invasive sports such as soccer is challenging because the game situation changes continuously in time and space, and multiple agents individually recognize the game situation and make decisions. Previous studies using deep reinforcement learning have often considered teams as a single agent and valued the teams and players who hold the ball in each discrete event. Then it was challenging to value the actions of multiple players, including players far from the ball, in a spatiotemporally continuous state space. In this paper, we propose a method of valuing possible actions for on- and off-ball soccer players in a single holistic framework based on multi-agent deep reinforcement learning. We consider a discrete action space in a continuous state space that mimics that of Google research football and leverages supervised learning for actions in reinforcement learning. In the experiment, we analyzed the relationships with conventional indicators, season goals, and game ratings by experts, and showed the effectiveness of the proposed method. Our approach can assess how multiple players move continuously throughout the game, which is difficult to be discretized or labeled but vital for teamwork, scouting, and fan engagement.
arxiv.org - Silvio Giancola, Anthony Cioppa, Julia Georgieva, Johsan Billingham, Andreas Serner, Kerry Peek, Bernard Ghanem, Marc Van Drooge
Association football is a complex and dynamic sport, with numerous actions occurring simultaneously in each game. Analyzing football videos is challenging and requires identifying subtle and diverse spatio-temporal patterns. Despite recent advances in computer vision, current algorithms still face significant challenges when learning from limited annotated data, lowering their performance in detecting these patterns. In this paper, we propose an active learning framework that selects the most informative video samples to be annotated next, thus drastically reducing the annotation effort and accelerating the training of action spotting models to reach the highest accuracy at a faster pace. Our approach leverages the notion of uncertainty sampling to select the most challenging video clips to train on next, hastening the learning process of the algorithm. We demonstrate that our proposed active learning framework effectively reduces the required training data for accurate action spotting in football videos. We achieve similar performances for action spotting with NetVLAD on SoccerNet-v2, using only one-third of the dataset, indicating significant capabilities for reducing annotation time and improving data efficiency. We further validate our approach on two new datasets that focus on temporally localizing actions of headers and passes, proving its effectiveness across different action semantics in football. We believe our active learning framework for action spotting would support further applications of action spotting algorithms and accelerate annotation campaigns
arxiv.org - Yutao Cui, Chenkai Zeng, Xiaoyu Zhao, Yichun Yang, Gangshan Wu, Limin Wang
Multi-object tracking in sports scenes plays a critical role in gathering players statistics, supporting further analysis, such as automatic tactical analysis. Yet existing MOT benchmarks cast little attention on the domain, limiting its development. In this work, we present a new large-scale multi-object tracking dataset in diverse sports scenes, coined as \emph{SportsMOT}, where all players on the court are supposed to be tracked. It consists of 240 video sequences, over 150K frames (almost 15\times MOT17) and over 1.6M bounding boxes (3\times MOT17) collected from 3 sports categories, including basketball, volleyball and football. Our dataset is characterized with two key properties: 1) fast and variable-speed motion and 2) similar yet distinguishable appearance. We expect SportsMOT to encourage the MOT trackers to promote in both motion-based association and appearance-based association. We benchmark several state-of-the-art trackers and reveal the key challenge of SportsMOT lies in object association. To alleviate the issue, we further propose a new multi-object tracking framework, termed as \emph{MixSort}, introducing a MixFormer-like structure as an auxiliary association model to prevailing tracking-by-detection trackers. By integrating the customized appearance-based association with the original motion-based association, MixSort achieves state-of-the-art performance on SportsMOT and MOT17. Based on MixSort, we give an in-depth analysis and provide some profound insights into SportsMOT. The dataset and code will be available at this https URL.
su.domains - Gareth James,Daniela Witten,Trevor Hastie,Robert Tibshirani
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance, marketing, and astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, deep learning, survival analysis, multiple testing, and more. Color graphics and real-world examples are used to illustrate the methods presented. This book is targeted at statisticians and non-statisticians alike, who wish to use cutting-edge statistical learning techniques to analyze their data.
Four of the authors co-wrote An Introduction to Statistical Learning, With Applications in R (ISLR), which has become a mainstay of undergraduate and graduate classrooms worldwide, as well as an important reference book for data scientists. One of the keys to its success was that each chapter contains a tutorial on implementing the analyses and methods presented in the R scientific computing environment. However, in recent years Python has become a popular language for data science, and there has been increasing demand for a Python-based alternative to ISLR. Hence, this book (ISLP) covers the same materials as ISLR but with labs implemented in Python. These labs will be useful both for Python novices, as well as experienced users
statsbomb.com - Matt Edwards
There are many statistics available to evaluate wide receivers. Some basic stats like catches, receiving yards, and touchdowns. Some more advanced stats like catch percentage, yards after the catch, and target share. And even more advanced ones still like those captured by the NFL’s Next Gen Stats team, air yards per target, and expected yards after the catch. All of these give a good insight into what type of receiver a player is.
wiley.com - M. J. Maher
First published: September 1982
Previous authors have rejected the Poisson model for association football scores in favour of the Negative Binomial. This paper, however, investigates the Poisson model further. Parameters representing the teams' inherent attacking and defensive strengths are incorporated and the most appropriate model is found from a hierarchy of models. Observed and expected frequencies of scores are compared and goodness-of-fit tests show that although there are some small systematic differences, an independent Poisson model gives a reasonably accurate description of football scores. Improvements can be achieved by the use of a bivariate Poisson model with a correlation between scores of 0.2.
wordpress.com - James Grayson
As I’ve said before, it’s all well and good knowing that team ‘x’ took 20 shots in the first half against team ‘y’, but unless we know whether the number of shots a team takes is repeatable over time then trying to put that number into context is essentially useless.
statsbomb.com - Marek Kwiatkowski
I have been involved in football analytics for four years and doing it for a living since 2014. It has been a wonderful adventure, but there is no denying that the public side of the field has stalled. But this is not really a "crisis of analytics" piece or an indictment of the community. Instead, I want to point out one critical barrier to further advancement and plot a course around it. In short, I want to argue for a more theoretical, concept-driven approach to football analysis, which is in my opinion overdue.
researchgate.net - Mitchell Pearson, Glen Livingston Jr, Robert King
Predictive football modelling has become progressively popular over the last two decades. Due to this, numerous studies have proposed different types of statistical models to predict the outcome of a football match. This study provides a review of three different models published in the academic literature and then implements these on recent match data from the top football leagues in Europe. These models are then compared utilising the rank probability score to assess their predictive capability. Additionally, a modification is proposed which includes the travel distance of the away team. When tested on football leagues from both Australia and Russia, it is shown to improve predictive capability according to the rank probability score.
semanticscholar.org - S. Koopman, R. Lit
We develop a statistical model for the analysis and forecasting of football match results which assumes a bivariate Poisson distribution with intensity coefficients that change stochastically over time. The dynamic model is a novelty in the statistical time series analysis of match results in team sports. Our treatment is based on state space and importance sampling methods which are computationally efficient. The out‐of‐sample performance of our methodology is verified in a betting strategy that is applied to the match outcomes from the 2010–2011 and 2011–2012 seasons of the English football Premier League. We show that our statistical modelling framework can produce a significant positive return over the bookmaker's odds.
soccermetrics.net - Howard Hamilton
A couple of years ago I read Michael Lewis’ Moneyball, which is an excellent book on the 2002 Oakland Athletics and the behind-the-scenes maneuvering that assembled that team. I was living in the Bay Area when the A’s went on their 20-game winning streak and I remember the excitement in the East Bay during that time, so the book appealed to me on that level. But it’s more than just a baseball book. It’s really a story of the use of statistical methods to assemble a team in a sport so heavily depending on scouting evaluations, despite the reputation of being a statistics-heavy sport. Now that the A’s are running the San José Earthquakes, and the architect of that approach (GM Billy Beane) is seeking to apply it to soccer, the following question has to be asked: Can a ‘Moneyball’ approach be successful in soccer?
sciencedirect.com - Igor Barbosa da Costa, Leandro Balby Marinho, Carlos Eduardo Santos Pires
The continuous growth of available football data presents unprecedented research opportunities for a better understanding of football dynamics. While many research works focus on predicting which team will win a match, other interesting questions, such as whether both teams will score in a game, are still unexplored and have gained momentum with the rise of betting markets. With this in mind, we investigate the following research questions in this paper: “How difficult is the ‘both teams to score’ (BTTS) prediction problem?”, “Are machine learning classifiers capable of predicting BTTS better than bookmakers?”, and “Are machine learning classifiers useful for devising profitable betting strategies in the BTTS market?”. We collected historical football data, extracted groups of features to represent the teams’ strengths, and fed these to state-of-the-art classification models. We performed a comprehensive set of experiments and showed that, although hard to predict, in some scenarios it is possible to outperform bookmakers, which are robust baselines per se. More importantly, in some cases it is possible to beat the market and devise profitable strategies based on machine learning algorithms. The results are encouraging and, besides shedding light on the problem, may provide novel insights for all kinds of football stakeholders.
ssrn.com - D Winkelmann, M Oetting, C Deutscher
Research on sports betting often identifies biased evaluation by bookmakers and corresponding opportunities for profitable strategies to bettors. Such studies repeatedly provide evidence for the existence of biased betting odds for different periods and leagues, leaving the impression that inefficiencies are very common. Since most studies cover only a few seasons, the question remains whether these market inefficiencies persist over time. We review the literature on the big five leagues in European association football and then analyse 14 seasons to detect the occurrence and duration of betting market inefficiencies. While our results replicate the temporal findings of previous research, they also show that biases do not persist systematically over time and across leagues. Furthermore, a Monte Carlo simulation reveals that the number of inefficient periods barely exceeds what would be expected in an efficient market.
substack.com
In the first episode of Post Script, we travel all the way back to what Tiotal Football calls “the spiritual birth of soccer analytics blogging,” when a couple of people writing about soccer data started a conversation.
apple.com
Rob Pizzola and Johnny from betstamp are joined by Jeff Ma! Rob, Johnny, and Jeff discuss the MIT blackjack team, being involved in startups, the current state of twitter, and life advice!