youtube.com
In this special video discussion, David provides a complete overview of what SharpBetting offers, who itâs for, and how to get the most from your membership.đ§ What Youâll Learn:Why SharpBetting was created and whoâs behind itWhat sports and markets are currently coveredWhich membership tier suits you bestHow to use the filters, settings
pythonfootball.com - Martin
So Iâm starting something new from scratch. The Python Football Review
* What it is: A free weekly newsletter with handsâon Python templates and football deep-dives.
* Who itâs for: Fans, scouts, journalists, and complete beginners who want practical data skillsâno fluff and no jargon.
* When: Every Thursday.
* What to expect:
+ Deep dives into metrics like xG, xGOT, xT, PPDA, and more
+ Stepâbyâstep Python snippets for scraping, wrangling and analysing data
+ Book reviews, case studies, and âreplicateâthisâprojectâ guides
+ Occasional forecasting pieces (because thatâs where many of us started)
pena.lt - Martin Eastwood
I've recently been questioning whether RPS is really the best tool for evaluating football forecasts - especially when our ultimate goal is to identify the most informative models as efficiently and fairly as possible.In this article, Iâll explain why RPS might not be the optimal choice, introduce alternative scoring metrics like Log Loss (also known as Ignorance Score) and the multiclass Brier score, and share some experiments / ideas I've tested to explore which metrics are best suited for evaluating football predictive models.
substack.com - Alex Marin Felices
Using Fine-Tuned LEMs to Assess Soccer Players' Impact Across Various Contexts.
pythonfootball.com - Python Football Review
What xG really measures (and what it doesnât), the key misconception that trips up even seasoned pros, why its loudest critics are mostly wrong, and how to pull tons of xG data with Python.
substack.com - Alex Marin Felices
A New Approach for Faster and More Accurate Tournament Outcome Predictions.
pena.lt - Martin Eastwood
Imagine you're watching a soccer match and your team's midfielder has the ball at the halfway line. How dangerous is this position? What about if they dribble forward 10 yards? Or make a pass to the wing? Expected Threat (xT), originally developed Sarah Rudd and popularised by Karun Singh, attempts to answer these questions by quantifying the offensive value of every position on the pitch.Unlike simpler metrics such as expected goals (xG) that only measure shot quality, xT evaluates both immediate shooting opportunities and the potential for creating future scoring chances. This makes it useful for analyzing buildup play and measuring contributions from those players who don't directly create shots.
pena.lt - Martin Eastwood
Football analytics has come a long way in recent years, moving from simple league tables to more sophisticated methods of quantifying team performance. If youâve ever looked at Elo ratings or FIFA rankings, you know that rating systems attempt to provide a clearer picture of how good a team really is, beyond just the wins and losses. But are these systems as accurate as they could be?Imagine two teams: Team A beats Team B 1-0 in a closely fought match, while Team C thrashes Team D 5-0. Should Team A and Team C gain the same rating boost? Many traditional rating systems don't differentiate much between these results, even though one clearly signals a more dominant performance. This is where Pi Ratings come in â a dynamic rating system designed to better reflect team ability by considering score discrepancies, home vs. away performances, and recent form.
substack.com - Alex Marin Felices
The following summary critically reviews the research conducted by Lorenzo Lolli, Pascal Bauer, Callum Irving, Daniele Bonanno, Oliver Höner, Warren Gregson, and Valter Di Salvo, titled "Data analytics in the football industry: a survey investigating operational frameworks and practices in professional clubs and national federations from around the world." All data, figures, and analysis presented here are drawn from their original work; I do not claim any authorship or ownership of the content. This summary has been written to provide a concise and technically informed synthesis of the paperâs findings, methodologies, and implications, while maintaining fidelity to the authorsâ intellectual contributions.
substack.com - Alex Marin Felices
The concept of expected goals (xG) has become a fundamental metric in football analytics, estimating the likelihood of a shot resulting in a goal based on contextual features such as shot distance, angle, and body part used. However, mainstream xG models do not account for player-specific attributes, leading to a uniform probability assignment for identical shots taken by different players. This limitation disregards variations in individual skill levels, exemplified by a scenario where Lionel Messi and a National League player take the same shot under identical conditions but are assigned the same xG value. Intuitively, Messi's superior finishing ability should yield a higher probability of scoring, yet conventional xG models fail to incorporate this effect.
arxiv.org - Shomoita Alam, Erica E. M. Moodie, Lucas Y. Wu, Tim B. Swartz
Abstract:Causal inference has become an accepted analytic framework in settings where experimentation is impossible, which is frequently the case in sports analytics, particularly for studying in-game tactics. However, subtle differences in implementation can lead to important differences in interpretation. In this work, we provide a case study to demonstrate the utility and the nuance of these approaches. Motivated by a case study of crossing in soccer, two causal questions are considered: the overall impact of crossing on shot creation (Average Treatment Effect, ATE) and its impact in plays where crossing was actually attempted (Average Treatment Effect on the Treated, ATT). Using data from Shandong Taishan Luneng Football Club's 2017 season, we demonstrate how distinct matching strategies are used for different estimation targets - the ATE and ATT - though both aim to eliminate any spurious relationship between crossing and shot creation. Results suggest crossing yields a 1.6% additive increase in shot probability overall compared to not crossing (ATE), whereas the ATT is 5.0%. We discuss what insights can be gained from each estimand, and provide examples where one may be preferred over the alternative. Understanding and clearly framing analytics questions through a causal lens ensure rigorous analyses of complex questions.
arxiv.org - Simon Cha
Abstract:As a dedicated follower of sports statistics and with the MLB season beginning in late March, I set out to predict how many wins each team would accumulate by the end of the 162 game season. The goal was to build a simulation framework capable of forecasting the remainder of the season, starting from a 20 game burn-in period to establish initial estimates of team strength. My approach used a Bayesian inference model incorporating team win percentage, batting average, and pitching ERA to construct a posterior distribution of win probability for each matchup. For each game, I sampled from the posterior and simulated the outcome using a Bernoulli trial. Because future matchup inputs were unobserved, I forecasted batting averages using random walks and modeled pitching ERA with Kalman filters. After simulating many seasons, the model produced a distribution of win totals for all 30 teams and can also be used to estimate each team's probability of making the postseason.
arxiv.org - Ali Al-Bustami, Zaid Ghazal
Abstract:Accurate prediction of FIFA World Cup match outcomes holds significant value for analysts, coaches, bettors, and fans. This paper presents a machine learning framework specifically designed to forecast match winners in FIFA World Cup. By integrating both team-level historical data and player-specific performance metrics such as goals, assists, passing accuracy, and tackles, we capture nuanced interactions often overlooked by traditional aggregate models. Our methodology processes multi-year data to create year-specific team profiles that account for evolving rosters and player development. We employ classification techniques complemented by dimensionality reduction and hyperparameter optimization, to yield robust predictive models. Experimental results on data from the FIFA 2022 World Cup demonstrate our approach's superior accuracy compared to baseline method. Our findings highlight the importance of incorporating individual player attributes and team-level composition to enhance predictive performance, offering new insights into player synergy, strategic match-ups, and tournament progression scenarios. This work underscores the transformative potential of rich, player-centric data in sports analytics, setting a foundation for future exploration of advanced learning architectures such as graph neural networks to model complex team interactions.
arxiv.org - Ling You, Wenxuan Huang, Xinni Xie, Xiangyi Wei, Bangyan Li, Shaohui Lin, Yang Li, Changbo Wang
Abstract:Soccer is a globally popular sporting event, typically characterized by long matches and distinctive highlight moments. Recent advances in Multimodal Large Language Models (MLLMs) offer promising capabilities in temporal grounding and video understanding, soccer commentary generation often requires precise temporal localization and semantically rich descriptions over long-form video. However, existing soccer MLLMs often rely on the temporal a priori for caption generation, so they cannot process the soccer video end-to-end. While some traditional approaches follow a two-step paradigm that is complex and fails to capture the global context to achieve suboptimal performance. To solve the above issues, we present TimeSoccer, the first end-to-end soccer MLLM for Single-anchor Dense Video Captioning (SDVC) in full-match soccer videos. TimeSoccer jointly predicts timestamps and generates captions in a single pass, enabling global context modeling across 45-minute matches. To support long video understanding of soccer matches, we introduce MoFA-Select, a training-free, motion-aware frame compression module that adaptively selects representative frames via a coarse-to-fine strategy, and incorporates complementary training paradigms to strengthen the model's ability to handle long temporal sequences. Extensive experiments demonstrate that our TimeSoccer achieves State-of-The-Art (SoTA) performance on the SDVC task in an end-to-end form, generating high-quality commentary with accurate temporal alignment and strong semantic relevance.
arxiv.org - Mohamad Dalal, Artur Xarles, Anthony Cioppa, Silvio Giancola, Marc Van Droogenbroeck, Bernard Ghanem, Albert Clapés...
Abstract:Artificial intelligence has revolutionized the way we analyze sports videos, whether to understand the actions of games in long untrimmed videos or to anticipate the player's motion in future frames. Despite these efforts, little attention has been given to anticipating game actions before they occur. In this work, we introduce the task of action anticipation for football broadcast videos, which consists in predicting future actions in unobserved future frames, within a five- or ten-second anticipation window. To benchmark this task, we release a new dataset, namely the SoccerNet Ball Action Anticipation dataset, based on SoccerNet Ball Action Spotting. Additionally, we propose a Football Action ANticipation TRAnsformer (FAANTRA), a baseline method that adapts FUTR, a state-of-the-art action anticipation model, to predict ball-related actions. To evaluate action anticipation, we introduce new metrics, including mAP@\delta, which evaluates the temporal precision of predicted future actions, as well as mAP@\infty, which evaluates their occurrence within the anticipation window. We also conduct extensive ablation studies to examine the impact of various task settings, input configurations, and model architectures. Experimental results highlight both the feasibility and challenges of action anticipation in football videos, providing valuable insights into the design of predictive models for sports analytics. By forecasting actions before they unfold, our work will enable applications in automated broadcasting, tactical analysis, and player decision-making. Our dataset and code are publicly available at this URL.
arxiv.org - Tanmay Grandhisiri
Abstract:In the NFL draft, teams must strategically balance immediate player impact against long-term value, presenting a complex optimization challenge for draft capital management. This paper introduces a framework for evaluating the fairness and efficiency of draft pick trades using norm-based loss functions. Draft pick valuations are modelled by the Weibull distribution. Utilizing these valuation techniques, the research identifies key trade-offs between aggressive, immediate-impact strategies and conservative, risk-averse approaches. Ultimately, this framework serves as a valuable analytical tool for assessing NFL draft trade fairness and value distribution, aiding team decision-makers and enriching insights within the sports analytics community.
arxiv.org - Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés
Abstract:Action Valuation (AV) has emerged as a key topic in Sports Analytics, offering valuable insights by assigning scores to individual actions based on their contribution to desired outcomes. Despite a few surveys addressing related concepts such as Player Valuation, there is no comprehensive review dedicated to an in-depth analysis of AV across different sports. In this survey, we introduce a taxonomy with nine dimensions related to the AV task, encompassing data, methodological approaches, evaluation techniques, and practical applications. Through this analysis, we aim to identify the essential characteristics of effective AV methods, highlight existing gaps in research, and propose future directions for advancing the field.
arxiv.org - LĂĄszlĂł CsatĂł
Abstract:The organisers of major sports competitions use different policies with respect to constraints in the group draw. Our paper aims to rationalise these choices by analysing the trade-off between attractiveness (the number of games played by teams from the same geographic zone) and fairness (the departure of the draw mechanism from a uniform distribution). A parametric optimisation model is formulated and applied to the 2022 FIFA World Cup draw. A flaw of the draw procedure is identified: the pre-assignment of the host to a group implies additional but unnecessary distortions. All Pareto efficient sets of draw constraints are determined via simulations. The proposed framework can be used to find the optimal draw rules of a tournament and justify the distortion of the draw procedure for the stakeholders.
arxiv.org - Yohei Ogawa, Rikuhei Umemoto, Keisuke Fujii
Abstract:Soccer is a sport played on a pitch where effective use of space is crucial. Decision-making during transitions, when possession switches between teams, has been increasingly important, but research on space evaluation in these moments has been limited. Recent space evaluation methods such as OBSO (Off-Ball Scoring Opportunity) use scoring probability, so it is not well-suited for assessing areas far from the goal, where transitions typically occur. In this paper, we propose OBPV (Off-Ball Positioning Value) to evaluate space across the pitch, including the starting points of transitions. OBPV extends OBSO by introducing the field value model, which evaluates the entire pitch, and by employing the transition kernel model, which reflects positional specificity through kernel density estimation of pass distributions. Experiments using La Liga 2023/24 season tracking and event data show that OBPV highlights effective space utilization during counter-attacks and reveals team-specific characteristics in how the teams utilize space after positive and negative transitions.
arxiv.org - Hao Xu, Arbind Agrahari Baniya, Sam Well, Mohamed Reda Bouadjenek, Richard Dazeley, Sunil Aryal
Abstract:Video event detection has become an essential component of sports analytics, enabling automated identification of key moments and enhancing performance analysis, viewer engagement, and broadcast efficiency. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs) and Transformers, have significantly improved accuracy and efficiency in Temporal Action Localization (TAL), Action Spotting (AS), and Precise Event Spotting (PES). This survey provides a comprehensive overview of these three key tasks, emphasizing their differences, applications, and the evolution of methodological approaches. We thoroughly review and categorize existing datasets and evaluation metrics specifically tailored for sports contexts, highlighting the strengths and limitations of each. Furthermore, we analyze state-of-the-art techniques, including multi-modal approaches that integrate audio and visual information, methods utilizing self-supervised learning and knowledge distillation, and approaches aimed at generalizing across multiple sports. Finally, we discuss critical open challenges and outline promising research directions toward developing more generalized, efficient, and robust event detection frameworks applicable to diverse sports. This survey serves as a foundation for future research on efficient, generalizable, and multi-modal sports event detection.
openreview.net - Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman RĂ€dle...
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provides strong performance across a wide range of tasks. In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM). We believe that our data, model, and insights will serve as a significant milestone for video segmentation and related perception tasks. We are releasing our main model, the dataset, an interactive demo and code.
arxiv.org - Tiancheng Jiang, Henry Wang, Md Sirajus Salekin, Parmida Atighehchian, Shinan Zhang
Abstract:Vision Language Models (VLMs) have demonstrated strong performance in multi-modal tasks by effectively aligning visual and textual representations. However, most video understanding VLM research has been domain-agnostic, leaving the understanding of their transfer learning capability to specialized domains under-explored. In this work, we address this by exploring the adaptability of open-source VLMs to specific domains, and focusing on soccer as an initial case study. Our approach uses large-scale soccer datasets and LLM to create instruction-following data, and use them to iteratively fine-tune the general-domain VLM in a curriculum learning fashion (first teaching the model key soccer concepts to then question answering tasks). The final adapted model, trained using a curated dataset of 20k video clips, exhibits significant improvement in soccer-specific tasks compared to the base model, with a 37.5% relative improvement for the visual question-answering task and an accuracy improvement from 11.8% to 63.5% for the downstream soccer action classification task.
arxiv.org - Vladimir Golovkin, Nikolay Nemtsev, Vasyl Shandyba, Oleg Udin, Nikita Kasatkin, Pavel Kononov, Anton Afanasiev, Sergey Ulasen...
Abstract:Game State Reconstruction (GSR), a critical task in Sports Video Understanding, involves precise tracking and localization of all individuals on the football field-players, goalkeepers, referees, and others - in real-world coordinates. This capability enables coaches and analysts to derive actionable insights into player movements, team formations, and game dynamics, ultimately optimizing training strategies and enhancing competitive advantage. Achieving accurate GSR using a single-camera setup is highly challenging due to frequent camera movements, occlusions, and dynamic scene content. In this work, we present a robust end-to-end pipeline for tracking players across an entire match using a single-camera setup. Our solution integrates a fine-tuned YOLOv5m for object detection, a SegFormer-based camera parameter estimator, and a DeepSORT-based tracking framework enhanced with re-identification, orientation prediction, and jersey number recognition. By ensuring both spatial accuracy and temporal consistency, our method delivers state-of-the-art game state reconstruction, securing first place in the SoccerNet Game State Reconstruction Challenge 2024 and significantly outperforming competing methods.
dynomight.net - dynomight
They say you canât truly hate someone unless you loved them first. I donât know if thatâs true as a general principle, but it certainly describes my relationship with NumPy.NumPy, by the way, is some software that does computations on arrays in Python. Itâs insanely popular and has had a huge influence on all the popular machine learning libraries like PyTorch. These libraries share most of the same issues I discuss below, but Iâll stick to NumPy for concreteness.
openreview.net - Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia
Data Shapley offers a principled framework for attributing the contribution of data within machine learning contexts. However, the traditional notion of Data Shapley requires re-training models on various data subsets, which becomes computationally infeasible for large-scale models. Additionally, this retraining-based definition cannot evaluate the contribution of data for a specific model training run, which may often be of interest in practice. This paper introduces a novel concept, In-Run Data Shapley, which eliminates the need for model retraining and is specifically designed for assessing data contribution for a particular model of interest. In-Run Data Shapley calculates the Shapley value for each gradient update iteration and accumulates these values throughout the training process. We present several techniques that allow the efficient scaling of In-Run Data Shapley to the size of foundation models. In its most optimized implementation, our method adds negligible runtime overhead compared to standard model training. This dramatic efficiency improvement makes it possible to perform data attribution for the foundation model pretraining stage. We present several case studies that offer fresh insights into pretraining data's contribution and discuss their implications for copyright in generative AI and pretraining data curation.
bloomberg.com - Kit Chellel
Bill Benter did the impossible: He wrote an algorithm that couldnât lose at the track. Close to a billion dollars later, he tells his story for the first time.
pena.lt - Martin Eastwood
I've recently released version 1.1.0 of my penaltyblog Python package, bringing significant improvements to the speed and predictive performance of football (soccer) goals models. With this update, I thought it would be a great opportunity to compare the different models available â such as Poisson, Dixon and Coles, and more â exploring how they work, how to optimize their parameters, and how they perform on real-world data.Let's start off with a high-level look at the different models available, looking at how they work, what their strengths are and what their weaknesses are.