statsbomb.com
The first course in our Academy provides a solid foundation in Football Analytics.You'll learn all the basics from understanding expected goals to analysing opposition tactics. This learning can be successfully applied to a role in professional football analysis, assist you with a future role or simply provide learning material to help develop your knowledge of data and analytics in football.This course contains 5 core lessons, each tuition video lasting between 30-50 minutes.The course is also available in Spanish at cursos.statsbomb.com.
arxiv.org - Joris Bekkers, Amod Sahasrabudhe
Abstract:A counterattack in soccer is a high speed, high intensity direct attack that can occur when a team transitions from a defensive state to an attacking state after regaining possession of the ball. The aim is to create a goal-scoring opportunity by convering a lot of ground with minimal passes before the opposing team can recover their defensive shape. The purpose of this research is to build gender-specific Graph Neural Networks to model the likelihood of a counterattack being successful and uncover what factors make them successful in professional soccer. These models are trained on a total of 20863 frames of synchronized on-ball event and spatiotemporal (broadcast) tracking data. This dataset is derived from 632 games of MLS (2022), NWSL (2022) and international soccer (2020-2022). With this data we demonstrate that gender-specific Graph Neural Networks outperform architecturally identical gender-ambiguous models in predicting the successful outcome of counterattacks. We show, using Permutation Feature Importance, that byline to byline speed, angle to the goal, angle to the ball and sideline to sideline speed are the node features with the highest impact on model performance. Additionally, we offer some illustrative examples on how to navigate the infinite solution search space to aid in identifying improvements for player decision making. This research is accompanied by an open-source repository containing all data and code, and it is also accompanied by an open-source Python package which simplifies converting spatiotemporal data into graphs. This package also facilitates testing, validation, training and prediction with this data. This should allow the reader to replicate and improve upon our research more easily.
arxiv.org - Koichi Fujii, Tomomi Matsui
Abstract:Constructing a suitable schedule for sports competitions is a crucial issue in sports scheduling. The round-robin tournament is a competition adopted in many professional sports. For most round-robin tournaments, it is considered undesirable that a team plays consecutive away or home matches; such an occurrence is called a break. Accordingly, it is preferable to reduce the number of breaks in a tournament. A common approach is first to construct a schedule and then determine a home-away assignment based on the given schedule to minimize the number of breaks (first-schedule-then-break). In this study, we concentrate on the problem that arises in the second stage of the first-schedule-then-break approach, namely, the break minimization problem(BMP). We prove that this problem can be reduced to an odd cycle transversal problem, the well-studied graph problem. These results lead to a new approximation algorithm for the BMP.
arxiv.org - Lee Kennedy-Shaffer
Abstract:From 2020 to 2023, Major League Baseball changed rules affecting team composition, player positioning, and game time. Understanding the effects of these rules is crucial for leagues, teams, players, and other relevant parties to assess their impact and to advocate either for further changes or undoing previous ones. Panel data and quasi-experimental methods provide useful tools for causal inference in these settings. I demonstrate this potential by analyzing the effect of the 2023 shift ban at both the league-wide and player-specific levels. Using difference-in-differences analysis, I show that the policy increased batting average on balls in play and on-base percentage for left-handed batters by a modest amount (nine points). For individual players, synthetic control analyses identify several players whose offensive performance (on-base percentage, on-base plus slugging percentage, and weighted on-base average) improved substantially (over 70 points in several cases) because of the rule change, and other players with previously high shift rates for whom it had little effect. This article both estimates the impact of this specific rule change and demonstrates how these methods for causal inference are potentially valuable for sports analytics -- at the player, team, and league levels -- more broadly.
arxiv.org - Thorsten Schank, Vivien Voigt, Christian Orthey
Abstract:It is well-established that the home advantage (HA), the phenomenon that on average the local team performs better than the visiting team, exists in many sports. In response to the COVID-19 outbreak, spectators were banned from football stadiums, which we leverage as a natural experiment to examine the impact of stadium spectators on HA. Using data from the first division of the German Bundesliga for seasons 2016/17 to 2023/24, we are the first to focus on a longer time horizon and consider not only the first but all three seasons subject to spectator regulations as well as two subsequent seasons without. We confirm previous studies regarding the disappearance of the HA in the last nine matches of season 2019/20. This drop materialised almost entirely through a reduction of home goals. The HA in season 2020/21 (with spectator ban during most matches) was very close to the pre-COVID-19 season 2018/19, indicating that teams became accustomed to the absence of spectators. For season 2021/22, with varying spectator regulations, we detect a U-shaped relationship between HA and the stadium utilisation rate, where HA increases considerably for matches with medium stadium utilisation which is associated with a larger difference in running distance between the home and away teams.
arxiv.org - Jiayuan Rao, Haoning Wu, Hao Jiang, Ya Zhang, Yanfeng Wang Weidi Xie
Abstract:As a globally celebrated sport, soccer has attracted widespread interest from fans over the world. This paper aims to develop a comprehensive multi-modal framework for soccer video understanding. Specifically, we make the following contributions in this paper: (i) we introduce SoccerReplay-1988, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1,988 complete matches, with an automated annotation pipeline; (ii) we present the first visual-language foundation model in the soccer domain, MatchVision, which leverages spatiotemporal information across soccer videos and excels in various downstream tasks; (iii) we conduct extensive experiments and ablation studies on action classification, commentary generation, and multi-view foul recognition, and demonstrate state-of-the-art performance on all of them, substantially outperforming existing models, which has demonstrated the superiority of our proposed data and model. We believe that this work will offer a standard paradigm for sports understanding research. The code and model will be publicly available for reproduction.
research.google - Mathias Bellaiche and Marc Wilson
We compare the performance of multimodal models on the understanding of time-series data when presented visually as plots compared to numerical values. We find significant performance improvements when presented with plots on tasks like fall detection.