degruyterbrill.com - Scott Powers, Luke Stancil, Naomi Consiglio
The progression of a single point in volleyball starts with a serve and then alternates between teams, each team allowed up to three contacts with the ball. Using charted data from the 2022 NCAA Division I womenâs volleyball season (4,147 matches, 600,000 points, more than 5 million recorded contacts), we model the progression of a point as a Markov chain with the state space defined by the sequence of contacts in the current possession. We estimate the probability of each team winning the point, which changes on each contact. We attribute changes in point probability to the player(s) responsible for each contact, facilitating measurement of performance on the same point scale for different skills. Traditional volleyball statistics do not allow apples-to-apples comparisons across skills, and they do not measure the impact of the performances on team success. For adversarial contact groups (serve/reception and set/attack/block/dig), we estimate a hierarchical linear model for the outcome, with random effects for the players involved; and we adjust performance for strength of schedule not only on the conference/team level but on the individual player level. We can use the results to answer practical questions for volleyball coaches.
substack.com - Chris Gunther
Fouls have been an especially hot topic throughout the last year, dating back to the start of 2024. Coming out of that yearâs All-Star game, there was a noticeable difference in the way referees officiated games.
github.com - Tanner Manett
A fullyâfeatured, GPUâaccelerated Python pipeline for estimating shotâlevel expected goals (xG) in ice hockey. This repository exposes the entire workflowâraw event data â engineered features â hyperâparameterâtuned model â evaluation plotsâso that students and researchers can reproduce results and propose improvements with minimal setup.
degruyterbrill.com - Melis Kizildemir, Ertugrul Akin, Altug Alkan
This work develops a family of solutions related to Shinâs model in producing accurate probability forecasts. More precisely, closed-form solutions are derived based on an analytical approach to known solutions and evaluated experimentally for sports betting data sets.
hudl.com - Lily Wood-blake
If I wanted to explore a playerâs passing in detail, there are several models that could help guide my analysis. I could use xPass to find out the likelihood of a given pass being completed, or pass clustering to group similar passes together and identify patterns. Within a few lines of code, I could understand how likely a passer is to take risks, the quality of their passing relative to expectation, the types of passes they typically make, and common passing relationships.
springer.com - Manan Shah, Arya Shah, Kripa Patel, Ameya Kshirsagar, Shlok Sanghvi & Vrundan Sojitra
In todayâs sports world, big data and artificial intelligence (AI) are turning vast amounts of informationâranging from player stats and biometrics to historical recordsâinto practical insights that drive better performance and smarter decisions. The objective of this paper is to explore the integration of big data with advanced technologies for predicting game outcomes, performing game analysis, and preventing injuries. We look at how techniques such as knowledge discovery in databases (KDD) uncover hidden patterns within extensive datasets and how techniques like logistic regression and neural networks apply these patterns to forecast game outcomes and refine strategy. Furthermore, we observe how innovations such as Microelectromechanical Systems (MEMS) and smart jerseys enable continuous, real-time health monitoring, enabling proactive injury prevention and optimized player management. By examining the various studies performed in this area, we better understand the current role of big data in sports and its future direction. Through this review, we aim to contribute to the advancement of sports technology and provide a basis for future research in this field.
degruyterbrill.com - Samuel Luxenberg, Refik Soyer and Sudip Bose
Penalty kicks are critical to game outcomes in soccer. The typical quantitative strategy is as follows. First, model such an interaction as a two or three strategy game between the kicker and the goalkeeper. Second, find the mixed-strategy Nash equilibrium (MSNE) to determine the âoptimalâ probabilities of each player choosing to kick or dive to either side or to the center of the goal. While this is the usual path to a solution, it is also fraught with many assumptions due to the nature of penalty kick data as well as due to implicit assumptions within the game theory model. In this paper, we introduce an alternative set of strategies, known as adversarial risk analysis (ARA), to determine optimal decisions for the penalty kick game. ARA, which is grounded in the principles of Bayesian decision analysis, allows the decision maker to avoid many of the assumptions and limitations encountered when using the game theory approach. By examining 2018â2019 Major League Soccer (MLS) penalty kicks on an individual basis, we show that our most accurate ARA model predicts the correct goalkeeper decision 65âŻ% of the time, while the game-theoretic model predicts correctly 51âŻ% of the time, a statistically significant difference.
degruyterbrill.com - Marie Hardt and Dan Nettleton
The Hardy distribution, derived by van der Ven (2012. The Hardy distribution for golf hole scores. Math. Gaz. 96: 428â438) and named after an idea by Hardy (1945. A mathematical theorem about golf. Math. Gaz. 29: 226â227), is a discrete probability distribution for modeling golf hole scores. According to the Hardy distribution, a golferâs score on a hole is determined by the par of the hole, the golferâs probability of hitting a good shot, and the golferâs probability of hitting a bad shot. To fit a Hardy distribution to golf scores on a hole, an analyst needs only the scores and a value for the par of the hole. We present four different hierarchical modeling strategies to jointly model the scores of multiple golfers on multiple holes during one golf tournament round using Hardy distributions. We then apply our new modeling strategies to golf hole scores from two very different golfer populations: male professional golfers on the PGA Tour and female high school golfers from the state of Iowa, USA. Probabilities of good and bad shots vary across holes for both the male professional golfers on the PGA Tour and the female high school golfers. We find little variation among the male professional golfers but substantial variation among the female high school golfers.
degruyterbrill.com - Vincent Renner , Konstantin Görgen , Alexander Woll , Hagen WÀsche and Melanie Schienle
Identifying success factors in football is of sporting and economic interest. However, research in this field for national teams and their competitions is rare despite the popularity of teams and events. Therefore, we analyze data for the UEFA EURO 2020 and, for comparison purposes, the previous tournament in 2016. To mitigate the challenges of perceived multicollinearity and a small sample size, and to identify the relevant variables, we apply the âLASSO Cross-fitted Stability-Selectionâ algorithm. This approach involves iterative splitting of data, with variables chosen via a âleast absolute shrinkage and selection operatorâ (LASSO) model (Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58: 267â288) on one half of the observations, while coefficients are estimated on the other half. Subsequently, we inspect the frequency of selection and stability of coefficient estimation for each variable over the repeated samples to identify factors as relevant. By that, we are able to differentiate generally valid success factors such as the market value ratio from on-field variables whose importance is tournament-dependent, e.g. the tackles attempted. As the latter is connected to a teamâs tactics, we conclude that their observed relevance is correlated to the results of the linked playing style in the specific tournaments. We also show the changing effect of these playing-styles on success across tournaments.
degruyterbrill.com - Mauro Florez, Michele Guindani, Marina Vannucci
Count data play a crucial role in sports analytics, providing valuable insights into various aspects of the game. Models that accurately capture the characteristics of count data are essential for making reliable inferences. In this paper, we propose the use of the ConwayâMaxwellâPoisson (CMP) model for analyzing count data in sports. The CMP model offers flexibility in modeling data with different levels of dispersion. Here we consider a bivariate CMP model that models the potential correlation between home and away scores by incorporating a random effect specification. We illustrate the advantages of the CMP model through simulations. We then analyze data from baseball and soccer games before, during, and after the COVID-19 pandemic. The performance of our proposed CMP model matches or outperforms standard Poisson and Negative Binomial models, providing a good fit and an accurate estimation of the observed effects in count data with any level of dispersion. The results highlight the robustness and flexibility of the CMP model in analyzing count data in sports, making it a suitable default choice for modeling a diverse range of count data types in sports, where the data dispersion may vary.
degruyterbrill.com - Bernardo Nipoti, Lorenzo Schiavon
While the use of expected goals (xG) as a metric for assessing soccer performance is increasingly prevalent, the uncertainty associated with their estimates is often overlooked. This work bridges this gap by providing easy-to-implement methods for uncertainty quantification in xG estimates derived from Bayesian models. Based on a convenient posterior approximation, we devise an online prior-to-posterior update scheme, aligning with the typical in-season model training in soccer. Additionally, we present a novel framework to assess and compare the performance dynamics of two teams during a match, while accounting for evolving match scores. Our approach is well-suited for graphical representation and improves interpretability. We validate the accuracy of our methods through simulations, and provide a real-world illustration using data from the Italian Serie A league.
degruyterbrill.com - Richard De Veaux, Anna Plantinga, Elizabeth Upton
The masters movement in swimming and running has exploded, resulting in an abundance of data to study the impact of age on performance. Analyzing data from masters events in running and swimming for athletes aged 35 to 80, we model the percentage increase in event time (or decrease in performance) by age and sex via stacked models that combine polynomial models, neural networks, and natural splines. To answer fundamental questions on the nature of performance decline for competitive athletes, we bootstrap the procedure to obtain confidence intervals. Cross-sectional masters data from the past decade are used to construct models, and the model predictions are compared to the trajectory of current world records by age and to estimates of decline using longitudinal data. Furthermore, the study explores the impact of constituent year, birth cohort, and participation effects, emphasizing the challenges in distinguishing age-related decline from factors like evolving training practices and varied participation rates. Our results give evidence that men generally decline more slowly than women, performance declines more rapidly for endurance events, athletes who participate more frequently decline more slowly than others, and masters level runners decline at rates roughly equivalent to world record holders.
degruyterbrill.com - Albert Cohen, Jimmy Risk
This paper presents a new framework for player valuation in European football, by fusing principles from financial mathematics and network theory. The valuation model leverages a âpassing matrixâ to encapsulate player interactions on the field, utilizing centrality measures to quantify individual influence. Unlike traditional approaches, such as regressing on past performance-salary data, this model focuses on in-game performance as a playerâs contributions evolve over time. Consequently, our model provides a dynamic and individualized framework for ascertaining a playerâs fair market value. The methodology is empirically validated through a case study in European football, employing real-world match and financial data. This cross-disciplinary mechanism for player valuation adapts the effect of connecting pay with performance, first seen in Scully ((1974). Pay and performance in major league baseball. Am. Econ. Rev. 64: 915â930), to include in-game contributions as well as expected present valuation of stochastic variables.
degruyterbrill.com - Ryan Pinheiro, Stefan Szymanski
This paper proposes novel approaches to measuring team productivity and evaluating trading efficiency in Major League Baseball (MLB) from 1995 to 2021 through an application of portfolio theory. The performance of individual players is measured using a structural approach relating player outcomes to team runs developed by Lindsey (1963. An investigation of strategies in baseball. Oper. Res. 11: 477â501). Using a portfolio theory framework, we treat MLB teams as a portfolio of players (assets), each of which can be defined by an expected contribution of runs per game and the variance of this measure. It is found that both the expected value and variance have a positive impact on team runs scored. Given our definition of teams characterized by their expected values and variances, we evaluate trading efficiency between teams given their pre-trade expected values and variances and the acquired playerâs pre-trade expected value and variance. We find that trade efficiency has improved over our timeframe, consistent with the growth in data-driven decision making used in MLB front offices.
nvidia.com - Mark Harris
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA. I wrote a previous post, An Easy Introduction to CUDA in 2013 that has been popular over the years. But CUDA programming has gotten easier, and GPUs have gotten much faster, so itâs time for an updated (and even easier) introduction.