nytimes.com - Roshane Thomas and Jacob Whitehead
West Ham United midfielder Lucas Paqueta has been charged with misconduct by the Football Association (FA) for allegedly getting booked deliberately “for the improper purpose of affecting the betting market”.
mlb.com
Several Individual Major League Records Are Now Held by Hall of Famer Josh Gibson, While Other Negro Leagues Stars Newly Appear on Leaderboards; Career Totals of Hall of Famers Like Mays, Willard Brown, Campanella, Doby, Irvin, Miñoso, Paige and Jackie Robinson Now Reflect Their Negro Leagues Feats; All Negro Leagues Stats Assembled by These Major League Ballplayers Are Now Available at MLB.com, with More Data Still Being Discovered
arxiv.org - Roberto Macrì Demartino, Leonardo Egidi, Nicola Torelli
Abstract:Over the last few years, there has been a growing interest in the prediction and modelling of competitive sports outcomes, with particular emphasis placed on this area by the Bayesian statistics and machine learning communities. In this paper, we have carried out a comparative evaluation of statistical and machine learning models to assess their predictive performance for the 2022 World Cup and for the 2024 Africa Cup of Nations by evaluating alternative summaries of past performances related to the involved teams. More specifically, we consider the Bayesian Bradley-Terry-Davidson model, which is a widely used statistical framework for ranking items based on paired comparisons that have been applied successfully in various domains, including football. The analysis was performed including in some canonical goal-based models both the Bradley-Terry-Davidson derived ranking and the widely recognized Coca-Cola FIFA ranking commonly adopted by football fans and amateurs.
arxiv.org - Benjamin Williams, Erin M. Schliep, Bailey Fosdick, Ryan Elmore
Abstract:Team and player evaluation in professional sport is extremely important given the financial implications of success/failure. It is especially critical to identify and retain elite shooters in the National Basketball Association (NBA), one of the premier basketball leagues worldwide because the ultimate goal of the game is to score more points than one's opponent. To this end we propose two novel basketball metrics: "expected points" for team-based comparisons and "expected points above average (EPAA)" as a player-evaluation tool. Both metrics leverage posterior samples from Bayesian hierarchical modeling framework to cluster teams and players based on their shooting propensities and abilities. We illustrate the concepts for the top 100 shot takers over the last decade and offer our metric as an additional metric for evaluating players.
arxiv.org - Maria Koshkina, James H. Elder
Abstract:Jersey number recognition is an important task in sports video analysis, partly due to its importance for long-term player tracking. It can be viewed as a variant of scene text recognition. However, there is a lack of published attempts to apply scene text recognition models on jersey number data. Here we introduce a novel public jersey number recognition dataset for hockey and study how scene text recognition methods can be adapted to this problem. We address issues of occlusions and assess the degree to which training on one sport (hockey) can be generalized to another (soccer). For the latter, we also consider how jersey number recognition at the single-image level can be aggregated across frames to yield tracklet-level jersey number labels. We demonstrate high performance on image- and tracklet-level tasks, achieving 91.4% accuracy for hockey images and 87.4% for soccer tracklets. Code, models, and data are available at : https://github.com/mkoshkina/jersey-number-pipeline
arxiv.org - Calvin Yeung, Kenjiro Ide, Keisuke Fujii
Abstract:Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset.
priceactionlab.com - Michael Harris
In my 14 years of financial social media presence, I have seen many market statistics fail verification. Most market statistics are essentially generated by backtests, which make assumptions and are prone to errors. Small sample sizes also pose a problem, but this relies on the assumption of correct calculation of the statistics initially. In many cases, the calculations are wrong. The primary cause of these errors is the hasty calculation of statistics to bolster market theories and validate specific biases, whether bullish or bearish. It is frightening that most people take these statistics at face value, especially if the presentation is fancy and the person presenting them has a large following.
theaiedge.io - Damien Benveniste
I often say that at some point in my career, I became more of an XGBoost modeler than a Machine Learning modeler. On large tabular datasets, it would provide close to optimum results without much effort. LightGBM and Catboost are obviously as good and sometimes better, but I will always keep a special place in my heart for XGBoost.
manning.com - Benjamin Tan Wei Hao, Shanoop Padmanabhan, and Varun Mallya
Delivering a successful machine learning project is hard. Design a Machine Learning System (From Scratch) makes it easier. In it, you’ll design a reliable ML system from the ground up, incorporating MLOps and DevOps along with a stack of proven infrastructure tools including Kubeflow, MLFlow, BentoML, Evidently, and Feast.In Design a Machine Learning System (From Scratch) you’ll learn how to:Set up an MLOps platformDeploy machine learning models to productionBuild end-to-end data pipelinesEffective monitoring and explainability
apple.com
The gambling industry in the US has exploded in recent years, and suffused every aspect of sports consumption. You can bet on who will win or lose just about any game in the world from your phone. In fact, you don't even have to just bet on games. You can bet on how many home runs a player will hit, or how many sets it will take to complete a given tennis match. So how does it all work? Who is setting the lines? Can a user actually make money? And how do the sportsbooks make money? On this episode, we speak with Isaac Rose-Berman, a professional sports gambler and author of the How Gambling Works newsletter. He talks about the tactics he uses to make money, and also how the betting sites make money from their users. We discuss market structure, the societal impact of the gambling boom, and what types of regulations might best curb the more harmful aspects of the industry.See omnystudio.com/listener for privacy information.
github.com - Richard McElreath
This course teaches data analysis, but it focuses on scientific models. The unfortunate truth about data is that nothing much can be done with it, until we say what caused it. We will prioritize conceptual, causal models and precise questions about those models. We will use Bayesian data analysis to connect scientific models to evidence. And we will learn powerful computational tools for coping with high-dimension, imperfect data of the kind that biologists and social scientists face. https://www.youtube.com/playlist?list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus