Sports Analytics Weekly by kubeia.io - 22/2024

Your weekly serving of sports analytics insights.

2026
2025
2024
2023
2022

Week 52
Week 51
Week 50
Week 49
Week 48
Week 47
Week 46
Week 45
Week 44
Week 43
Week 42
Week 41
Week 40
Week 39
Week 38
Week 37
Week 36
Week 35
Week 34
Week 31
Week 30
Week 29
Week 28
Week 27
Week 26
Week 25
Week 24
Week 23
Week 22
Week 21
Week 20
Week 19
Week 18
Week 17
Week 16
Week 15
Week 14
Week 13
Week 12
Week 11
Week 10
Week 9
Week 8
Week 7
Week 6
Week 5
Week 4
Week 3
Week 2
Week 1

🎲 Betting

Lucas Paqueta betting charge: What are West Ham legal options? Could he be banned for life?

nytimes.com - Roshane Thomas and Jacob Whitehead

West Ham United midfielder Lucas Paqueta has been charged with misconduct by the Football Association (FA) for allegedly getting booked deliberately “for the improper purpose of affecting the betting market”.

⚽ Sports

Statistics of the Negro Leagues officially enter the Major League record (Press release)

mlb.com

Several Individual Major League Records Are Now Held by Hall of Famer Josh Gibson, While Other Negro Leagues Stars Newly Appear on Leaderboards; Career Totals of Hall of Famers Like Mays, Willard Brown, Campanella, Doby, Irvin, Miñoso, Paige and Jackie Robinson Now Reflect Their Negro Leagues Feats; All Negro Leagues Stats Assembled by These Major League Ballplayers Are Now Available at MLB.com, with More Data Still Being Discovered

📝 Sports Analytics

Alternative ranking measures to predict international football results

arxiv.org - Roberto Macrì Demartino, Leonardo Egidi, Nicola Torelli

Abstract:Over the last few years, there has been a growing interest in the prediction and modelling of competitive sports outcomes, with particular emphasis placed on this area by the Bayesian statistics and machine learning communities. In this paper, we have carried out a comparative evaluation of statistical and machine learning models to assess their predictive performance for the 2022 World Cup and for the 2024 Africa Cup of Nations by evaluating alternative summaries of past performances related to the involved teams. More specifically, we consider the Bayesian Bradley-Terry-Davidson model, which is a widely used statistical framework for ranking items based on paired comparisons that have been applied successfully in various domains, including football. The analysis was performed including in some canonical goal-based models both the Bradley-Terry-Davidson derived ranking and the widely recognized Coca-Cola FIFA ranking commonly adopted by football fans and amateurs.

Expected Points Above Average: A Novel NBA Player Metric Based on Bayesian Hierarchical Modeling

arxiv.org - Benjamin Williams, Erin M. Schliep, Bailey Fosdick, Ryan Elmore

Abstract:Team and player evaluation in professional sport is extremely important given the financial implications of success/failure. It is especially critical to identify and retain elite shooters in the National Basketball Association (NBA), one of the premier basketball leagues worldwide because the ultimate goal of the game is to score more points than one's opponent. To this end we propose two novel basketball metrics: "expected points" for team-based comparisons and "expected points above average (EPAA)" as a player-evaluation tool. Both metrics leverage posterior samples from Bayesian hierarchical modeling framework to cluster teams and players based on their shooting propensities and abilities. We illustrate the concepts for the top 100 shot takers over the last decade and offer our metric as an additional metric for evaluating players.

👁️ Computer Vision

A General Framework for Jersey Number Recognition in Sports Video

arxiv.org - Maria Koshkina, James H. Elder

Abstract:Jersey number recognition is an important task in sports video analysis, partly due to its importance for long-term player tracking. It can be viewed as a variant of scene text recognition. However, there is a lack of published attempts to apply scene text recognition models on jersey number data. Here we introduce a novel public jersey number recognition dataset for hockey and study how scene text recognition methods can be adapted to this problem. We address issues of occlusions and assess the degree to which training on one sport (hockey) can be generalized to another (soccer). For the latter, we also consider how jersey number recognition at the single-image level can be aggregated across frames to yield tracklet-level jersey number labels. We demonstrate high performance on image- and tracklet-level tasks, achieving 91.4% accuracy for hockey images and 87.4% for soccer tracklets. Code, models, and data are available at : https://github.com/mkoshkina/jersey-number-pipeline

AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements

arxiv.org - Calvin Yeung, Kenjiro Ide, Keisuke Fujii

Abstract:Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset.

💰 Quantitative Finance

Market Statistics Should Be Scrutinized Carefully

priceactionlab.com - Michael Harris

In my 14 years of financial social media presence, I have seen many market statistics fail verification. Most market statistics are essentially generated by backtests, which make assumptions and are prone to errors. Small sample sizes also pose a problem, but this relies on the assumption of correct calculation of the statistics initially. In many cases, the calculations are wrong. The primary cause of these errors is the hasty calculation of statistics to bolster market theories and validate specific biases, whether bullish or bearish. It is frightening that most people take these statistics at face value, especially if the presentation is fancy and the person presenting them has a large following.

🤖 Machine Learning

Design a Machine Learning System (From Scratch)

manning.com - Benjamin Tan Wei Hao, Shanoop Padmanabhan, and Varun Mallya

Delivering a successful machine learning project is hard. Design a Machine Learning System (From Scratch) makes it easier. In it, you’ll design a reliable ML system from the ground up, incorporating MLOps and DevOps along with a stack of proven infrastructure tools including Kubeflow, MLFlow, BentoML, Evidently, and Feast.In Design a Machine Learning System (From Scratch) you’ll learn how to:Set up an MLOps platformDeploy machine learning models to productionBuild end-to-end data pipelinesEffective monitoring and explainability

Understanding XGBoost from A to Z!

theaiedge.io - Damien Benveniste

I often say that at some point in my career, I became more of an XGBoost modeler than a Machine Learning modeler. On large tabular datasets, it would provide close to optimum results without much effort. LightGBM and Catboost are obviously as good and sometimes better, but I will always keep a special place in my heart for XGBoost.

🎙️ Podcast

Odd Lots: How a Professional Sports Bettor Really Makes Money (Apple Podcasts)

apple.com

The gambling industry in the US has exploded in recent years, and suffused every aspect of sports consumption. You can bet on who will win or lose just about any game in the world from your phone. In fact, you don't even have to just bet on games. You can bet on how many home runs a player will hit, or how many sets it will take to complete a given tennis match. So how does it all work? Who is setting the lines? Can a user actually make money? And how do the sportsbooks make money? On this episode, we speak with Isaac Rose-Berman, a professional sports gambler and author of the How Gambling Works newsletter. He talks about the tactics he uses to make money, and also how the betting sites make money from their users. We discuss market structure, the societal impact of the gambling boom, and what types of regulations might best curb the more harmful aspects of the industry.See omnystudio.com/listener for privacy information.

🧮 Statistics

rmcelreath/stat_rethinking_2024

github.com - Richard McElreath

This course teaches data analysis, but it focuses on scientific models. The unfortunate truth about data is that nothing much can be done with it, until we say what caused it. We will prioritize conceptual, causal models and precise questions about those models. We will use Bayesian data analysis to connect scientific models to evidence. And we will learn powerful computational tools for coping with high-dimension, imperfect data of the kind that biologists and social scientists face. https://www.youtube.com/playlist?list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus

This newsletter is brought to you by κυβεῖα. Kubeia is an innovative startup revolutionizing sports predictions with its user-friendly, no-code machine learning platform.

Don't forget to follow us on social media:

Twitter
Instagram
YouTube

Terms and conditions - Privacy policy