Sports Analytics Weekly by kubeia.io - 27/2023

Your weekly serving of sports analytics insights.

2025
2024
2023
2022

Week 52
Week 51
Week 49
Week 48
Week 47
Week 46
Week 45
Week 44
Week 43
Week 42
Week 41
Week 40
Week 39
Week 38
Week 37
Week 36
Week 35
Week 34
Week 33
Week 32
Week 31
Week 30
Week 29
Week 28
Week 27
Week 26
Week 25
Week 24
Week 23
Week 22
Week 21
Week 20
Week 19
Week 18
Week 16
Week 14
Week 13
Week 12
Week 10
Week 9
Week 6
Week 5
Week 4
Week 3
Week 1

📝 Sports Analytics

Extending the Dixon and Coles model: an application to women’s football data

arxiv.org - Rouven Michels, Marius Otting, Dimitris Karlis

The prevalent model by Dixon and Coles (1997) extends the double Poisson model where two independent Poisson distributions model the number of goals scored by each team by moving probabilities between the scores 0-0, 0-1, 1-0, and 1-1. We show that this is a special case of a multiplicative model known as the Sarmanov family. Based on this family, we create more suitable models by moving probabilities between scores and employing other discrete distributions. We apply the new models to women’s football scores, which exhibit some characteristics different than that of men’s football.

Club coefficients in the UEFA Champions League: Time for the shift to an Elo-based formula

arxiv.org - László Csató

One of the most popular club football tournaments, the UEFA Champions League, will see a fundamental reform from the 2024/25 season: the traditional group stage will be replaced by one league where each of the 36 teams plays eight matches. To guarantee that the opponents of the clubs are of the same strength in the new design, it is crucial to forecast the performance of the teams before the tournament as well as possible.

👁️ Computer Vision

State of Computer Vision 2023

sebastianraschka.com - Sebastian Rashka

Large language model development (LLM) development is still happening at a rapid pace. At the same time, leaving AI regulation debates aside, LLM news seem to be arriving at a just slightly slower rate than usual.

This is a good opportunity to give the spotlight to computer vision once in a while, discussing the current state of research and development in this field. And this theme also goes nicely with a recap of CVPR 2023 in Vancouver, which was a wonderful conference at probably the nicest conference venue I have attended so far.

💰 Quantitative Finance

Copula for Statistical Arbitrage: Intro to Vine Copula

hudsonthames.org - Hansen Pei

Copula is a great statistical tool to study the relation among multiple random variables: By focusing on the joint cumulative density of quantiles of marginals, we can bypass the idiosyncratic features of marginal distributions and directly look at how they are “related”.
Indeed, traders and analysts have been using copula to exploit statistical arbitrage under the pairs trading framework for some time.
Copula itself is not limited to just 2 dimensions. You can expand it to arbitrarily large dimensions as you wish. The disadvantage comes from the practicality side: originally when probabilists created copula, in order to do further analysis theoretically they focus on a very small subset of copulas that have strict structural assumptions about their mathematical form. In reality, when there are only 2 dimensions, under most cases you can still model a pair of random variables reasonably well with the existing bivariate copulas. However when it goes to higher dimensions, the copula model becomes quite rigid and tends to lose a lot of useful details.
Therefore, vine copula is invented exactly to address this high dimensional probabilistic modeling problem. Instead of using an N-dimensional copula directly, it decomposes the probability density into conditional probabilities, and further decomposes conditional probabilities into bivariate copulas.

🤖 Machine Learning

The Safari of Deep Signal Processing: Hyena and Beyond · Hazy Research

stanford.edu - Michael Poli, Stefano Massaroli, Simran Arora, Dan Fu, Stefano Ermon, Chris Ré.

The quest for architectures supporting extremely long sequences continues! There have been some exciting developments on long sequence models and alternatives to Transformers.

Selected ML Papers from ICML 2023

gitlab.io - Gautier Marti

This blog post serves as a summary and exploration of ~100 papers, providing insights into the key trends presented at ICML 2023. The papers can be categorized into several sub-fields, including Graph Neural Networks and Transformers, Large Language Models, Optimal Transport, Time Series Analysis, Causality, Clustering, PCA and Autoencoders, as well as a few miscellaneous topics.

Learning Deep Time-index Models for Time Series Forecasting

arxiv.org - Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, Steven Hoi

Deep learning has been actively applied to time series forecasting, leading to a deluge of new methods, belonging to the class of historicalvalue models. Yet, despite the attractive properties of time-index models, such as being able to model the continuous nature of underlying time series dynamics, little attention has been given to them. Indeed, while naive deep timeindex models are far more expressive than the manually predefined function representations of classical time-index models, they are inadequate for forecasting, being unable to generalize to unseen time steps due to the lack of inductive bias.
In this paper, we propose DeepTime, a metaoptimization framework to learn deep time-index models which overcome these limitations, yielding an efficient and accurate forecasting model. Extensive experiments on real world datasets in the long sequence time-series forecasting setting demonstrate that our approach achieves competitive results with state-of-the-art methods, and is highly efficient. Code is available at https://github.com/salesforce/DeepTime.

🕰️ Blast From the Past

A Course in Machine Learning, Practical Issues

ciml.info - Hal Daumé III

However, before attempting to understand more complex models of learning, it is important to have a firm grasp on how to use machine learning in practice. This chapter is all about how to go from an abstract learning problem to a concrete implementation. You will see some examples of “best practices” along with justifications of these practices.
In many ways, going from an abstract problem to a concrete learning task is more of an art than a science. However, this art can have a huge impact on the practical performance of learning systems. In many cases, moving to a more complicated learning algorithm will gain you a few percent improvement. Going to a better representation will gain you an order of magnitude improvement. To this end, we will discuss several high level ideas to help you develop a better artistic sensibility.

Expected Value vs Expected Growth (Kelly criterion Part I)

sportsbookreview.com

A questiony I'm often asked is how exactly expected value differs from expected growth. The difference is somewhat subtle but understanding it is essential to risk management in general and the Kelly criterion in particular.

What’s a growth rate, really?

ergodicityeconomics.com - Ole Peters

Growth rates are at the heart of ergodicity economics, and economic news are full of them, too — “GDP grew by 3% last year,” something like that. Sometimes we also hear “national debt grew by $1,271,000,000,000 over the last year” (which is dimensionally different from 3\% per year). So since growth rates come in very different forms: what are they, really?

Predicting Football Results With Statistical Modelling: Dixon-Coles and Time-Weighting

github.io - David Sheehan

This post describes two popular improvements to the standard Poisson model for football predictions, collectively known as the Dixon-Coles model.

This newsletter is brought to you by κυβεῖα. Kubeia is an innovative startup revolutionizing sports predictions with its user-friendly, no-code machine learning platform.

Don't forget to follow us on social media:

Twitter
Instagram
YouTube

Terms and conditions - Privacy policy