Sports Analytics Weekly by kubeia.io - 48/2022

Your weekly serving of sports analytics insights.

2025
2024
2023
2022

Week 49
Week 48

🎲 Betting

FPL Models: The Birth of Analytics FC

gumroad.com

Fantasy Premier League (FPL) managers often use data to pick their teams, which has lead to models built to predict FPL points for players. This book discusses the different approaches to building these models, and how FPL managers should use data such as expected goals (XG).

⚽ Sports

The Rams' Super Bowl Afterparty Turned Into A Historic Hangover By Alex Kirshner

fivethirtyeight.com

And none has ever produced a record as bad as what the Los Angeles Rams are tracking toward this year. The Rams are 3-8, with a .273 winning percentage that paces them to finish comfortably last among champs attempting a sequel.

💰 Quantitative Finance

Forecasting and Point Estimates

meetup.com

In this session we will examine two papers from Nassim Nicholas Taleb on the topic of forecasting outcomes of fat-tailed variables: - On single point forecasts for fat-tailed variables - On the Statistical Differences between Binary Forecasts and Real World Payoffs

Volatility is (mostly) path-dependent

ssrn.com

We learn from data that volatility is mostly path-dependent: up to 90% of the variance of the implied volatility of equity indexes is explained endogenously by past index returns, and up to 65% for (noisy estimates of) future daily realized volatility. The path-dependency that we uncover is remarkably simple: a linear combination of a weighted sum of past daily returns and the square root of a weighted sum of past daily squared returns with different time-shifted power-law weights capturing both short and long memory. This simple model, which is homogeneous in volatility, is shown to consistently outperform existing models across equity indexes and train/test sets for both implied and realized volatility. It suggests a simple continuous-time path-dependent volatility ( PDV) model that may be fed historical or risk-neutral parameters. The weights can be approximated by superpositions of exponential kernels to produce Markovian models. In particular, we propose a 4-factor Markovian PDV model which captures all the important stylized facts of volatility, produces very realistic price and volatility paths, and jointly fits SPX and VIX smiles remarkably well. We thus show, for the first time, that a continuous-time Markovian parametric stochastic volatility ( actually, PDV) model can practically solve the joint SPX/VIX smile calibration problem. This article is dedicated to the memory of Peter Carr whose works on volatility modeling have been so inspiring to us.

🤖 Machine Learning

Statistical vs Deep Learning forecasting methods

github.com

In other words, deep-learning ensembles outperform statistical ensembles just by 0.36 points in SMAPE. However, the DL ensemble takes more than 14 days to run and costs around USD 11,000, while the statistical ensemble takes 6 minutes to run and costs $0.5c.

Data Drift, Concept Drift, and Other Maintenance Issues

deeplearning.ai

Some engineers think that when you deploy an AI system, you’re done. But when you first deploy, you may only be halfway to the goal. Substantial work lies ahead in monitoring and maintaining the system.

An Introduction to SMOTE

kdnuggets.com

Improve the model performance by balancing the dataset using the synthetic minority oversampling technique.

Review of ML and AutoML Solutions to Forecast Time-Series Data

springer.com

Time-series forecasting is a significant discipline of data modeling where past observations of the same variable are analyzed to predict the future values of the time series. Its prominence lies in different use cases where it is required, including economic, weather, stock price, business development, and other use cases. In this work, a review was conducted on the methods of analyzing time series starting from the traditional linear modeling techniques until the automated machine learning (AutoML) frameworks, including deep learning models. The objective of this review article is to support identifying the time-series forecasting challenge and the different techniques to meet the challenge. This work can be additionally an assist and a reference for researchers and industries demanding to use AutoML to solve the problem of forecasting. It identifies the gaps of the previous works and techniques used to solve the problem of forecasting time series.

🚀 Engineering

Engineers' billing nightmares

getlago.com

TL;DR: Billing is just 100x harder than you think

Scaling PostgresML to 1 Million Requests per Second

postgresml.org

In this post, we'll discuss how we horizontally scale PostgresML to achieve more than 1 million XGBoost predictions per second on commodity hardware.

Low Latency Optimization: Understanding Huge Pages

hudsonrivertrading.com

This series of posts is relatively technical, and requires some high-level understanding of operating systems (OS) concepts like memory management, as well as some hardware details such as the CPU caches. In the first post, we will explain the benefits of huge pages. In the second post, we will explain how they can be used in a production environment.

This newsletter is brought to you by κυβεῖα. Kubeia is an innovative startup revolutionizing sports predictions with its user-friendly, no-code machine learning platform.

Don't forget to follow us on social media:

Twitter
Instagram
YouTube

Terms and conditions - Privacy policy