Sports Analytics Weekly by kubeia.io - 29/2025

Your weekly serving of sports analytics insights.

2025
2024
2023
2022

Week 31
Week 30
Week 29
Week 28
Week 27
Week 26
Week 25
Week 24
Week 23
Week 22
Week 21
Week 19
Week 17
Week 16
Week 14
Week 13
Week 12
Week 11
Week 10
Week 9
Week 7
Week 6
Week 5
Week 4
Week 3
Week 2
Week 1

🤖 Machine Learning

Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data

openreview.net - Anurag Garg ~Anurag_Garg1 , Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel Müller, Frank Hutter

Foundation models for tabular data, like TabPFN, achieve strong performance on small datasets when pre-trained solely on synthetic data. We show that this performance can be significantly boosted by a targeted continued pre-training phase. Specifically, we demonstrate that leveraging a small, curated collection of large, real-world datasets for continued pre-training yields superior downstream predictive accuracy compared to using broader, potentially noisier corpora like CommonCrawl or GitTables. Our resulting model, Real-TabPFN, achieves substantial performance gains on 29 datasets from the OpenML AutoML Benchmark.

Towards Synthetic Data for Fine-tuning Tabular Foundation Models

openreview.net - Magnus Bühler ~Magnus_Bühler1 , Lennart Purucker, Frank Hutter

Tabular foundation models pre-trained on synthetically generated datasets have exhibited strong in-context learning capabilities. While fine-tuning can further enhance predictive performance, overfitting to the training data of a downstream task poses a significant risk in tiny-to-small data regimes. We propose a fine-tuning method that employs synthetically generated fine-tuning data to avoid overfitting and improve generalization performance. We study three variants of data generation methods and empirically demonstrate that they mitigate overfitting and outperform standard fine-tuning approaches across five tiny-to-small real-world datasets. Our data generation methods leverage density estimators and structural causal models, akin to those employed during pre-training, to yield the best performance. Our findings indicate that synthetic data generation, a central element in pre-training, can be successfully adapted to enhance fine-tuning.

🌩 Forecasting

How Wrong Are Football Pundits?

pythonfootball.com - Python Football Review

For as long as I’ve watched football, I’ve been fascinated by the pundits who sit in match‑day studios and announce exactly how the weekend will unfold.But are they actually any good—or are we all just blindly trusting their reputation?It’s time to find out.

The unpredictability conundrum

argmin.net - Ben Recht

Y is not predictable from X.I call claims of this form “unpredictability arguments.” Papers making unpredictability arguments can get a lot of temporary traction in machine learning discourse. They give fuel to the petty battles inside the community. In our current landscape, they give ammunition for lay critiques of industry. They can even help bring on AI Winters if people take them seriously enough. The problem is they are much harder to justify as stated.

🏛️ Economics

Football Financial Radars

substack.com - Kieron O’Connor

Like many fans, I have found football radars to be really helpful when assessing the relative qualities of individual players, as they are visually very intuitive and easy to understand at a glance.Therefore, I have long wanted to produce something similar to highlight the financial strengths and weaknesses of football clubs to investors, but have been defeated by various technical limitations (mainly my own).Until now.

This newsletter is brought to you by κυβεῖα. Kubeia is an innovative startup revolutionizing sports predictions with its user-friendly, no-code machine learning platform.

Don't forget to follow us on social media:

Twitter
Instagram
YouTube

Terms and conditions - Privacy policy