Sports Analytics Weekly by kubeia.io - 3/2024

Your weekly serving of sports analytics insights.

2025
2024
2023
2022

Week 52
Week 51
Week 50
Week 49
Week 48
Week 47
Week 46
Week 45
Week 44
Week 43
Week 42
Week 41
Week 40
Week 39
Week 38
Week 37
Week 36
Week 35
Week 34
Week 31
Week 30
Week 29
Week 28
Week 27
Week 26
Week 25
Week 24
Week 23
Week 22
Week 21
Week 20
Week 19
Week 18
Week 17
Week 16
Week 15
Week 14
Week 13
Week 12
Week 11
Week 10
Week 9
Week 8
Week 7
Week 6
Week 5
Week 4
Week 3
Week 2
Week 1

⚽ Sports

Leverkusen are unbeaten at halfway. Could they really have an invincible season?

theathletic.com - Raphael Honigstein

Bayer Leverkusen won German football’s first title of the season on Saturday: they are Herbstmeister (autumn champions), top of the table midway through the season. The accolade is weirdly named and purely unofficial, but it does carry symbolic meaning as a good omen. A team that can do half the job are widely seen as capable of going all the way, not least because in two-thirds of all seasons since the Bundesliga’s foundation in 1963, they actually did.

👁️ Computer Vision

YOLO Loss Function Part 1: SIoU and Focal Loss

learnopencv.com - Soumyadip

The YOLO (You Only Look Once) series of models, renowned for its real-time object detection capabilities, owes much of its effectiveness to its specialized loss functions. In this article, we delve into the various YOLO loss function integral to YOLO’s evolution, focusing on their implementation in PyTorch. Our aim is to provide a clear, technical understanding of these functions, which are crucial for optimizing model training and performance.

🤖 Machine Learning

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

arxiv.org - Yu Bai, Song Mei, Huan Wang, Caiming Xiong

Modern machine learning models with high accuracy are often miscalibrated -- the predicted top probability does not reflect the actual accuracy, and tends to be over-confident. It is commonly believed that such over-confidence is mainly due to over-parametrization, in particular when the model is large enough to memorize the training data and maximize the confidence.
In this paper, we show theoretically that over-parametrization is not the only reason for over-confidence. We prove that logistic regression is inherently over-confident, in the realizable, under-parametrized setting where the data is generated from the logistic model, and the sample size is much larger than the number of parameters. Further, this over-confidence happens for general well-specified binary classification problems as long as the activation is symmetric and concave on the positive part. Perhaps surprisingly, we also show that over-confidence is not always the case -- there exists another activation function (and a suitable loss function) under which the learned classifier is under-confident at some probability values. Overall, our theory provides a precise characterization of calibration in realizable binary classification, which we verify on simulations and real data experiments.

Statistical Thinking - Classification vs. Prediction

fharrell.com - Frank Harrell

It is important to distinguish prediction and classification. In many decisionmaking contexts, classification represents a premature decision, because classification combines prediction and decision making and usurps the decision maker in specifying costs of wrong decisions. The classification rule must be reformulated if costs/utilities or sampling criteria change. Predictions are separate from decisions and can be used by any decision maker.Classification is best used with non-stochastic/deterministic outcomes that occur in say 0.3 - 0.7 of the observations, and not when the simplest classifer (always outputting “positive” or always outputting “negative”) is highly accurate or when two individuals with identical inputs can easily have different outcomes. For these situations, modeling tendencies (i.e., probabilities) is key.Classification should be used when outcomes are distinct and predictors are strong enough to provide, for all subjects, a probability near 1.0 for one of the outcomes.

🕰️ Blast From the Past

Randomness reexamined

mic-journal.no - Rudolf E. Kalman

'Roughly speaking, what we know is science and what we don´t know is philosophy´ - Bertrand Russel, ca. 1968.

🧮 Statistics

When Past Performance Is Indicative Of Future Returns

argmin.net - Ben Recht

No one can explain why or when statistics generalize and transfer.

This newsletter is brought to you by κυβεῖα. Kubeia is an innovative startup revolutionizing sports predictions with its user-friendly, no-code machine learning platform.

Don't forget to follow us on social media:

Twitter
Instagram
YouTube

Terms and conditions - Privacy policy