Sports Analytics Weekly by kubeia.io - 21/2023

Your weekly serving of sports analytics insights.

2025
2024
2023
2022

Week 52
Week 51
Week 49
Week 48
Week 47
Week 46
Week 45
Week 44
Week 43
Week 42
Week 41
Week 40
Week 39
Week 38
Week 37
Week 36
Week 35
Week 34
Week 33
Week 32
Week 31
Week 30
Week 29
Week 28
Week 27
Week 26
Week 25
Week 24
Week 23
Week 22
Week 21
Week 20
Week 19
Week 18
Week 16
Week 14
Week 13
Week 12
Week 10
Week 9
Week 6
Week 5
Week 4
Week 3
Week 1

⚽ Sports

‘Winning the transfer window’ is misunderstood – most big deals don’t pay off - The Athletic

theathletic.com

“Who’s winning the transfer window?” people will ask on social media — and the misguided response, almost always, will be the club spending in a wild, extravagant but compellingly eye-catching way, like Imelda Marcos on the last day of a trip to New York.

Myth-busting the 2022-23 Premier League storylines – what is true and what isn’t? - The Athletic

theathletic.com

Some of the likes-gathering community have reacted in a dishonest way, shifting to a not-exactly-true model, an impressions-first economy. Given the fact that many millions have consumed this content it feels almost ungracious to confirm that, no, Arsenal were not 11 points clear with a game in hand when they signed Jorginho. And, yes, Ederson has conceded a direct free-kick goal in his career. And no, Trent Alexander-Arnold does not have the second-highest number of goals from direct free kicks in Premier League history.

📝 Sports Analytics

Open Source Sports Analytics with PySport - TalkPython Podcast

talkpython.fm

If you're looking for fun data sets for learning, for teaching, maybe a conference talk, or even if you're just really into them, sports offers up a continuous stream of rich data that many people can relate to. Yet, accessing that data can be tricky. Sometimes it's locked away in obscure file formats. Other times, the data exists but without a clear API to access it. On this episode, we talk about PySport - something of an awesome list of a wide range of libraries (mostly but not all Python) for accessing a wide variety of sports data from the NFL, NBA, F1, and more. We have Koen Vossen, maintainer of PySport to talk through some of the more popular projects.

👁️ Computer Vision

Train YOLO NAS on Custom Dataset

learnopencv.com

YOLO-NAS is currently the latest YOLO object detection model. From the outset, it beats all other YOLO models in terms of accuracy. The pretrained YOLO-NAS models detect more objects with better accuracy compared to the previous YOLO models. But how do we train YOLO NAS on a custom dataset? This will be our goal in this article – to train different YOLO NAS models on a custom dataset.

💰 Quantitative Finance

C++ in Finance

cppcast.com

Antony Peacock joins Timur and Phil. After rounding up the news, we chat with Antony about what it's like to work in finance as a C++ developer, the similarities and differences to games dev and how you can break in to a role in finance. We also discuss what it's like to work in tech as someone with dyslexia.

🤖 Machine Learning

LLMs leaderboard

lmsys.org

We use the Elo rating system to calculate the relative performance of the models. You can view the voting data, basic analyses, and calculation procedure in this notebook.

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

arxiv.org

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variants have been state-of-the-art for years, and more sophisticated second-order (Hessian-based) optimizers often incur too much per-step overhead. In this paper, we propose Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer that uses a light-weight estimate of the diagonal Hessian as the pre-conditioner. The update is the moving average of the gradients divided by the moving average of the estimated Hessian, followed by element-wise clipping. The clipping controls the worst-case update size and tames the negative impact of non-convexity and rapid change of Hessian along the trajectory. Sophia only estimates the diagonal Hessian every handful of iterations, which has negligible average per-step time and memory overhead. On language modeling with GPT-2 models of sizes ranging from 125M to 770M, Sophia achieves a 2x speed-up compared with Adam in the number of steps, total compute, and wall-clock time. Theoretically, we show that Sophia adapts to the curvature in different components of the parameters, which can be highly heterogeneous for language modeling tasks. Our run-time bound does not depend on the condition number of the loss.

This newsletter is brought to you by κυβεῖα. Kubeia is an innovative startup revolutionizing sports predictions with its user-friendly, no-code machine learning platform.

Don't forget to follow us on social media:

Twitter
Instagram
YouTube

Terms and conditions - Privacy policy