arxiv.org - Rouven Michels, Marius Ătting, Roland Langrock
It is still largely unclear to what extent bettors update their prior assumptions about the strength and form of competing teams considering the dynamics during the match. This is of interest not only from the psychological perspective, but also as the pricing of live odds ideally should be driven both by the (objective) outcome probabilities and also the bettors' behaviour. Using state-space models (SSMs) to account for the dynamically evolving latent sentiment of the betting market, we analyse a unique high-frequency data set on stakes placed during the match. We find that stakes in the live-betting market are driven both by perceived pre-game strength and by in-game strength, the latter as measured by the Valuing Actions by Estimating Probabilities (VAEP) approach. Both effects vary over the course of the match.
researchgate.net - Christian Deutscher, Bernd Frick, Marius Ătting
In this paper, we argue that potential inefficiencies on betting markets are more likely to exist at the very beginning of a season, when the available information on the teams' playing strength is difficult to evaluate. This lack of reliable information should be particularly large in the case of recently promoted teams that have typically undergone major changes in the composition of their roster following their promotion. Without any information on the latter teams' potential performance, they are particularly difficult to evaluate, which may eventually lead to inefficiencies and positive returns on investment in the betting market. We analyse odds from German first division Bundesliga soccer for the seasons 2002/03 to 2015/16 to find betting market inefficiencies at the start of the season. As expected, betting on recently promoted team wins generates temporarily positive returns, especially for away games. These results suggest bookmakers to underestimate promoted teams' ability to familiarize with the conditions in the new league, such as having to play in front of larger, often hostile crowds.
sagepub.com - Christian Deutscher, Marius Ătting, Sandra Schneemann,Hendrik Scholten
Betting markets have grown considerably lately. Despite their impact on the economic importance of professional sports, they just received academic interest recently. This article determines factors affecting the amount of money bet as well as the number of matched bets placed on the largest European soccer league, namely, the English Premier League between 2009-2010 and 2015-2016. Data from the betting exchange Betfair suggest season progress, weekday, number of substitutes, both teams market values, as well as uncertainty of outcome to determine market transactions and, hence, the economic importance.
theguardian.com - Rob Davies
From brain hacks to dark nudges and near misses â betting companies employ an arsenal of clever tricks to tempt punters into spending more money. Hereâs how âŠ
tandfonline.com - David Winkelmann, Christian Deutscher & Marius Ătting
The outbreak of COVID-19 in March 2020 led to a shutdown of economic activities in Europe. This included the sports sector since public gatherings were prohibited. The German Bundesliga was among the first sport leagues realizing a restart without spectators. Several recent studies suggest that the home advantage of teams eroded for the remaining matches. Our paper analyses the reaction by bookmakers to the disappearance of such home advantage. We show that bookmakers had problems to adjust the betting odds in accordance with the disappeared home advantage, opening opportunities for profitable betting strategies.
youtube.com - Pieter Robberechts
Creativity is highly valued in soccer players. It contributes to exciting and unpredictable play, which can help teams to overcome defensive strategies and create scoring opportunities. Consequently, evaluating the creative abilities of players is an important aspect of the player recruitment process. Yet, creativity is generally seen as an intangible quality that cannot be analyzed through statistics. Addressing this challenge, the 2023 SIGKDD paper titled "un-xPass: Measuring Soccer Player's Creativity" defines a novel metric to quantify the level of creativity involved in a player's passes. The innovative approach utilizes machine learning techniques to assess two important factors that characterize creativity: originality and usefulness.
statsbomb.com - Mike Bursik
For anyone that follows football on a regular basis, itâs impossible to escape the inevitable cliches around the importance of winning the turnover battle. Itâs one of the most popular answers a TV analyst will give when asked to cite a key to victory in a specific matchup. Although there is truth to this statement, this reasoning often lacks key contextual information around when turnovers happen within the game.
youtube.com
Increasingly, football is lending itself to plenty of analysis using numbers, but the availability of data doesnât always ensure that itâs properly applied. Here are ten mistakes which are commonly made and the best way to correct them.
americansocceranalysis.com - John Muller
Goals added (g ) measures a playerâs total on-ball contribution in attack and defense. It does this by calculating how much each touch changes their teamâs chances of scoring and conceding across two possessions.
americansocceranalysis.com - Matthias Kullowatz
This is part two of our introductory series on Goals Added (g ). Here is part one, where John first introduced it.
github.io
xPo expands to expected potential.Why Do We Need xPo?I wanted to figure out a way to measure the value of various attacking actions on the pitch. xPo is just an estimate of the potential xG an action could eventually generate.I've listed some differences with the most commonly used method, xG Chain, and a recently popular method, xT in the Comparisons With Other Models section.
pff.com - Alexander Schram
After more than a decade of shaping the landscape of analytics in the NFL and college football, PFF now brings its renowned player grading system to the beautiful game, evaluating every player for each event during a game. This document describes the entire process from grading each event to ranking player performances for a facet. The process can be broken down into three steps: (1) play-by-play grading, (2) normalizing the grades and (3) converting the grades.
github.com - Devin Pleuler
In the handbook you can find three primary things:
- Resources and suggestions for technical skills worth having for work in soccer analytics (but can probably be extended to other sports)
- A series of tutorials delivered in Jupyter notebook format using StatsBomb Open Data, covering various data science techniques common in soccer analytics.
- Collected research and articles that I believe are required reading to get up to speed with both the history and state-of-the-art in soccer analytics.
arxiv.org - Marius Ătting
In recent years, data-driven approaches have become a popular tool in a variety of sports to gain an advantage by, e.g., analysing potential strategies of opponents. Whereas the availability of play-by-play or player tracking data in sports such as basketball and baseball has led to an increase of sports analytics studies, equivalent datasets for the National Football League (NFL) were not freely available for a long time. In this contribution, we consider a comprehensive play-by-play NFL dataset provided by this http URL, comprising 289,191 observations in total, to predict play calls in the NFL using hidden Markov models. The resulting out-of-sample prediction accuracy for the 2018 NFL season is 71.5%, which is substantially higher compared to similar studies on play call predictions in the NFL.
springer.com - Marius Ătting, Roland Langrock & Antonello Maruotti
We investigate the potential occurrence of change pointsâcommonly referred to as âmomentum shiftsââin the dynamics of football matches. For that purpose, we model minute-by-minute in-game statistics of Bundesliga matches using hidden Markov models (HMMs). To allow for within-state dependence of the variables, we formulate multivariate state-dependent distributions using copulas. For the Bundesliga data considered, we find that the fitted HMMs comprise states which can be interpreted as a team showing different levels of control over a match. Our modelling framework enables inference related to causes of momentum shifts and team tactics, which is of much interest to managers, bookmakers, and sports fans.
sciencedirect.com - Stephanie Kovalchik
The Elo rating system is one of the most popular methods for estimating the ability of competitors over time in sport. The standard Elo system focuses on predicting wins and losses, but there is often also interest in the margin of victory (MOV) because it reflects the magnitude of a result. There have been few theoretical investigations and comparisons of Elo-based models. In the present study, we propose four model options for an MOV Elo system: linear, joint additive, multiplicative, and logistic. Notations and guidance for tuning each model are provided. The models were applied to menâs tennis for several MOV choices. The results showed that all MOV approaches using within-set statistics improved the predictive performance compared with the standard Elo system, but only the joint additive model yielded unbiased ratings with stable variance in the simulation study. This general framework for MOV Elo ratings provide sports modelers with a new set of tools for building systems to rate competitors and forecast outcomes in sport.
americansocceranalysis.com - Matthias Kullowatz
While valuations of offensive actions in soccer are, by no means, perfect, they are still significantly more accurate and meaningful than how we evaluate defensive actions and playersâ defensive contributions. In a challenge-accepted moment of weakness, we took a stab at better assigning a Goals Added (g+) equivalent for defense: g- (âg minusâ). What weâre about to share will reinforce just how hard it is to quantify the value of an individualâs defensive actions, but hopefully I can also entertain you down this rabbit hole weâve been playing around in for more than a year.
kuleuven.be - Pieter Robberechts, Maaike Van Roy, Jesse Davis
Creativity is highly valued in soccer players. It contributes to exciting and unpredictable play, which can help teams to overcome defensive strategies and create scoring opportunities. Consequently, evaluating the creative abilities of players is an important aspect of the player recruitment process. However, there is currently no clear way to measure creativity in soccer. It is not captured by the typical result-based performance indicators, as being creative entails going beyond just doing something useful, to accomplishing something useful but in a unique or atypical way. Therefore in this paper, we define a novel metric to quantify the level of creativity involved in a playerâs passes. Our Creative Decision Rating (CDR) utilizes machine learning techniques to assess two important factors: the originality of a pass, and its value in terms of increasing the teamâs chances of scoring a goal. We validated our metric on StatsBomb 360 contextual event stream data of the 2021/22 English Premier League season and show through a number of use cases that it provides another angle on a playerâs skill, complementing existing player evaluation metrics. Overall, our metric provides a concise method for capturing and quantifying the creativity of soccer players and could have important implications for player recruitment and talent development in the sport.
youtube.com - Sam Gregory, Devin Pleuler
Sam Gregory and Devin Pleuler (Toronto FC Analytics) work to demystify tracking data, they discuss how data is being used at the club level, the future of analytics and more.
ssrn.com - Bryan T. Kelly Yale, Dacheng Xiu
We survey the nascent literature on machine learning in the study of financial markets. We highlight the best examples of what this line of research has to offer and recommend promising directions for future research. This survey is designed for both financial economists interested in grasping machine learning tools, as well as for statisticians and machine learners seeking interesting financial contexts where advanced methods may be deployed.
youtube.com - Mathias Gaunard
Electronic trading, in particular for high-frequency and low-latency strategies, is an area that is very much in demand of C developers, but sometimes seen as alien by traditional technologists. In this talk, we'll attempt to demystify this industry, present the problems people try to solve in it, and explain which parts of C has made it such a prevalent tool there.The talk will be primarily focused on electronic trading on centralized exchanges with continuous matching of limit orders, be it for delta-one or derived assets, on co-located or cloud exchanges. We'll first cover the basics with reference and market data, execution, and order book kinematics, then discuss how one would build a program with low-latency trading capabilities as a result, with optimized thread models, networking and memory management. Finally we'll delve into a few more quantitative topics to understand how it all fits together: matching engine simulation, alpha modeling, slippage and risk management.
twitter.com
- CM3Leon
- LongLLaMA
- AnimateDiff
- Patch nâ Pack: NaViT
- Secrets of RLHF in LLMs
- LLMs as General Pattern Machines
arxiv.org - Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive.
arxiv.org - Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1)O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at this https URL.
royalsocietypublishing.org - Michael J. Smith and James E. Geach
In this review, we explore the historical development and future prospects of artificial intelligence (AI) and deep learning in astronomy. We trace the evolution of connectionism in astronomy through its three waves, from the early use of multilayer perceptrons, to the rise of convolutional and recurrent neural networks, and finally to the current era of unsupervised and generative deep learning methods. With the exponential growth of astronomical data, deep learning techniques offer an unprecedented opportunity to uncover valuable insights and tackle previously intractable problems. As we enter the anticipated fourth wave of astronomical connectionism, we argue for the adoption of GPT-like foundation models fine-tuned for astronomical applications. Such models could harness the wealth of high-quality, multimodal astronomical data to serve state-of-the-art downstream tasks. To keep pace with advancements driven by Big Tech, we propose a collaborative, open-source approach within the astronomy community to develop and maintain these foundation models, fostering a symbiotic relationship between AI and astronomy that capitalizes on the unique strengths of both fields.
timdettmers.com - Tim Dettmers
Deep learning is a field with intense computational requirements, and your choice of GPU will fundamentally determine your deep learning experience. But what features are important if you want to buy a new GPU? GPU RAM, cores, tensor cores, caches? How to make a cost-efficient choice? This blog post will delve into these questions, tackle common misconceptions, give you an intuitive understanding of how to think about GPUs, and will lend you advice, which will help you to make a choice that is right for you.
stackexchange.com
General Finance Textbooks
- Options, Futures and Other Derivatives, John Hull
- The Concepts and Practice of Mathematical Finance, Mark Joshi
- Paul Wilmott on Quantitative Finance, Paul Wilmott
...
statsbomb.com - StatsBomb
This article is written by the CV team at StatsBomb. In it, we will cover the technical details of the camera calibration algorithm that we have developed to collect the location of players directly from televised footage
datofutbol.cl - Ismael GĂłmez Schmidt
Along this article I am going to share some detail which could be useful if you want to fit your own xG model, in addition to its respective performance evaluation and results analysis.
footballoutsiders.com - Lau Sze Yui
Every football fan probably knows that passing in football is generally more efficient than rushing. But how much passing is really enough?
nfeloapp.com - Robby Greer
Home Field Advantage (HFA) is the average advantage, generally measured in points, that a team experiences when playing in their own stadium. In the NFL, HFA is generally accepted to sit somewhere between 2 and 3 points. However, past analysis shows this number to fluctuate by team and over time. From 2007 to 2017, the NFLâs strongest observed HFA (Seattleâs 5.1) was a full 4.4 points larger than its weakest (Miamiâs 0.7). Similarly, league wide HFA was 2.2 in 2018, but fell to -0.9 points just one season later in 2019.
github.com - Andrew Puopolo
Building a Random Forest Model for Expected Goals: In this notebook, we examine how to avoid overfitting/mistakes, learn how to cross validate our models and determine the best hyperparameters. Finally, we take a small section at the end to determine which features are important in a Random Forest.
arxiv.org - Marius Ătting, Rouven Michels, Roland Langrock, Christian Deutscher
Sports betting markets have grown very rapidly recently, with the total European gambling market worth 98.6 billion euro in 2019. Considering a high-resolution (1 Hz) data set provided by a large European bookmaker, we investigate the effect of news on the dynamics of live betting. In particular, we consider stakes placed in a live betting market during football matches. Accounting for the general market activity level within a state-space modelling framework, we focus on the market's response to events such as goals (i.e. major news), but also to the general situation within a match such as the uncertainty about the outcome. Our results indicate that markets might overreact to recent news, confirming cognitive biases known from psychology and behavioural economics.
wordpress.com - Jim Albert
For this blog 20 is a special number since it is the 20th anniversary of two things relevant to exploring baseball data with R. It has been 20 years since the release of version 1.0 of R and 20 years since the beginning of the Baseball Reference site. It seems appropriate to provide some personal comments on the 20th anniversary of these two events, reflecting on the history of statistical software and the availability of baseball data.
wordpress.com - Justin Kubatko, Dean Oliver, Kevin Pelton, and Dan T. Rosenbaum
Journal of Quantitative Analysis in Sports. Volume 3, Issue 3 2007 Article 1
The quantitative analysis of sports is a growing branch of science and, in many ways one that has developed through non-academic and non-traditionally peer-reviewed work. The aim of this paper is to bring to a peer-reviewed journal the generally accepted basics of the analysis of basketball, thereby providing a common starting point for future research in basketball. The possession concept, in particular the concept of equal possessions for opponents in a game, is central to basketball analysis. Estimates of possessions have existed for approximately two
decades, but the various formulas have sometimes created confusion. We hope that by showing how most previous formulas are special cases of our more general formulation, we shed light on the relationship between possessions and various statistics. Also, we hope that our new estimates can provide a common basis for future possession estimation. In addition to listing data sources for statistical research on basketball, we also discuss other concepts and methods, including offensive and defensive ratings, plays, per-minute statistics, pace adjustments, true shooting percentage, effective field goal percentage, rebound rates, Four Factors, plus/minus statistics, counterpart statistics, linear weights metrics, individual possession usage, individual efficiency, Pythagorean method, and Bell Curve method. This list is not an exhaustive list of methodologies used in the field, but we believe that they provide a set of tools that fit within the possession framework and form the basis of common conversations on statistical research in basketball.
hockey-graphs.com - Thibaud Chatel
If you ever work for a hockey team as an analyst, you could be facing two very recurrent questions from the coaching staff. The first one is very practical: How can analytics help us work better and faster? The second one is: What is the real contribution of each player? Meaning beyond the usual on-ice âpossessionâ stats like Corsi or Expected Goals and individual production metrics such as shots taken, scoring chances, expected goals created, zone exits, entries, or even high-danger passes (passes that end or go through the slot). But those events were not yet statistically linked to each other. Finding a way to provide answers to both questions was my goal for the last few months, and the solution was: I needed to split the game in âSequencesâ.
shinyapps.io - Kostya Medvedovsky, Andrew Patton
The public basketball stats space has advanced wonderfully over the last decade, most prominently with explosion of âall-in-oneâ metrics like RAPM, RPM, PIPM (RIP), LEBRON, and BPM, among others. Excellent research has also been done on a number of other topics, such as positional versatility, clutch performance, shooting luck, and matchups.However, despite these advances, there has been a relative dearth of focus on forward-looking projections as opposed to backwards-looking explanations, and even less public work on basic box-score metrics (as opposed to âall-in-oneâ metrics). Krishna Narsu has done excellent work on the âstabilityâ of various stats, and I have contributed myself, but this work has been on a team level. FiveThirtyEight, meanwhile, has been releasing their CARMELO/RAPTOR player projections, but these are likewise rolled-up, âall-in-oneâ-style projections that tell us relatively little about where a playerâs growth/decline is going to come from. These metrics donât answer questions such as âhow good a three-point shooter is Jaylen Brown?â or âhow many two-point attempts can we expect Marcus Smart to take?âDARKO (Daily Adjusted and Regressed Kalman Optimized projections) is an attempt to fill that gap. As will be familiar to baseball fans, DARKO is a basketball projection system similar in concept to Steamer, PECOTA, and ZiPS. To my knowledge, it is one of the few public computer-driven NBA box-score projection systems (as opposed to the âhand-curatedâ systems offered by some fantasy basketball sites).
priceactionlab.com - Michael Harris
One would think that by backtesting more ideas and more frequently, the chances of discovering an edge increase. However, the opposite exactly happens, and the chances of discovering something of value diminish with frequent backtesting.