blogspot.com
I've mentioned the legend that is Billy Walters a few times in this blog. If you're serious about sports betting, you will have heard his name and back in August I mentioned his autobiography Gambler. The book doesn't disappoint
degruyter.com - Edward Wheatcroft
Betting odds are generally considered to represent accurate reflections of the underlying probabilities for the outcomes of sporting events. There are, however, known to be a number of inherent biases such as the favorite-longshot bias in which outsiders are generally priced with poorer value odds than favorites. Using data from European soccer matches, this paper demonstrates the existence of another bias in which the match odds overreact to favorable and unfavorable runs of results. A statistic is defined, called the Combined Odds Distribution (COD) statistic, which measures the performance of a team relative to expectations given their odds over previous matches. Teams that overperform expectations tend to have a high COD statistic and those that underperform tend to have a low COD statistic. Using data from twenty different leagues over twelve seasons, it is shown that teams with a low COD statistic tend to be assigned more generous odds by bookmakers. This can be exploited and a sustained and robust profit can be made. It is suggested that the bias in the odds can be explained in the context of the âhot hand fallacyâ, in which gamblers overestimate variation in the ability of each team over time.
degruyter.com - Benjamin P. Jacot and Paul V. Mochkovitch
This paper examines how the Kelly criterion, a strategy for maximizing the expected log-growth of capital through informed betting, can be applied to non-mutually exclusive bets. These are bets where there is no one-to-one correspondence between the bets and the possible outcomes of the game. This type of situation is common in horse racing, where multiple types of bets are available for a single race. The paper begins by providing a theoretical overview of the Kelly betting strategy and then discusses how it can be extended to non-mutually exclusive bets. A new formulation of the fractional Kelly strategy, which involves betting a fixed fraction of the amount suggested by the Kelly criterion, is also presented for this type of scenario.
degruyter.com - David A. Harville
Modern and post-modern portfolio theory were devised by Harry Markowitz (among others) for purposes of allocating some monetary resources among a number of financial assets so as to strike a suitable balance between risk and expected return. The problem it addresses bears a considerable resemblance to one encountered in making âmoneylineâ bets on the outcomes of contests in sports like American football. In distributing some allotted funds among a number of such bets, it may be desired to account for the risk. By introducing suitable modifications, the procedures employed in modern and post-modern portfolio theory for the allocation of resources among financial assets can be adapted for use in the distribution of funds among multiple bets. As in the case of financial assets, the most appropriate measures of risk are ones like the semi-deviation or semi-variance that penalize only negative or below-target returns. The various procedures are illustrated and compared by applying them retrospectively to moneyline bets on the outcomes of the college football âbowlâ games from the 2020 season.
theguardian.com - Greg Wood
No one who enjoys betting on racing would be expected to prefer form-âfilling to form study, but until 18 October, it is an option that many punters may want to consider. That is the closing date for submissions to the Gambling Commissionâs consultation on the implementation of âaffordabilityâ checks for customers of online gambling sites. It is almost certainly the last chance for ordinary punters to make their views known on plans that could have major implications for how â or even if â they can pursue their entirely legal hobby in future.
theathletic.com - Gregg Evans
They may have gone largely unnoticed amid Liverpoolâs frantic summer transfer window, but the signings of Harvey Owen, Trey Nyoni and Amara Nallo were just as revealing about the clubâs long-term strategy as any of the high-profile arrivals.
degruyter.com - Erik-Jan van Kesteren and Tom Bergkamp
Successful performance in Formula One is determined by combination of both the driverâs skill and race-car constructor advantage. This makes key performance questions in the sport difficult to answer. For example, who is the best Formula One driver, which is the best constructor, and what is their relative contribution to success? In this paper, we answer these questions based on data from the hybrid era in Formula One (2014â2021 seasons). We present a novel Bayesian multilevel rank-ordered logit regression method to model individual race finishing positions. We show that our modelling approach describes our data well, which allows for precise inferences about driver skill and constructor advantage. We conclude that Hamilton and Verstappen are the best drivers in the hybrid era, the top-three teams (Mercedes, Ferrari, and Red Bull) clearly outperform other constructors, and approximately 88âŻ% of the variance in race results is explained by the constructor. We argue that this modelling approach may prove useful for sports beyond Formula One, as it creates performance ratings for independent components contributing to success.
arxiv.org - Silvan Vollmer, David Schoch, Ulrik Brandes
Abstract:We compare conversion rates of association football (soccer) penalties during regulation or extra time with those during shootouts. Our data consists of roughly 50,000 penalties from the eleven~most recent seasons in European men's football competitions. About one third of the penalties are from more than 1,500 penalty shootouts. We find that shootout conversion rates are significantly lower, and attribute this to worse performance of shooters rather than better performance of goalkeepers. We also find that, statistically, there is no advantage for either team in the usual alternating shooting order. These main findings are complemented by a number of more detailed analyses.
degruyter.com - Eric Eager and Tej Seth
In recent years, the game of football has made a shift towards being more quantitative. With the advent of charting and tracking data, player evaluation is able to be studied from several different angles. In this paper, we build and refine two novel metrics: Bite Distance Under Expected (BDUE) and Ground Covered Over Expected (GCOE) for the evaluation of linebackers in the National Football League (NFL). Here, we show that these metrics are heavily correlated with each other, which demonstrates the trade-off linebackers have to make between being aggressive against the run and being effective when the opposing offense is using play-action. We also show that these metrics are more stable than those in the public space. Finally, we show how these metrics measure deception by opposing offenses.
oreilly.com - Eric A. Eager, Richard A. Erickson
Baseball is not the only sport to use "moneyball." American football fans, teams, and gamblers are increasingly using data to gain an edge against the competition. Professional and college teams use data to help select players and identify team needs. Fans use data to guide fantasy team picks and strategies. Sports bettors and fantasy football players are using data to help inform decision making. This concise book provides a clear introduction to using statistical models to analyze football data.Whether your goal is to produce a winning team, dominate your fantasy football league, qualify for an entry-level football analyst position, or simply learn R and Python using fun example cases, this book is your starting place. You'll learn how to:Apply basic statistical concepts to football datasetsDescribe football data with quantitative methodsCreate efficient workflows that offer reproducible resultsUse data science skills such as web scraping, manipulating data, and plotting dataImplement statistical models for football dataLink data summaries and model outputs to create reports or presentations using tools such as R Markdown and R ShinyAnd more
arxiv.org - Gabriel Calvo, Carmen Armero, Bernd Grimm, Christophe Ley
Abstract:Wheelchair basketball, regulated by the International Wheelchair Basketball Federation, is a sport designed for individuals with physical disabilities. This paper presents a data-driven tool that effectively determines optimal team line-ups based on past performance data and metrics for player effectiveness. Our proposed methodology involves combining a Bayesian longitudinal model with an integer linear problem to optimise the line-up of a wheelchair basketball team. To illustrate our approach, we use real data from a team competing in the Rollstuhlbasketball Bundesliga, namely the Doneck Dolphins Trier. We consider three distinct performance metrics for each player and incorporate uncertainty from the posterior predictive distribution of the longitudinal model into the optimisation process. The results demonstrate the tool's ability to select the most suitable team compositions and calculate posterior probabilities of compatibility or incompatibility among players on the court.
arxiv.org - George Nousias, Konstantinos Delibasis, Ilias Maglogiannis
Estimating homography matrix between two images has various applications like image stitching or image mosaicing and spatial information retrieval from multiple camera views, but has been proved to be a complicated problem, especially in cases of radically different camera poses and zoom factors. Many relevant approaches have been proposed, utilizing direct feature based, or deep learning methodologies. In this paper, we propose a generalized RANSAC algorithm, H-RANSAC, to retrieve homography image transformations from sets of points without descriptive local feature vectors and point pairing. We allow the points to be optionally labelled in two classes. We propose a robust criterion that rejects implausible point selection before each iteration of RANSAC, based on the type of the quadrilaterals formed by random point pair selection (convex or concave and (non)-self-intersecting). A similar post-hoc criterion rejects implausible homography transformations is included at the end of each iteration. The expected maximum iterations of HH-RANSAC are derived for different probabilities of success, according to the number of points per image and per class, and the percentage of outliers. The proposed methodology is tested on a large dataset of images acquired by 12 cameras during real football matches, where radically different views at each timestamp are to be matched. Comparisons with state-of-the-art implementations of RANSAC combined with classic and deep learning image salient point detection indicates the superiority of the proposed HH-RANSAC, in terms of average reprojection error and number of successfully processed pairs of frames, rendering it the method of choice in cases of image homography alignment with few tens of points, while local features are not available, or not descriptive enough. The implementation of HH-RANSAC is available in this https URL
arxiv.org - Fiche Guénolé, Sevestre Vincent, Gonzalez-Barral Camila, Leglaive Simon, Séguier Renaud
Abstract:Technologies play an increasingly important role in sports and become a real competitive advantage for the athletes who benefit from it. Among them, the use of motion capture is developing in various sports to optimize sporting gestures. Unfortunately, traditional motion capture systems are expensive and constraining. Recently developed computer vision-based approaches also struggle in certain sports, like swimming, due to the aquatic environment. One of the reasons for the gap in performance is the lack of labeled datasets with swimming videos. In an attempt to address this issue, we introduce SwimXYZ, a synthetic dataset of swimming motions and videos. SwimXYZ contains 3.4 million frames annotated with ground truth 2D and 3D joints, as well as 240 sequences of swimming motions in the SMPL parameters format. In addition to making this dataset publicly available, we present use cases for SwimXYZ in swimming stroke clustering and 2D pose estimation.
arxiv.org - Fei Wu, Qingzhong Wang, Jian Bian, Haoyi Xiong, Ning Ding, Feixiang Lu, Jun Cheng, Dejing Dou
Abstract:To understand human behaviors, action recognition based on videos is a common approach. Compared with image-based action recognition, videos provide much more information. Reducing the ambiguity of actions and in the last decade, many works focused on datasets, novel models and learning approaches have improved video action recognition to a higher level. However, there are challenges and unsolved problems, in particular in sports analytics where data collection and labeling are more sophisticated, requiring sport professionals to annotate data. In addition, the actions could be extremely fast and it becomes difficult to recognize them. Moreover, in team sports like football and basketball, one action could involve multiple players, and to correctly recognize them, we need to analyse all players, which is relatively complicated. In this paper, we present a survey on video action recognition for sports analytics. We introduce more than ten types of sports, including team sports, such as football, basketball, volleyball, hockey and individual sports, such as figure skating, gymnastics, table tennis, tennis, diving and badminton. Then we compare numerous existing frameworks for sports analysis to present status quo of video action recognition in both team sports and individual sports. Finally, we discuss the challenges and unsolved problems in this area and to facilitate sports analytics, we develop a toolbox using PaddlePaddle, which supports football, basketball, table tennis and figure skating action recognition.
hudsonthames.org - Ruth du Toit
Our recent reading group examined mean reversion and momentum strategies, drawing insights from the article, âDynamically combining mean reversion and momentum investment strategiesâ by James Velissaris. The aim of the paper was to create a diversified arbitrage approach that combines mean reversion and momentum strategies to exploit the strengths of both strategies.Mean reversion and momentum strategies have distinct characteristics. Mean reversion strategies centre around stocks reverting to their mean values and capitalising on relative mispricing among stocks. In contrast, momentum strategies focus on stocks that have shown strong recent performance and are expected to continue that trend.
manning.com - Quan Nguyen
Bayesian optimization helps pinpoint the best configuration for your machine learning models with speed and accuracy. Put its advanced techniques into practice with this hands-on guide.
arxiv.org - Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson
Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.
arxiv.org - Sascha Marton, Stefan LĂŒdtke, Christian Bartelt, Heiner Stuckenschmidt
Abstract:Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose GRANDE\text{GRANDE}, GRA\text{GRA}dieN\text{N}t-Based D\text{D}ecision Tree E\text{E}nsembles, a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. GRANDE is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator to jointly optimize all model parameters. Our method combines axis-aligned splits, which is a useful inductive bias for tabular data, with the flexibility of gradient-based optimization. Furthermore, we introduce an advanced instance-wise weighting that facilitates learning representations for both, simple and complex relations, within a single model. We conducted an extensive evaluation on a predefined benchmark with 19 classification datasets and demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets.
arxiv.org - Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, Mingsheng Long
Abstract:The recent boom of linear forecasting models questions the ongoing passion for architectural modifications of Transformer-based forecasters. These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp. However, Transformer is challenged in forecasting series with larger lookback windows due to performance degradation and computation explosion. Besides, the unified embedding for each temporal token fuses multiple variates with potentially unaligned timestamps and distinct physical measurements, which may fail in learning variate-centric representations and result in meaningless attention maps. In this work, we reflect on the competent duties of Transformer components and repurpose the Transformer architecture without any adaptation on the basic components. We propose iTransformer that simply inverts the duties of the attention mechanism and the feed-forward network. Specifically, the time points of individual series are embedded into variate tokens which are utilized by the attention mechanism to capture multivariate correlations; meanwhile, the feed-forward network is applied for each variate token to learn nonlinear representations. The iTransformer model achieves consistent state-of-the-art on several real-world datasets, which further empowers the Transformer family with promoted performance, generalization ability across different variates, and better utilization of arbitrary lookback windows, making it a nice alternative as the fundamental backbone of time series forecasting.
reuters.com - By Anna Tong, Max A. Cherney, Christopher Bing and Stephen Nellis
OpenAI, the company behind ChatGPT, is exploring making its own artificial intelligence chips and has gone as far as evaluating a potential acquisition target, according to people familiar with the companyâs plans.
semianalysis.com - Dylan Patel and Myron Xie
Nvidiaâs AI solutions are currently at the top of the world, but disruption is coming. Google has unleashed an unprecedented plan of building out their own AI infrastructure. We exclusively detailed volumes and dollar amounts for Googleâs TPUv5 and TPUv5e buildout for both internal use in training/inference as well as for external customer usage from firms such as Apple, Anthropic, CharacterAI, MidJourney, Assembly, Gridspace, etc.Google isnât the only rising threat to their dominance in AI infrastructure. On software, Metaâs PyTorch 2.0 and OpenAI Triton are barreling forward allowing other hardware vendors to be enabled.
arxiv.org - Abdelhak Bentaleb, May Lim, Mehmet N. Akcay, Ali C. Begen, Sarra Hammoudi, Roger Zimmermann
Abstract:This survey presents the evolution of live media streaming and the technological developments behind today's IP-based low-latency live streaming systems. Live streaming primarily involves capturing, encoding, packaging and delivering real-time events such as live sports, live news, personal broadcasts and surveillance videos. Live streaming also involves concurrent streaming of linear TV programming off the satellite, cable, over-the-air or IPTV broadcast, where the programming is not necessarily a real-time event. The survey starts with a discussion on the latency and latency continuum in streaming applications. Then, it lays out the existing live streaming workflows and protocols, followed by an in-depth analysis of the latency sources in these workflows and protocols. The survey continues with the technology enablers, low-latency extensions for the popular HTTP adaptive streaming methods and enhancements for robust low-latency playback. An entire section is dedicated to the detailed summary and findings of Twitch's grand challenge on low-latency live streaming. The survey concludes with a discussion of ongoing research problems in this space.
degruyter.com - Shannon K. Gallagher, Kayla Frisoli and Amanda Luby
In tennis, the Australian Open, French Open, Wimbledon, and US Open are the four most prestigious events (Grand Slams). These four Grand Slams differ in the composition of the court surfaces, when they are played in the year, and which city hosts the players. Individual Grand Slams come with different expectations, and it is often thought that some players achieve better results at some Grand Slams than others. It is also thought that differences in results may be attributed, at least partially, to surface type of the courts. For example, Rafael Nadal, Roger Federer, and Serena Williams have achieved their best results on clay, grass, and hard courts, respectively. This paper explores differences among Grand Slams, while adjusting for confounders such as tour, competitor strength, and player attributes. More specifically, we examine the effect of the Grand Slam on player performance for matches from 2013 to 2019. We take two approaches to modeling these data: (1) a mixed-effects model accounting for both player and tournament features and (2) models that emphasize individual performance. We identify differences across the Grand Slams at both the tournament and individual player level.
degruyter.com - Gregory M. Steeger , Johnathon L. Dulin and Gerardo O. Gonzalez
The Saint Louis Blues were hot at the end of the 2018â2019 National Hockey League season, winning eleven games in a row in January and February, and eight of their last ten. They parlayed this momentum to their first Stanley Cup Championship in franchise history. Or did they? Did the series of wins at the end of the season give the Blues the momentum needed to reach the pinnacle of the sport on June 12th, or was the Bluesâ path to victory the confluence of a series of random events that fell in their favor? In this paper we apply entropy as an unbiased measure to further refute the idea of momentum in sports. We show that game outcomes are not dependent on previous gamesâ outcomes and conclude that the theory of momentum, across the season, is a fallacy that should not affect behavior.
degruyter.com - Ronald Yurko, Francesca Matano , Lee F. Richardson , Nicholas Granered , Taylor Pospisil , Konstantinos Pelechrinis
Continuous-time assessments of game outcomes in sports have become increasingly common in the last decade. In American football, only discrete-time estimates of play value were possible, since the most advanced public football datasets were recorded at the play-by-play level. While measures such as expected points and win probability are useful for evaluating football plays and game situations, there has been no research into how these values change throughout the course of a play. In this work, we make two main contributions: First, we introduce a general framework for continuous-time within-play valuation in the National Football League using player-tracking data. Our modular framework incorporates several modular sub-models, to easily incorporate recent work involving player tracking data in football. Second, we use a long short-term memory recurrent neural network to construct a ball-carrier model to estimate how many yards the ball-carrier is expected to gain from their current position, conditional on the locations and trajectories of the ball-carrier, their teammates and opponents. Additionally, we demonstrate an extension with conditional density estimation so that the expectation of any measure of play value can be calculated in continuous-time, which was never before possible at such a granular level.
arxiv.org - Michael J. Lopez
Most historical National Football League (NFL) analysis, both mainstream and academic, has relied on public, play-level data to generate team and player comparisons. Given the number of oft omitted variables that impact on-field results, such as play call, game situation, and opponent strength, findings tend to be more anecdotal than actionable. With the release of player tracking data, however, analysts can better ask and answer questions to isolate skill and strategy. In this article, we highlight the limitations of traditional analyses, and use a decades-old punching bag for analysts, fourth-down strategy, as a microcosm for why tracking data is needed. Specifically, we assert that, in absence of using the precise yardage needed for a first down, past findings supporting an aggressive fourth down strategy may have been overstated. Next, we synthesize recent work that comprises this special Journal of Quantitative Analysis in Sports issue into player tracking data in football. Finally, we conclude with some best practices and limitations regarding usage of this data. The release of player tracking data marks a transition for the league and its' analysts, and we hope this issue helps guide innovation in football analytics for years to come.
apple.com
In this episode, I had the pleasure of talking to Stephen Harris, a vastly experienced bettor and bookmaker with a speciality on both greyhounds and jumps racing.Starting out life as a bookmaker at the age of 20 taking bets on dogs, Stephen walks me through his early years on track, through to his time working at Sporting Index with some of the sharpest minds going, before his current role providing content and tips for Betting Expert.Along the way Stephen has learnt plenty on how the betting world works including who makes it pay and how, whether it be punting on course or online with both bookies and exchanges. Its through the latter where Stephen bets exclusively these days and he has plenty to share on how the betting markets now operate and the importance of adapting and evolving as a punter to survive.He also has some strong thoughts on the current mess surrounding affordability, the motivations of those at the Gambling Commission, through to how easy it is these days to bet into the black market.We also touch on his work at Betting Expert and his superb Free Value Angle column and its edge and educational approach.
netlify.app - Mine Ăetinkaya-Rundel and Johanna Hardin
This is the website for Introduction to Modern Statistics, Second Edition by Mine Ăetinkaya-Rundel and Johanna Hardin. Introduction to Modern Statistics, which weâll refer to as IMS going forward, is a textbook from the OpenIntro project.
videolectures.net - Christoph Bergmeir
A random walk through the random forest.
medium.com - Florian Aust
Prepare for a journey into advanced statistical evaluation parameters. The 7 following metrics hold the keys to deciphering the intricacies of your data and will empower you to make informed decisions in the world of machine learning and regression analysis: Correlation analysis, ChiÂČ contingency analysis, p-value analysis, Kolmogorov-Smirnov-Test, RÂČ coefficient of determination, Explained Variance Score, Mean Squared Error
kalmanfilter.net - Alex Becker
The Kalman Filter algorithm is a powerful tool for estimating and predicting system states in the presence of uncertainty and is widely used as a fundamental component in applications such as target tracking, navigation, and control.Although the Kalman Filter is a straightforward concept, many resources on the subject require extensive mathematical background and fail to provide practical examples and illustrations, making it more complicated than necessary.Back in 2017, I created an online tutorial based on numerical examples and intuitive explanations to make the topic more accessible and understandable. The online tutorial provides introductory material covering the univariate (one-dimensional) and multivariate (multidimensional) Kalman Filters.