expectinggoals.com - Michael Caley
If there were no Expecting Goals team ratings model, people could still get insights from soccer analytics about their favorite teams. And while I have worked hard to optimize this model and find new ways to use the statistical record to evaluate clubs, these are surely marginal gains.I decided to build this ratings system for two reasons. The first is because it is fun. And the second is that building a ratings system opens up a variety of possible new studies and new ways of approaching studies.
thetransferflow.com - Neel Shelat
Arsenal are now four points clear at the top of the Premier League and certainly look like the best team in the division.
youtube.com - Michael MacKelvie
We know that players are influenced by their environments, but how do we separate them? How does this vary positionally?
expectinggoals.com - Michael Caley
One decision I made in constructing the Expecting Goals Team Ratings system was not to optimize just for the English Premier League. The training and testing set included equal numbers of seasons from the top divisions in Spain, Italy, Germany and France. It is possible that, at the margins, this decision will make the ratings and projections a little less precise for the Premier League. But it also means that for future studies, these methods have been tested on more data and can be used on a more expansive set of data.
substack.com - Alex Marin Felices
The following summary critically reviews the research paper titled "Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)" by Gabriel Anzer, Kilian Arnsmeyer, Pascal Bauer, Joris Bekkers, Ulf Brefeld, Jesse Davis, Nicolas Evans, Matthias Kempe, Samuel J Robertson, Joshua Wyatt Smith and Jan Van Haaren. All data, figures, and analysis presented here are drawn from their original work; I do not claim any authorship or ownership of the content. This summary has been written to provide a concise and technically informed synthesis of the paper’s findings, methodologies, and implications, while maintaining fidelity to the authors’ intellectual contributions.
substack.com - Ernest Chan
Features are inputs to machine learning algorithms. Sometimes also called independent variables, covariates, or just X, they can be used for supervised or unsupervised learning, or for optimization. For example, at QTS, we use more than 100 of them as inputs to dynamically calibrate the allocation between our Tail Reaper strategy and E-mini S&P 500 futures. In general, modelers have no idea which features are useful a priori, or if they are redundant, for a particular application. Using all of the features can result in overfitting and poor out-of-sample performance, or worse, numerical instability and singularities during matrix inversion. Hence the need for a process called “features selection”.
