openreview.net - Anurag Garg ~Anurag_Garg1 , Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel MĂĽller, Frank Hutter
Foundation models for tabular data, like TabPFN, achieve strong performance on small datasets when pre-trained solely on synthetic data. We show that this performance can be significantly boosted by a targeted continued pre-training phase. Specifically, we demonstrate that leveraging a small, curated collection of large, real-world datasets for continued pre-training yields superior downstream predictive accuracy compared to using broader, potentially noisier corpora like CommonCrawl or GitTables. Our resulting model, Real-TabPFN, achieves substantial performance gains on 29 datasets from the OpenML AutoML Benchmark.
openreview.net - Magnus BĂĽhler ~Magnus_BĂĽhler1 , Lennart Purucker, Frank Hutter
Tabular foundation models pre-trained on synthetically generated datasets have exhibited strong in-context learning capabilities. While fine-tuning can further enhance predictive performance, overfitting to the training data of a downstream task poses a significant risk in tiny-to-small data regimes. We propose a fine-tuning method that employs synthetically generated fine-tuning data to avoid overfitting and improve generalization performance. We study three variants of data generation methods and empirically demonstrate that they mitigate overfitting and outperform standard fine-tuning approaches across five tiny-to-small real-world datasets. Our data generation methods leverage density estimators and structural causal models, akin to those employed during pre-training, to yield the best performance. Our findings indicate that synthetic data generation, a central element in pre-training, can be successfully adapted to enhance fine-tuning.
pythonfootball.com - Python Football Review
For as long as I’ve watched football, I’ve been fascinated by the pundits who sit in match‑day studios and announce exactly how the weekend will unfold.But are they actually any good—or are we all just blindly trusting their reputation?It’s time to find out.
argmin.net - Ben Recht
Y is not predictable from X.I call claims of this form “unpredictability arguments.” Papers making unpredictability arguments can get a lot of temporary traction in machine learning discourse. They give fuel to the petty battles inside the community. In our current landscape, they give ammunition for lay critiques of industry. They can even help bring on AI Winters if people take them seriously enough. The problem is they are much harder to justify as stated.
substack.com - Kieron O’Connor
Like many fans, I have found football radars to be really helpful when assessing the relative qualities of individual players, as they are visually very intuitive and easy to understand at a glance.Therefore, I have long wanted to produce something similar to highlight the financial strengths and weaknesses of football clubs to investors, but have been defeated by various technical limitations (mainly my own).Until now.