-
In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization
Authors:
Herilalaina Rakotoarison,
Steven Adriaensen,
Neeratyoy Mallik,
Samir Garibov,
Edward Bergman,
Frank Hutter
Abstract:
With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this ap…
▽ More
With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this approach pose challenges for existing methods, requiring retraining or fine-tuning their neural network surrogates online, introducing overhead, instability, and hyper-hyperparameters. In this work, we propose FT-PFN, a novel surrogate for Freeze-thaw style BO. FT-PFN is a prior-data fitted network (PFN) that leverages the transformers' in-context learning ability to efficiently and reliably do Bayesian learning curve extrapolation in a single forward pass. Our empirical analysis across three benchmark suites shows that the predictions made by FT-PFN are more accurate and 10-100 times faster than those of the deep Gaussian process and deep ensemble surrogates used in previous work. Furthermore, we show that, when combined with our novel acquisition mechanism (MFPI-random), the resulting in-context freeze-thaw BO method (ifBO), yields new state-of-the-art performance in the same three families of deep learning HPO benchmarks considered in prior work.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks
Authors:
Steven Adriaensen,
Herilalaina Rakotoarison,
Samuel Müller,
Frank Hutter
Abstract:
Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fi…
▽ More
Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fitted neural networks (PFNs) in this context. A PFN is a transformer, pre-trained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We propose LC-PFN, a PFN trained to extrapolate 10 million artificial right-censored learning curves generated from a parametric prior proposed in prior art using MCMC. We demonstrate that LC-PFN can approximate the posterior predictive distribution more accurately than MCMC, while being over 10 000 times faster. We also show that the same LC-PFN achieves competitive performance extrapolating a total of 20 000 real learning curves from four learning curve benchmarks (LCBench, NAS-Bench-201, Taskset, and PD1) that stem from training a wide range of model architectures (MLPs, CNNs, RNNs, and Transformers) on 53 different datasets with varying input modalities (tabular, image, text, and protein data). Finally, we investigate its potential in the context of model selection and find that a simple LC-PFN based predictive early stopping criterion obtains 2 - 6x speed-ups on 45 of these datasets, at virtually no overhead.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
A note on small weight codewords of projective geometric codes and on the smallest sets of even type
Authors:
Sam Adriaensen
Abstract:
In this paper, we study the codes $\mathcal C_k(n,q)$ arising from the incidence of points and $k$-spaces in $\text{PG}(n,q)$ over the field $\mathbb F_p$, with $q = p^h$, $p$ prime. We classify all codewords of minimum weight of the dual code $\mathcal C_k(n,q)^\perp$ in case $q \in \{4,8\}$. This is equivalent to classifying the smallest sets of even type in $\text{PG}(n,q)$ for $q \in \{4,8\}$.…
▽ More
In this paper, we study the codes $\mathcal C_k(n,q)$ arising from the incidence of points and $k$-spaces in $\text{PG}(n,q)$ over the field $\mathbb F_p$, with $q = p^h$, $p$ prime. We classify all codewords of minimum weight of the dual code $\mathcal C_k(n,q)^\perp$ in case $q \in \{4,8\}$. This is equivalent to classifying the smallest sets of even type in $\text{PG}(n,q)$ for $q \in \{4,8\}$. We also provide shorter proofs for some already known results, namely of the best known lower bound on the minimum weight of $\mathcal C_k(n,q)^\perp$ for general values of $q$, and of the classification of all codewords of $\mathcal C_{n-1}(n,q)$ of weight up to $2q^{n-1}$.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
On additive MDS codes with linear projections
Authors:
Sam Adriaensen,
Simeon Ball
Abstract:
We support some evidence that a long additive MDS code over a finite field must be equivalent to a linear code. More precisely, let $C$ be an $\mathbb F_q$-linear $(n,q^{hk},n-k+1)_{q^h}$ MDS code over $\mathbb F_{q^h}$. If $k=3$, $h \in \{2,3\}$, $n > \max \{q^{h-1},h q -1\} + 3$, and $C$ has three coordinates from which its projections are equivalent to linear codes, we prove that $C$ itself is…
▽ More
We support some evidence that a long additive MDS code over a finite field must be equivalent to a linear code. More precisely, let $C$ be an $\mathbb F_q$-linear $(n,q^{hk},n-k+1)_{q^h}$ MDS code over $\mathbb F_{q^h}$. If $k=3$, $h \in \{2,3\}$, $n > \max \{q^{h-1},h q -1\} + 3$, and $C$ has three coordinates from which its projections are equivalent to linear codes, we prove that $C$ itself is equivalent to a linear code. If $k>3$, $n > q+k$, and there are two disjoint subsets of coordinates whose combined size is at most $k-2$ from which the projections of $C$ are equivalent to linear codes, we prove that $C$ is equivalent to a code which is linear over a larger field than $\mathbb F_q$.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Automated Dynamic Algorithm Configuration
Authors:
Steven Adriaensen,
André Biedenkapp,
Gresa Shala,
Noor Awad,
Theresa Eimer,
Marius Lindauer,
Frank Hutter
Abstract:
The performance of an algorithm often critically depends on its parameter configuration. While a variety of automated algorithm configuration methods have been proposed to relieve users from the tedious and error-prone task of manually tuning parameters, there is still a lot of untapped potential as the learned configuration is static, i.e., parameter settings remain fixed throughout the run. Howe…
▽ More
The performance of an algorithm often critically depends on its parameter configuration. While a variety of automated algorithm configuration methods have been proposed to relieve users from the tedious and error-prone task of manually tuning parameters, there is still a lot of untapped potential as the learned configuration is static, i.e., parameter settings remain fixed throughout the run. However, it has been shown that some algorithm parameters are best adjusted dynamically during execution, e.g., to adapt to the current part of the optimization landscape. Thus far, this is most commonly achieved through hand-crafted heuristics. A promising recent alternative is to automatically learn such dynamic parameter adaptation policies from data. In this article, we give the first comprehensive account of this new field of automated dynamic algorithm configuration (DAC), present a series of recent advances, and provide a solid foundation for future research in this field. Specifically, we (i) situate DAC in the broader historical context of AI research; (ii) formalize DAC as a computational problem; (iii) identify the methods used in prior-art to tackle this problem; (iv) conduct empirical case studies for using DAC in evolutionary optimization, AI planning, and machine learning.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
DACBench: A Benchmark Library for Dynamic Algorithm Configuration
Authors:
Theresa Eimer,
André Biedenkapp,
Maximilian Reimer,
Steven Adriaensen,
Frank Hutter,
Marius Lindauer
Abstract:
Dynamic Algorithm Configuration (DAC) aims to dynamically control a target algorithm's hyperparameters in order to improve its performance. Several theoretical and empirical results have demonstrated the benefits of dynamically controlling hyperparameters in domains like evolutionary computation, AI Planning or deep learning. Replicating these results, as well as studying new methods for DAC, howe…
▽ More
Dynamic Algorithm Configuration (DAC) aims to dynamically control a target algorithm's hyperparameters in order to improve its performance. Several theoretical and empirical results have demonstrated the benefits of dynamically controlling hyperparameters in domains like evolutionary computation, AI Planning or deep learning. Replicating these results, as well as studying new methods for DAC, however, is difficult since existing benchmarks are often specialized and incompatible with the same interfaces. To facilitate benchmarking and thus research on DAC, we propose DACBench, a benchmark library that seeks to collect and standardize existing DAC benchmarks from different AI domains, as well as provide a template for new ones. For the design of DACBench, we focused on important desiderata, such as (i) flexibility, (ii) reproducibility, (iii) extensibility and (iv) automatic documentation and visualization. To show the potential, broad applicability and challenges of DAC, we explore how a set of six initial benchmarks compare in several dimensions of difficulty.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
Metaheuristics "In the Large"
Authors:
Jerry Swan,
Steven Adriaensen,
Alexander E. I. Brownlee,
Kevin Hammond,
Colin G. Johnson,
Ahmed Kheiri,
Faustyna Krawiec,
J. J. Merelo,
Leandro L. Minku,
Ender Özcan,
Gisele L. Pappa,
Pablo García-Sánchez,
Kenneth Sörensen,
Stefan Voß,
Markus Wagner,
David R. White
Abstract:
Following decades of sustained improvement, metaheuristics are one of the great success stories of optimization research. However, in order for research in metaheuristics to avoid fragmentation and a lack of reproducibility, there is a pressing need for stronger scientific and computational infrastructure to support the development, analysis and comparison of new approaches. We argue that, via pri…
▽ More
Following decades of sustained improvement, metaheuristics are one of the great success stories of optimization research. However, in order for research in metaheuristics to avoid fragmentation and a lack of reproducibility, there is a pressing need for stronger scientific and computational infrastructure to support the development, analysis and comparison of new approaches. We argue that, via principled choice of infrastructure support, the field can pursue a higher level of scientific enquiry. We describe our vision and report on progress, showing how the adoption of common protocols for all metaheuristics can help liberate the potential of the field, easing the exploration of the design space of metaheuristics.
△ Less
Submitted 3 June, 2021; v1 submitted 19 November, 2020;
originally announced November 2020.