Skip to main content

Showing 1–50 of 139 results for author: Hutter, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03389  [pdf, other

    cs.LG cs.AI

    Don't Waste Your Time: Early Stopping Cross-Validation

    Authors: Edward Bergman, Lennart Purucker, Frank Hutter

    Abstract: State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold cross-validation instead of holdout validation drastically increases the computational cost of validating a single configuration. While ensuring better generalization… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted at Third International Conference on Automated Machine Learning (AutoML 2024); for code, see https://github.com/automl/DontWasteYourTime-early-stopping

  2. arXiv:2404.16795  [pdf, other

    cs.LG

    In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization

    Authors: Herilalaina Rakotoarison, Steven Adriaensen, Neeratyoy Mallik, Samir Garibov, Edward Bergman, Frank Hutter

    Abstract: With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this ap… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  3. arXiv:2404.16551  [pdf, other

    cs.LG

    Surprisingly Strong Performance Prediction with Neural Graph Features

    Authors: Gabriela Kadlecová, Jovita Lukasik, Martin Pilát, Petra Vidnerová, Mahmoud Safari, Roman Neruda, Frank Hutter

    Abstract: Performance prediction has been a key part of the neural architecture search (NAS) process, allowing to speed up NAS algorithms by avoiding resource-consuming network training. Although many performance predictors correlate well with ground truth performance, they require training data in the form of trained networks. Recently, zero-cost proxies have been proposed as an efficient method to estimat… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 45 pages, 30 figures

  4. arXiv:2403.01888  [pdf, other

    cs.AI cs.LG

    Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks

    Authors: Shuhei Watanabe, Neeratyoy Mallik, Edward Bergman, Frank Hutter

    Abstract: While deep learning has celebrated many successes, its results often hinge on the meticulous selection of hyperparameters (HPs). However, the time-consuming nature of deep learning training makes HP optimization (HPO) a costly endeavor, slowing down the development of efficient HPO tools. While zero-cost benchmarks, which provide performance and runtime without actual training, offer a solution fo… ▽ More

    Submitted 17 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Submitted to AutoML Conference 2024 ABCD Track

  5. arXiv:2402.18213  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-objective Differentiable Neural Architecture Search

    Authors: Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter

    Abstract: Pareto front profiling in multi-objective optimization (MOO), i.e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training. Typically, in MOO neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints in… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 31 pages, 22 figures

  6. arXiv:2402.18153  [pdf, other

    cs.LG cs.AI

    Diffusion-based Neural Network Weights Generation

    Authors: Bedionita Soro, Bruno Andreis, Hayeon Lee, Song Chong, Frank Hutter, Sung Ju Hwang

    Abstract: Transfer learning is a topic of significant interest in recent deep learning research because it enables faster convergence and improved performance on new tasks. While the performance of transfer learning depends on the similarity of the source data to the target data, it is costly to train a model on a large number of datasets. Therefore, pretrained models are generally blindly selected with the… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 14 pages

  7. arXiv:2402.11137  [pdf, other

    cs.LG

    TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks

    Authors: Benjamin Feuer, Robin Tibor Schirrmeister, Valeriia Cherepanova, Chinmay Hegde, Frank Hutter, Micah Goldblum, Niv Cohen, Colin White

    Abstract: While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to large language models, PFNs make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass. However, current PFNs have limitations that prohibit their widespread adopt… ▽ More

    Submitted 18 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  8. arXiv:2402.03170  [pdf, other

    cs.LG

    Is Mamba Capable of In-Context Learning?

    Authors: Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, Frank Hutter

    Abstract: State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL), a variant of meta-learning concerning the learned ability to solve tasks during a neural network forward pass, exploiting contextual information provided as input to the model. This useful ability emerges as a side product of the foundation model's massive pretraining. While transformer models… ▽ More

    Submitted 24 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  9. arXiv:2401.05351  [pdf, other

    q-bio.BM cs.LG

    Rethinking Performance Measures of RNA Secondary Structure Problems

    Authors: Frederic Runge, Jörg K. H. Franke, Daniel Fertmann, Frank Hutter

    Abstract: Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 scor… ▽ More

    Submitted 4 December, 2023; originally announced January 2024.

    Comments: 12 pages, Accepted at the Machine Learning for Structural Biology Workshop, NeurIPS 2023

  10. arXiv:2312.10440  [pdf, other

    cs.LG cs.AI

    Weight-Entanglement Meets Gradient-Based Neural Architecture Search

    Authors: Rhea Sanjay Sukthanker, Arjun Krishnakumar, Mahmoud Safari, Frank Hutter

    Abstract: Weight sharing is a fundamental concept in neural architecture search (NAS), enabling gradient-based methods to explore cell-based architecture spaces significantly faster than traditional blackbox approaches. In parallel, weight \emph{entanglement} has emerged as a technique for intricate parameter sharing among architectures within macro-level search spaces. %However, the macro structure of such… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  11. arXiv:2311.14645  [pdf, other

    cs.LG stat.ML

    A General Framework for User-Guided Bayesian Optimization

    Authors: Carl Hvarfner, Frank Hutter, Luigi Nardi

    Abstract: The optimization of expensive-to-evaluate black-box functions is prevalent in various scientific disciplines. Bayesian optimization is an automatic, general and sample-efficient method to solve these problems with minimal knowledge of the underlying function dynamics. However, the ability of Bayesian optimization to incorporate prior knowledge or beliefs about the function at hand in order to acce… ▽ More

    Submitted 17 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: 18 pages, 11 figures

    Journal ref: 12:th International Conference on Learning Representations (ICLR 2024)

  12. arXiv:2311.09058  [pdf, other

    cs.LG

    Constrained Parameter Regularization

    Authors: Jörg K. H. Franke, Michael Hefenbrock, Gregor Koehler, Frank Hutter

    Abstract: Regularization is a critical component in deep learning training, with weight decay being a commonly used approach. It applies a constant penalty coefficient uniformly across all parameters. This may be unnecessarily restrictive for some parameters, while insufficiently restricting others. To dynamically adjust penalty coefficients for different parameter groups, we present constrained parameter r… ▽ More

    Submitted 6 December, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

  13. arXiv:2310.20447  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks

    Authors: Steven Adriaensen, Herilalaina Rakotoarison, Samuel Müller, Frank Hutter

    Abstract: Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fi… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  14. arXiv:2310.17688  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Managing AI Risks in an Era of Rapid Progress

    Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

    Abstract: In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose urgent priorities for AI R&D and governance.

    Submitted 12 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

  15. arXiv:2310.03940  [pdf, other

    cs.CV cs.AI

    Hard View Selection for Self-Supervised Learning

    Authors: Fabio Ferreira, Ivo Rapant, Frank Hutter

    Abstract: Many Self-Supervised Learning (SSL) methods train their models to be invariant to different "views" of an image input for which a good data augmentation pipeline is crucial. While considerable efforts were directed towards improving pre-text tasks, architectures, or robustness (e.g., Siamese networks or teacher-softmax centering), the majority of these methods remain strongly reliant on the random… ▽ More

    Submitted 31 December, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

  16. arXiv:2307.10073  [pdf, other

    cs.LG q-bio.BM

    Scalable Deep Learning for RNA Secondary Structure Prediction

    Authors: Jörg K. H. Franke, Frederic Runge, Frank Hutter

    Abstract: The field of RNA secondary structure prediction has made significant progress with the adoption of deep learning techniques. In this work, we present the RNAformer, a lean deep learning model using axial attention and recycling in the latent space. We gain performance improvements by designing the architecture for modeling the adjacency matrix directly in the latent space and by scaling the size o… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted at the 2023 ICML Workshop on Computational Biology. Honolulu, Hawaii, USA, 2023

  17. arXiv:2307.08801  [pdf, other

    cs.LG q-bio.GN

    Towards Automated Design of Riboswitches

    Authors: Frederic Runge, Jörg K. H. Franke, Frank Hutter

    Abstract: Experimental screening and selection pipelines for the discovery of novel riboswitches are expensive, time-consuming, and inefficient. Using computational methods to reduce the number of candidates for the screen could drastically decrease these costs. However, existing computational approaches do not fully satisfy all requirements for the design of such initial screening libraries. In this work,… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 9 pages, Accepted at the 2023 ICML Workshop on Computational Biology

  18. arXiv:2306.12370  [pdf, other

    cs.LG

    PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning

    Authors: Neeratyoy Mallik, Edward Bergman, Carl Hvarfner, Danny Stoll, Maciej Janowski, Marius Lindauer, Luigi Nardi, Frank Hutter

    Abstract: Hyperparameters of Deep Learning (DL) pipelines are crucial for their downstream performance. While a large number of methods for Hyperparameter Optimization (HPO) have been developed, their incurred costs are often untenable for modern DL. Consequently, manual experimentation is still the most prevalent approach to optimize hyperparameters, relying on the researcher's intuition, domain knowledge,… ▽ More

    Submitted 15 November, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  19. arXiv:2306.03828  [pdf, other

    cs.LG

    Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

    Authors: Sebastian Pineda Arango, Fabio Ferreira, Arlind Kadra, Frank Hutter, Josif Grabocka

    Abstract: With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained mode… ▽ More

    Submitted 22 February, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

  20. arXiv:2305.17535  [pdf, other

    cs.LG stat.ML

    PFNs4BO: In-Context Learning for Bayesian Optimization

    Authors: Samuel Müller, Matthias Feurer, Noah Hollmann, Frank Hutter

    Abstract: In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). PFNs are neural processes that are trained to approximate the posterior predictive distribution (PPD) through in-context learning on any prior distribution that can be efficiently sampled from. We describe how this flexibility can be exploited for surrogate modeling in BO. We use PFNs to… ▽ More

    Submitted 22 July, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: In: Proceedings of the 40th International Conference on Machine Learning (ICML'23), PMLR 202:25444-25470, 2023

  21. arXiv:2305.04502  [pdf, other

    cs.LG cs.NE

    MO-DEHB: Evolutionary-based Hyperband for Multi-Objective Optimization

    Authors: Noor Awad, Ayushi Sharma, Philipp Muller, Janek Thomas, Frank Hutter

    Abstract: Hyperparameter optimization (HPO) is a powerful technique for automating the tuning of machine learning (ML) models. However, in many real-world applications, accuracy is only one of multiple performance criteria that must be considered. Optimizing these objectives simultaneously on a complex and diverse search space remains a challenging task. In this paper, we propose MO-DEHB, an effective and f… ▽ More

    Submitted 11 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

  22. arXiv:2305.03403  [pdf, other

    cs.AI cs.LG

    Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

    Authors: Noah Hollmann, Samuel Müller, Frank Hutter

    Abstract: As the field of automated machine learning (AutoML) advances, it becomes increasingly important to incorporate domain knowledge into these systems. We present an approach for doing so by harnessing the power of large language models (LLMs). Specifically, we introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to itera… ▽ More

    Submitted 28 September, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

  23. arXiv:2304.11005  [pdf, other

    cs.LG stat.ML

    Self-Correcting Bayesian Optimization through Bayesian Active Learning

    Authors: Carl Hvarfner, Erik Hellsten, Frank Hutter, Luigi Nardi

    Abstract: Gaussian processes are the model of choice in Bayesian optimization and active learning. Yet, they are highly dependent on cleverly chosen hyperparameters to reach their full potential, and little effort is devoted to finding good hyperparameters in the literature. We demonstrate the impact of selecting good hyperparameters for GPs and present two acquisition functions that explicitly prioritize h… ▽ More

    Submitted 15 February, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

    Journal ref: 37th International Conference on Neural Information Processing Systems (NeurIPS 2023)

  24. arXiv:2304.10255  [pdf, other

    cs.LG stat.ML

    PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

    Authors: Shuhei Watanabe, Archit Bansal, Frank Hutter

    Abstract: The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA).… ▽ More

    Submitted 26 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Accepted by IJCAI2023

  25. Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML

    Authors: Hilde Weerts, Florian Pfisterer, Matthias Feurer, Katharina Eggensperger, Edward Bergman, Noor Awad, Joaquin Vanschoren, Mykola Pechenizkiy, Bernd Bischl, Frank Hutter

    Abstract: The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to p… ▽ More

    Submitted 20 February, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Journal ref: Journal of Artificial Intelligence Research 79 (2024) 639-677

  26. arXiv:2301.08727  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Architecture Search: Insights from 1000 Papers

    Authors: Colin White, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela, Debadeepta Dey, Frank Hutter

    Abstract: In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural ar… ▽ More

    Submitted 25 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

  27. arXiv:2212.06751  [pdf, other

    cs.LG cs.AI

    Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator

    Authors: Shuhei Watanabe, Noor Awad, Masaki Onishi, Frank Hutter

    Abstract: Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body o… ▽ More

    Submitted 31 May, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accpeted to IJCAI 2023

  28. Mind the Gap: Measuring Generalization Performance Across Multiple Objectives

    Authors: Matthias Feurer, Katharina Eggensperger, Edward Bergman, Florian Pfisterer, Bernd Bischl, Frank Hutter

    Abstract: Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from th… ▽ More

    Submitted 9 February, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

  29. arXiv:2211.14411  [pdf, other

    cs.LG cs.AI

    c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization

    Authors: Shuhei Watanabe, Frank Hutter

    Abstract: Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as memory usage, or latency on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle t… ▽ More

    Submitted 26 May, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted to IJCAI 2023

  30. arXiv:2211.01842  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

    Authors: Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, Frank Hutter

    Abstract: The discovery of neural architectures from simple building blocks is a long-standing goal of Neural Architecture Search (NAS). Hierarchical search spaces are a promising step towards this goal but lack a unifying search space design framework and typically only search over some limited aspect of architectures. In this work, we introduce a unifying search space design framework based on context-fre… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2023

  31. arXiv:2210.09943  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition

    Authors: Samuel Dooley, Rhea Sanjay Sukthanker, John P. Dickerson, Colin White, Frank Hutter, Micah Goldblum

    Abstract: Face recognition systems are widely deployed in safety-critical applications, including law enforcement, yet they exhibit bias across a range of socio-demographic dimensions, such as gender and race. Conventional wisdom dictates that model biases arise from biased training data. As a consequence, previous works on bias mitigation largely focused on pre-processing the training data, adding penaltie… ▽ More

    Submitted 6 December, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

  32. arXiv:2210.03230  [pdf, other

    cs.LG cs.AI stat.ML

    NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies

    Authors: Arjun Krishnakumar, Colin White, Arber Zela, Renbo Tu, Mahmoud Safari, Frank Hutter

    Abstract: Zero-cost proxies (ZC proxies) are a recent architecture performance prediction technique aiming to significantly speed up algorithms for neural architecture search (NAS). Recent work has shown that these techniques show great promise, but certain aspects, such as evaluating and exploiting their complementary strengths, are under-studied. In this work, we create NAS-Bench-Suite: we evaluate 13 ZC… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: NeurIPS Datasets and Benchmarks Track 2022

  33. arXiv:2209.11693  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    T3VIP: Transformation-based 3D Video Prediction

    Authors: Iman Nematollahi, Erick Rosete-Beas, Seyed Mahdi B. Azad, Raghu Rajan, Frank Hutter, Wolfram Burgard

    Abstract: For autonomous skill acquisition, robots have to learn about the physical rules governing the 3D world dynamics from their own past experience to predict and reason about plausible future outcomes. To this end, we propose a transformation-based 3D video prediction (T3VIP) approach that explicitly models the 3D motion by decomposing a scene into its object parts and predicting their corresponding r… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: Accepted at the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  34. arXiv:2207.07875  [pdf, other

    cs.LG cs.AI cs.CV

    On the Importance of Hyperparameters and Data Augmentation for Self-Supervised Learning

    Authors: Diane Wagner, Fabio Ferreira, Danny Stoll, Robin Tibor Schirrmeister, Samuel Müller, Frank Hutter

    Abstract: Self-Supervised Learning (SSL) has become a very active area of Deep Learning research where it is heavily used as a pre-training method for classification and other tasks. However, the rapid pace of advancements in this area comes at a price: training pipelines vary significantly across papers, which presents a potentially crucial confounding factor. Here, we show that, indeed, the choice of hype… ▽ More

    Submitted 16 July, 2022; originally announced July 2022.

    Comments: Accepted at the ICML 2022 Pre-training Workshop

  35. arXiv:2207.01848  [pdf, other

    cs.LG stat.ML

    TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

    Authors: Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter

    Abstract: We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN performs in-context learning (ICL), it learns to make predictions using sequences of labeled examples (x, f(x)) given in the input, without requiring further parameter… ▽ More

    Submitted 16 September, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

  36. arXiv:2206.08476  [pdf, other

    cs.LG cs.AI cs.CV

    Zero-Shot AutoML with Pretrained Models

    Authors: Ekrem Öztürk, Fabio Ferreira, Hadi S. Jomaa, Lars Schmidt-Thieme, Josif Grabocka, Frank Hutter

    Abstract: Given a new dataset D and a low compute budget, how should we choose a pre-trained model to fine-tune to D, and set the fine-tuning hyperparameters without risking overfitting, particularly if D is small? Here, we extend automated machine learning (AutoML) to best make these choices. Our domain-independent meta-learning approach learns a zero-shot surrogate model which, at test time, allows to sel… ▽ More

    Submitted 25 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Journal ref: International Conference on Machine Learning 2022

  37. arXiv:2206.08138  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Lessons learned from the NeurIPS 2021 MetaDL challenge: Backbone fine-tuning without episodic meta-learning dominates for few-shot learning image classification

    Authors: Adrian El Baz, Ihsan Ullah, Edesio Alcobaça, André C. P. L. F. Carvalho, Hong Chen, Fabio Ferreira, Henry Gouk, Chaoyu Guan, Isabelle Guyon, Timothy Hospedales, Shell Hu, Mike Huisman, Frank Hutter, Zhengying Liu, Felix Mohr, Ekrem Öztürk, Jan N. van Rijn, Haozhe Sun, Xin Wang, Wenwu Zhu

    Abstract: Although deep neural networks are capable of achieving performance superior to humans on various tasks, they are notorious for requiring large amounts of data and computing resources, restricting their success to domains where such resources are available. Metalearning methods can address this problem by transferring knowledge from related tasks, thus reducing the amount of data and computing reso… ▽ More

    Submitted 11 July, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: version 2 is the correct version, including supplementary material at the end

    Journal ref: NeurIPS 2021 Competition and Demonstration Track, Dec 2021, On-line, United States

  38. arXiv:2206.04771  [pdf, other

    cs.LG stat.ML

    Joint Entropy Search for Maximally-Informed Bayesian Optimization

    Authors: Carl Hvarfner, Frank Hutter, Luigi Nardi

    Abstract: Information-theoretic Bayesian optimization techniques have become popular for optimizing expensive-to-evaluate black-box functions due to their non-myopic qualities. Entropy Search and Predictive Entropy Search both consider the entropy over the optimum in the input space, while the recent Max-value Entropy Search considers the entropy over the optimal value in the output space. We propose Joint… ▽ More

    Submitted 14 January, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 10 pages, 8 figures

  39. arXiv:2206.03493  [pdf, other

    cs.LG

    DeepCAVE: An Interactive Analysis Tool for Automated Machine Learning

    Authors: René Sass, Eddie Bergman, André Biedenkapp, Frank Hutter, Marius Lindauer

    Abstract: Automated Machine Learning (AutoML) is used more than ever before to support users in determining efficient hyperparameters, neural architectures, or even full machine learning pipelines. However, users tend to mistrust the optimization process and its results due to a lack of transparency, making manual tuning still widespread. We introduce DeepCAVE, an interactive framework to analyze and monito… ▽ More

    Submitted 11 July, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: Workshop on Adaptive Experimental Design and Active Learning in the Real World (ReALML@ICML'22)

  40. arXiv:2205.13927  [pdf, other

    cs.LG q-bio.BM

    Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

    Authors: Jörg K. H. Franke, Frederic Runge, Frank Hutter

    Abstract: Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structure… ▽ More

    Submitted 14 November, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: 38 pages, Accepted at 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  41. arXiv:2205.13881  [pdf, other

    cs.AI cs.LG cs.NE

    Automated Dynamic Algorithm Configuration

    Authors: Steven Adriaensen, André Biedenkapp, Gresa Shala, Noor Awad, Theresa Eimer, Marius Lindauer, Frank Hutter

    Abstract: The performance of an algorithm often critically depends on its parameter configuration. While a variety of automated algorithm configuration methods have been proposed to relieve users from the tedious and error-prone task of manually tuning parameters, there is still a lot of untapped potential as the learned configuration is static, i.e., parameter settings remain fixed throughout the run. Howe… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  42. arXiv:2205.05511  [pdf, other

    cs.LG

    Efficient Automated Deep Learning for Time Series Forecasting

    Authors: Difan Deng, Florian Karl, Frank Hutter, Bernd Bischl, Marius Lindauer

    Abstract: Recent years have witnessed tremendously improved efficiency of Automated Machine Learning (AutoML), especially Automated Deep Learning (AutoDL) systems, but recent work focuses on tabular, image, or NLP tasks. So far, little attention has been paid to general AutoDL frameworks for time series forecasting, despite the enormous success in applying different novel architectures to such tasks. In thi… ▽ More

    Submitted 22 July, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

  43. arXiv:2204.11051  [pdf, other

    cs.LG stat.ML

    $π$BO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization

    Authors: Carl Hvarfner, Danny Stoll, Artur Souza, Marius Lindauer, Frank Hutter, Luigi Nardi

    Abstract: Bayesian optimization (BO) has become an established framework and popular tool for hyperparameter optimization (HPO) of machine learning (ML) algorithms. While known for its sample-efficiency, vanilla BO can not utilize readily available prior beliefs the practitioner has on the potential location of the optimum. Thus, BO disregards a valuable source of information, reducing its appeal to ML prac… ▽ More

    Submitted 23 April, 2022; originally announced April 2022.

    Comments: 9 pages, 4 figures, Accepted as poster for ICLR 2022

  44. arXiv:2203.01717  [pdf, other

    cs.LG

    Practitioner Motives to Select Hyperparameter Optimization Methods

    Authors: Niklas Hasebrook, Felix Morsbach, Niclas Kannengießer, Marc Zöller, Jörg Franke, Marius Lindauer, Frank Hutter, Ali Sunyaev

    Abstract: Advanced programmatic hyperparameter optimization (HPO) methods, such as Bayesian optimization, have high sample efficiency in reproducibly finding optimal hyperparameter values of machine learning (ML) models. Yet, ML practitioners often apply less sample-efficient HPO methods, such as grid search, which often results in under-optimized ML models. As a reason for this behavior, we suspect practit… ▽ More

    Submitted 26 June, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: submitted to JMLR; currently under review

  45. arXiv:2202.07242  [pdf, other

    cs.CV cs.LG

    Neural Architecture Search for Dense Prediction Tasks in Computer Vision

    Authors: Thomas Elsken, Arber Zela, Jan Hendrik Metzen, Benedikt Staffler, Thomas Brox, Abhinav Valada, Frank Hutter

    Abstract: The success of deep learning in recent years has lead to a rising demand for neural network architecture engineering. As a consequence, neural architecture search (NAS), which aims at automatically designing neural network architectures in a data-driven manner rather than manually, has evolved as a popular field of research. With the advent of weight sharing strategies across architectures, NAS ha… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  46. arXiv:2202.04500  [pdf, other

    cs.LG

    Contextualize Me -- The Case for Context in Reinforcement Learning

    Authors: Carolin Benjamins, Theresa Eimer, Frederik Schubert, Aditya Mohan, Sebastian Döhler, André Biedenkapp, Bodo Rosenhahn, Frank Hutter, Marius Lindauer

    Abstract: While Reinforcement Learning ( RL) has made great strides towards solving increasingly complicated problems, many algorithms are still brittle to even slight environmental changes. Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner, thereby enabling flexible, precise and interpretable task specification and generation. Our goal is to show how… ▽ More

    Submitted 2 June, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.02102

  47. arXiv:2202.03259  [pdf, other

    cs.NE cs.LG

    Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

    Authors: André Biedenkapp, Nguyen Dang, Martin S. Krejca, Frank Hutter, Carola Doerr

    Abstract: It has long been observed that the performance of evolutionary algorithms and other randomized search heuristics can benefit from a non-static choice of the parameters that steer their optimization behavior. Mechanisms that identify suitable configurations on the fly ("parameter control") or via a dedicated training process ("dynamic algorithm configuration") are therefore an important component o… ▽ More

    Submitted 15 April, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

  48. arXiv:2202.02790  [pdf, other

    cs.LG cs.AI

    Learning Synthetic Environments and Reward Networks for Reinforcement Learning

    Authors: Fabio Ferreira, Thomas Nierhoff, Andreas Saelinger, Frank Hutter

    Abstract: We introduce Synthetic Environments (SEs) and Reward Networks (RNs), represented by neural networks, as proxy environment models for training Reinforcement Learning (RL) agents. We show that an agent, after being trained exclusively on the SE, is able to solve the corresponding real environment. While an SE acts as a full proxy to a real environment by learning about its state dynamics and rewards… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Journal ref: International Conference on Learning Representations (ICLR 2022)

  49. arXiv:2201.13396  [pdf, other

    cs.LG cs.AI stat.ML

    NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy

    Authors: Yash Mehta, Colin White, Arber Zela, Arjun Krishnakumar, Guri Zabergja, Shakiba Moradian, Mahmoud Safari, Kaicheng Yu, Frank Hutter

    Abstract: The release of tabular benchmarks, such as NAS-Bench-101 and NAS-Bench-201, has significantly lowered the computational overhead for conducting scientific research in neural architecture search (NAS). Although they have been widely adopted and used to tune real-world NAS algorithms, these benchmarks are limited to small search spaces and focus solely on image classification. Recently, several new… ▽ More

    Submitted 11 February, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: ICLR 2022

  50. Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

    Authors: Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer

    Abstract: The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems,… ▽ More

    Submitted 2 June, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

    Comments: Published in JAIR. Co-first authors and co-last authors are listed in alphabetical order

    MSC Class: 68T01 ACM Class: I.2.6

    Journal ref: Journal of Artificial Intelligence Research 74 (2022) 517-568