Skip to main content

Showing 1–50 of 177 results for author: Hutter, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21387  [pdf, ps, other

    cs.LG

    Early Stopping Tabular In-Context Learning

    Authors: Jaris Küken, Lennart Purucker, Frank Hutter

    Abstract: Tabular foundation models have shown strong performance across various tabular learning tasks via in-context learning, offering robust generalization without any downstream finetuning. However, their inference-time costs remain high, particularly for larger datasets. To address this, we propose early-stopping the in-context learning process. We achieve this by dynamically evaluating whether to sto… ▽ More

    Submitted 28 June, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

    Comments: ICML Workshop Paper

  2. arXiv:2506.16791  [pdf, ps, other

    cs.LG cs.AI

    TabArena: A Living Benchmark for Machine Learning on Tabular Data

    Authors: Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Salinas, Frank Hutter

    Abstract: With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular b… ▽ More

    Submitted 25 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

    Comments: v2: fixed author list. 51 pages. Code available at https://tabarena.ai/code; examples at https://tabarena.ai/code-examples; dataset curation at https://tabarena.ai/data-tabular-ml-iid-study and https://tabarena.ai/dataset-curation

  3. arXiv:2506.07049  [pdf, ps, other

    cs.LG cs.CY

    FairPFN: A Tabular Foundation Model for Causal Fairness

    Authors: Jake Robertson, Noah Hollmann, Samuel Müller, Noor Awad, Frank Hutter

    Abstract: Machine learning (ML) systems are utilized in critical sectors, such as healthcare, law enforcement, and finance. However, these systems are often trained on historical data that contains demographic biases, leading to ML decisions that perpetuate or exacerbate existing social inequalities. Causal fairness provides a transparent, human-in-the-loop framework to mitigate algorithmic discrimination,… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  4. arXiv:2506.06143  [pdf, ps, other

    cs.LG

    carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks

    Authors: Carolin Benjamins, Helena Graf, Sarah Segel, Difan Deng, Tim Ruhkopf, Leona Hennig, Soham Basu, Neeratyoy Mallik, Edward Bergman, Deyao Chen, François Clément, Matthias Feurer, Katharina Eggensperger, Frank Hutter, Carola Doerr, Marius Lindauer

    Abstract: Hyperparameter Optimization (HPO) is crucial to develop well-performing machine learning models. In order to ease prototyping and benchmarking of HPO methods, we propose carps, a benchmark framework for Comprehensive Automated Research Performance Studies allowing to evaluate N optimizers on M benchmark tasks. In this first release of carps, we focus on the four most important types of HPO task ty… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  5. arXiv:2506.06039  [pdf, ps, other

    cs.LG

    Do-PFN: In-Context Learning for Causal Effect Estimation

    Authors: Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, Bernhard Schölkopf

    Abstract: Estimation of causal effects is critical to a range of scientific disciplines. Existing methods for this task either require interventional data, knowledge about the ground truth causal graph, or rely on assumptions such as unconfoundedness, restricting their applicability in real-world settings. In the domain of tabular machine learning, Prior-data fitted networks (PFNs) have achieved state-of-th… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  6. arXiv:2505.23947  [pdf, ps, other

    cs.LG cs.AI

    Position: The Future of Bayesian Prediction Is Prior-Fitted

    Authors: Samuel Müller, Arik Reuter, Noah Hollmann, David Rügamer, Frank Hutter

    Abstract: Training neural networks on randomly generated artificial datasets yields Bayesian models that capture the prior defined by the dataset-generating distribution. Prior-data Fitted Networks (PFNs) are a class of methods designed to leverage this insight. In an era of rapidly increasing computational resources for pre-training and a near stagnation in the generation of new real-world data in many app… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted as position paper at ICML 2025

  7. arXiv:2505.23032  [pdf, ps, other

    cs.LG cs.AI

    Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks

    Authors: Dongwoo Lee, Dong Bok Lee, Steven Adriaensen, Juho Lee, Sung Ju Hwang, Frank Hutter, Seon Joo Kim, Hae Beom Lee

    Abstract: Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications invol… ▽ More

    Submitted 15 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML 2025

  8. arXiv:2505.22014  [pdf, ps, other

    cs.LG

    Learning in Compact Spaces with Approximately Normalized Transformers

    Authors: Jörg K. H. Franke, Urs Spiegelhalter, Marianna Nezhurina, Jenia Jitsev, Frank Hutter, Michael Hefenbrock

    Abstract: In deep learning, regularization and normalization are common solutions for challenges such as overfitting, numerical instabilities, and the increasing variance in the residual stream. An alternative approach is to force all parameters and representations to lie on a hypersphere. This removes the need for regularization and increases convergence speed, but comes with additional costs. In this work… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Preprint

  9. arXiv:2505.21372  [pdf, ps, other

    cs.LG cs.AI

    Improving LLM-based Global Optimization with Search Space Partitioning

    Authors: Andrej Schwanke, Lyubomir Ivanov, David Salinas, Fabio Ferreira, Aaron Klein, Frank Hutter, Arber Zela

    Abstract: Large Language Models (LLMs) have recently emerged as effective surrogate models and candidate generators within global optimization frameworks for expensive blackbox functions. Despite promising results, LLM-based methods often struggle in high-dimensional search spaces or when lacking domain-specific priors, leading to sparse or uninformative suggestions. To overcome these limitations, we propos… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 25 pages, 10 figures, 3 tables

  10. arXiv:2504.10735  [pdf, other

    cs.LG cs.AI

    Frozen Layers: Memory-efficient Many-fidelity Hyperparameter Optimization

    Authors: Timur Carstensen, Neeratyoy Mallik, Frank Hutter, Martin Rapp

    Abstract: As model sizes grow, finding efficient and cost-effective hyperparameter optimization (HPO) methods becomes increasingly crucial for deep learning pipelines. While multi-fidelity HPO (MF-HPO) trades off computational resources required for DL training with lower fidelity estimations, existing fidelity sources often fail under lower compute and memory constraints. We propose a novel fidelity source… ▽ More

    Submitted 17 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  11. arXiv:2504.04945  [pdf, other

    cs.LG cs.AI cs.CL

    A Llama walks into the 'Bar': Efficient Supervised Fine-Tuning for Legal Reasoning in the Multi-state Bar Exam

    Authors: Rean Fernandes, André Biedenkapp, Frank Hutter, Noor Awad

    Abstract: Legal reasoning tasks present unique challenges for large language models (LLMs) due to the complexity of domain-specific knowledge and reasoning processes. This paper investigates how effectively smaller language models (Llama 2 7B and Llama 3 8B) can be fine-tuned with a limited dataset of 1,514 Multi-state Bar Examination (MBE) questions to improve legal question answering accuracy. We evaluate… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: COLM 2025 preprint, 9 pages, 3 figures, 16 appendix pages

    ACM Class: I.2.7; I.2.1

  12. arXiv:2503.09159  [pdf, other

    cs.LG

    Unreflected Use of Tabular Data Repositories Can Undermine Research Quality

    Authors: Andrej Tschalzev, Lennart Purucker, Stefan Lüdtke, Frank Hutter, Christian Bartelt, Heiner Stuckenschmidt

    Abstract: Data repositories have accumulated a large number of tabular datasets from various domains. Machine Learning researchers are actively using these datasets to evaluate novel approaches. Consequently, data repositories have an important standing in tabular data research. They not only host datasets but also provide information on how to use them in supervised learning tasks. In this paper, we argue… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  13. arXiv:2502.10297  [pdf, ps, other

    cs.LG cs.CL cs.FL

    DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

    Authors: Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, Riccardo Grazzi

    Abstract: Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling, offering efficient training and linear-time inference. However, existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices. Diagonal matrices, used in models such as Mamba, GLA, or m… ▽ More

    Submitted 19 June, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: v5: Characterization of DeltaProduct's state-tracking ability. Analysis of hidden state's effective rank. Improved scaling analysis. v6: Added analysis for products of RWKV-7 matrices

  14. arXiv:2502.06684  [pdf, ps, other

    cs.LG cs.AI

    EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks

    Authors: Michael Arbel, David Salinas, Frank Hutter

    Abstract: Recent foundational models for tabular data, such as TabPFN, excel at adapting to new tasks via in-context learning, but remain constrained to a fixed, pre-defined number of target dimensions-often necessitating costly ensembling strategies. We trace this constraint to a deeper architectural shortcoming: these models lack target equivariance, so that permuting target dimension orderings alters the… ▽ More

    Submitted 3 July, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  15. arXiv:2502.03654  [pdf, other

    cs.LG cs.AI cs.CV

    Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics

    Authors: Indrashis Das, Mahmoud Safari, Steven Adriaensen, Frank Hutter

    Abstract: Activation functions are fundamental elements of deep learning architectures as they significantly influence training dynamics. ReLU, while widely used, is prone to the dying neuron problem, which has been mitigated by variants such as LeakyReLU, PReLU, and ELU that better handle negative neuron outputs. Recently, self-gated activations like GELU and Swish have emerged as state-of-the-art alternat… ▽ More

    Submitted 21 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: 8 pages, excluding references and appendix; v2: slight improvement in presentation. Equation (4) added, with proof in Appendix A. Appendices B (Flipped Mish) and I (Machine Translation) added. Figure 9 added to Appendix C. Appendix D extended with Heatmaps 12 and 13

  16. arXiv:2502.02672  [pdf, other

    cs.CL cs.LG

    Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes

    Authors: Mayuka Jayawardhana, Renbo, Samuel Dooley, Valeriia Cherepanova, Andrew Gordon Wilson, Frank Hutter, Colin White, Tom Goldstein, Micah Goldblum

    Abstract: Large language models (LLMs) perform remarkably well on tabular datasets in zero- and few-shot settings, since they can extract meaning from natural language column headers that describe features and labels. Similarly, TabPFN, a recent non-LLM transformer pretrained on numerous tables for in-context learning, has demonstrated excellent performance for dataset sizes up to a thousand samples. In con… ▽ More

    Submitted 5 February, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

    Comments: 12 pages, 6 figures

    ACM Class: I.2.m; I.2.6; I.2.7

  17. arXiv:2501.17178  [pdf, other

    cs.CL cs.AI cs.LG

    Tuning LLM Judge Design Decisions for 1/1000 of the Cost

    Authors: David Salinas, Omar Swelam, Frank Hutter

    Abstract: Evaluating Large Language Models (LLMs) often requires costly human annotations. To address this, LLM-based judges have been proposed, which compare the outputs of two LLMs enabling the ranking of models without human intervention. While several approaches have been proposed, many confounding factors are present between different papers. For instance the model, the prompt and other hyperparameters… ▽ More

    Submitted 27 May, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

  18. arXiv:2501.02945  [pdf, other

    cs.LG

    From Tables to Time: How TabPFN-v2 Outperforms Specialized Time Series Forecasting Models

    Authors: Shi Bin Hoo, Samuel Müller, David Salinas, Frank Hutter

    Abstract: Foundation models have become increasingly popular for forecasting due to their ability to provide predictions without requiring a lot of training data. In this work, we demonstrate how TabPFN-v2, a general tabular foundation model, can be effectively applied to time series forecasting. We introduce TabPFN-TS, a simple method that combines TabPFN-v2 with lightweight feature engineering to enable b… ▽ More

    Submitted 26 May, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

    Comments: This version extends our NeurIPS 2024 workshop paper, The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features, presented at the Table Representation Learning and Time Series in the Age of Large Models workshops

  19. arXiv:2411.12537  [pdf, other

    cs.LG cs.CL cs.FL

    Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

    Authors: Riccardo Grazzi, Julien Siems, Arber Zela, Jörg K. H. Franke, Frank Hutter, Massimiliano Pontil

    Abstract: Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers for long sequences. However, both Transformers and LRNNs struggle to perform state-tracking, which may impair performance in tasks such as code evaluation. In one forward pass, current architectures are unable to solve even parity, the simplest state-trackin… ▽ More

    Submitted 18 March, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

    Comments: V2: Correction to Theorem 1 and 2 and to point 3 of Proposition 1. V3: ICLR Camera Ready, V4: ICLR Camera Ready, added figures to theory section, updated modular arithmetic with brackets results because previous results did not contain multiplication

  20. arXiv:2411.10634  [pdf, other

    cs.LG stat.ML

    Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data

    Authors: Kai Helli, David Schnurr, Noah Hollmann, Samuel Müller, Frank Hutter

    Abstract: While most ML models expect independent and identically distributed data, this assumption is often violated in real-world scenarios due to distribution shifts, resulting in the degradation of machine learning model performance. Until now, no tabular method has consistently outperformed classical supervised learning, which ignores these shifts. To address temporal distribution shifts, we present Dr… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: Accepted at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

    MSC Class: 68T07 ACM Class: I.2.6

  21. arXiv:2411.07340  [pdf, other

    cs.LG cs.AI

    Warmstarting for Scaling Language Models

    Authors: Neeratyoy Mallik, Maciej Janowski, Johannes Hog, Herilalaina Rakotoarison, Aaron Klein, Josif Grabocka, Frank Hutter

    Abstract: Scaling model sizes to scale performance has worked remarkably well for the current large language models paradigm. The research and empirical findings of various scaling studies led to novel scaling results and laws that guides subsequent research. High training costs for contemporary scales of data and models result in a lack of thorough understanding of how to tune and arrive at such training s… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

  22. arXiv:2411.01195  [pdf, other

    cs.CL cs.LG

    Transfer Learning for Finetuning Large Language Models

    Authors: Tobias Strangmann, Lennart Purucker, Jörg K. H. Franke, Ivo Rapant, Fabio Ferreira, Frank Hutter

    Abstract: As the landscape of large language models expands, efficiently finetuning for specific tasks becomes increasingly crucial. At the same time, the landscape of parameter-efficient finetuning methods rapidly expands. Consequently, practitioners face a multitude of complex choices when searching for an optimal finetuning pipeline for large language models. To reduce the complexity for practitioners, w… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

    Comments: Accepted at NeurIPS 2024 Workshop on Adaptive Foundation Models

  23. arXiv:2410.19889  [pdf, other

    cs.CL cs.LG

    Ensembling Finetuned Language Models for Text Classification

    Authors: Sebastian Pineda Arango, Maciej Janowski, Lennart Purucker, Arber Zela, Frank Hutter, Josif Grabocka

    Abstract: Finetuning is a common practice widespread across different communities to adapt pretrained models to particular tasks. Text classification is one of these tasks for which many pretrained models are available. On the other hand, ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates. However, ensembling pretrained models for text classificat… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: Workshop on Fine-Tuning in Modern Machine Learning @ NeurIPS 2024. arXiv admin note: text overlap with arXiv:2410.04520

  24. arXiv:2410.17787  [pdf, other

    cs.LG cs.AI

    Large Language Models Engineer Too Many Simple Features For Tabular Data

    Authors: Jaris Küken, Lennart Purucker, Frank Hutter

    Abstract: Tabular machine learning problems often require time-consuming and labor-intensive feature engineering. Recent efforts have focused on using large language models (LLMs) to capitalize on their potential domain knowledge. At the same time, researchers have observed ethically concerning negative biases in other LLM-related use cases, such as text generation. These developments motivated us to invest… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

    Comments: Preprint

  25. arXiv:2410.13286  [pdf, other

    cs.LG

    A Human-in-the-Loop Fairness-Aware Model Selection Framework for Complex Fairness Objective Landscapes

    Authors: Jake Robertson, Thorsten Schmidt, Frank Hutter, Noor Awad

    Abstract: Fairness-aware Machine Learning (FairML) applications are often characterized by complex social objectives and legal requirements, frequently involving multiple, potentially conflicting notions of fairness. Despite the well-known Impossibility Theorem of Fairness and extensive theoretical research on the statistical and socio-technical trade-offs between fairness metrics, many FairML tools still o… ▽ More

    Submitted 21 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

  26. arXiv:2410.09385  [pdf, other

    cs.LG cs.AI

    Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space Models

    Authors: Sathya Kamesh Bhethanabhotla, Omar Swelam, Julien Siems, David Salinas, Frank Hutter

    Abstract: This paper introduces Mamba4Cast, a zero-shot foundation model for time series forecasting. Based on the Mamba architecture and inspired by Prior-data Fitted Networks (PFNs), Mamba4Cast generalizes robustly across diverse time series tasks without the need for dataset specific fine-tuning. Mamba4Cast's key innovation lies in its ability to achieve strong zero-shot performance on real-world dataset… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  27. arXiv:2410.06479  [pdf, other

    cs.CL

    Compressing Large Language Models with Automated Sub-Network Search

    Authors: Rhea Sanjay Sukthanker, Benedikt Staffler, Frank Hutter, Aaron Klein

    Abstract: Large Language Models (LLMs) demonstrate exceptional reasoning abilities, enabling strong generalization across diverse tasks such as commonsense reasoning and instruction following. However, as LLMs scale, inference costs become increasingly prohibitive, accumulating significantly over their life cycle. In this paper we consider model compression for LLMs to reduce model size while improving down… ▽ More

    Submitted 5 February, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

  28. arXiv:2410.04560  [pdf, other

    cs.LG stat.ML

    GAMformer: In-Context Learning for Generalized Additive Models

    Authors: Andreas Mueller, Julien Siems, Harsha Nori, David Salinas, Arber Zela, Rich Caruana, Frank Hutter

    Abstract: Generalized Additive Models (GAMs) are widely recognized for their ability to create fully interpretable machine learning models for tabular data. Traditionally, training GAMs involves iterative learning algorithms, such as splines, boosted trees, or neural networks, which refine the additive components through repeated error reduction. In this paper, we introduce GAMformer, the first method to le… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

    Comments: 20 pages, 12 figures

  29. arXiv:2410.04520  [pdf, ps, other

    cs.LG

    Regularized Neural Ensemblers

    Authors: Sebastian Pineda Arango, Maciej Janowski, Lennart Purucker, Arber Zela, Frank Hutter, Josif Grabocka

    Abstract: Ensemble methods are known for enhancing the accuracy and robustness of machine learning models by combining multiple base learners. However, standard approaches like greedy or random ensembling often fall short, as they assume a constant weight across samples for the ensemble members. This can limit expressiveness and hinder performance when aggregating the ensemble predictions. In this study, we… ▽ More

    Submitted 23 June, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

    Comments: Accepted in AutoML Conference 2025

  30. arXiv:2410.01565  [pdf, other

    cs.LG stat.ML

    Bayes' Power for Explaining In-Context Learning Generalizations

    Authors: Samuel Müller, Noah Hollmann, Frank Hutter

    Abstract: Traditionally, neural network training has been primarily viewed as an approximation of maximum likelihood estimation (MLE). This interpretation originated in a time when training for multiple epochs on small datasets was common and performance was data bound; but it falls short in the era of large-scale single-epoch trainings ushered in by large self-supervised setups, like language models. In th… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  31. arXiv:2409.18827  [pdf, other

    cs.LG

    ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning

    Authors: Jannis Becktepe, Julian Dierkes, Carolin Benjamins, Aditya Mohan, David Salinas, Raghu Rajan, Frank Hutter, Holger Hoos, Marius Lindauer, Theresa Eimer

    Abstract: Hyperparameters are a critical factor in reliably training well-performing reinforcement learning (RL) agents. Unfortunately, developing and evaluating automated approaches for tuning such hyperparameters is both costly and time-consuming. As a result, such approaches are often only evaluated on a single domain or algorithm, making comparisons difficult and limiting insights into their generalizab… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: Accepted at the 17th European Workshop on Reinforcement Learning

    Journal ref: 17th European Workshop on Reinforcement Learning 2024

  32. arXiv:2409.14084  [pdf, other

    cs.LG cs.AI

    One-shot World Models Using a Transformer Trained on a Synthetic Prior

    Authors: Fabio Ferreira, Moreno Schlageter, Raghu Rajan, Andre Biedenkapp, Frank Hutter

    Abstract: A World Model is a compressed spatial and temporal representation of a real world environment that allows one to train an agent or execute planning methods. However, world models are typically trained on observations from the real world environment, and they usually do not enable learning policies for other real environments. We propose One-Shot World Model (OSWM), a transformer world model that i… ▽ More

    Submitted 24 October, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

  33. arXiv:2408.06820  [pdf, other

    cs.LG cs.AI

    Efficient Search for Customized Activation Functions with Gradient Descent

    Authors: Lukas Strack, Mahmoud Safari, Frank Hutter

    Abstract: Different activation functions work best for different deep learning models. To exploit this, we leverage recent advancements in gradient-based search techniques for neural architectures to efficiently identify high-performing activation functions for a given application. We propose a fine-grained search cell that combines basic mathematical operations to model activation functions, allowing for t… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 10 pages, 1 figure, excluding references and appendix

  34. arXiv:2408.02533  [pdf, other

    cs.LG

    LMEMs for post-hoc analysis of HPO Benchmarking

    Authors: Anton Geburek, Neeratyoy Mallik, Danny Stoll, Xavier Bouthillier, Frank Hutter

    Abstract: The importance of tuning hyperparameters in Machine Learning (ML) and Deep Learning (DL) is established through empirical research and applications, evident from the increase in new hyperparameter optimization (HPO) algorithms and benchmarks steadily added by the community. However, current benchmarking practices using averaged performance across many datasets may obscure key differences between H… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  35. arXiv:2407.05732  [pdf, other

    cs.LG cs.AI cs.CY

    FairPFN: Transformers Can do Counterfactual Fairness

    Authors: Jake Robertson, Noah Hollmann, Noor Awad, Frank Hutter

    Abstract: Machine Learning systems are increasingly prevalent across healthcare, law enforcement, and finance but often operate on historical data, which may carry biases against certain demographic groups. Causal and counterfactual fairness provides an intuitive way to define fairness that closely aligns with legal standards. Despite its theoretical benefits, counterfactual fairness comes with several prac… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  36. arXiv:2406.18701  [pdf, other

    cs.LG cs.AI

    Fast Optimizer Benchmark

    Authors: Simon Blauth, Tobias Bürger, Zacharias Häringer, Jörg Franke, Frank Hutter

    Abstract: In this paper, we present the Fast Optimizer Benchmark (FOB), a tool designed for evaluating deep learning optimizers during their development. The benchmark supports tasks from multiple domains such as computer vision, natural language processing, and graph learning. The focus is on convenient usage, featuring human-readable YAML configurations, SLURM integration, and plotting utilities. FOB can… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 5 pages + 12 appendix pages, submitted to AutoML Conf 2024 Workshop Track

  37. arXiv:2406.03348  [pdf, other

    cs.LG

    Position: A Call to Action for a Human-Centered AutoML Paradigm

    Authors: Marius Lindauer, Florian Karl, Anne Klier, Julia Moosbauer, Alexander Tornede, Andreas Mueller, Frank Hutter, Matthias Feurer, Bernd Bischl

    Abstract: Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive p… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  38. arXiv:2405.10299  [pdf, other

    cs.LG cs.AI

    HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

    Authors: Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Aaron Klein, Lennart Purucker, Joerg K. H. Franke, Frank Hutter

    Abstract: The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training a… ▽ More

    Submitted 3 November, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: 59 pages, 73 figures, 11 tables

  39. arXiv:2405.03389  [pdf, other

    cs.LG cs.AI

    Don't Waste Your Time: Early Stopping Cross-Validation

    Authors: Edward Bergman, Lennart Purucker, Frank Hutter

    Abstract: State-of-the-art automated machine learning systems for tabular data often employ cross-validation; ensuring that measured performances generalize to unseen data, or that subsequent ensembling does not overfit. However, using k-fold cross-validation instead of holdout validation drastically increases the computational cost of validating a single configuration. While ensuring better generalization… ▽ More

    Submitted 2 August, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted at Third International Conference on Automated Machine Learning (AutoML 2024); for code, see https://github.com/automl/DontWasteYourTime-early-stopping

  40. arXiv:2404.16795  [pdf, other

    cs.LG

    In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization

    Authors: Herilalaina Rakotoarison, Steven Adriaensen, Neeratyoy Mallik, Samir Garibov, Edward Bergman, Frank Hutter

    Abstract: With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this ap… ▽ More

    Submitted 12 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Published at the 41st International Conference on Machine Learning (ICML), Vienna, Austria

  41. arXiv:2404.16551  [pdf, other

    cs.LG

    Surprisingly Strong Performance Prediction with Neural Graph Features

    Authors: Gabriela Kadlecová, Jovita Lukasik, Martin Pilát, Petra Vidnerová, Mahmoud Safari, Roman Neruda, Frank Hutter

    Abstract: Performance prediction has been a key part of the neural architecture search (NAS) process, allowing to speed up NAS algorithms by avoiding resource-consuming network training. Although many performance predictors correlate well with ground truth performance, they require training data in the form of trained networks. Recently, zero-cost proxies have been proposed as an efficient method to estimat… ▽ More

    Submitted 13 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: ICML 2024. Code at https://github.com/gabikadlecova/zc_combine , blog post: https://gabikadlecova.github.io/blog/2024/graf/

  42. arXiv:2403.01888  [pdf, other

    cs.AI cs.LG

    Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks

    Authors: Shuhei Watanabe, Neeratyoy Mallik, Edward Bergman, Frank Hutter

    Abstract: While deep learning has celebrated many successes, its results often hinge on the meticulous selection of hyperparameters (HPs). However, the time-consuming nature of deep learning training makes HP optimization (HPO) a costly endeavor, slowing down the development of efficient HPO tools. While zero-cost benchmarks, which provide performance and runtime without actual training, offer a solution fo… ▽ More

    Submitted 19 August, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to AutoML Conference 2024 ABCD Track

  43. arXiv:2402.18213  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-objective Differentiable Neural Architecture Search

    Authors: Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter

    Abstract: Pareto front profiling in multi-objective optimization (MOO), i.e., finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives that require training a neural network. Typically, in MOO for neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware… ▽ More

    Submitted 4 February, 2025; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 44 pages, 34 figures

  44. arXiv:2402.18153  [pdf, other

    cs.LG cs.AI

    Diffusion-Based Neural Network Weights Generation

    Authors: Bedionita Soro, Bruno Andreis, Hayeon Lee, Wonyong Jeong, Song Chong, Frank Hutter, Sung Ju Hwang

    Abstract: Transfer learning has gained significant attention in recent deep learning research due to its ability to accelerate convergence and enhance performance on new tasks. However, its success is often contingent on the similarity between source and target data, and training on numerous datasets can be costly, leading to blind selection of pretrained models with limited insight into their effectiveness… ▽ More

    Submitted 25 October, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 32 pages

  45. arXiv:2402.11137  [pdf, other

    cs.LG

    TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks

    Authors: Benjamin Feuer, Robin Tibor Schirrmeister, Valeriia Cherepanova, Chinmay Hegde, Frank Hutter, Micah Goldblum, Niv Cohen, Colin White

    Abstract: While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to large language models, PFNs make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass. However, current PFNs have limitations that prohibit their widespread adopt… ▽ More

    Submitted 21 October, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2024 Poster

  46. arXiv:2402.03170  [pdf, other

    cs.LG

    Is Mamba Capable of In-Context Learning?

    Authors: Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, Frank Hutter

    Abstract: State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL), a variant of meta-learning concerning the learned ability to solve tasks during a neural network forward pass, exploiting contextual information provided as input to the model. This useful ability emerges as a side product of the foundation model's massive pretraining. While transformer models… ▽ More

    Submitted 24 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  47. arXiv:2401.05351  [pdf, other

    q-bio.BM cs.LG

    Rethinking Performance Measures of RNA Secondary Structure Problems

    Authors: Frederic Runge, Jörg K. H. Franke, Daniel Fertmann, Frank Hutter

    Abstract: Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 scor… ▽ More

    Submitted 4 December, 2023; originally announced January 2024.

    Comments: 12 pages, Accepted at the Machine Learning for Structural Biology Workshop, NeurIPS 2023

  48. arXiv:2312.10440  [pdf, other

    cs.LG cs.AI

    Weight-Entanglement Meets Gradient-Based Neural Architecture Search

    Authors: Rhea Sanjay Sukthanker, Arjun Krishnakumar, Mahmoud Safari, Frank Hutter

    Abstract: Weight sharing is a fundamental concept in neural architecture search (NAS), enabling gradient-based methods to explore cell-based architecture spaces significantly faster than traditional blackbox approaches. In parallel, weight \emph{entanglement} has emerged as a technique for intricate parameter sharing among architectures within macro-level search spaces. %However, the macro structure of such… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  49. arXiv:2311.14645  [pdf, other

    cs.LG stat.ML

    A General Framework for User-Guided Bayesian Optimization

    Authors: Carl Hvarfner, Frank Hutter, Luigi Nardi

    Abstract: The optimization of expensive-to-evaluate black-box functions is prevalent in various scientific disciplines. Bayesian optimization is an automatic, general and sample-efficient method to solve these problems with minimal knowledge of the underlying function dynamics. However, the ability of Bayesian optimization to incorporate prior knowledge or beliefs about the function at hand in order to acce… ▽ More

    Submitted 17 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: 18 pages, 11 figures

    Journal ref: 12:th International Conference on Learning Representations (ICLR 2024)

  50. arXiv:2311.09058  [pdf, other

    cs.LG

    Improving Deep Learning Optimization through Constrained Parameter Regularization

    Authors: Jörg K. H. Franke, Michael Hefenbrock, Gregor Koehler, Frank Hutter

    Abstract: Regularization is a critical component in deep learning. The most commonly used approach, weight decay, applies a constant penalty coefficient uniformly across all parameters. This may be overly restrictive for some parameters, while insufficient for others. To address this, we present Constrained Parameter Regularization (CPR) as an alternative to traditional weight decay. Unlike the uniform appl… ▽ More

    Submitted 7 December, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: In Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), 35 pages