Skip to main content

Showing 1–17 of 17 results for author: Vollmer, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16118  [pdf, other

    cs.CR

    Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models

    Authors: Daniel Reti, Norman Becker, Tillmann Angeli, Anasuya Chattopadhyay, Daniel Schneider, Sebastian Vollmer, Hans D. Schotten

    Abstract: With the increasing prevalence of security incidents, the adoption of deception-based defense strategies has become pivotal in cyber security. This work addresses the challenge of scalability in designing honeytokens, a key component of such defense mechanisms. The manual creation of honeytokens is a tedious task. Although automated generators exists, they often lack versatility, being specialized… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 12 pages

  2. arXiv:2402.07770  [pdf, other

    cs.IR cs.CL stat.AP

    Quantitative knowledge retrieval from large language models

    Authors: David Selby, Kai Spriestersbach, Yuichiro Iwashita, Dennis Bappert, Archana Warrier, Sumantrak Mukherjee, Muhammad Nabeel Asim, Koichi Kise, Sebastian Vollmer

    Abstract: Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less well understood. In this paper we explore the feasibility of LLMs as a mechanism for quantitative knowledge retrieval to aid data analysis tasks such as elicitation of prior distributions for Bayesian… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 13 pages plus supplementary materials

  3. arXiv:2401.08513  [pdf, other

    cs.LG cs.CR

    X Hacking: The Threat of Misguided AutoML

    Authors: Rahul Sharma, Sergey Redyuk, Sumantrak Mukherjee, Andrea Sipka, Sebastian Vollmer, David Selby

    Abstract: Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values. We show how an automated machine learning pipe… ▽ More

    Submitted 12 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 13 pages, 8 figures, plus supplementary materials

  4. arXiv:2307.06431  [pdf, other

    stat.ML cs.LG

    Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

    Authors: Tobias Schröder, Zijing Ou, Jen Ning Lim, Yingzhen Li, Sebastian J. Vollmer, Andrew B. Duncan

    Abstract: Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likeli… ▽ More

    Submitted 27 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: Camera Ready version for the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Changes in this revision: Appendix A1: Corrected proof of Theorem 1. Appendix D3: Added definition and numerical experiments for energy discrepancy on binary discrete spaces. Minor changes in the main text and correction of typos. Added new references

  5. arXiv:2206.03256  [pdf, other

    cs.CY cs.LG stat.AP stat.ME

    Flexible Group Fairness Metrics for Survival Analysis

    Authors: Raphael Sonabend, Florian Pfisterer, Alan Mishler, Moritz Schauer, Lukas Burk, Sumantrak Mukherjee, Sebastian Vollmer

    Abstract: Algorithmic fairness is an increasingly important field concerned with detecting and mitigating biases in machine learning models. There has been a wealth of literature for algorithmic fairness in regression and classification however there has been little exploration of the field for survival analysis. Survival analysis is the prediction task in which one attempts to predict the probability of an… ▽ More

    Submitted 22 July, 2022; v1 submitted 26 May, 2022; originally announced June 2022.

    Comments: Accepted in DSHealth 2022 (Workshop on Applied Data Science for Healthcare)

  6. arXiv:2202.01929  [pdf, other

    cs.LG stat.ML

    Energy-Based Models for Functional Data using Path Measure Tilting

    Authors: Jen Ning Lim, Sebastian Vollmer, Lorenz Wolf, Andrew Duncan

    Abstract: Energy-Based Models (EBMs) have proven to be a highly effective approach for modelling densities on finite-dimensional spaces. Their ability to incorporate domain-specific choices and constraints into the structure of the model through composition make EBMs an appealing candidate for applications in physics, biology and computer vision and various other fields. Recently, Energy-Based Processes (EB… ▽ More

    Submitted 22 February, 2023; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: Updated for AISTATS 2023

  7. arXiv:2112.04828  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures

    Authors: Raphael Sonabend, Andreas Bender, Sebastian Vollmer

    Abstract: In this paper we consider how to evaluate survival distribution predictions with measures of discrimination. This is a non-trivial problem as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantage… ▽ More

    Submitted 9 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  8. arXiv:2108.10934  [pdf, other

    stat.ML cs.CR cs.LG

    Mitigating Statistical Bias within Differentially Private Synthetic Data

    Authors: Sahra Ghalebikesabi, Harrison Wilde, Jack Jewson, Arnaud Doucet, Sebastian Vollmer, Chris Holmes

    Abstract: Increasing interest in privacy-preserving machine learning has led to new and evolved approaches for generating private synthetic data from undisclosed real data. However, mechanisms of privacy preservation can significantly reduce the utility of synthetic data, which in turn impacts downstream tasks such as learning predictive models or inference. We propose several re-weighting strategies using… ▽ More

    Submitted 19 May, 2022; v1 submitted 24 August, 2021; originally announced August 2021.

  9. arXiv:2012.15505  [pdf, other

    cs.LG

    Flexible model composition in machine learning and its implementation in MLJ

    Authors: Anthony D. Blaom, Sebastian J. Vollmer

    Abstract: A graph-based protocol called `learning networks' which combine assorted machine learning models into meta-models is described. Learning networks are shown to overcome several limitations of model composition as implemented in the dominant machine learning platforms. After illustrating the protocol in simple examples, a concise syntax for specifying a learning network, implemented in the MLJ frame… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

    Comments: 13 pages, 3 figures

    ACM Class: I.2.6

  10. arXiv:2011.08299  [pdf, other

    cs.LG stat.AP stat.ME stat.ML

    Foundations of Bayesian Learning from Synthetic Data

    Authors: Harrison Wilde, Jack Jewson, Sebastian Vollmer, Chris Holmes

    Abstract: There is significant growth and interest in the use of synthetic data as an enabler for machine learning in environments where the release of real data is restricted due to privacy or availability constraints. Despite a large number of methods for synthetic data generation, there are comparatively few results on the statistical properties of models learnt on synthetic data, and fewer still for sit… ▽ More

    Submitted 24 November, 2020; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: 43 pages (10 main text, 33 supplement), 32 figures (4 main text, 28 supplement)

  11. arXiv:2011.02407  [pdf, other

    cs.LG cs.CY econ.EM

    Debiasing classifiers: is reality at variance with expectation?

    Authors: Ashrya Agrawal, Florian Pfisterer, Bernd Bischl, Francois Buet-Golfouse, Srijan Sood, Jiahao Chen, Sameena Shah, Sebastian Vollmer

    Abstract: We present an empirical study of debiasing methods for classifiers, showing that debiasers often fail in practice to generalize out-of-sample, and can in fact make fairness worse rather than better. A rigorous evaluation of the debiasing treatment effect requires extensive cross-validation beyond what is usually done. We demonstrate that this phenomenon can be explained as a consequence of bias-va… ▽ More

    Submitted 30 May, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: 13 pages, under review

    MSC Class: 68T01; 68Q32; 68T05 ACM Class: G.4; I.2.0; J.4

  12. arXiv:2010.11530  [pdf, other

    stat.ML cs.LG

    Model updating after interventions paradoxically introduces bias

    Authors: James Liley, Samuel R Emerson, Bilal A Mateen, Catalina A Vallejos, Louis J M Aslett, Sebastian J Vollmer

    Abstract: Machine learning is increasingly being used to generate prediction models for use in a number of real-world settings, from credit risk assessment to clinical decision support. Recent discussions have highlighted potential problems in the updating of a predictive score for a binary outcome when an existing predictive score forms part of the standard workflow, driving interventions. In this setting,… ▽ More

    Submitted 22 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Sections of this preprint on 'Successive adjuvancy' (section 4, theorem 2, figures 4,5, and associated discussions) were not included in the originally submitted version of this paper due to length. This material does not appear in the published version of this manuscript, and the reader should be aware that these sections did not undergo peer review

  13. MLJ: A Julia package for composable machine learning

    Authors: Anthony D. Blaom, Franz Kiraly, Thibaut Lienart, Yiannis Simillides, Diego Arenas, Sebastian J. Vollmer

    Abstract: MLJ (Machine Learing in Julia) is an open source software package providing a common interface for interacting with machine learning models written in Julia and other languages. It provides tools and meta-algorithms for selecting, tuning, evaluating, composing and comparing those models, with a focus on flexible model composition. In this design overview we detail chief novelties of the framework,… ▽ More

    Submitted 3 November, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: Shortened version of previous version

    Journal ref: Journal of Open Source Software, 2020, vol. 5(55), p. 2704

  14. arXiv:1908.08737  [pdf, other

    cs.CR

    Design choices for productive, secure, data-intensive research at scale in the cloud

    Authors: Diego Arenas, Jon Atkins, Claire Austin, David Beavan, Alvaro Cabrejas Egea, Steven Carlysle-Davies, Ian Carter, Rob Clarke, James Cunningham, Tom Doel, Oliver Forrest, Evelina Gabasova, James Geddes, James Hetherington, Radka Jersakova, Franz Kiraly, Catherine Lawrence, Jules Manser, Martin T. O'Reilly, James Robinson, Helen Sherwood-Taylor, Serena Tierney, Catalina A. Vallejos, Sebastian Vollmer, Kirstie Whitaker

    Abstract: We present a policy and process framework for secure environments for productive data science research projects at scale, by combining prevailing data security threat and risk profiles into five sensitivity tiers, and, at each tier, specifying recommended policies for data classification, data ingress, software ingress, data egress, user access, user device control, and analysis environments. By p… ▽ More

    Submitted 15 September, 2019; v1 submitted 23 August, 2019; originally announced August 2019.

  15. arXiv:1812.10404  [pdf

    cs.CY cs.LG stat.AP stat.ML

    Machine learning and AI research for Patient Benefit: 20 Critical Questions on Transparency, Replicability, Ethics and Effectiveness

    Authors: Sebastian Vollmer, Bilal A. Mateen, Gergo Bohner, Franz J Király, Rayid Ghani, Pall Jonsson, Sarah Cumbers, Adrian Jonas, Katherine S. L. McAllister, Puja Myles, David Granger, Mark Birse, Richard Branson, Karel GM Moons, Gary S Collins, John P. A. Ioannidis, Chris Holmes, Harry Hemingway

    Abstract: Machine learning (ML), artificial intelligence (AI) and other modern statistical methods are providing new opportunities to operationalize previously untapped and rapidly growing sources of data for patient benefit. Whilst there is a lot of promising research currently being undertaken, the literature as a whole lacks: transparency; clear reporting to facilitate replicability; exploration for pote… ▽ More

    Submitted 21 December, 2018; originally announced December 2018.

    Comments: 25 pages, 2 boxes, 1 figure

    MSC Class: 68T01

  16. arXiv:1611.06972  [pdf, other

    stat.ML cs.LG math.PR

    Measuring Sample Quality with Diffusions

    Authors: Jackson Gorham, Andrew B. Duncan, Sebastian J. Vollmer, Lester Mackey

    Abstract: Stein's method for measuring convergence to a continuous target distribution relies on an operator characterizing the target and Stein factor bounds on the solutions of an associated differential equation. While such operators and bounds are readily available for a diversity of univariate targets, few multivariate targets have been analyzed. We introduce a new class of characterizing operators bas… ▽ More

    Submitted 12 November, 2018; v1 submitted 21 November, 2016; originally announced November 2016.

    MSC Class: 60J60; 62-04; 62E17; 60E15; 65C60 (Primary) 62-07; 65C05; 68T05 (Secondary)

  17. arXiv:1512.09327  [pdf, other

    cs.LG stat.ML

    Distributed Bayesian Learning with Stochastic Natural-gradient Expectation Propagation and the Posterior Server

    Authors: Leonard Hasenclever, Stefan Webb, Thibaut Lienart, Sebastian Vollmer, Balaji Lakshminarayanan, Charles Blundell, Yee Whye Teh

    Abstract: This paper makes two contributions to Bayesian machine learning algorithms. Firstly, we propose stochastic natural gradient expectation propagation (SNEP), a novel alternative to expectation propagation (EP), a popular variational inference algorithm. SNEP is a black box variational algorithm, in that it does not require any simplifying assumptions on the distribution of interest, beyond the exist… ▽ More

    Submitted 7 September, 2017; v1 submitted 31 December, 2015; originally announced December 2015.

    Comments: 37 pages, 7 figures

    Journal ref: Journal of Machine Learning Research 18 (2017) 1-37