Skip to main content

Showing 1–49 of 49 results for author: Steinke, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16706  [pdf, other

    cs.DS cs.CC cs.CR cs.LG

    Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy

    Authors: Krishnamurthy Dvijotham, H. Brendan McMahan, Krishna Pillutla, Thomas Steinke, Abhradeep Thakurta

    Abstract: In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differential… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  2. arXiv:2403.06634  [pdf, other

    cs.CR

    Stealing Part of a Production Language Model

    Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick, Florian Tramèr

    Abstract: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  3. arXiv:2310.15526  [pdf, other

    cs.LG cs.CR

    Privacy Amplification for Matrix Mechanisms

    Authors: Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

    Abstract: Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In… ▽ More

    Submitted 4 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Appearing in ICLR 2024. Changes made to match the conference version of the paper

  4. arXiv:2310.06771  [pdf, other

    cs.LG cs.AI cs.CR math.OC

    Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

    Authors: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Krishna Pillutla, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

    Abstract: Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choic… ▽ More

    Submitted 7 May, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, and Krishna Pillutla contributed equally

    Journal ref: ICLR 2024

  5. arXiv:2308.12947  [pdf, ps, other

    cs.DS cs.CR

    Counting Distinct Elements Under Person-Level Differential Privacy

    Authors: Alexander Knop, Thomas Steinke

    Abstract: We study the problem of counting the number of distinct elements in a dataset subject to the constraint of differential privacy. We consider the challenging setting of person-level DP (a.k.a. user-level DP) where each person may contribute an unbounded number of items and hence the sensitivity is unbounded. Our approach is to compute a bounded-sensitivity version of this query, which reduces to… ▽ More

    Submitted 27 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  6. arXiv:2305.13440  [pdf, ps, other

    cs.DS cs.LG

    Differentially Private Medians and Interior Points for Non-Pathological Data

    Authors: Maryam Aliakbarpour, Rose Silver, Thomas Steinke, Jonathan Ullman

    Abstract: We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over $\mathbb{R}$ satisfying very mild moment conditions. Our result stands in contrast to the surprising negative result of Bun et al. (FOCS 2015) that showed there is no differentially private estimator with any finite sample complexity that returns any non-trivial appr… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  7. arXiv:2305.13209  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Faster Differentially Private Convex Optimization via Second-Order Methods

    Authors: Arun Ganesh, Mahdi Haghifam, Thomas Steinke, Abhradeep Thakurta

    Abstract: Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate D… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  8. arXiv:2305.08846  [pdf, other

    cs.LG cs.CR cs.DS

    Privacy Auditing with One (1) Training Run

    Authors: Thomas Steinke, Milad Nasr, Matthew Jagielski

    Abstract: We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions a… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  9. arXiv:2303.18086  [pdf, other

    cs.CR cs.DB

    Differentially Private Stream Processing at Scale

    Authors: Bing Zhang, Vadym Doroshenko, Peter Kairouz, Thomas Steinke, Abhradeep Thakurta, Ziyin Ma, Eidan Cohen, Himani Apte, Jodi Spacek

    Abstract: We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

  10. arXiv:2302.09483  [pdf, other

    cs.LG

    Why Is Public Pretraining Necessary for Private Model Training?

    Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang

    Abstract: In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported with the use of pretraining on publicly available data. This is in part due to the benefits of transfer learning, which is the standard motivation for pretraining in non-private settings. However, the stark contrast in the improvement achieved through pretrai… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  11. arXiv:2302.07956  [pdf, other

    cs.LG cs.CR

    Tight Auditing of Differentially Private Machine Learning

    Authors: Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

    Abstract: Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implaus… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  12. arXiv:2301.13334  [pdf, ps, other

    math.ST cs.CR cs.DS stat.ML

    A Bias-Variance-Privacy Trilemma for Statistical Estimation

    Authors: Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

    Abstract: The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low varia… ▽ More

    Submitted 28 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  13. arXiv:2210.00597  [pdf, other

    cs.CR cs.DS cs.LG

    Composition of Differential Privacy & Privacy Amplification by Subsampling

    Authors: Thomas Steinke

    Abstract: This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications." We give an introduction to the most important property of differential privacy -- composition: running multiple independent analyses on the data of a set of people will still be differentially private as long as each of the analyses is private on its own -- as well as the related topic of… ▽ More

    Submitted 26 October, 2022; v1 submitted 2 October, 2022; originally announced October 2022.

  14. arXiv:2209.04053  [pdf, ps, other

    cs.CR cs.DS cs.LG

    Algorithms with More Granular Differential Privacy Guarantees

    Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thomas Steinke

    Abstract: Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed. In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this framework, we study several basic data analysis and l… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  15. arXiv:2202.12219  [pdf, other

    cs.LG

    Debugging Differential Privacy: A Case Study for Privacy Auditing

    Authors: Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini

    Abstract: Differential Privacy can provide provable privacy guarantees for training data in machine learning. However, the presence of proofs does not preclude the presence of errors. Inspired by recent advances in auditing which have been used for estimating lower bounds on differentially private algorithms, here we show that auditing can also be used to find flaws in (purportedly) differentially private s… ▽ More

    Submitted 28 March, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  16. arXiv:2112.00193  [pdf, other

    cs.LG cs.CR

    Public Data-Assisted Mirror Descent for Private Model Training

    Authors: Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith M. Suriyakumar, Om Thakkar, Abhradeep Thakurta

    Abstract: In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by t… ▽ More

    Submitted 27 March, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: 20 pages, 8 figures, 3 tables

  17. arXiv:2111.04609  [pdf, ps, other

    stat.ML cs.CR cs.DS cs.IT cs.LG

    A Private and Computationally-Efficient Estimator for Unbounded Gaussians

    Authors: Gautam Kamath, Argyris Mouzakis, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

    Abstract: We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is… ▽ More

    Submitted 11 February, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  18. arXiv:2110.03620  [pdf, other

    cs.LG cs.CR cs.DS

    Hyperparameter Tuning with Renyi Differential Privacy

    Authors: Nicolas Papernot, Thomas Steinke

    Abstract: For many differentially private algorithms, such as the prominent noisy stochastic gradient descent (DP-SGD), the analysis needed to bound the privacy leakage of a single training run is well understood. However, few studies have reasoned about the privacy leakage resulting from the multiple training runs needed to fine tune the value of the training algorithm's hyperparameters. In this work, we f… ▽ More

    Submitted 14 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

  19. arXiv:2106.09683  [pdf, other

    cs.LG cs.IT stat.ML

    PAC-Bayes, MAC-Bayes and Conditional Mutual Information: Fast rate bounds that handle general VC classes

    Authors: Peter Grünwald, Thomas Steinke, Lydia Zakynthinou

    Abstract: We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and gho… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 24 pages, accepted for publication at COLT 2021

  20. arXiv:2106.00001  [pdf, other

    cs.CR cs.DS cs.LG stat.CO

    Privately Learning Subspaces

    Authors: Vikrant Singhal, Thomas Steinke

    Abstract: Private data analysis suffers a costly curse of dimensionality. However, the data often has an underlying low-dimensional structure. For example, when optimizing via gradient descent, the gradients often lie in or near a low-dimensional subspace. If that low-dimensional structure can be identified, then we can avoid paying (in terms of privacy or accuracy) for the high ambient dimension. We pres… ▽ More

    Submitted 10 August, 2021; v1 submitted 28 May, 2021; originally announced June 2021.

  21. arXiv:2105.07260  [pdf, ps, other

    cs.CR

    The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise

    Authors: Zeyu Ding, Daniel Kifer, Sayed M. Saghaian N. E., Thomas Steinke, Yuxin Wang, Yingtai Xiao, Danfeng Zhang

    Abstract: The permute-and-flip mechanism is a recently proposed differentially private selection algorithm that was shown to outperform the exponential mechanism. In this paper, we show that permute-and-flip is equivalent to the well-known report noisy max algorithm with exponential noise.

    Submitted 5 June, 2021; v1 submitted 15 May, 2021; originally announced May 2021.

  22. arXiv:2102.08598  [pdf, other

    cs.LG cs.CR cs.DS

    Leveraging Public Data for Practical Private Query Release

    Authors: Terrance Liu, Giuseppe Vietri, Thomas Steinke, Jonathan Ullman, Zhiwei Steven Wu

    Abstract: In many statistical problems, incorporating priors can significantly improve performance. However, the use of prior knowledge in differentially private query release has remained underexplored, despite such priors commonly being available in the form of public datasets, such as previous US Census releases. With the goal of releasing statistics about a private dataset, we present PMW^Pub, which --… ▽ More

    Submitted 10 June, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  23. arXiv:2102.06387  [pdf, other

    cs.LG cs.DS stat.ML

    The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation

    Authors: Peter Kairouz, Ziyu Liu, Thomas Steinke

    Abstract: We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analys… ▽ More

    Submitted 8 September, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: International Conference on Machine Learning (ICML), 2021

  24. arXiv:2009.05401  [pdf, ps, other

    cs.CR

    Multi-Central Differential Privacy

    Authors: Thomas Steinke

    Abstract: Differential privacy is typically studied in the central model where a trusted "aggregator" holds the sensitive data of all the individuals and is responsible for protecting their privacy. A popular alternative is the local model in which the aggregator is untrusted and instead each individual is responsible for their own privacy. The decentralized privacy guarantee of the local model comes at a h… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: Short working paper (10 pages) - comments welcome

  25. arXiv:2007.05453  [pdf, other

    cs.LG cs.DS stat.ML

    New Oracle-Efficient Algorithms for Private Synthetic Data Release

    Authors: Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, Zhiwei Steven Wu

    Abstract: We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  26. arXiv:2006.06783  [pdf, other

    cs.CR cs.LG math.OC stat.ML

    Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient Descent

    Authors: Shuang Song, Thomas Steinke, Om Thakkar, Abhradeep Thakurta

    Abstract: We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $ε$ is the privacy para… ▽ More

    Submitted 2 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  27. arXiv:2004.00010  [pdf, other

    cs.DS cs.CR stat.ML

    The Discrete Gaussian for Differential Privacy

    Authors: Clément L. Canonne, Gautam Kamath, Thomas Steinke

    Abstract: A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical e… ▽ More

    Submitted 18 January, 2021; v1 submitted 31 March, 2020; originally announced April 2020.

    Comments: Improved time analysis, and generalisation to the multivariate case

  28. arXiv:2001.09122  [pdf, other

    cs.LG cs.CR cs.DS cs.IT stat.ML

    Reasoning About Generalization via Conditional Mutual Information

    Authors: Thomas Steinke, Lydia Zakynthinou

    Abstract: We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms. Our framework ties together existing approaches, including uniform convergence bounds and recent methods for adaptive data analysis. Specifically, we use Conditional Mutual Information (CMI) to quantify how well the input (i.e., the training data) can be recognized given the out… ▽ More

    Submitted 18 June, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

    Comments: 58 pages. Changes from previous version: Added discussion on related work and updated references. Simplified part of the proof of Theorem 4.10

  29. arXiv:1906.02830  [pdf, other

    math.ST cs.CR cs.DS

    Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation

    Authors: Mark Bun, Thomas Steinke

    Abstract: The simplest and most widely applied method for guaranteeing differential privacy is to add instance-independent noise to a statistic of interest that is scaled to its global sensitivity. However, global sensitivity is a worst-case notion that is often too conservative for realized dataset instances. We provide methods for scaling noise in an instance-dependent way and demonstrate that they provid… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

  30. arXiv:1905.13229  [pdf, ps, other

    cs.DS cs.CR cs.LG stat.ML

    Private Hypothesis Selection

    Authors: Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu

    Abstract: We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we deno… ▽ More

    Submitted 4 January, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: Appeared in NeurIPS 2019. Final version to appear in IEEE Transactions on Information Theory

  31. arXiv:1812.03224  [pdf, other

    cs.LG stat.ML

    A Hybrid Approach to Privacy-Preserving Federated Learning

    Authors: Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, Yi Zhou

    Abstract: Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained mode… ▽ More

    Submitted 14 August, 2019; v1 submitted 7 December, 2018; originally announced December 2018.

  32. arXiv:1811.03763  [pdf, ps, other

    cs.DS

    Towards Instance-Optimal Private Query Release

    Authors: Jaroslaw Blasiok, Mark Bun, Aleksandar Nikolov, Thomas Steinke

    Abstract: We study efficient mechanisms for the query release problem in differential privacy: given a workload of $m$ statistical queries, output approximate answers to the queries while satisfying the constraints of differential privacy. In particular, we are interested in mechanisms that optimally adapt to the given workload. Building on the projection mechanism of Nikolov, Talwar, and Zhang, and using t… ▽ More

    Submitted 8 November, 2018; originally announced November 2018.

    Comments: To appear in SODA 2019

  33. arXiv:1806.06100  [pdf, ps, other

    cs.LG cs.DS stat.ML

    The Limits of Post-Selection Generalization

    Authors: Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, Jonathan Ullman

    Abstract: While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of adaptivity---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure post hoc generalization (also called robust or post-select… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

  34. arXiv:1712.07196  [pdf, ps, other

    cs.LG cs.CR cs.DS cs.IT

    Calibrating Noise to Variance in Adaptive Data Analysis

    Authors: Vitaly Feldman, Thomas Steinke

    Abstract: Datasets are often used multiple times and each successive analysis may depend on the outcome of previous analyses. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. A recent line of work studies the challenges that arise from such adaptive data reuse by considering the problem of answering a sequence of "queries" about the data d… ▽ More

    Submitted 11 June, 2018; v1 submitted 19 December, 2017; originally announced December 2017.

    Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2018

  35. arXiv:1706.05069  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Generalization for Adaptively-chosen Estimators via Stable Median

    Authors: Vitaly Feldman, Thomas Steinke

    Abstract: Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this pr… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

    Comments: To appear in Conference on Learning Theory (COLT) 2017

  36. arXiv:1704.03024  [pdf, other

    cs.DS cs.CR

    Tight Lower Bounds for Differentially Private Selection

    Authors: Thomas Steinke, Jonathan Ullman

    Abstract: A pervasive task in the differential privacy literature is to select the $k$ items of "highest quality" out of a set of $d$ items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algo… ▽ More

    Submitted 10 April, 2017; originally announced April 2017.

  37. arXiv:1701.03493  [pdf, other

    cs.DM cs.DS

    Subgaussian Tail Bounds via Stability Arguments

    Authors: Thomas Steinke, Jonathan Ullman

    Abstract: Sums of independent, bounded random variables concentrate around their expectation approximately as well a Gaussian of the same variance. Well known results of this form include the Bernstein, Hoeffding, and Chernoff inequalities and many others. We present an alternative proof of these tail bounds based on what we call a stability argument, which avoids bounding the moment generating function or… ▽ More

    Submitted 21 April, 2017; v1 submitted 12 January, 2017; originally announced January 2017.

  38. arXiv:1605.02065  [pdf, other

    cs.CR cs.DS cs.IT cs.LG

    Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds

    Authors: Mark Bun, Thomas Steinke

    Abstract: "Concentrated differential privacy" was recently introduced by Dwork and Rothblum as a relaxation of differential privacy, which permits sharper analyses of many privacy-preserving computations. We present an alternative formulation of the concept of concentrated differential privacy in terms of the Renyi divergence between the distributions obtained by running an algorithm on neighboring inputs.… ▽ More

    Submitted 6 May, 2016; originally announced May 2016.

  39. arXiv:1604.04618  [pdf, other

    cs.CR cs.DS cs.LG

    Make Up Your Mind: The Price of Online Queries in Differential Privacy

    Authors: Mark Bun, Thomas Steinke, Jonathan Ullman

    Abstract: We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online:… ▽ More

    Submitted 15 April, 2016; originally announced April 2016.

  40. arXiv:1511.02513  [pdf, other

    cs.LG cs.CR cs.DS

    Algorithmic Stability for Adaptive Data Analysis

    Authors: Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, Jonathan Ullman

    Abstract: Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated the formal stu… ▽ More

    Submitted 8 November, 2015; originally announced November 2015.

    Comments: This work unifies and subsumes the two arXiv manuscripts arXiv:1503.04843 and arXiv:1504.05800

  41. arXiv:1504.04675  [pdf, other

    cs.CC

    Pseudorandomness for Read-Once, Constant-Depth Circuits

    Authors: Sitan Chen, Thomas Steinke, Salil Vadhan

    Abstract: For Boolean functions computed by read-once, depth-$D$ circuits with unbounded fan-in over the de Morgan basis, we present an explicit pseudorandom generator with seed length $\tilde{O}(\log^{D+1} n)$. The previous best seed length known for this model was $\tilde{O}(\log^{D+4} n)$, obtained by Trevisan and Xue (CCC `13) for all of $AC^0$ (not just read-once). Our work makes use of Fourier analyti… ▽ More

    Submitted 18 September, 2015; v1 submitted 17 April, 2015; originally announced April 2015.

  42. arXiv:1503.04843   

    cs.LG cs.DS

    More General Queries and Less Generalization Error in Adaptive Data Analysis

    Authors: Raef Bassily, Adam Smith, Thomas Steinke, Jonathan Ullman

    Abstract: Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC '15) and Hardt and Ullman (FOCS '14) initiated the formal stu… ▽ More

    Submitted 9 November, 2015; v1 submitted 16 March, 2015; originally announced March 2015.

    Comments: This paper was merged with another manuscript and is now subsumed by arXiv:1511.02513

  43. arXiv:1501.06095  [pdf, other

    cs.DS cs.CR cs.LG

    Between Pure and Approximate Differential Privacy

    Authors: Thomas Steinke, Jonathan Ullman

    Abstract: We show a new lower bound on the sample complexity of $(\varepsilon, δ)$-differentially private algorithms that accurately answer statistical queries on high-dimensional databases. The novelty of our bound is that it depends optimally on the parameter $δ$, which loosely corresponds to the probability that the algorithm fails to be private, and is the first to smoothly interpolate between approxima… ▽ More

    Submitted 24 January, 2015; originally announced January 2015.

  44. arXiv:1412.2457  [pdf, ps, other

    cs.CC cs.LG

    Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

    Authors: Mark Bun, Thomas Steinke

    Abstract: Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstl… ▽ More

    Submitted 8 December, 2014; originally announced December 2014.

    Comments: 22 pages

  45. arXiv:1410.1228  [pdf, other

    cs.CR cs.DS cs.LG

    Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery

    Authors: Thomas Steinke, Jonathan Ullman

    Abstract: We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given $n$ samples from an unknown distribution. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is accurate if it is "close" to the correct expectation over the d… ▽ More

    Submitted 20 February, 2015; v1 submitted 5 October, 2014; originally announced October 2014.

  46. arXiv:1405.7028  [pdf, ps, other

    cs.CC

    Pseudorandomness and Fourier Growth Bounds for Width 3 Branching Programs

    Authors: Thomas Steinke, Salil Vadhan, Andrew Wan

    Abstract: We present an explicit pseudorandom generator for oblivious, read-once, width-$3$ branching programs, which can read their input bits in any order. The generator has seed length $\tilde{O}( \log^3 n ).$ The previously best known seed length for this model is $n^{1/2+o(1)}$ due to Impagliazzo, Meka, and Zuckerman (FOCS '12). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RAN… ▽ More

    Submitted 27 May, 2014; originally announced May 2014.

    Comments: arXiv admin note: text overlap with arXiv:1306.3004

  47. arXiv:1306.3004  [pdf, ps, other

    cs.CC

    Pseudorandomness for Regular Branching Programs via Fourier Analysis

    Authors: Omer Reingold, Thomas Steinke, Salil Vadhan

    Abstract: We present an explicit pseudorandom generator for oblivious, read-once, permutation branching programs of constant width that can read their input bits in any order. The seed length is $O(\log^2 n)$, where $n$ is the length of the branching program. The previous best seed length known for this model was $n^{1/2+o(1)}$, which follows as a special case of a generator due to Impagliazzo, Meka, and Zu… ▽ More

    Submitted 19 June, 2013; v1 submitted 12 June, 2013; originally announced June 2013.

    Comments: RANDOM 2013

  48. arXiv:1102.5540  [pdf, other

    cs.DS

    Hierarchical Heavy Hitters with the Space Saving Algorithm

    Authors: Michael Mitzenmacher, Thomas Steinke, Justin Thaler

    Abstract: The Hierarchical Heavy Hitters problem extends the notion of frequent items to data arranged in a hierarchy. This problem has applications to network traffic monitoring, anomaly detection, and DDoS detection. We present a new streaming approximation algorithm for computing Hierarchical Heavy Hitters that has several advantages over previous algorithms. It improves on the worst-case time and space… ▽ More

    Submitted 9 August, 2011; v1 submitted 27 February, 2011; originally announced February 2011.

    Comments: 22 pages, 18 figures

  49. arXiv:1006.0405  [pdf, ps, other

    math.NA cs.CR cs.DS

    A Rigorous Extension of the Schönhage-Strassen Integer Multiplication Algorithm Using Complex Interval Arithmetic

    Authors: Thomas Steinke, Raazesh Sainudiin

    Abstract: Multiplication of n-digit integers by long multiplication requires O(n^2) operations and can be time-consuming. In 1970 A. Schoenhage and V. Strassen published an algorithm capable of performing the task with only O(n log(n)) arithmetic operations over the complex field C; naturally, finite-precision approximations to C are used and rounding errors need to be accounted for. Overall, using variabl… ▽ More

    Submitted 2 June, 2010; originally announced June 2010.

    Journal ref: EPTCS 24, 2010, pp. 151-159