Skip to main content

Showing 1–50 of 63 results for author: Thakurta, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01129  [pdf, ps, other

    cs.LG cs.CR

    On Design Principles for Private Adaptive Optimizers

    Authors: Arun Ganesh, Brendan McMahan, Abhradeep Thakurta

    Abstract: The spherical noise added to gradients in differentially private (DP) training undermines the performance of adaptive optimizers like AdaGrad and Adam, and hence many recent works have proposed algorithms to address this challenge. However, the empirical results in these works focus on simple tasks and models and the conclusions may not generalize to model training in practice. In this paper we su… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: PPML 2025

  2. arXiv:2506.08201  [pdf, ps, other

    cs.LG cs.CR

    Correlated Noise Mechanisms for Differentially Private Learning

    Authors: Krishna Pillutla, Jalaj Upadhyay, Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Arun Ganesh, Monika Henzinger, Jonathan Katz, Ryan McKenna, H. Brendan McMahan, Keith Rush, Thomas Steinke, Abhradeep Thakurta

    Abstract: This monograph explores the design and analysis of correlated noise mechanisms for differential privacy (DP), focusing on their application to private training of AI and machine learning models via the core primitive of estimation of weighted prefix sums. While typical DP mechanisms inject independent noise into each step of a stochastic gradient (SGD) learning algorithm in order to protect the pr… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 212 pages

  3. arXiv:2506.00201  [pdf, ps, other

    cs.CR

    Hush! Protecting Secrets During Model Training: An Indistinguishability Approach

    Authors: Arun Ganesh, Brendan McMahan, Milad Nasr, Thomas Steinke, Abhradeep Thakurta

    Abstract: We consider the problem of secret protection, in which a business or organization wishes to train a model on their own data, while attempting to not leak secrets potentially contained in that data via the model. The standard method for training models to avoid memorization of secret information is via differential privacy (DP). However, DP requires a large loss in utility or a large dataset to ach… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  4. arXiv:2411.13598  [pdf, other

    cs.CR cs.LG

    Preserving Expert-Level Privacy in Offline Reinforcement Learning

    Authors: Navodita Sharma, Vishnu Vinod, Abhradeep Thakurta, Alekh Agarwal, Borja Balle, Christoph Dann, Aravindan Raghuveer

    Abstract: The offline reinforcement learning (RL) problem aims to learn an optimal policy from historical data collected by one or more behavioural policies (experts) by interacting with an environment. However, the individual experts may be privacy-sensitive in that the learnt policy may retain information about their precise choices. In some domains like personalized retrieval, advertising and healthcare,… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  5. arXiv:2410.06266  [pdf, other

    cs.CR

    Near Exact Privacy Amplification for Matrix Mechanisms

    Authors: Christopher A. Choquette-Choo, Arun Ganesh, Saminul Haque, Thomas Steinke, Abhradeep Thakurta

    Abstract: We study the problem of computing the privacy parameters for DP machine learning when using privacy amplification via random batching and noise correlated across rounds via a correlation matrix $\textbf{C}$ (i.e., the matrix mechanism). Past work on this problem either only applied to banded $\textbf{C}$, or gave loose privacy parameters. In this work, we give a framework for computing near-exact… ▽ More

    Submitted 20 March, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: To appear in ICLR 2025

  6. arXiv:2410.06186  [pdf, other

    cs.CR cs.LG

    The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD

    Authors: Thomas Steinke, Milad Nasr, Arun Ganesh, Borja Balle, Christopher A. Choquette-Choo, Matthew Jagielski, Jamie Hayes, Abhradeep Guha Thakurta, Adam Smith, Andreas Terzis

    Abstract: We propose a simple heuristic privacy analysis of noisy clipped stochastic gradient descent (DP-SGD) in the setting where only the last iterate is released and the intermediate iterates remain hidden. Namely, our heuristic assumes a linear structure for the model. We show experimentally that our heuristic is predictive of the outcome of privacy auditing applied to various training procedures. Th… ▽ More

    Submitted 6 March, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 camera-ready version

  7. arXiv:2410.01948  [pdf, other

    cs.CR

    Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models

    Authors: Hongbin Liu, Lun Wang, Om Thakkar, Abhradeep Thakurta, Arun Narayanan

    Abstract: Large ASR models can inadvertently leak sensitive information, which can be mitigated by formal privacy measures like differential privacy (DP). However, traditional DP training is computationally expensive, and can hurt model performance. Our study explores DP parameter-efficient fine-tuning as a way to mitigate privacy risks with smaller computation and performance costs for ASR models. Through… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  8. arXiv:2409.13953  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Training Large ASR Encoders with Differential Privacy

    Authors: Geeticka Chauhan, Steve Chien, Om Thakkar, Abhradeep Thakurta, Arun Narayanan

    Abstract: Self-supervised learning (SSL) methods for large speech models have proven to be highly effective at ASR. With the interest in public deployment of large pre-trained models, there is a rising concern for unintended memorization and leakage of sensitive data points from the training data. In this paper, we apply differentially private (DP) pre-training to a SOTA Conformer-based encoder, and study i… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: In proceedings of the IEEE Spoken Language Technologies Workshop, 2024

  9. arXiv:2406.02716  [pdf, ps, other

    cs.LG cs.CR

    Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches

    Authors: Christopher A. Choquette-Choo, Arun Ganesh, Abhradeep Thakurta

    Abstract: In this paper we revisit the DP stochastic convex optimization (SCO) problem. For convex smooth losses, it is well-known that the canonical DP-SGD (stochastic gradient descent) achieves the optimal rate of $O\left(\frac{LR}{\sqrt{n}} + \frac{LR \sqrt{p \log(1/δ)}}{εn}\right)$ under $(ε, δ)$-DP, and also well-known that variants of DP-SGD can achieve the optimal rate in a single epoch. However, the… ▽ More

    Submitted 2 October, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Differences from v1: Stronger results assuming global minimizer in constraint set, more detailed comparison to past work, cut empirical section

  10. arXiv:2404.16706  [pdf, other

    cs.DS cs.CC cs.CR cs.LG

    Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy

    Authors: Krishnamurthy Dvijotham, H. Brendan McMahan, Krishna Pillutla, Thomas Steinke, Abhradeep Thakurta

    Abstract: In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differential… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2402.13531  [pdf, other

    cs.LG cs.CR

    Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation

    Authors: Gavin Brown, Krishnamurthy Dvijotham, Georgina Evans, Daogao Liu, Adam Smith, Abhradeep Thakurta

    Abstract: We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss. Under modest assumptions on the input, we characterize the distribution of the iterate at each time step. Our analysis leads to new results on the algorithm's accuracy: for a proper fixed choice of hyperparameters, the sample complexity depends only linearly on… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 22 pages, 11 figures

  12. arXiv:2312.11534  [pdf, ps, other

    cs.CR cs.DS cs.LG stat.ML

    Improved Differentially Private and Lazy Online Convex Optimization

    Authors: Naman Agarwal, Satyen Kale, Karan Singh, Abhradeep Guha Thakurta

    Abstract: We study the task of $(ε, δ)$-differentially private online convex optimization (OCO). In the online setting, the release of each distinct decision or iterate carries with it the potential for privacy loss. This problem has a long history of research starting with Jain et al. [2012] and the best known results for the regime of ε not being very small are presented in Agarwal et al. [2023]. In this… ▽ More

    Submitted 20 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

  13. arXiv:2310.15526  [pdf, other

    cs.LG cs.CR

    Privacy Amplification for Matrix Mechanisms

    Authors: Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

    Abstract: Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In… ▽ More

    Submitted 4 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Appearing in ICLR 2024. Changes made to match the conference version of the paper

  14. arXiv:2310.15454  [pdf, other

    cs.LG cs.CR stat.ML

    Private Learning with Public Features

    Authors: Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang

    Abstract: We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not re… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  15. arXiv:2310.06771  [pdf, other

    cs.LG cs.AI cs.CR math.OC

    Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

    Authors: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Krishna Pillutla, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

    Abstract: Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choic… ▽ More

    Submitted 7 May, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, and Krishna Pillutla contributed equally

    Journal ref: ICLR 2024

  16. arXiv:2306.08153  [pdf, other

    cs.LG cs.CR

    (Amplified) Banded Matrix Factorization: A unified approach to private training

    Authors: Christopher A. Choquette-Choo, Arun Ganesh, Ryan McKenna, H. Brendan McMahan, Keith Rush, Abhradeep Thakurta, Zheng Xu

    Abstract: Matrix factorization (MF) mechanisms for differential privacy (DP) have substantially improved the state-of-the-art in privacy-utility-computation tradeoffs for ML applications in a variety of scenarios, but in both the centralized and federated settings there remain instances where either MF cannot be easily applied, or other algorithms provide better tradeoffs (typically, as $ε$ becomes small).… ▽ More

    Submitted 1 November, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 34 pages, 13 figures

  17. arXiv:2305.18393  [pdf, other

    cs.LG cs.CR

    Training Private Models That Know What They Don't Know

    Authors: Stephan Rabanser, Anvith Thudi, Abhradeep Thakurta, Krishnamurthy Dvijotham, Nicolas Papernot

    Abstract: Training reliable deep learning models which avoid making overconfident but incorrect predictions is a longstanding challenge. This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selec… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  18. arXiv:2305.13209  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Faster Differentially Private Convex Optimization via Second-Order Methods

    Authors: Arun Ganesh, Mahdi Haghifam, Thomas Steinke, Abhradeep Thakurta

    Abstract: Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate D… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  19. arXiv:2304.06929  [pdf

    cs.CR

    Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

    Authors: Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang

    Abstract: In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 20… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  20. arXiv:2303.18086  [pdf, other

    cs.CR cs.DB

    Differentially Private Stream Processing at Scale

    Authors: Bing Zhang, Vadym Doroshenko, Peter Kairouz, Thomas Steinke, Abhradeep Thakurta, Ziyin Ma, Eidan Cohen, Himani Apte, Jodi Spacek

    Abstract: We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic… ▽ More

    Submitted 12 July, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

  21. arXiv:2303.00654  [pdf, other

    cs.LG cs.CR stat.ML

    How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

    Authors: Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, Abhradeep Thakurta

    Abstract: ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP t… ▽ More

    Submitted 31 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Journal ref: Journal of Artificial Intelligence Research 77 (2023) 1113-1201

  22. arXiv:2302.09699  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks

    Authors: Arun Ganesh, Daogao Liu, Sewoong Oh, Abhradeep Thakurta

    Abstract: We consider the problem of minimizing a non-convex objective while preserving the privacy of the examples in the training data. Building upon the previous variance-reduced algorithm SpiderBoost, we introduce a new framework that utilizes two different kinds of gradient oracles. The first kind of oracles can estimate the gradient of one point, and the second kind of oracles, less precise and more c… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  23. arXiv:2302.09483  [pdf, other

    cs.LG

    Why Is Public Pretraining Necessary for Private Model Training?

    Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang

    Abstract: In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported with the use of pretraining on publicly available data. This is in part due to the benefits of transfer learning, which is the standard motivation for pretraining in non-private settings. However, the stark contrast in the improvement achieved through pretrai… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  24. arXiv:2302.07975  [pdf, other

    cs.LG cs.CR stat.ML

    Multi-Task Differential Privacy Under Distribution Skew

    Authors: Walid Krichene, Prateek Jain, Shuang Song, Mukund Sundararajan, Abhradeep Thakurta, Li Zhang

    Abstract: We study the problem of multi-task learning under user-level differential privacy, in which $n$ users contribute data to $m$ tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy.… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  25. arXiv:2211.13403  [pdf, other

    cs.LG cs.CR cs.CV

    Differentially Private Image Classification from Features

    Authors: Harsh Mehta, Walid Krichene, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

    Abstract: Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific c… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  26. arXiv:2211.06530  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning

    Authors: Christopher A. Choquette-Choo, H. Brendan McMahan, Keith Rush, Abhradeep Thakurta

    Abstract: We introduce new differentially private (DP) mechanisms for gradient-based machine learning (ML) with multiple passes (epochs) over a dataset, substantially improving the achievable privacy-utility-computation tradeoffs. We formalize the problem of DP mechanisms for adaptive streams with multiple participations and introduce a non-trivial extension of online matrix factorization DP mechanisms to o… ▽ More

    Submitted 8 June, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 9 pages main-text, 3 figures. 40 pages with 13 figures total

  27. arXiv:2210.17520  [pdf, ps, other

    cs.CR cs.LG

    Fully Adaptive Composition for Gaussian Differential Privacy

    Authors: Adam Smith, Abhradeep Thakurta

    Abstract: We show that Gaussian Differential Privacy, a variant of differential privacy tailored to the analysis of Gaussian noise addition, composes gracefully even in the presence of a fully adaptive analyst. Such an analyst selects mechanisms (to be run on a sensitive data set) and their privacy budgets adaptively, that is, based on the answers from other mechanisms run previously on the same data set. I… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

  28. arXiv:2210.03505  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components

    Authors: Soumyabrata Pal, Prateek Varshney, Prateek Jain, Abhradeep Guha Thakurta, Gagan Madan, Gaurav Aggarwal, Pradeep Shenoy, Gaurav Srivastava

    Abstract: Personalization of machine learning (ML) predictions for individual users/domains/enterprises is critical for practical recommendation systems. Standard personalization approaches involve learning a user/domain specific embedding that is fed into a fixed global model which can be limiting. On the other hand, personalizing/fine-tuning model itself for each user/domain -- a.k.a meta-learning -- has… ▽ More

    Submitted 5 September, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: 104 pages, 7 figures, 2 Tables

  29. arXiv:2210.02156  [pdf, ps, other

    cs.LG cs.CR

    Fine-Tuning with Differential Privacy Necessitates an Additional Hyperparameter Search

    Authors: Yannis Cattan, Christopher A. Choquette-Choo, Nicolas Papernot, Abhradeep Thakurta

    Abstract: Models need to be trained with privacy-preserving learning algorithms to prevent leakage of possibly sensitive information contained in their training data. However, canonical algorithms like differentially private stochastic gradient descent (DP-SGD) do not benefit from model scale in the same way as non-private learning. This manifests itself in the form of unappealing tradeoffs between privacy… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  30. arXiv:2210.01864  [pdf, other

    cs.LG cs.CR

    Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints

    Authors: Virat Shejwalkar, Arun Ganesh, Rajiv Mathews, Yarong Mu, Shuang Song, Om Thakkar, Abhradeep Thakurta, Xinyi Zheng

    Abstract: In this work, we focus on improving the accuracy-variance trade-off for state-of-the-art differentially private machine learning (DP ML) methods. First, we design a general framework that uses aggregates of intermediate checkpoints \emph{during training} to increase the accuracy of DP ML techniques. Specifically, we demonstrate that training over aggregates can provide significant gains in predict… ▽ More

    Submitted 17 September, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: New results on pCVR task

  31. arXiv:2207.04686  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    (Nearly) Optimal Private Linear Regression via Adaptive Clipping

    Authors: Prateek Varshney, Abhradeep Thakurta, Prateek Jain

    Abstract: We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution. We propose and analyze a one-pass mini-batch stochastic gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement. Noise is added for DP but the noise standard deviation is estimated online. Compared to existing… ▽ More

    Submitted 12 July, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 41 Pages, Accepted in the 35th Annual Conference on Learning Theory (COLT 2022)

  32. arXiv:2207.02794  [pdf, ps, other

    cs.DS cs.CR cs.LG math.MG stat.ML

    Private Matrix Approximation and Geometry of Unitary Orbits

    Authors: Oren Mangoubi, Yikai Wu, Satyen Kale, Abhradeep Guha Thakurta, Nisheeth K. Vishnoi

    Abstract: Consider the following optimization problem: Given $n \times n$ matrices $A$ and $Λ$, maximize $\langle A, UΛU^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$. This problem seeks to approximate $A$ by a matrix whose spectrum is the same as $Λ$ and, by setting $Λ$ to be appropriate diagonal matrices, one can recover matrix approximation problems such as PCA and rank-$k$ approximat… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Journal ref: Proceedings of Thirty Fifth Conference on Learning Theory (COLT), PMLR 178:3547-3588, 2022

  33. arXiv:2207.00160  [pdf, other

    cs.LG cs.CR stat.ML

    When Does Differentially Private Learning Not Suffer in High Dimensions?

    Authors: Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin Tat Lee, Abhradeep Guha Thakurta

    Abstract: Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following researc… ▽ More

    Submitted 26 October, 2022; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: 26 pages; v3 includes additional experiments and clarification

  34. arXiv:2207.00099  [pdf, other

    cs.LG

    Measuring Forgetting of Memorized Training Examples

    Authors: Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang

    Abstract: Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what… ▽ More

    Submitted 9 May, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: Appeared at ICLR '23, 22 pages, 12 figures

  35. arXiv:2205.02973  [pdf, other

    cs.LG cs.CR cs.CV

    Large Scale Transfer Learning for Differentially Private Image Classification

    Authors: Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

    Abstract: Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Unfortunately, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private trai… ▽ More

    Submitted 20 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

  36. arXiv:2204.01585  [pdf, ps, other

    cs.LG cs.CR math.OC

    Differentially Private Sampling from Rashomon Sets, and the Universality of Langevin Diffusion for Convex Optimization

    Authors: Arun Ganesh, Abhradeep Thakurta, Jalaj Upadhyay

    Abstract: In this paper we provide an algorithmic framework based on Langevin diffusion (LD) and its corresponding discretizations that allow us to simultaneously obtain: i) An algorithm for sampling from the exponential mechanism, whose privacy analysis does not depend on convexity and which can be stopped at anytime without compromising privacy, and ii) tight uniform stability guarantees for the exponenti… ▽ More

    Submitted 28 August, 2023; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Appeared in COLT 2023. For ease of presentation, some results appear in the previous version of this paper on arXiv (v3) that do not appear in this version, nor are subsumed by results in this version. Please see Section 1.4 for more details

  37. arXiv:2202.08312  [pdf, other

    cs.LG math.OC

    Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams

    Authors: Sergey Denisov, Brendan McMahan, Keith Rush, Adam Smith, Abhradeep Guha Thakurta

    Abstract: Motivated by recent applications requiring differential privacy over adaptive streams, we investigate the question of optimal instantiations of the matrix mechanism in this setting. We prove fundamental theoretical results on the applicability of matrix factorizations to adaptive streams, and provide a parameter-free fixed-point algorithm for computing optimal factorizations. We instantiate this f… ▽ More

    Submitted 17 January, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: 33 pages, 6 figures. Associated code at https://github.com/google-research/federated/tree/master/dp_matrix_factorization

  38. arXiv:2201.12328  [pdf, other

    cs.LG

    Toward Training at ImageNet Scale with Differential Privacy

    Authors: Alexey Kurakin, Shuang Song, Steve Chien, Roxana Geambasu, Andreas Terzis, Abhradeep Thakurta

    Abstract: Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set ou… ▽ More

    Submitted 8 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: 25 pages, 7 figures. Code available at https://github.com/google-research/dp-imagenet

  39. arXiv:2112.00193  [pdf, other

    cs.LG cs.CR

    Public Data-Assisted Mirror Descent for Private Model Training

    Authors: Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith M. Suriyakumar, Om Thakkar, Abhradeep Thakurta

    Abstract: In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by t… ▽ More

    Submitted 27 March, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: 20 pages, 8 figures, 3 tables

  40. arXiv:2111.15521  [pdf, other

    cs.LG cs.CR

    Node-Level Differentially Private Graph Neural Networks

    Authors: Ameya Daigavane, Gagan Madan, Aditya Sinha, Abhradeep Guha Thakurta, Gaurav Aggarwal, Prateek Jain

    Abstract: Graph Neural Networks (GNNs) are a popular technique for modelling graph-structured data and computing node-level representations via aggregation of information from the neighborhood of each node. However, this aggregation implies an increased risk of revealing sensitive information, as a node can participate in the inference for multiple nodes. This implies that standard privacy-preserving machin… ▽ More

    Submitted 26 August, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: 20 pages, 4 figures

  41. arXiv:2107.09802  [pdf, other

    cs.LG cs.CR stat.ML

    Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates

    Authors: Steve Chien, Prateek Jain, Walid Krichene, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang

    Abstract: We study the problem of differentially private (DP) matrix completion under user-level privacy. We design a joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) the best known privacy/utility trade-off both theoretically, as well as on bench… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

  42. arXiv:2103.00039  [pdf, other

    cs.CR cs.LG

    Practical and Private (Deep) Learning without Sampling or Shuffling

    Authors: Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, Zheng Xu

    Abstract: We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in importan… ▽ More

    Submitted 10 December, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

  43. arXiv:2101.04535  [pdf, other

    cs.LG cs.CR

    Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning

    Authors: Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini

    Abstract: Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example.If observing the training algorithm does not meaningfully increase the adversary's odds of successfull… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

  44. arXiv:2011.05315  [pdf, other

    cs.CR cs.CV cs.LG

    Is Private Learning Possible with Instance Encoding?

    Authors: Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, Florian Tramer

    Abstract: A private machine learning algorithm hides as much as possible about its training data while still preserving accuracy. In this work, we study whether a non-private learning algorithm can be made private by relying on an instance-encoding mechanism that modifies the training inputs before feeding them to a normal learner. We formalize both the notion of instance encoding and its privacy by providi… ▽ More

    Submitted 27 April, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

  45. arXiv:2008.06570  [pdf, ps, other

    cs.LG stat.ML

    Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces

    Authors: Peter Kairouz, Mónica Ribero, Keith Rush, Abhradeep Thakurta

    Abstract: We revisit the problem of empirical risk minimziation (ERM) with differential privacy. We show that noisy AdaGrad, given appropriate knowledge and conditions on the subspace from which gradients can be drawn, achieves a regret comparable to traditional AdaGrad plus a well-controlled term due to noise. We show a convergence rate of $O(\text{Tr}(G_T)/T)$, where $G_T$ captures the geometry of the gra… ▽ More

    Submitted 30 January, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

  46. arXiv:2007.14191  [pdf, other

    stat.ML cs.CR cs.LG

    Tempered Sigmoid Activations for Deep Learning with Differential Privacy

    Authors: Nicolas Papernot, Abhradeep Thakurta, Shuang Song, Steve Chien, Úlfar Erlingsson

    Abstract: Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to l… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  47. arXiv:2007.06605  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy Amplification via Random Check-Ins

    Authors: Borja Balle, Peter Kairouz, H. Brendan McMahan, Om Thakkar, Abhradeep Thakurta

    Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. Two standard approaches, privacy amplification by subsampling, and privacy amplification by shuffling, permit adding lower noise in DP-SGD than via naïve schemes. A key assumption in both these approaches is that the elements in the data set can be u… ▽ More

    Submitted 30 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Updated proof for $(ε_0, δ_0)$-DP local randomizers

  48. arXiv:2006.06783  [pdf, other

    cs.CR cs.LG math.OC stat.ML

    Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient Descent

    Authors: Shuang Song, Thomas Steinke, Om Thakkar, Abhradeep Thakurta

    Abstract: We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $ε$ is the privacy para… ▽ More

    Submitted 2 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  49. arXiv:2003.12020  [pdf, ps, other

    cs.LG cs.CR stat.ML

    A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks

    Authors: Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Abhradeep Thakurta

    Abstract: Poisoning attacks have emerged as a significant security threat to machine learning algorithms. It has been demonstrated that adversaries who make small changes to the training set, such as adding specially crafted data points, can hurt the performance of the output model. Some of the stronger poisoning attacks require the full knowledge of the training data. This leaves open the possibility of ac… ▽ More

    Submitted 13 December, 2021; v1 submitted 26 March, 2020; originally announced March 2020.

  50. arXiv:2002.04049  [pdf, other

    cs.CR

    Guidelines for Implementing and Auditing Differentially Private Systems

    Authors: Daniel Kifer, Solomon Messing, Aaron Roth, Abhradeep Thakurta, Danfeng Zhang

    Abstract: Differential privacy is an information theoretic constraint on algorithms and code. It provides quantification of privacy leakage and formal privacy guarantees that are currently considered the gold standard in privacy protections. In this paper we provide an initial set of "best practices" for developing differentially private platforms, techniques for unit testing that are specific to differenti… ▽ More

    Submitted 12 May, 2020; v1 submitted 10 February, 2020; originally announced February 2020.