Skip to main content

Showing 1–50 of 54 results for author: Thakurta, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16706  [pdf, other

    cs.DS cs.CC cs.CR cs.LG

    Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy

    Authors: Krishnamurthy Dvijotham, H. Brendan McMahan, Krishna Pillutla, Thomas Steinke, Abhradeep Thakurta

    Abstract: In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differential… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  2. arXiv:2402.13531  [pdf, other

    cs.LG cs.CR

    Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation

    Authors: Gavin Brown, Krishnamurthy Dvijotham, Georgina Evans, Daogao Liu, Adam Smith, Abhradeep Thakurta

    Abstract: We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss. Under modest assumptions on the input, we characterize the distribution of the iterate at each time step. Our analysis leads to new results on the algorithm's accuracy: for a proper fixed choice of hyperparameters, the sample complexity depends only linearly on… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 22 pages, 11 figures

  3. arXiv:2312.11534  [pdf, ps, other

    cs.CR cs.DS cs.LG stat.ML

    Improved Differentially Private and Lazy Online Convex Optimization

    Authors: Naman Agarwal, Satyen Kale, Karan Singh, Abhradeep Guha Thakurta

    Abstract: We study the task of $(ε, δ)$-differentially private online convex optimization (OCO). In the online setting, the release of each distinct decision or iterate carries with it the potential for privacy loss. This problem has a long history of research starting with Jain et al. [2012] and the best known results for the regime of ε not being very small are presented in Agarwal et al. [2023]. In this… ▽ More

    Submitted 20 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

  4. arXiv:2310.15526  [pdf, other

    cs.LG cs.CR

    Privacy Amplification for Matrix Mechanisms

    Authors: Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

    Abstract: Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In… ▽ More

    Submitted 4 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Appearing in ICLR 2024. Changes made to match the conference version of the paper

  5. arXiv:2310.15454  [pdf, other

    cs.LG cs.CR stat.ML

    Private Learning with Public Features

    Authors: Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang

    Abstract: We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not re… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  6. arXiv:2310.06771  [pdf, other

    cs.LG cs.AI cs.CR math.OC

    Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

    Authors: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Krishna Pillutla, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

    Abstract: Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choic… ▽ More

    Submitted 7 May, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, and Krishna Pillutla contributed equally

    Journal ref: ICLR 2024

  7. arXiv:2306.08153  [pdf, other

    cs.LG cs.CR

    (Amplified) Banded Matrix Factorization: A unified approach to private training

    Authors: Christopher A. Choquette-Choo, Arun Ganesh, Ryan McKenna, H. Brendan McMahan, Keith Rush, Abhradeep Thakurta, Zheng Xu

    Abstract: Matrix factorization (MF) mechanisms for differential privacy (DP) have substantially improved the state-of-the-art in privacy-utility-computation tradeoffs for ML applications in a variety of scenarios, but in both the centralized and federated settings there remain instances where either MF cannot be easily applied, or other algorithms provide better tradeoffs (typically, as $ε$ becomes small).… ▽ More

    Submitted 1 November, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 34 pages, 13 figures

  8. arXiv:2305.18393  [pdf, other

    cs.LG cs.CR

    Training Private Models That Know What They Don't Know

    Authors: Stephan Rabanser, Anvith Thudi, Abhradeep Thakurta, Krishnamurthy Dvijotham, Nicolas Papernot

    Abstract: Training reliable deep learning models which avoid making overconfident but incorrect predictions is a longstanding challenge. This challenge is further exacerbated when learning has to be differentially private: protection provided to sensitive data comes at the price of injecting additional randomness into the learning process. In this work, we conduct a thorough empirical investigation of selec… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  9. arXiv:2305.13209  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Faster Differentially Private Convex Optimization via Second-Order Methods

    Authors: Arun Ganesh, Mahdi Haghifam, Thomas Steinke, Abhradeep Thakurta

    Abstract: Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate D… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  10. arXiv:2304.06929  [pdf

    cs.CR

    Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

    Authors: Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang

    Abstract: In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 20… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  11. arXiv:2303.18086  [pdf, other

    cs.CR cs.DB

    Differentially Private Stream Processing at Scale

    Authors: Bing Zhang, Vadym Doroshenko, Peter Kairouz, Thomas Steinke, Abhradeep Thakurta, Ziyin Ma, Eidan Cohen, Himani Apte, Jodi Spacek

    Abstract: We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

  12. arXiv:2303.00654  [pdf, other

    cs.LG cs.CR stat.ML

    How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

    Authors: Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, Abhradeep Thakurta

    Abstract: ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP t… ▽ More

    Submitted 31 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Journal ref: Journal of Artificial Intelligence Research 77 (2023) 1113-1201

  13. arXiv:2302.09699  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks

    Authors: Arun Ganesh, Daogao Liu, Sewoong Oh, Abhradeep Thakurta

    Abstract: We consider the problem of minimizing a non-convex objective while preserving the privacy of the examples in the training data. Building upon the previous variance-reduced algorithm SpiderBoost, we introduce a new framework that utilizes two different kinds of gradient oracles. The first kind of oracles can estimate the gradient of one point, and the second kind of oracles, less precise and more c… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  14. arXiv:2302.09483  [pdf, other

    cs.LG

    Why Is Public Pretraining Necessary for Private Model Training?

    Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang

    Abstract: In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported with the use of pretraining on publicly available data. This is in part due to the benefits of transfer learning, which is the standard motivation for pretraining in non-private settings. However, the stark contrast in the improvement achieved through pretrai… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

  15. arXiv:2302.07975  [pdf, other

    cs.LG cs.CR stat.ML

    Multi-Task Differential Privacy Under Distribution Skew

    Authors: Walid Krichene, Prateek Jain, Shuang Song, Mukund Sundararajan, Abhradeep Thakurta, Li Zhang

    Abstract: We study the problem of multi-task learning under user-level differential privacy, in which $n$ users contribute data to $m$ tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy.… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  16. arXiv:2211.13403  [pdf, other

    cs.LG cs.CR cs.CV

    Differentially Private Image Classification from Features

    Authors: Harsh Mehta, Walid Krichene, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

    Abstract: Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific c… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  17. arXiv:2211.06530  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning

    Authors: Christopher A. Choquette-Choo, H. Brendan McMahan, Keith Rush, Abhradeep Thakurta

    Abstract: We introduce new differentially private (DP) mechanisms for gradient-based machine learning (ML) with multiple passes (epochs) over a dataset, substantially improving the achievable privacy-utility-computation tradeoffs. We formalize the problem of DP mechanisms for adaptive streams with multiple participations and introduce a non-trivial extension of online matrix factorization DP mechanisms to o… ▽ More

    Submitted 8 June, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 9 pages main-text, 3 figures. 40 pages with 13 figures total

  18. arXiv:2210.17520  [pdf, ps, other

    cs.CR cs.LG

    Fully Adaptive Composition for Gaussian Differential Privacy

    Authors: Adam Smith, Abhradeep Thakurta

    Abstract: We show that Gaussian Differential Privacy, a variant of differential privacy tailored to the analysis of Gaussian noise addition, composes gracefully even in the presence of a fully adaptive analyst. Such an analyst selects mechanisms (to be run on a sensitive data set) and their privacy budgets adaptively, that is, based on the answers from other mechanisms run previously on the same data set. I… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

  19. arXiv:2210.03505  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Sample-Efficient Personalization: Modeling User Parameters as Low Rank Plus Sparse Components

    Authors: Soumyabrata Pal, Prateek Varshney, Prateek Jain, Abhradeep Guha Thakurta, Gagan Madan, Gaurav Aggarwal, Pradeep Shenoy, Gaurav Srivastava

    Abstract: Personalization of machine learning (ML) predictions for individual users/domains/enterprises is critical for practical recommendation systems. Standard personalization approaches involve learning a user/domain specific embedding that is fed into a fixed global model which can be limiting. On the other hand, personalizing/fine-tuning model itself for each user/domain -- a.k.a meta-learning -- has… ▽ More

    Submitted 5 September, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: 104 pages, 7 figures, 2 Tables

  20. arXiv:2210.02156  [pdf, ps, other

    cs.LG cs.CR

    Fine-Tuning with Differential Privacy Necessitates an Additional Hyperparameter Search

    Authors: Yannis Cattan, Christopher A. Choquette-Choo, Nicolas Papernot, Abhradeep Thakurta

    Abstract: Models need to be trained with privacy-preserving learning algorithms to prevent leakage of possibly sensitive information contained in their training data. However, canonical algorithms like differentially private stochastic gradient descent (DP-SGD) do not benefit from model scale in the same way as non-private learning. This manifests itself in the form of unappealing tradeoffs between privacy… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  21. arXiv:2210.01864  [pdf, other

    cs.LG cs.CR

    Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints

    Authors: Virat Shejwalkar, Arun Ganesh, Rajiv Mathews, Om Thakkar, Abhradeep Thakurta

    Abstract: All state-of-the-art (SOTA) differentially private machine learning (DP ML) methods are iterative in nature, and their privacy analyses allow publicly releasing the intermediate training checkpoints. However, DP ML benchmarks, and even practical deployments, typically use only the final training checkpoint to make predictions. In this work, for the first time, we comprehensively explore various me… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

  22. arXiv:2207.04686  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    (Nearly) Optimal Private Linear Regression via Adaptive Clipping

    Authors: Prateek Varshney, Abhradeep Thakurta, Prateek Jain

    Abstract: We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution. We propose and analyze a one-pass mini-batch stochastic gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement. Noise is added for DP but the noise standard deviation is estimated online. Compared to existing… ▽ More

    Submitted 12 July, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 41 Pages, Accepted in the 35th Annual Conference on Learning Theory (COLT 2022)

  23. arXiv:2207.02794  [pdf, ps, other

    cs.DS cs.CR cs.LG math.MG stat.ML

    Private Matrix Approximation and Geometry of Unitary Orbits

    Authors: Oren Mangoubi, Yikai Wu, Satyen Kale, Abhradeep Guha Thakurta, Nisheeth K. Vishnoi

    Abstract: Consider the following optimization problem: Given $n \times n$ matrices $A$ and $Λ$, maximize $\langle A, UΛU^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$. This problem seeks to approximate $A$ by a matrix whose spectrum is the same as $Λ$ and, by setting $Λ$ to be appropriate diagonal matrices, one can recover matrix approximation problems such as PCA and rank-$k$ approximat… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Journal ref: Proceedings of Thirty Fifth Conference on Learning Theory (COLT), PMLR 178:3547-3588, 2022

  24. arXiv:2207.00160  [pdf, other

    cs.LG cs.CR stat.ML

    When Does Differentially Private Learning Not Suffer in High Dimensions?

    Authors: Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin Tat Lee, Abhradeep Guha Thakurta

    Abstract: Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following researc… ▽ More

    Submitted 26 October, 2022; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: 26 pages; v3 includes additional experiments and clarification

  25. arXiv:2207.00099  [pdf, other

    cs.LG

    Measuring Forgetting of Memorized Training Examples

    Authors: Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Chiyuan Zhang

    Abstract: Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what… ▽ More

    Submitted 9 May, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: Appeared at ICLR '23, 22 pages, 12 figures

  26. arXiv:2205.02973  [pdf, other

    cs.LG cs.CR cs.CV

    Large Scale Transfer Learning for Differentially Private Image Classification

    Authors: Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

    Abstract: Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Unfortunately, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private trai… ▽ More

    Submitted 20 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

  27. arXiv:2204.01585  [pdf, ps, other

    cs.LG cs.CR math.OC

    Differentially Private Sampling from Rashomon Sets, and the Universality of Langevin Diffusion for Convex Optimization

    Authors: Arun Ganesh, Abhradeep Thakurta, Jalaj Upadhyay

    Abstract: In this paper we provide an algorithmic framework based on Langevin diffusion (LD) and its corresponding discretizations that allow us to simultaneously obtain: i) An algorithm for sampling from the exponential mechanism, whose privacy analysis does not depend on convexity and which can be stopped at anytime without compromising privacy, and ii) tight uniform stability guarantees for the exponenti… ▽ More

    Submitted 28 August, 2023; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Appeared in COLT 2023. For ease of presentation, some results appear in the previous version of this paper on arXiv (v3) that do not appear in this version, nor are subsumed by results in this version. Please see Section 1.4 for more details

  28. arXiv:2202.08312  [pdf, other

    cs.LG math.OC

    Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams

    Authors: Sergey Denisov, Brendan McMahan, Keith Rush, Adam Smith, Abhradeep Guha Thakurta

    Abstract: Motivated by recent applications requiring differential privacy over adaptive streams, we investigate the question of optimal instantiations of the matrix mechanism in this setting. We prove fundamental theoretical results on the applicability of matrix factorizations to adaptive streams, and provide a parameter-free fixed-point algorithm for computing optimal factorizations. We instantiate this f… ▽ More

    Submitted 17 January, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: 33 pages, 6 figures. Associated code at https://github.com/google-research/federated/tree/master/dp_matrix_factorization

  29. arXiv:2201.12328  [pdf, other

    cs.LG

    Toward Training at ImageNet Scale with Differential Privacy

    Authors: Alexey Kurakin, Shuang Song, Steve Chien, Roxana Geambasu, Andreas Terzis, Abhradeep Thakurta

    Abstract: Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set ou… ▽ More

    Submitted 8 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: 25 pages, 7 figures. Code available at https://github.com/google-research/dp-imagenet

  30. arXiv:2112.00193  [pdf, other

    cs.LG cs.CR

    Public Data-Assisted Mirror Descent for Private Model Training

    Authors: Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith M. Suriyakumar, Om Thakkar, Abhradeep Thakurta

    Abstract: In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by t… ▽ More

    Submitted 27 March, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: 20 pages, 8 figures, 3 tables

  31. arXiv:2111.15521  [pdf, other

    cs.LG cs.CR

    Node-Level Differentially Private Graph Neural Networks

    Authors: Ameya Daigavane, Gagan Madan, Aditya Sinha, Abhradeep Guha Thakurta, Gaurav Aggarwal, Prateek Jain

    Abstract: Graph Neural Networks (GNNs) are a popular technique for modelling graph-structured data and computing node-level representations via aggregation of information from the neighborhood of each node. However, this aggregation implies an increased risk of revealing sensitive information, as a node can participate in the inference for multiple nodes. This implies that standard privacy-preserving machin… ▽ More

    Submitted 26 August, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: 20 pages, 4 figures

  32. arXiv:2107.09802  [pdf, other

    cs.LG cs.CR stat.ML

    Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates

    Authors: Steve Chien, Prateek Jain, Walid Krichene, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang

    Abstract: We study the problem of differentially private (DP) matrix completion under user-level privacy. We design a joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) the best known privacy/utility trade-off both theoretically, as well as on bench… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

  33. arXiv:2103.00039  [pdf, other

    cs.CR cs.LG

    Practical and Private (Deep) Learning without Sampling or Shuffling

    Authors: Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, Zheng Xu

    Abstract: We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in importan… ▽ More

    Submitted 10 December, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

  34. arXiv:2101.04535  [pdf, other

    cs.LG cs.CR

    Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning

    Authors: Milad Nasr, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, Nicholas Carlini

    Abstract: Differentially private (DP) machine learning allows us to train models on private data while limiting data leakage. DP formalizes this data leakage through a cryptographic game, where an adversary must predict if a model was trained on a dataset D, or a dataset D' that differs in just one example.If observing the training algorithm does not meaningfully increase the adversary's odds of successfull… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

  35. arXiv:2011.05315  [pdf, other

    cs.CR cs.CV cs.LG

    Is Private Learning Possible with Instance Encoding?

    Authors: Nicholas Carlini, Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Shuang Song, Abhradeep Thakurta, Florian Tramer

    Abstract: A private machine learning algorithm hides as much as possible about its training data while still preserving accuracy. In this work, we study whether a non-private learning algorithm can be made private by relying on an instance-encoding mechanism that modifies the training inputs before feeding them to a normal learner. We formalize both the notion of instance encoding and its privacy by providi… ▽ More

    Submitted 27 April, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

  36. arXiv:2008.06570  [pdf, ps, other

    cs.LG stat.ML

    Fast Dimension Independent Private AdaGrad on Publicly Estimated Subspaces

    Authors: Peter Kairouz, Mónica Ribero, Keith Rush, Abhradeep Thakurta

    Abstract: We revisit the problem of empirical risk minimziation (ERM) with differential privacy. We show that noisy AdaGrad, given appropriate knowledge and conditions on the subspace from which gradients can be drawn, achieves a regret comparable to traditional AdaGrad plus a well-controlled term due to noise. We show a convergence rate of $O(\text{Tr}(G_T)/T)$, where $G_T$ captures the geometry of the gra… ▽ More

    Submitted 30 January, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

  37. arXiv:2007.14191  [pdf, other

    stat.ML cs.CR cs.LG

    Tempered Sigmoid Activations for Deep Learning with Differential Privacy

    Authors: Nicolas Papernot, Abhradeep Thakurta, Shuang Song, Steve Chien, Úlfar Erlingsson

    Abstract: Because learning sometimes involves sensitive data, machine learning algorithms have been extended to offer privacy for training data. In practice, this has been mostly an afterthought, with privacy-preserving models obtained by re-running training with a different optimizer, but using the model architectures that already performed well in a non-privacy-preserving setting. This approach leads to l… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

  38. arXiv:2007.06605  [pdf, other

    cs.LG cs.CR stat.ML

    Privacy Amplification via Random Check-Ins

    Authors: Borja Balle, Peter Kairouz, H. Brendan McMahan, Om Thakkar, Abhradeep Thakurta

    Abstract: Differentially Private Stochastic Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. Two standard approaches, privacy amplification by subsampling, and privacy amplification by shuffling, permit adding lower noise in DP-SGD than via naïve schemes. A key assumption in both these approaches is that the elements in the data set can be u… ▽ More

    Submitted 30 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: Updated proof for $(ε_0, δ_0)$-DP local randomizers

  39. arXiv:2006.06783  [pdf, other

    cs.CR cs.LG math.OC stat.ML

    Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient Descent

    Authors: Shuang Song, Thomas Steinke, Om Thakkar, Abhradeep Thakurta

    Abstract: We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $ε$ is the privacy para… ▽ More

    Submitted 2 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  40. arXiv:2003.12020  [pdf, ps, other

    cs.LG cs.CR stat.ML

    A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks

    Authors: Samuel Deng, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Abhradeep Thakurta

    Abstract: Poisoning attacks have emerged as a significant security threat to machine learning algorithms. It has been demonstrated that adversaries who make small changes to the training set, such as adding specially crafted data points, can hurt the performance of the output model. Some of the stronger poisoning attacks require the full knowledge of the training data. This leaves open the possibility of ac… ▽ More

    Submitted 13 December, 2021; v1 submitted 26 March, 2020; originally announced March 2020.

  41. arXiv:2002.04049  [pdf, other

    cs.CR

    Guidelines for Implementing and Auditing Differentially Private Systems

    Authors: Daniel Kifer, Solomon Messing, Aaron Roth, Abhradeep Thakurta, Danfeng Zhang

    Abstract: Differential privacy is an information theoretic constraint on algorithms and code. It provides quantification of privacy leakage and formal privacy guarantees that are currently considered the gold standard in privacy protections. In this paper we provide an initial set of "best practices" for developing differentially private platforms, techniques for unit testing that are specific to differenti… ▽ More

    Submitted 12 May, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  42. arXiv:2001.03618  [pdf, other

    cs.CR

    Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

    Authors: Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar, Abhradeep Thakurta

    Abstract: Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

  43. arXiv:1908.09970  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Private Stochastic Convex Optimization with Optimal Rates

    Authors: Raef Bassily, Vitaly Feldman, Kunal Talwar, Abhradeep Thakurta

    Abstract: We study differentially private (DP) algorithms for stochastic convex optimization (SCO). In this problem the goal is to approximately minimize the population loss given i.i.d. samples from a distribution over convex and Lipschitz loss functions. A long line of existing work on private convex optimization focuses on the empirical loss and derives asymptotically tight bounds on the excess empirical… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

  44. arXiv:1811.12469  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

    Authors: Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Abhradeep Thakurta

    Abstract: Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to… ▽ More

    Submitted 25 July, 2020; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: Stated amplification bounds for epsilon > 1 explicitly and also stated the bounds for for Renyi DP. Fixed an incorrect statement in one of the proofs

  45. arXiv:1808.06651  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Privacy Amplification by Iteration

    Authors: Vitaly Feldman, Ilya Mironov, Kunal Talwar, Abhradeep Thakurta

    Abstract: Many commonly used learning algorithms work by iteratively updating an intermediate solution using one or a few data points in each iteration. Analysis of differential privacy for such algorithms often involves ensuring privacy of each step and then reasoning about the cumulative privacy cost of the algorithm. This is enabled by composition theorems for differential privacy that allow releasing of… ▽ More

    Submitted 10 December, 2018; v1 submitted 20 August, 2018; originally announced August 2018.

    Comments: Extended abstract appears in Foundations of Computer Science (FOCS) 2018

  46. arXiv:1803.05101  [pdf, ps, other

    cs.LG

    Model-Agnostic Private Learning via Stability

    Authors: Raef Bassily, Om Thakkar, Abhradeep Thakurta

    Abstract: We design differentially private learning algorithms that are agnostic to the learning model. Our algorithms are interactive in nature, i.e., instead of outputting a model based on the training data, they provide predictions for a set of $m$ feature vectors that arrive online. We show that, for the feature vectors on which an ensemble of models (trained on random disjoint subsets of a dataset) mak… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

  47. arXiv:1712.09765  [pdf, other

    cs.LG

    Differentially Private Matrix Completion Revisited

    Authors: Prateek Jain, Om Thakkar, Abhradeep Thakurta

    Abstract: We provide the first provably joint differentially private algorithm with formal utility guarantees for the problem of user-level privacy-preserving collaborative filtering. Our algorithm is based on the Frank-Wolfe method, and it consistently estimates the underlying preference matrix as long as the number of users $m$ is $ω(n^{5/4})$, where $n$ is the number of items, and each user provides her… ▽ More

    Submitted 11 June, 2018; v1 submitted 28 December, 2017; originally announced December 2017.

    Comments: Updated version. Accepted for presentation at International Conference on Machine Learning (ICML) 2018

  48. arXiv:1707.04982  [pdf, other

    cs.DS

    Practical Locally Private Heavy Hitters

    Authors: Raef Bassily, Kobbi Nissim, Uri Stemmer, Abhradeep Thakurta

    Abstract: We present new practical local differentially private heavy hitters algorithms achieving optimal or near-optimal worst-case error and running time -- TreeHist and Bitstogram. In both algorithms, server running time is $\tilde O(n)$ and user running time is $\tilde O(1)$, hence improving on the prior state-of-the-art result of Bassily and Smith [STOC 2015] requiring $O(n^{5/2})$ server time and… ▽ More

    Submitted 16 July, 2017; originally announced July 2017.

  49. arXiv:1607.05786  [pdf, ps, other

    cs.DS cs.CC cs.DM

    Erasure-Resilient Property Testing

    Authors: Kashyap Dixit, Sofya Raskhodnikova, Abhradeep Thakurta, Nithin Varma

    Abstract: Property testers form an important class of sublinear algorithms. In the standard property testing model, an algorithm accesses the input function via an oracle that returns function values at all queried domain points. In many realistic situations, the oracle may be unable to reveal the function values at some domain points due to privacy concerns, or when some of the values get erased by mistake… ▽ More

    Submitted 19 July, 2016; originally announced July 2016.

    MSC Class: 68W20; 68W25; 68P10; 68Q87; 68W40 ACM Class: F.2.2

  50. arXiv:1503.02031  [pdf, other

    cs.LG cs.NE stat.ML

    To Drop or Not to Drop: Robustness, Consistency and Differential Privacy Properties of Dropout

    Authors: Prateek Jain, Vivek Kulkarni, Abhradeep Thakurta, Oliver Williams

    Abstract: Training deep belief networks (DBNs) requires optimizing a non-convex function with an extremely large number of parameters. Naturally, existing gradient descent (GD) based methods are prone to arbitrarily poor local minima. In this paper, we rigorously show that such local minima can be avoided (upto an approximation error) by using the dropout technique, a widely used heuristic in this domain. I… ▽ More

    Submitted 6 March, 2015; originally announced March 2015.

    Comments: Currently under review for ICML 2015