Skip to main content

Showing 1–50 of 67 results for author: Feldman, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.10201  [pdf, other

    cs.DS cs.CR cs.IT cs.LG

    Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

    Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar, Samson Zhou

    Abstract: We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each use… ▽ More

    Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Fixed author ordering

  2. arXiv:2312.11788  [pdf, other

    cs.LG math.OC

    Faster Convergence with Multiway Preferences

    Authors: Aadirupa Saha, Vitaly Feldman, Tomer Koren, Yishay Mansour

    Abstract: We address the problem of convex optimization with preference feedback, where the goal is to minimize a convex function given a weaker form of comparison queries. Each query consists of two points and the dueling feedback returns a (noisy) single-bit binary comparison of the function values of the two queried points. Here we consider the sign-function-based comparison feedback model and analyze th… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2310.00098  [pdf, other

    cs.LG cs.CR stat.ML

    Federated Learning with Differential Privacy for End-to-End Speech Recognition

    Authors: Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Tatiana Likhomanenko

    Abstract: While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: Under review

  4. arXiv:2307.15835  [pdf, ps, other

    cs.CR cs.DS cs.LG stat.ML

    Mean Estimation with User-level Privacy under Data Heterogeneity

    Authors: Rachel Cummings, Vitaly Feldman, Audra McMillan, Kunal Talwar

    Abstract: A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Conference version published at NeurIPS 2022

  5. arXiv:2307.15017  [pdf, other

    cs.CR cs.LG

    Samplable Anonymous Aggregation for Private Federated Data Analysis

    Authors: Kunal Talwar, Shan Wang, Audra McMillan, Vojta Jina, Vitaly Feldman, Bailey Basile, Aine Cahill, Yi Sheng Chan, Mike Chatzidakis, Junye Chen, Oliver Chick, Mona Chitnis, Suman Ganta, Yusuf Goren, Filip Granqvist, Kristine Guo, Frederic Jacobs, Omid Javidbakht, Albert Liu, Richard Low, Dan Mascenik, Steve Myers, David Park, Wonhee Park, Gianni Parsa , et al. (11 additional authors not shown)

    Abstract: We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Our first contribution is to propose a simple primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust as… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: 24 pages

  6. arXiv:2307.11749  [pdf, other

    cs.LG cs.CR

    Differentially Private Heavy Hitter Detection using Federated Analytics

    Authors: Karan Chadha, Junye Chen, John Duchi, Vitaly Feldman, Hanieh Hashemi, Omid Javidbakht, Audra McMillan, Kunal Talwar

    Abstract: In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

  7. arXiv:2306.04444  [pdf, other

    cs.LG cs.CR stat.ML

    Fast Optimal Locally Private Mean Estimation via Random Projections

    Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar

    Abstract: We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity… ▽ More

    Submitted 26 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Added the correct github link

  8. arXiv:2302.14154  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime

    Authors: Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

    Abstract: We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds. For the problem of online prediction from experts, we design new algorithms that obtain near-optimal regret ${O} \big( \varepsilon^{-1} \log^{1.5}{d} \big)$ where $d$ is the number of experts. This signif… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  9. arXiv:2211.10082  [pdf, other

    cs.CR

    Private Federated Statistics in an Interactive Setting

    Authors: Audra McMillan, Omid Javidbakht, Kunal Talwar, Elliot Briggs, Mike Chatzidakis, Junye Chen, John Duchi, Vitaly Feldman, Yusuf Goren, Michael Hesse, Vojta Jina, Anil Katti, Albert Liu, Cheney Lyford, Joey Meyer, Alex Palmer, David Park, Wonhee Park, Gianni Parsa, Paul Pelzl, Rehan Rishi, Congzheng Song, Shan Wang, Shundong Zhou

    Abstract: Privately learning statistics of events on devices can enable improved user experience. Differentially private algorithms for such problems can benefit significantly from interactivity. We argue that an aggregation protocol can enable an interactive private federated statistics system where user's devices maintain control of the privacy assurance. We describe the architecture of such a system, and… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  10. arXiv:2210.13537  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Private Online Prediction from Experts: Separations and Faster Rates

    Authors: Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

    Abstract: Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of… ▽ More

    Submitted 29 June, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Removed the results for the realizable setting which we uploaded with additional results for that setting in a separate paper. Added a proof sketch for the lower bound

  11. arXiv:2210.13497  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Subspace Recovery from Heterogeneous Data with Non-isotropic Noise

    Authors: John Duchi, Vitaly Feldman, Lunjia Hu, Kunal Talwar

    Abstract: Recovering linear subspaces from data is a fundamental and important task in statistics and machine learning. Motivated by heterogeneity in Federated Learning settings, we study a basic formulation of this problem: the principal component analysis (PCA), with a focus on dealing with irregular noise. Our data come from $n$ users with user $i$ contributing data samples from a $d$-dimensional distrib… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: In NeurIPS 2022

  12. arXiv:2209.14987  [pdf, other

    cs.LG cs.CR

    No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy"

    Authors: Nicholas Carlini, Vitaly Feldman, Milad Nasr

    Abstract: New methods designed to preserve data privacy require careful scrutiny. Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a ``privacy-preserving'' method is attacked. A recent work selected for an Outstanding Paper Award at ICML 2022 (Dong et al., 2022) claims that dataset condensation (DC) significantly improves data privacy when tr… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  13. arXiv:2208.04591  [pdf, other

    cs.CR cs.DS cs.LG stat.ML

    Stronger Privacy Amplification by Shuffling for Rényi and Approximate Differential Privacy

    Authors: Vitaly Feldman, Audra McMillan, Kunal Talwar

    Abstract: The shuffle model of differential privacy has gained significant interest as an intermediate trust model between the standard local and central models [EFMRTT19; CSUZZ19]. A key result in this model is that randomly shuffling locally randomized data amplifies differential privacy guarantees. Such amplification implies substantially stronger privacy guarantees for systems in which data is contribut… ▽ More

    Submitted 30 October, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Errata added. 14 pages, 4 figures

  14. arXiv:2205.02466  [pdf, other

    cs.LG cs.CR

    Optimal Algorithms for Mean Estimation under Local Differential Privacy

    Authors: Hilal Asi, Vitaly Feldman, Kunal Talwar

    Abstract: We study the problem of mean estimation of $\ell_2$-bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the asymptotically optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

  15. arXiv:2203.00194  [pdf, other

    cs.CR cs.DS cs.LG

    Private Frequency Estimation via Projective Geometry

    Authors: Vitaly Feldman, Jelani Nelson, Huy Lê Nguyen, Kunal Talwar

    Abstract: In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For a universe size of $k$ and with $n$ users, our $\varepsilon$-LDP algorithm has communication cost $\lceil\log_2k\rceil$ bits in the private coin setting and $\varepsilon\log_2 e + O(1)$ in the public coin setting, and has computation cost… ▽ More

    Submitted 28 February, 2022; originally announced March 2022.

  16. arXiv:2103.01516  [pdf, ps, other

    cs.LG cs.CR math.OC stat.ML

    Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry

    Authors: Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

    Abstract: Stochastic convex optimization over an $\ell_1$-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any $(\varepsilon,δ)$-differentially private optimizer is $\sqrt{\log(d)/n} + \sqrt{d}/\varepsilon n.$ The upper bound is based… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

  17. arXiv:2102.12099  [pdf, other

    cs.CR cs.DS cs.LG

    Lossless Compression of Efficient Private Local Randomizers

    Authors: Vitaly Feldman, Kunal Talwar

    Abstract: Locally Differentially Private (LDP) Reports are commonly used for collection of statistics and machine learning in the federated setting. In many cases the best known LDP algorithms require sending prohibitively large messages from the client device to the server (such as when constructing histograms over large domain or learning a high-dimensional model). This has led to significant efforts on r… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  18. arXiv:2012.12803  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling

    Authors: Vitaly Feldman, Audra McMillan, Kunal Talwar

    Abstract: Recent work of Erlingsson, Feldman, Mironov, Raghunathan, Talwar, and Thakurta [EFMRTT19] demonstrates that random shuffling amplifies differential privacy guarantees of locally randomized data. Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously [BEMMRLRKTS17] and has lead to significant interest in the shuffle model of privacy… ▽ More

    Submitted 7 September, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: Updated to include numerical experiments for Renyi differential privacy

  19. When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning?

    Authors: Gavin Brown, Mark Bun, Vitaly Feldman, Adam Smith, Kunal Talwar

    Abstract: Modern machine learning models are complex and frequently encode surprising amounts of information about individual inputs. In extreme cases, complex models appear to memorize entire input examples, including seemingly irrelevant information (social security numbers from text, for example). In this paper, we aim to understand whether this sort of memorization is necessary for accurate learning. We… ▽ More

    Submitted 21 July, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

    Journal ref: STOC 2021 Pages 123-132

  20. arXiv:2008.11193  [pdf, other

    cs.CR cs.LG stat.ML

    Individual Privacy Accounting via a Renyi Filter

    Authors: Vitaly Feldman, Tijana Zrnic

    Abstract: We consider a sequential setting in which a single dataset of individuals is used to perform adaptively-chosen analyses, while ensuring that the differential privacy loss of each participant does not exceed a pre-specified privacy budget. The standard approach to this problem relies on bounding a worst-case estimate of the privacy loss over all individuals and all possible values of their data, fo… ▽ More

    Submitted 8 January, 2022; v1 submitted 25 August, 2020; originally announced August 2020.

  21. arXiv:2008.03703  [pdf, other

    cs.LG stat.ML

    What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation

    Authors: Vitaly Feldman, Chiyuan Zhang

    Abstract: Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical explanatio… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

  22. arXiv:2006.06914  [pdf, ps, other

    cs.LG math.OC stat.ML

    Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

    Authors: Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, Kunal Talwar

    Abstract: Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important prog… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: 32 pages

    MSC Class: 90-08 ACM Class: F.2.1; G.1.6; G.3

  23. arXiv:2006.06021  [pdf

    cs.SI cs.CY physics.soc-ph

    Modeling and Simulation of COVID-19 Pandemic for Cincinnati Tri-State Area

    Authors: Michael Rechtin, Vince Feldman, Sam Klare, Nathan Riddle, Rajnikant Sharma

    Abstract: In this paper, we use SIR model to simulate the COVID-19 pandemic for Cincinnati Tri-State Area. We have built a representative population of Cincinnati that includes movements for traveling to stores, schools, workplaces, and traveling to friends houses. Using this model, we simulate the effect of quarantine, return to work, and panic buying. We show that that there will be a second wave of infec… ▽ More

    Submitted 15 June, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

  24. arXiv:2005.04763  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Private Stochastic Convex Optimization: Optimal Rates in Linear Time

    Authors: Vitaly Feldman, Tomer Koren, Kunal Talwar

    Abstract: We study differentially private (DP) algorithms for stochastic convex optimization: the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. (2019) has established the optimal bound on the excess population loss achievable given $n$ samples. Unfortunately, their algorithm achieving this bound is relatively in… ▽ More

    Submitted 10 May, 2020; originally announced May 2020.

  25. arXiv:2001.03618  [pdf, other

    cs.CR

    Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

    Authors: Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar, Abhradeep Thakurta

    Abstract: Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

  26. arXiv:1911.10541  [pdf, ps, other

    cs.LG cs.CR cs.DS stat.ML

    PAC learning with stable and private predictions

    Authors: Yuval Dagan, Vitaly Feldman

    Abstract: We study binary classification algorithms for which the prediction on any point is not too sensitive to individual examples in the dataset. Specifically, we consider the notions of uniform stability (Bousquet and Elisseeff, 2001) and prediction privacy (Dwork and Feldman, 2018). Previous work on these notions shows how they can be achieved in the standard PAC model via simple aggregation of models… ▽ More

    Submitted 23 September, 2020; v1 submitted 24 November, 2019; originally announced November 2019.

  27. arXiv:1911.04014  [pdf, ps, other

    cs.LG cs.CR cs.DS stat.ML

    Interaction is necessary for distributed learning with privacy or communication constraints

    Authors: Yuval Dagan, Vitaly Feldman

    Abstract: Local differential privacy (LDP) is a model where users send privatized data to an untrusted central server whose goal it to solve some data analysis task. In the non-interactive version of this model the protocol consists of a single round in which a server sends requests to all users then receives their responses. This version is deployed in industry due to its practical advantages and has attra… ▽ More

    Submitted 23 September, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

  28. arXiv:1908.09970  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Private Stochastic Convex Optimization with Optimal Rates

    Authors: Raef Bassily, Vitaly Feldman, Kunal Talwar, Abhradeep Thakurta

    Abstract: We study differentially private (DP) algorithms for stochastic convex optimization (SCO). In this problem the goal is to approximately minimize the population loss given i.i.d. samples from a distribution over convex and Lipschitz loss functions. A long line of existing work on private convex optimization focuses on the empirical loss and derives asymptotically tight bounds on the excess empirical… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

  29. arXiv:1906.05271  [pdf, other

    cs.LG stat.ML

    Does Learning Require Memorization? A Short Tale about a Long Tail

    Authors: Vitaly Feldman

    Abstract: State-of-the-art results on image recognition tasks are achieved using over-parameterized learning algorithms that (nearly) perfectly fit the training set and are known to fit well even random labels. This tendency to memorize the labels of the training data is not explained by existing theoretical analyses. Memorization of the training data also presents significant privacy risks when the trainin… ▽ More

    Submitted 10 January, 2021; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Significant revision: revised introduction/overview; added formal treatment of noise in the labels and explanation for the disparate effects of limiting memorization

  30. arXiv:1905.10360  [pdf, other

    cs.LG cs.DS stat.ML

    The advantages of multiple classes for reducing overfitting from test set reuse

    Authors: Vitaly Feldman, Roy Frostig, Moritz Hardt

    Abstract: Excessive reuse of holdout data can lead to overfitting. However, there is little concrete evidence of significant overfitting due to holdout reuse in popular multiclass benchmarks today. Known results show that, in the worst-case, revealing the accuracy of $k$ adaptively chosen classifiers on a data set of size $n$ allows to create a classifier with bias of $Θ(\sqrt{k/n})$ for any binary predicti… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

  31. arXiv:1902.10710  [pdf, ps, other

    cs.LG cs.DS stat.ML

    High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

    Authors: Vitaly Feldman, Jan Vondrak

    Abstract: Algorithmic stability is a classical approach to understanding and analysis of the generalization error of learning algorithms. A notable weakness of most stability-based generalization bounds is that they hold only in expectation. Generalization with high probability has been established in a landmark paper of Bousquet and Elisseeff (2002) albeit at the expense of an additional $\sqrt{n}$ factor… ▽ More

    Submitted 23 June, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: this is a follow-up to and has minor text overlap with arXiv:1812.09859; v2: minor revision following acceptance for presentation at COLT 2019

  32. arXiv:1812.09859  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Generalization Bounds for Uniformly Stable Algorithms

    Authors: Vitaly Feldman, Jan Vondrak

    Abstract: Uniform stability of a learning algorithm is a classical notion of algorithmic stability introduced to derive high-probability bounds on the generalization error (Bousquet and Elisseeff, 2002). Specifically, for a loss function with range bounded in $[0,1]$, the generalization error of a $γ$-uniformly stable learning algorithm on $n$ samples is known to be within $O((γ+1/n) \sqrt{n \log(1/δ)})$ of… ▽ More

    Submitted 18 March, 2019; v1 submitted 24 December, 2018; originally announced December 2018.

    Comments: Appeared in Neural Information Processing Systems (NeurIPS), 2018

  33. arXiv:1811.12469  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity

    Authors: Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Abhradeep Thakurta

    Abstract: Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to… ▽ More

    Submitted 25 July, 2020; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: Stated amplification bounds for epsilon > 1 explicitly and also stated the bounds for for Renyi DP. Fixed an incorrect statement in one of the proofs

  34. arXiv:1809.09165  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Locally Private Learning without Interaction Requires Separation

    Authors: Amit Daniely, Vitaly Feldman

    Abstract: We consider learning under the constraint of local differential privacy (LDP). For many learning problems known efficient algorithms in this model require many rounds of communication between the server and the clients holding the data points. Yet multi-round protocols are prohibitively slow in practice due to network latency and, as a result, currently deployed large-scale systems are limited to… ▽ More

    Submitted 28 October, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

  35. arXiv:1808.06651  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Privacy Amplification by Iteration

    Authors: Vitaly Feldman, Ilya Mironov, Kunal Talwar, Abhradeep Thakurta

    Abstract: Many commonly used learning algorithms work by iteratively updating an intermediate solution using one or a few data points in each iteration. Analysis of differential privacy for such algorithms often involves ensuring privacy of each step and then reasoning about the cumulative privacy cost of the algorithm. This is enabled by composition theorems for differential privacy that allow releasing of… ▽ More

    Submitted 10 December, 2018; v1 submitted 20 August, 2018; originally announced August 2018.

    Comments: Extended abstract appears in Foundations of Computer Science (FOCS) 2018

  36. arXiv:1803.10266  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Privacy-preserving Prediction

    Authors: Cynthia Dwork, Vitaly Feldman

    Abstract: Ensuring differential privacy of models learned from sensitive user data is an important goal that has been studied extensively in recent years. It is now known that for some basic learning problems, especially those involving high-dimensional data, producing an accurate private model requires much more data than learning without privacy. At the same time, in many applications it is not necessary… ▽ More

    Submitted 8 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2018

  37. arXiv:1803.04307  [pdf, ps, other

    cs.LG

    The Everlasting Database: Statistical Validity at a Fair Price

    Authors: Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nathan Srebro

    Abstract: The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional s… ▽ More

    Submitted 2 April, 2019; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: 22 pages, accepted to NeurIPS 2018

  38. arXiv:1712.07196  [pdf, ps, other

    cs.LG cs.CR cs.DS cs.IT

    Calibrating Noise to Variance in Adaptive Data Analysis

    Authors: Vitaly Feldman, Thomas Steinke

    Abstract: Datasets are often used multiple times and each successive analysis may depend on the outcome of previous analyses. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. A recent line of work studies the challenges that arise from such adaptive data reuse by considering the problem of answering a sequence of "queries" about the data d… ▽ More

    Submitted 11 June, 2018; v1 submitted 19 December, 2017; originally announced December 2017.

    Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2018

  39. arXiv:1706.05069  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Generalization for Adaptively-chosen Estimators via Stable Median

    Authors: Vitaly Feldman, Thomas Steinke

    Abstract: Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this pr… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

    Comments: To appear in Conference on Learning Theory (COLT) 2017

  40. arXiv:1703.00066  [pdf, ps, other

    cs.LG cs.DS

    On the Power of Learning from $k$-Wise Queries

    Authors: Vitaly Feldman, Badih Ghazi

    Abstract: Several well-studied models of access to data samples, including statistical queries, local differential privacy and low-communication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of $Ex_{x \sim D}[q(x)]$ for any choice of the query function $q$ mapping $X$ to the reals, where $D$ is an unknown dat… ▽ More

    Submitted 28 February, 2017; originally announced March 2017.

    Comments: 32 pages, Appeared in Innovations in Theoretical Computer Science (ITCS) 2017

  41. arXiv:1611.06475  [pdf, ps, other

    cs.LG stat.ML

    Dealing with Range Anxiety in Mean Estimation via Statistical Queries

    Authors: Vitaly Feldman

    Abstract: We give algorithms for estimating the expectation of a given real-valued function $φ:X\to {\bf R}$ on a sample drawn randomly from some unknown distribution $D$ over domain $X$, namely ${\bf E}_{{\bf x}\sim D}[φ({\bf x})]$. Our algorithms work in two well-studied models of restricted access to data samples. The first one is the statistical query (SQ) model in which an algorithm has access to an SQ… ▽ More

    Submitted 25 August, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

  42. Direct-dynamical entanglement-discord relations

    Authors: Virginia Feldman, Jonas Maziero, A. Auyuanet

    Abstract: In this article, by considering Bell-diagonal two-qubit initial states submitted to local dynamics generated by the phase damping, bit flip, phase flip, bit-phase flip, and depolarizing channels, we report some elegant direct-dynamical relations between geometric measures of entanglement and discord. The complex scenario appearing already in this simplified case study indicates that similarly simp… ▽ More

    Submitted 4 April, 2017; v1 submitted 27 October, 2016; originally announced October 2016.

    Comments: 7 pages, 3 figures, Published version

    Journal ref: Quantum Information Processing(2017)16:128

  43. arXiv:1608.04414  [pdf, other

    cs.LG stat.ML

    Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back

    Authors: Vitaly Feldman

    Abstract: In stochastic convex optimization the goal is to minimize a convex function $F(x) \doteq {\mathbf E}_{{\mathbf f}\sim D}[{\mathbf f}(x)]$ over a convex set $\cal K \subset {\mathbb R}^d$ where $D$ is some unknown distribution and each $f(\cdot)$ in the support of $D$ is convex over $\cal K$. The optimization is commonly based on i.i.d.~samples $f^1,f^2,\ldots,f^n$ from $D$. A standard approach to… ▽ More

    Submitted 26 December, 2016; v1 submitted 15 August, 2016; originally announced August 2016.

    Comments: Added illustrations of functions used in some of the constructions

  44. arXiv:1608.02198  [pdf, ps, other

    cs.LG cs.CC stat.ML

    A General Characterization of the Statistical Query Complexity

    Authors: Vitaly Feldman

    Abstract: Statistical query (SQ) algorithms are algorithms that have access to an {\em SQ oracle} for the input distribution $D$ instead of i.i.d.~ samples from $D$. Given a query function $φ:X \rightarrow [-1,1]$, the oracle returns an estimate of ${\bf E}_{ x\sim D}[φ(x)]$ within some tolerance $τ_φ$ that roughly corresponds to the number of samples. In this work we demonstrate that the complexity of so… ▽ More

    Submitted 17 April, 2017; v1 submitted 7 August, 2016; originally announced August 2016.

    Comments: Minor revision

  45. arXiv:1512.09170  [pdf, ps, other

    cs.LG cs.DS

    Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

    Authors: Vitaly Feldman, Cristobal Guzman, Santosh Vempala

    Abstract: Stochastic convex optimization, where the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular fi… ▽ More

    Submitted 21 November, 2016; v1 submitted 30 December, 2015; originally announced December 2015.

    Comments: Substantial revision. To appear in SODA 2017

  46. arXiv:1506.02629  [pdf, other

    cs.LG cs.DS

    Generalization in Adaptive Data Analysis and Holdout Reuse

    Authors: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth

    Abstract: Overfitting is the bane of data analysts, even when data are plentiful. Formal approaches to understanding this problem focus on statistical inference and generalization of individual analysis procedures. Yet the practice of data analysis is an inherently interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on th… ▽ More

    Submitted 25 September, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

  47. arXiv:1504.03391  [pdf, ps, other

    cs.DS cs.LG

    Tight Bounds on Low-degree Spectral Concentration of Submodular and XOS functions

    Authors: Vitaly Feldman, Jan Vondrak

    Abstract: Submodular and fractionally subadditive (or equivalently XOS) functions play a fundamental role in combinatorial optimization, algorithmic game theory and machine learning. Motivated by learnability of these classes of functions from random examples, we consider the question of how well such functions can be approximated by low-degree polynomials in $\ell_2$ norm over the uniform distribution. Thi… ▽ More

    Submitted 2 August, 2015; v1 submitted 13 April, 2015; originally announced April 2015.

  48. arXiv:1501.02911  [pdf, ps, other

    cs.DS

    Sorting and Selection with Imprecise Comparisons

    Authors: Miklos Ajtai, Vitaly Feldman, Avinatan Hassidim, Jelani Nelson

    Abstract: We consider a simple model of imprecise comparisons: there exists some $δ>0$ such that when a subject is given two elements to compare, if the values of those elements (as perceived by the subject) differ by at least $δ$, then the comparison will be made correctly; when the two elements have values that are within $δ$, the outcome of the comparison is unpredictable. This model is inspired by both… ▽ More

    Submitted 13 January, 2015; originally announced January 2015.

    ACM Class: F.2.2; G.2.2

  49. arXiv:1411.2664  [pdf, ps, other

    cs.LG cs.DS

    Preserving Statistical Validity in Adaptive Data Analysis

    Authors: Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth

    Abstract: A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of sophisticated validation techniques, to deep statistical methods for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of data analysis: the theory of statistical inference ass… ▽ More

    Submitted 2 March, 2016; v1 submitted 10 November, 2014; originally announced November 2014.

    Comments: Updated related work with recent developments

  50. arXiv:1407.2774  [pdf, other

    cs.DS math.CO math.PR

    Subsampled Power Iteration: a Unified Algorithm for Block Models and Planted CSP's

    Authors: Vitaly Feldman, Will Perkins, Santosh Vempala

    Abstract: We present an algorithm for recovering planted solutions in two well-known models, the stochastic block model and planted constraint satisfaction problems, via a common generalization in terms of random bipartite graphs. Our algorithm matches up to a constant factor the best-known bounds for the number of edges (or constraints) needed for perfect recovery and its running time is linear in the numb… ▽ More

    Submitted 28 April, 2015; v1 submitted 10 July, 2014; originally announced July 2014.