-
Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
Authors:
Hilal Asi,
Vitaly Feldman,
Jelani Nelson,
Huy L. Nguyen,
Kunal Talwar,
Samson Zhou
Abstract:
We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each use…
▽ More
We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send $Ω(\min(n\varepsilon^2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $\mathcal{O}(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that any single-message protocol must incur mean squared error $Ω(dn^{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = Θ(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.
△ Less
Submitted 25 April, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
Authors:
Aadirupa Saha,
Hilal Asi
Abstract:
We consider the well-studied dueling bandit problem, where a learner aims to identify near-optimal actions using pairwise comparisons, under the constraint of differential privacy. We consider a general class of utility-based preference matrices for large (potentially unbounded) decision spaces and give the first differentially private dueling bandit algorithm for active learning with user prefere…
▽ More
We consider the well-studied dueling bandit problem, where a learner aims to identify near-optimal actions using pairwise comparisons, under the constraint of differential privacy. We consider a general class of utility-based preference matrices for large (potentially unbounded) decision spaces and give the first differentially private dueling bandit algorithm for active learning with user preferences. Our proposed algorithms are computationally efficient with near-optimal performance, both in terms of the private and non-private regret bound. More precisely, we show that when the decision space is of finite size $K$, our proposed algorithm yields order optimal $O\Big(\sum_{i = 2}^K\log\frac{KT}{Δ_i} + \frac{K}ε\Big)$ regret bound for pure $ε$-DP, where $Δ_i$ denotes the suboptimality gap of the $i$-th arm. We also present a matching lower bound analysis which proves the optimality of our algorithms. Finally, we extend our results to any general decision space in $d$-dimensions with potentially infinite arms and design an $ε$-DP algorithm with regret $\tilde{O} \left( \frac{d^6}{κε} + \frac{ d\sqrt{T }}κ \right)$, providing privacy for free when $T \gg d$.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates
Authors:
Hilal Asi,
Daogao Liu
Abstract:
We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy, where each user may hold multiple data items. Existing work for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al. (2023)] or requires the number of users to grow polynomially with the dimensionality of the problem with additional strict assumptions [Bassily et al. (2023)]. We de…
▽ More
We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy, where each user may hold multiple data items. Existing work for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al. (2023)] or requires the number of users to grow polynomially with the dimensionality of the problem with additional strict assumptions [Bassily et al. (2023)]. We develop new algorithms for user-level DP-SCO that obtain optimal rates for both convex and strongly convex functions in polynomial time and require the number of users to grow only logarithmically in the dimension. Moreover, our algorithms are the first to obtain optimal rates for non-smooth functions in polynomial time. These algorithms are based on multiple-pass DP-SGD, combined with a novel private mean estimation procedure for concentrated data, which applies an outlier removal step before estimating the mean of the gradients.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Fast Optimal Locally Private Mean Estimation via Random Projections
Authors:
Hilal Asi,
Vitaly Feldman,
Jelani Nelson,
Huy L. Nguyen,
Kunal Talwar
Abstract:
We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity…
▽ More
We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a $1+o(1)$-factor. Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm such as PrivUnitG in the lower-dimensional space. In addition, we show that, by appropriately correlating the random projection matrices across devices, we can achieve fast server run-time. We mathematically analyze the error of the algorithm in terms of properties of the random projections, and study two instantiations. Lastly, our experiments for private mean estimation and private federated learning demonstrate that our algorithms empirically obtain nearly the same utility as optimal ones while having significantly lower communication and computational cost.
△ Less
Submitted 26 June, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime
Authors:
Hilal Asi,
Vitaly Feldman,
Tomer Koren,
Kunal Talwar
Abstract:
We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds. For the problem of online prediction from experts, we design new algorithms that obtain near-optimal regret ${O} \big( \varepsilon^{-1} \log^{1.5}{d} \big)$ where $d$ is the number of experts. This signif…
▽ More
We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds. For the problem of online prediction from experts, we design new algorithms that obtain near-optimal regret ${O} \big( \varepsilon^{-1} \log^{1.5}{d} \big)$ where $d$ is the number of experts. This significantly improves over the best existing regret bounds for the DP non-realizable setting which are ${O} \big( \varepsilon^{-1} \min\big\{d, T^{1/3}\log d\big\} \big)$. We also develop an adaptive algorithm for the small-loss setting with regret $O(L^\star\log d + \varepsilon^{-1} \log^{1.5}{d})$ where $L^\star$ is the total loss of the best expert. Additionally, we consider DP online convex optimization in the realizable setting and propose an algorithm with near-optimal regret $O \big(\varepsilon^{-1} d^{1.5} \big)$, as well as an algorithm for the smooth case with regret $O \big( \varepsilon^{-2/3} (dT)^{1/3} \big)$, both significantly improving over existing bounds in the non-realizable regime.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
From Robustness to Privacy and Back
Authors:
Hilal Asi,
Jonathan Ullman,
Lydia Zakynthinou
Abstract:
We study the relationship between two desiderata of algorithms in statistical inference and machine learning: differential privacy and robustness to adversarial data corruptions. Their conceptual similarity was first observed by Dwork and Lei (STOC 2009), who observed that private algorithms satisfy robustness, and gave a general method for converting robust algorithms to private ones. However, al…
▽ More
We study the relationship between two desiderata of algorithms in statistical inference and machine learning: differential privacy and robustness to adversarial data corruptions. Their conceptual similarity was first observed by Dwork and Lei (STOC 2009), who observed that private algorithms satisfy robustness, and gave a general method for converting robust algorithms to private ones. However, all general methods for transforming robust algorithms into private ones lead to suboptimal error rates. Our work gives the first black-box transformation that converts any adversarially robust algorithm into one that satisfies pure differential privacy. Moreover, we show that for any low-dimensional estimation task, applying our transformation to an optimal robust estimator results in an optimal private estimator. Thus, we conclude that for any low-dimensional task, the optimal error rate for $\varepsilon$-differentially private estimators is essentially the same as the optimal error rate for estimators that are robust to adversarially corrupting $1/\varepsilon$ training samples. We apply our transformation to obtain new optimal private estimators for several high-dimensional tasks, including Gaussian (sparse) linear regression and PCA. Finally, we present an extension of our transformation that leads to approximate differentially private algorithms whose error does not depend on the range of the output space, which is impossible under pure differential privacy.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Private optimization in the interpolation regime: faster rates and hardness results
Authors:
Hilal Asi,
Karan Chadha,
Gary Cheng,
John Duchi
Abstract:
In non-private stochastic convex optimization, stochastic gradient methods converge much faster on interpolation problems -- problems where there exists a solution that simultaneously minimizes all of the sample losses -- than on non-interpolating ones; we show that generally similar improvements are impossible in the private setting. However, when the functions exhibit quadratic growth around the…
▽ More
In non-private stochastic convex optimization, stochastic gradient methods converge much faster on interpolation problems -- problems where there exists a solution that simultaneously minimizes all of the sample losses -- than on non-interpolating ones; we show that generally similar improvements are impossible in the private setting. However, when the functions exhibit quadratic growth around the optimum, we show (near) exponential improvements in the private sample complexity. In particular, we propose an adaptive algorithm that improves the sample complexity to achieve expected error $α$ from $\frac{d}{\varepsilon \sqrtα}$ to $\frac{1}{α^ρ} + \frac{d}{\varepsilon} \log\left(\frac{1}α\right)$ for any fixed $ρ>0$, while retaining the standard minimax-optimal sample complexity for non-interpolation problems. We prove a lower bound that shows the dimension-dependent term is tight. Furthermore, we provide a superefficiency result which demonstrates the necessity of the polynomial term for adaptive algorithms: any algorithm that has a polylogarithmic sample complexity for interpolation problems cannot achieve the minimax-optimal rates for the family of non-interpolation problems.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Private Online Prediction from Experts: Separations and Faster Rates
Authors:
Hilal Asi,
Vitaly Feldman,
Tomer Koren,
Kunal Talwar
Abstract:
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of…
▽ More
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of $\tilde{O}(\sqrt{T \log d} + \log d/\varepsilon)$ for the stochastic setting and $\tilde{O}(\sqrt{T \log d} + T^{1/3} \log d/\varepsilon)$ for oblivious adversaries (where $d$ is the number of experts). For pure DP, our algorithms are the first to obtain sub-linear regret for oblivious adversaries in the high-dimensional regime $d \ge T$. Moreover, we prove new lower bounds for adaptive adversaries. Our results imply that unlike the non-private setting, there is a strong separation between the optimal regret for adaptive and non-adaptive adversaries for this problem. Our lower bounds also show a separation between pure and approximate differential privacy for adaptive adversaries where the latter is necessary to achieve the non-private $O(\sqrt{T})$ regret.
△ Less
Submitted 29 June, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
How many labelers do you have? A closer look at gold-standard labels
Authors:
Chen Cheng,
Hilal Asi,
John Duchi
Abstract:
The construction of most supervised learning datasets revolves around collecting multiple labels for each instance, then aggregating the labels to form a type of ``gold-standard.''. We question the wisdom of this pipeline by developing a (stylized) theoretical model of this process and analyzing its statistical consequences, showing how access to non-aggregated label information can make training…
▽ More
The construction of most supervised learning datasets revolves around collecting multiple labels for each instance, then aggregating the labels to form a type of ``gold-standard.''. We question the wisdom of this pipeline by developing a (stylized) theoretical model of this process and analyzing its statistical consequences, showing how access to non-aggregated label information can make training well-calibrated models easier or -- in some cases -- even feasible, whereas it is impossible with only gold-standard labels. The entire story, however, is subtle, and the contrasts between aggregated and fuller label information depend on the particulars of the problem, where estimators that use aggregated information exhibit robust but slower rates of convergence, while estimators that can effectively leverage all labels converge more quickly if they have fidelity to (or can learn) the true labeling process. The theory we develop in the stylized model makes several predictions for real-world datasets, including when non-aggregate labels should improve learning performance, which we test to corroborate the validity of our predictions.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Optimal Algorithms for Mean Estimation under Local Differential Privacy
Authors:
Hilal Asi,
Vitaly Feldman,
Kunal Talwar
Abstract:
We study the problem of mean estimation of $\ell_2$-bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the asymptotically optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of…
▽ More
We study the problem of mean estimation of $\ell_2$-bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the asymptotically optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of designing the protocol with the smallest variance. We show that PrivUnit (Bhowmick et al. 2018) with optimized parameters achieves the optimal variance among a large family of locally private randomizers. To prove this result, we establish some properties of local randomizers, and use symmetrization arguments that allow us to write the optimal randomizer as the optimizer of a certain linear program. These structural results, which should extend to other problems, then allow us to show that the optimal randomizer belongs to the PrivUnit family.
We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees. This allows us to establish several useful properties on the exact constants of the optimal error as well as to numerically estimate these constants.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
Adapting to Function Difficulty and Growth Conditions in Private Optimization
Authors:
Hilal Asi,
Daniel Levy,
John Duchi
Abstract:
We develop algorithms for private stochastic convex optimization that adapt to the hardness of the specific function we wish to optimize. While previous work provide worst-case bounds for arbitrary convex functions, it is often the case that the function at hand belongs to a smaller class that enjoys faster rates. Concretely, we show that for functions exhibiting $κ$-growth around the optimum, i.e…
▽ More
We develop algorithms for private stochastic convex optimization that adapt to the hardness of the specific function we wish to optimize. While previous work provide worst-case bounds for arbitrary convex functions, it is often the case that the function at hand belongs to a smaller class that enjoys faster rates. Concretely, we show that for functions exhibiting $κ$-growth around the optimum, i.e., $f(x) \ge f(x^*) + λκ^{-1} \|x-x^*\|_2^κ$ for $κ> 1$, our algorithms improve upon the standard ${\sqrt{d}}/{n\varepsilon}$ privacy rate to the faster $({\sqrt{d}}/{n\varepsilon})^{\tfracκ{κ- 1}}$. Crucially, they achieve these rates without knowledge of the growth constant $κ$ of the function. Our algorithms build upon the inverse sensitivity mechanism, which adapts to instance difficulty (Asi & Duchi, 2020), and recent localization techniques in private optimization (Feldman et al., 2020). We complement our algorithms with matching lower bounds for these function classes and demonstrate that our adaptive algorithm is \emph{simultaneously} (minimax) optimal over all $κ\ge 1+c$ whenever $c = Θ(1)$.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Private Adaptive Gradient Methods for Convex Optimization
Authors:
Hilal Asi,
John Duchi,
Alireza Fallah,
Omid Javidbakht,
Kunal Talwar
Abstract:
We study adaptive methods for differentially private convex optimization, proposing and analyzing differentially private variants of a Stochastic Gradient Descent (SGD) algorithm with adaptive stepsizes, as well as the AdaGrad algorithm. We provide upper bounds on the regret of both algorithms and show that the bounds are (worst-case) optimal. As a consequence of our development, we show that our…
▽ More
We study adaptive methods for differentially private convex optimization, proposing and analyzing differentially private variants of a Stochastic Gradient Descent (SGD) algorithm with adaptive stepsizes, as well as the AdaGrad algorithm. We provide upper bounds on the regret of both algorithms and show that the bounds are (worst-case) optimal. As a consequence of our development, we show that our private versions of AdaGrad outperform adaptive SGD, which in turn outperforms traditional SGD in scenarios with non-isotropic gradients where (non-private) Adagrad provably outperforms SGD. The major challenge is that the isotropic noise typically added for privacy dominates the signal in gradient geometry for high-dimensional problems; approaches to this that effectively optimize over lower-dimensional subspaces simply ignore the actual problems that varying gradient geometries introduce. In contrast, we study non-isotropic clipping and noise addition, developing a principled theoretical approach; the consequent procedures also enjoy significantly stronger empirical performance than prior approaches.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Stochastic Bias-Reduced Gradient Methods
Authors:
Hilal Asi,
Yair Carmon,
Arun Jambulapati,
Yujia Jin,
Aaron Sidford
Abstract:
We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $δ$, variance $O(\log(1/δ))$, and an expected sampling cost of…
▽ More
We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $δ$, variance $O(\log(1/δ))$, and an expected sampling cost of $O(\log(1/δ))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing.
We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.
△ Less
Submitted 28 October, 2021; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry
Authors:
Hilal Asi,
Vitaly Feldman,
Tomer Koren,
Kunal Talwar
Abstract:
Stochastic convex optimization over an $\ell_1$-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any $(\varepsilon,δ)$-differentially private optimizer is $\sqrt{\log(d)/n} + \sqrt{d}/\varepsilon n.$ The upper bound is based…
▽ More
Stochastic convex optimization over an $\ell_1$-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any $(\varepsilon,δ)$-differentially private optimizer is $\sqrt{\log(d)/n} + \sqrt{d}/\varepsilon n.$ The upper bound is based on a new algorithm that combines the iterative localization approach of~\citet{FeldmanKoTa20} with a new analysis of private regularized mirror descent. It applies to $\ell_p$ bounded domains for $p\in [1,2]$ and queries at most $n^{3/2}$ gradients improving over the best previously known algorithm for the $\ell_2$ case which needs $n^2$ gradients. Further, we show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $\sqrt{\log(d)/n} + (\log(d)/\varepsilon n)^{2/3}.$ This bound is achieved by a new variance-reduced version of the Frank-Wolfe algorithm that requires just a single pass over the data. We also show that the lower bound in this case is the minimum of the two rates mentioned above.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Near Instance-Optimality in Differential Privacy
Authors:
Hilal Asi,
John C. Duchi
Abstract:
We develop two notions of instance optimality in differential privacy, inspired by classical statistical theory: one by defining a local minimax risk and the other by considering unbiased mechanisms and analogizing the Cramer-Rao bound, and we show that the local modulus of continuity of the estimand of interest completely determines these quantities. We also develop a complementary collection mec…
▽ More
We develop two notions of instance optimality in differential privacy, inspired by classical statistical theory: one by defining a local minimax risk and the other by considering unbiased mechanisms and analogizing the Cramer-Rao bound, and we show that the local modulus of continuity of the estimand of interest completely determines these quantities. We also develop a complementary collection mechanisms, which we term the inverse sensitivity mechanisms, which are instance optimal (or nearly instance optimal) for a large class of estimands. Moreover, these mechanisms uniformly outperform the smooth sensitivity framework on each instance for several function classes of interest, including real-valued continuous functions. We carefully present two instantiations of the mechanisms for median and robust regression estimation with corresponding experiments.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
Finding Planted Cliques in Sublinear Time
Authors:
Jay Mardia,
Hilal Asi,
Kabir Aladin Chandrasekher
Abstract:
We study the planted clique problem in which a clique of size k is planted in an Erdos-Renyi graph G(n,1/2) and one is interested in recovering this planted clique. It is widely believed that it exhibits a statistical-computational gap when computational efficiency is equated with the existence of polynomial time algorithms. We study this problem under a more fine-grained computational lens and co…
▽ More
We study the planted clique problem in which a clique of size k is planted in an Erdos-Renyi graph G(n,1/2) and one is interested in recovering this planted clique. It is widely believed that it exhibits a statistical-computational gap when computational efficiency is equated with the existence of polynomial time algorithms. We study this problem under a more fine-grained computational lens and consider the following two questions.
1. Do there exist sublinear time algorithms for recovering the planted clique?
2. What is the smallest running time any algorithm can hope to have?
We show that because of a well known clique-completion property, very elementary sublinear time recovery algorithms do indeed exist for clique sizes k = ω(\sqrt{n}). This points to a qualitatively stronger statistical-computational gap. The planted clique recovery problem can be solved without even looking at most of the input above the Θ(\sqrt{n}) threshold and cannot be solved by any efficient algorithm below it.
A running time lower bound for the recovery problem follows easily from the results of [RS19], and this implies our recovery algorithms are optimal whenever k = Ω(n^{2/3}). However, for k = o(n^{2/3}) there is a gap between our algorithmic upper bound and the information-theoretic lower bound implied by [RS19].
With some caveats, we show stronger detection lower bounds based on the Planted Clique Conjecture for a natural but restricted class of algorithms. The key idea is to relate very fast sublinear time algorithms for detecting large planted cliques to polynomial time algorithms for detecting small planted cliques.
△ Less
Submitted 17 October, 2022; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Element Level Differential Privacy: The Right Granularity of Privacy
Authors:
Hilal Asi,
John Duchi,
Omid Javidbakht
Abstract:
Differential Privacy (DP) provides strong guarantees on the risk of compromising a user's data in statistical learning applications, though these strong protections make learning challenging and may be too stringent for some use cases. To address this, we propose element level differential privacy, which extends differential privacy to provide protection against leaking information about any parti…
▽ More
Differential Privacy (DP) provides strong guarantees on the risk of compromising a user's data in statistical learning applications, though these strong protections make learning challenging and may be too stringent for some use cases. To address this, we propose element level differential privacy, which extends differential privacy to provide protection against leaking information about any particular "element" a user has, allowing better utility and more robust results than classical DP. By carefully choosing these "elements," it is possible to provide privacy protections at a desired granularity. We provide definitions, associated privacy guarantees, and analysis to identify the tradeoffs with the new definition; we also develop several private estimation and learning methodologies, providing careful examples for item frequency and M-estimation (empirical risk minimization) with concomitant privacy and utility analysis. We complement our theoretical and methodological advances with several real-world applications, estimating histograms and fitting several large-scale prediction models, including deep networks.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Nearly Optimal Constructions of PIR and Batch Codes
Authors:
Hilal Asi,
Eitan Yaakobi
Abstract:
In this work we study two families of codes with availability, namely private information retrieval (PIR) codes and batch codes. While the former requires that every information symbol has $k$ mutually disjoint recovering sets, the latter asks this property for every multiset request of $k$ information symbols. The main problem under this paradigm is to minimize the number of redundancy symbols. W…
▽ More
In this work we study two families of codes with availability, namely private information retrieval (PIR) codes and batch codes. While the former requires that every information symbol has $k$ mutually disjoint recovering sets, the latter asks this property for every multiset request of $k$ information symbols. The main problem under this paradigm is to minimize the number of redundancy symbols. We denote this value by $r_P(n,k), r_B(n,k)$, for PIR, batch codes, respectively, where $n$ is the number of information symbols. Previous results showed that for any constant $k$, $r_P(n,k) = Θ(\sqrt{n})$ and $r_B(n,k)=O(\sqrt{n}\log(n)$. In this work we study the asymptotic behavior of these codes for non-constant $k$ and specifically for $k=Θ(n^ε)$. We also study the largest value of $k$ such that the rate of the codes approaches 1, and show that for all $ε<1$, $r_P(n,n^ε) = o(n)$, while for batch codes, this property holds for all $ε< 0.5$.
△ Less
Submitted 5 June, 2017; v1 submitted 25 January, 2017;
originally announced January 2017.