Search | arXiv e-print repository

arXiv:2505.20612 [pdf, ps, other]

Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models

Authors: Peter Robicheaux, Matvei Popov, Anish Madan, Isaac Robinson, Joseph Nelson, Deva Ramanan, Neehar Peri

Abstract: Vision-language models (VLMs) trained on internet-scale data achieve remarkable zero-shot detection performance on common objects like car, truck, and pedestrian. However, state-of-the-art models still struggle to generalize to out-of-distribution classes, tasks and imaging modalities not typically found in their pre-training. Rather than simply re-training VLMs on more visual data, we argue that… ▽ More Vision-language models (VLMs) trained on internet-scale data achieve remarkable zero-shot detection performance on common objects like car, truck, and pedestrian. However, state-of-the-art models still struggle to generalize to out-of-distribution classes, tasks and imaging modalities not typically found in their pre-training. Rather than simply re-training VLMs on more visual data, we argue that one should align VLMs to new concepts with annotation instructions containing a few visual examples and rich textual descriptions. To this end, we introduce Roboflow100-VL, a large-scale collection of 100 multi-modal object detection datasets with diverse concepts not commonly found in VLM pre-training. We evaluate state-of-the-art models on our benchmark in zero-shot, few-shot, semi-supervised, and fully-supervised settings, allowing for comparison across data regimes. Notably, we find that VLMs like GroundingDINO and Qwen2.5-VL achieve less than 2% zero-shot accuracy on challenging medical imaging datasets within Roboflow100-VL, demonstrating the need for few-shot concept alignment. Lastly, we discuss our recent CVPR 2025 Foundational FSOD competition and share insights from the community. Notably, the winning team significantly outperforms our baseline by 16.8 mAP! Our code and dataset are available at https://github.com/roboflow/rf100-vl/ and https://universe.roboflow.com/rf100-vl/ △ Less

Submitted 16 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

Comments: The first two authors contributed equally. Project Page: https://rf100-vl.org/

arXiv:2503.23611 [pdf, other]

doi 10.1145/3713082.3730393

My CXL Pool Obviates Your PCIe Switch

Authors: Yuhong Zhong, Daniel S. Berger, Pantea Zardoshti, Enrique Saurez, Jacob Nelson, Antonis Psistakis, Joshua Fried, Asaf Cidon

Abstract: Pooling PCIe devices across multiple hosts offers a promising solution to mitigate stranded I/O resources, enhance device utilization, address device failures, and reduce total cost of ownership. The only viable option today are PCIe switches, which decouple PCIe devices from hosts by connecting them through a hardware switch. However, the high cost and limited flexibility of PCIe switches hinder… ▽ More Pooling PCIe devices across multiple hosts offers a promising solution to mitigate stranded I/O resources, enhance device utilization, address device failures, and reduce total cost of ownership. The only viable option today are PCIe switches, which decouple PCIe devices from hosts by connecting them through a hardware switch. However, the high cost and limited flexibility of PCIe switches hinder their widespread adoption beyond specialized datacenter use cases. This paper argues that PCIe device pooling can be effectively implemented in software using CXL memory pools. CXL memory pools improve memory utilization and already have positive return on investment. We find that, once CXL pools are in place, they can serve as a building block for pooling any kind of PCIe device. We demonstrate that PCIe devices can directly use CXL memory as I/O buffers without device modifications, which enables routing PCIe traffic through CXL pool memory. This software-based approach is deployable on today's hardware and is more flexible than hardware PCIe switches. In particular, we explore how disaggregating devices such as NICs can transform datacenter infrastructure. △ Less

Submitted 21 April, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

arXiv:2503.07783 [pdf]

Sensemaking in Novel Environments: How Human Cognition Can Inform Artificial Agents

Authors: Robert E. Patterson, Regina Buccello-Stout, Mary E. Frame, Anna M. Maresca, Justin Nelson, Barbara Acker-Mills, Erica Curtis, Jared Culbertson, Kevin Schmidt, Scott Clouse, Steve Rogers

Abstract: One of the most vital cognitive skills to possess is the ability to make sense of objects, events, and situations in the world. In the current paper, we offer an approach for creating artificially intelligent agents with the capacity for sensemaking in novel environments. Objectives: to present several key ideas: (1) a novel unified conceptual framework for sensemaking (which includes the existenc… ▽ More One of the most vital cognitive skills to possess is the ability to make sense of objects, events, and situations in the world. In the current paper, we offer an approach for creating artificially intelligent agents with the capacity for sensemaking in novel environments. Objectives: to present several key ideas: (1) a novel unified conceptual framework for sensemaking (which includes the existence of sign relations embedded within and across frames); (2) interaction among various content-addressable, distributed-knowledge structures via shared attributes (whose net response would represent a synthesized object, event, or situation serving as a sign for sensemaking in a novel environment). Findings: we suggest that attributes across memories can be shared and recombined in novel ways to create synthesized signs, which can denote certain outcomes in novel environments (i.e., sensemaking). △ Less

Submitted 10 March, 2025; originally announced March 2025.

Comments: 14 pages, 5 figures

MSC Class: I.2.0

arXiv:2503.05113 [pdf]

FOSS solution for Molecular Dynamics Simulation Automation and Collaboration with MDSGAT

Authors: Jai Geddes Nelson, Xiaochen Liu, Ken Tye Yong

Abstract: The process of setting up and successfully running Molecular Dynamics Simulations (MDS) is outlined to be incredibly labour and computationally expensive with a very high barrier to entry for newcomers wishing to utilise the benefits and insights of MDS. Here, presented, is a unique Free and Open-Source Software (FOSS) solution that aims to not only reduce the barrier of entry for new Molecular Dy… ▽ More The process of setting up and successfully running Molecular Dynamics Simulations (MDS) is outlined to be incredibly labour and computationally expensive with a very high barrier to entry for newcomers wishing to utilise the benefits and insights of MDS. Here, presented, is a unique Free and Open-Source Software (FOSS) solution that aims to not only reduce the barrier of entry for new Molecular Dynamics (MD) users, but also significantly reduce the setup time and hardware utilisation overhead for even highly experienced MD researchers. This is accomplished through the creation of the Molecular Dynamics Simulation Generator and Analysis Tool (MDSGAT) which currently serves as a viable alternative to other restrictive or privatised MDS Graphical solutions with a unique design that allows for seamless collaboration and distribution of exact MD simulation setups and initialisation parameters through a single setup file. This solution is designed from the start with a modular mindset allowing for additional software expansion to incorporate numerous extra MDS packages and analysis methods over time △ Less

Submitted 14 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

arXiv:2501.17754 [pdf]

doi 10.1002/aisy.202400993

Analysis of the navigation of magnetic microrobots through cerebral bifurcations

Authors: Pedro G. Alves, Maria Pinto, Rosa Moreira, Derick Sivakumaran, Fabian C. Landers, Maria Guix, Bradley J. Nelson, Andreas D. Flouris, Salvador Pané, Josep Puigmartí-Luis, Tiago Sotto Mayor

Abstract: Local administration of thrombolytics in ischemic stroke could accelerate clot lysis and the ensuing reperfusion while minimizing the side effects of systemic administration. Medical microrobots could be injected into the bloodstream and magnetically navigated to the clot for administering the drugs directly to the target. The magnetic manipulation required to navigate medical microrobots will dep… ▽ More Local administration of thrombolytics in ischemic stroke could accelerate clot lysis and the ensuing reperfusion while minimizing the side effects of systemic administration. Medical microrobots could be injected into the bloodstream and magnetically navigated to the clot for administering the drugs directly to the target. The magnetic manipulation required to navigate medical microrobots will depend on various parameters such as the microrobots size, the blood velocity, and the imposed magnetic field gradients. Numerical simulation was used to study the motion of magnetically controlled microrobots flowing through representative cerebral bifurcations, for predicting the magnetic gradients required to navigate the microrobots from the injection point until the target location. Upon thorough validation of the model against several independent analytical and experimental results, the model was used to generate maps and a predictive equation providing quantitative information on the required magnetic gradients, for different scenarios. The developed maps and predictive equation are crucial to inform the design, operation and optimization of magnetic navigation systems for healthcare applications. △ Less

Submitted 29 January, 2025; originally announced January 2025.

Journal ref: Adv. Intell. Syst. 2400993 (2025)

arXiv:2501.11553 [pdf]

Clinically Ready Magnetic Microrobots for Targeted Therapies

Authors: Fabian C. Landers, Lukas Hertle, Vitaly Pustovalov, Derick Sivakumaran, Oliver Brinkmann, Kirstin Meiners, Pascal Theiler, Valentin Gantenbein, Andrea Veciana, Michael Mattmann, Silas Riss, Simone Gervasoni, Christophe Chautems, Hao Ye, Semih Sevim, Andreas D. Flouris, Josep Puigmartí-Luis, Tiago Sotto Mayor, Pedro Alves, Tessa Lühmann, Xiangzhong Chen, Nicole Ochsenbein, Ueli Moehrlen, Philipp Gruber, Miriam Weisskopf , et al. (3 additional authors not shown)

Abstract: Systemic drug administration often causes off-target effects limiting the efficacy of advanced therapies. Targeted drug delivery approaches increase local drug concentrations at the diseased site while minimizing systemic drug exposure. We present a magnetically guided microrobotic drug delivery system capable of precise navigation under physiological conditions. This platform integrates a clinica… ▽ More Systemic drug administration often causes off-target effects limiting the efficacy of advanced therapies. Targeted drug delivery approaches increase local drug concentrations at the diseased site while minimizing systemic drug exposure. We present a magnetically guided microrobotic drug delivery system capable of precise navigation under physiological conditions. This platform integrates a clinical electromagnetic navigation system, a custom-designed release catheter, and a dissolvable capsule for accurate therapeutic delivery. In vitro tests showed precise navigation in human vasculature models, and in vivo experiments confirmed tracking under fluoroscopy and successful navigation in large animal models. The microrobot balances magnetic material concentration, contrast agent loading, and therapeutic drug capacity, enabling effective hosting of therapeutics despite the integration complexity of its components, offering a promising solution for precise targeted drug delivery. △ Less

Submitted 20 January, 2025; originally announced January 2025.

arXiv:2412.01143 [pdf, other]

Space Complexity of Minimum Cut Problems in Single-Pass Streams

Authors: Matthew Ding, Alexandro Garces, Jason Li, Honghao Lin, Jelani Nelson, Vihan Shah, David P. Woodruff

Abstract: We consider the problem of finding a minimum cut of a weighted graph presented as a single-pass stream. While graph sparsification in streams has been intensively studied, the specific application of finding minimum cuts in streams is less well-studied. To this end, we show upper and lower bounds on minimum cut problems in insertion-only streams for a variety of settings, including for both random… ▽ More We consider the problem of finding a minimum cut of a weighted graph presented as a single-pass stream. While graph sparsification in streams has been intensively studied, the specific application of finding minimum cuts in streams is less well-studied. To this end, we show upper and lower bounds on minimum cut problems in insertion-only streams for a variety of settings, including for both randomized and deterministic algorithms, for both arbitrary and random order streams, and for both approximate and exact algorithms. One of our main results is an $\widetilde{O}(n/\varepsilon)$ space algorithm with fast update time for approximating a spectral cut query with high probability on a stream given in an arbitrary order. Our result breaks the $Ω(n/\varepsilon^2)$ space lower bound required of a sparsifier that approximates all cuts simultaneously. Using this result, we provide streaming algorithms with near optimal space of $\widetilde{O}(n/\varepsilon)$ for minimum cut and approximate all-pairs effective resistances, with matching space lower-bounds. The amortized update time of our algorithms is $\widetilde{O}(1)$, provided that the number of edges in the input graph is at least $(n/\varepsilon^2)^{1+o(1)}$. We also give a generic way of incorporating sketching into a recursive contraction algorithm to improve the post-processing time of our algorithms. In addition to these results, we give a random-order streaming algorithm that computes the {\it exact} minimum cut on a simple, unweighted graph using $\widetilde{O}(n)$ space. Finally, we give an $Ω(n/\varepsilon^2)$ space lower bound for deterministic minimum cut algorithms which matches the best-known upper bound up to polylogarithmic factors. △ Less

Submitted 6 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: 25+3 pages, 2 figures. Accepted to ITCS 2025. v2: minor updates to author information

arXiv:2411.06370 [pdf, ps, other]

One Attack to Rule Them All: Tight Quadratic Bounds for Adaptive Queries on Cardinality Sketches

Authors: Edith Cohen, Jelani Nelson, Tamás Sarlós, Mihir Singhal, Uri Stemmer

Abstract: Cardinality sketches are compact data structures for representing sets or vectors. These sketches are space-efficient, typically requiring only logarithmic storage in the input size, and enable approximation of cardinality (or the number of nonzero entries). A crucial property in applications is \emph{composability}, meaning that the sketch of a union of sets can be computed from individual sketch… ▽ More Cardinality sketches are compact data structures for representing sets or vectors. These sketches are space-efficient, typically requiring only logarithmic storage in the input size, and enable approximation of cardinality (or the number of nonzero entries). A crucial property in applications is \emph{composability}, meaning that the sketch of a union of sets can be computed from individual sketches. Existing designs provide strong statistical guarantees, ensuring that a randomly sampled sketching map remains robust for an exponential number of queries in terms of the sketch size $k$. However, these guarantees degrade to quadratic in $k$ when queries are \emph{adaptive}, meaning they depend on previous responses. Prior works on statistical queries (Steinke and Ullman, 2015) and specific MinHash cardinality sketches (Ahmadian and Cohen, 2024) established that this is tight in that they can be compromised using a quadratic number of adaptive queries. In this work, we develop a universal attack framework that applies to broad classes of cardinality sketches. We show that any union-composable sketching map can be compromised with $\tilde{O}(k^4)$ adaptive queries and this improves to a tight bound of $\tilde{O}(k^2)$ for monotone maps (including MinHash, statistical queries, and Boolean linear maps). Similarly, any linear sketching map over the reals $\mathbb{R}$ and finite fields $\mathbb{F}_p$ can be compromised using $\tilde{O}(k^2)$ adaptive queries, which is optimal and strengthens some of the recent results by~\citet{GribelyukLWYZ:FOCS2024}, who established a weaker polynomial bound. △ Less

Submitted 13 March, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

arXiv:2411.02535 [pdf, other]

Polynomial-Time Classical Simulation of Noisy Circuits with Naturally Fault-Tolerant Gates

Authors: Jon Nelson, Joel Rajakumar, Dominik Hangleiter, Michael J. Gullans

Abstract: We construct a polynomial-time classical algorithm that samples from the output distribution of low-depth noisy Clifford circuits with any product-state inputs and final single-qubit measurements in any basis. This class of circuits includes Clifford-magic circuits and Conjugated-Clifford circuits, which are important candidates for demonstrating quantum advantage using non-universal gates. Additi… ▽ More We construct a polynomial-time classical algorithm that samples from the output distribution of low-depth noisy Clifford circuits with any product-state inputs and final single-qubit measurements in any basis. This class of circuits includes Clifford-magic circuits and Conjugated-Clifford circuits, which are important candidates for demonstrating quantum advantage using non-universal gates. Additionally, our results generalize a simulation algorithm for IQP circuits [Rajakumar et. al, SODA'25] to the case of IQP circuits augmented with CNOT gates, which is another class of non-universal circuits that are relevant to current experiments. Importantly, our results do not require randomness assumptions over the circuit families considered (such as anticoncentration properties) and instead hold for every circuit in each class. This allows us to place tight limitations on the robustness of these circuits to noise. In particular, we show that there is no quantum advantage at large depths with realistically noisy Clifford circuits, even with perfect magic state inputs, or IQP circuits with CNOT gates, even with arbitrary diagonal non-Clifford gates. The key insight behind the algorithm is that interspersed noise causes a decay of long-range entanglement, and at depths beyond a critical threshold, the noise builds up to an extent that most correlations can be classically simulated. To prove our results, we merge techniques from percolation theory with tools from Pauli path analysis. △ Less

Submitted 10 December, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

arXiv:2405.01437 [pdf, other]

Two competing populations with a common environmental resource

Authors: Keith Paarporn, James Nelson

Abstract: Feedback-evolving games is a framework that models the co-evolution between payoff functions and an environmental state. It serves as a useful tool to analyze many social dilemmas such as natural resource consumption, behaviors in epidemics, and the evolution of biological populations. However, it has primarily focused on the dynamics of a single population of agents. In this paper, we consider th… ▽ More Feedback-evolving games is a framework that models the co-evolution between payoff functions and an environmental state. It serves as a useful tool to analyze many social dilemmas such as natural resource consumption, behaviors in epidemics, and the evolution of biological populations. However, it has primarily focused on the dynamics of a single population of agents. In this paper, we consider the impact of two populations of agents that share a common environmental resource. We focus on a scenario where individuals in one population are governed by an environmentally ``responsible" incentive policy, and individuals in the other population are environmentally ``irresponsible". An analysis on the asymptotic stability of the coupled system is provided, and conditions for which the resource collapses are identified. We then derive consumption rates for the irresponsible population that optimally exploit the environmental resource, and analyze how incentives should be allocated to the responsible population that most effectively promote the environment via a sensitivity analysis. △ Less

Submitted 21 August, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.10201 [pdf, other]

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar, Samson Zhou

Abstract: We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each use… ▽ More We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send $Ω(\min(n\varepsilon^2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $\mathcal{O}(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that any single-message protocol must incur mean squared error $Ω(dn^{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = Θ(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler. △ Less

Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: Fixed author ordering

arXiv:2403.14770 [pdf, other]

Beehive: A Flexible Network Stack for Direct-Attached Accelerators

Authors: Katie Lim, Matthew Giordano, Theano Stavrinos, Irene Zhang, Jacob Nelson, Baris Kasikci, Tom Anderson

Abstract: Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To… ▽ More Direct-attached accelerators, where application accelerators are directly connected to the datacenter network via a hardware network stack, offer substantial benefits in terms of reduced latency, CPU overhead, and energy use. However, a key challenge is that modern datacenter network stacks are complex, with interleaved protocol layers, network management functions, and virtualization support. To operators, network feature agility, diagnostics, and manageability are often considered just as important as raw performance. By contrast, existing hardware network stacks only support basic protocols and are often difficult to extend since they use fixed processing pipelines. We propose Beehive, a new, open-source FPGA network stack for direct-attached accelerators designed to enable flexible and adaptive construction of complex network functionality in hardware. Application and network protocol elements are modularized as tiles over a network-on-chip substrate. Elements can be added or scaled up/down to match workload characteristics with minimal effort or changes to other elements. Flexible diagnostics and control are integral, with tooling to ensure deadlock safety. Our implementation interoperates with standard Linux TCP and UDP clients, with a 4x improvement in end-to-end RPC tail latency for Linux UDP clients versus a CPU-attached accelerator. Beehive is available at https://github.com/beehive-fpga/beehive △ Less

Submitted 11 September, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: To appear at MICRO 2024

arXiv:2403.00028 [pdf, ps, other]

Lower Bounds for Differential Privacy Under Continual Observation and Online Threshold Queries

Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Uri Stemmer

Abstract: One of the most basic problems for studying the "price of privacy over time" is the so called private counter problem, introduced by Dwork et al. (2010) and Chan et al. (2010). In this problem, we aim to track the number of events that occur over time, while hiding the existence of every single event. More specifically, in every time step $t\in[T]$ we learn (in an online fashion) that $Δ_t\geq 0$… ▽ More One of the most basic problems for studying the "price of privacy over time" is the so called private counter problem, introduced by Dwork et al. (2010) and Chan et al. (2010). In this problem, we aim to track the number of events that occur over time, while hiding the existence of every single event. More specifically, in every time step $t\in[T]$ we learn (in an online fashion) that $Δ_t\geq 0$ new events have occurred, and must respond with an estimate $n_t\approx\sum_{j=1}^t Δ_j$. The privacy requirement is that all of the outputs together, across all time steps, satisfy event level differential privacy. The main question here is how our error needs to depend on the total number of time steps $T$ and the total number of events $n$. Dwork et al. (2015) showed an upper bound of $O\left(\log(T)+\log^2(n)\right)$, and Henzinger et al. (2023) showed a lower bound of $Ω\left(\min\{\log n, \log T\}\right)$. We show a new lower bound of $Ω\left(\min\{n,\log T\}\right)$, which is tight w.r.t. the dependence on $T$, and is tight in the sparse case where $\log^2 n=O(\log T)$. Our lower bound has the following implications: $\bullet$ We show that our lower bound extends to the "online thresholds problem", where the goal is to privately answer many "quantile queries" when these queries are presented one-by-one. This resolves an open question of Bun et al. (2017). $\bullet$ Our lower bound implies, for the first time, a separation between the number of mistakes obtainable by a private online learner and a non-private online learner. This partially resolves a COLT'22 open question published by Sanyal and Ramponi. $\bullet$ Our lower bound also yields the first separation between the standard model of private online learning and a recently proposed relaxed variant of it, called private online prediction. △ Less

Submitted 17 April, 2024; v1 submitted 28 February, 2024; originally announced March 2024.

arXiv:2312.02132 [pdf, other]

Hot PATE: Private Aggregation of Distributions for Diverse Task

Authors: Edith Cohen, Benjamin Cohen-Wang, Xin Lyu, Jelani Nelson, Tamas Sarlos, Uri Stemmer

Abstract: The Private Aggregation of Teacher Ensembles (PATE) framework enables privacy-preserving machine learning by aggregating responses from disjoint subsets of sensitive data. Adaptations of PATE to tasks with inherent output diversity such as text generation face a core tension: preserving output diversity reduces teacher agreement, which in turn increases the noise required for differential privacy,… ▽ More The Private Aggregation of Teacher Ensembles (PATE) framework enables privacy-preserving machine learning by aggregating responses from disjoint subsets of sensitive data. Adaptations of PATE to tasks with inherent output diversity such as text generation face a core tension: preserving output diversity reduces teacher agreement, which in turn increases the noise required for differential privacy, degrading utility. Yet suppressing diversity is counterproductive, as modern large language models encapsulate knowledge in their output distributions. We propose Hot PATE, a variant tailored to settings where outputs are distributions. We formally define what it means to preserve diversity and introduce an efficient aggregation mechanism that transfers diversity to the randomized output without incurring additional privacy cost. Our method can be implemented with only API access to proprietary models and serves as a drop-in replacement for existing "cold" PATE aggregators. Empirically, Hot PATE achieves orders-of-magnitude improvement on in-context learning tasks. △ Less

Submitted 17 May, 2025; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.01242 [pdf, other]

Pushing the Limits of Quantum Computing for Simulating PFAS Chemistry

Authors: Emil Dimitrov, Goar Sanchez-Sanz, James Nelson, Lee O'Riordan, Myles Doyle, Sean Courtney, Venkatesh Kannan, Hassan Naseri, Alberto Garcia Garcia, James Tricker, Marisa Faraggi, Joshua Goings, Luning Zhao

Abstract: Accurate and scalable methods for computational quantum chemistry can accelerate research and development in many fields, ranging from drug discovery to advanced material design. Solving the electronic Schrodinger equation is the core problem of computational chemistry. However, the combinatorial complexity of this problem makes it intractable to find exact solutions, except for very small systems… ▽ More Accurate and scalable methods for computational quantum chemistry can accelerate research and development in many fields, ranging from drug discovery to advanced material design. Solving the electronic Schrodinger equation is the core problem of computational chemistry. However, the combinatorial complexity of this problem makes it intractable to find exact solutions, except for very small systems. The idea of quantum computing originated from this computational challenge in simulating quantum-mechanics. We propose an end-to-end quantum chemistry pipeline based on the variational quantum eigensolver (VQE) algorithm and integrated with both HPC-based simulators and a trapped-ion quantum computer. Our platform orchestrates hundreds of simulation jobs on compute resources to efficiently complete a set of ab initio chemistry experiments with a wide range of parameterization. Per- and poly-fluoroalkyl substances (PFAS) are a large family of human-made chemicals that pose a major environmental and health issue globally. Our simulations includes breaking a Carbon-Fluorine bond in trifluoroacetic acid (TFA), a common PFAS chemical. This is a common pathway towards destruction and removal of PFAS. Molecules are modeled on both a quantum simulator and a trapped-ion quantum computer, specifically IonQ Aria. Using basic error mitigation techniques, the 11-qubit TFA model (56 entangling gates) on IonQ Aria yields near-quantitative results with milli-Hartree accuracy. Our novel results show the current state and future projections for quantum computing in solving the electronic structure problem, push the boundaries for the VQE algorithm and quantum computers, and facilitates development of quantum chemistry workflows. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.12871 [pdf, other]

The origins of unpredictability in life trajectory prediction tasks

Authors: Ian Lundberg, Rachel Brown-Weinstock, Susan Clampet-Lundquist, Sarah Pachman, Timothy J. Nelson, Vicki Yang, Kathryn Edin, Matthew J. Salganik

Abstract: Why are life trajectories difficult to predict? We investigated this question through in-depth qualitative interviews with 40 families sampled from a multi-decade longitudinal study. Our sampling and interviewing process were informed by the earlier efforts of hundreds of researchers to predict life outcomes for participants in this study. The qualitative evidence we uncovered in these interviews… ▽ More Why are life trajectories difficult to predict? We investigated this question through in-depth qualitative interviews with 40 families sampled from a multi-decade longitudinal study. Our sampling and interviewing process were informed by the earlier efforts of hundreds of researchers to predict life outcomes for participants in this study. The qualitative evidence we uncovered in these interviews combined with a well-known mathematical decomposition of prediction error helps us identify some origins of unpredictability and create a new conceptual framework. Our specific evidence and our more general framework suggest that unpredictability should be expected in many life trajectory prediction tasks, even in the presence of complex algorithms and large datasets. Our work also provides a foundation for future empirical and theoretical work on unpredictability in human lives. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 54 pages, 8 figures

ACM Class: J.4

arXiv:2310.01347 [pdf, ps, other]

Hamiltonians whose low-energy states require $Ω(n)$ T gates

Authors: Nolan J. Coble, Matthew Coudron, Jon Nelson, Seyed Sajjad Nezhadi

Abstract: The recent resolution of the NLTS Conjecture [ABN22] establishes a prerequisite to the Quantum PCP (QPCP) Conjecture through a novel use of newly-constructed QLDPC codes [LZ22]. Even with NLTS now solved, there remain many independent and unresolved prerequisites to the QPCP Conjecture, such as the NLSS Conjecture of [GL22]. In this work we focus on a specific and natural prerequisite to both NLSS… ▽ More The recent resolution of the NLTS Conjecture [ABN22] establishes a prerequisite to the Quantum PCP (QPCP) Conjecture through a novel use of newly-constructed QLDPC codes [LZ22]. Even with NLTS now solved, there remain many independent and unresolved prerequisites to the QPCP Conjecture, such as the NLSS Conjecture of [GL22]. In this work we focus on a specific and natural prerequisite to both NLSS and the QPCP Conjecture, namely, the existence of local Hamiltonians whose low-energy states all require $ω(\log n)$ T gates to prepare. In fact, we prove a stronger result which is not necessarily implied by either conjecture: we construct local Hamiltonians whose low-energy states require $Ω(n)$ T gates. We further show that our procedure can be applied to the NLTS Hamiltonians of [ABN22] to yield local Hamiltonians whose low-energy states require both $Ω(\log n)$-depth and $Ω(n)$ T gates to prepare. In order to accomplish this we define a "pseudo-stabilizer" property of a state with respect to each local Hamiltonian term, and prove an additive local energy lower bound for each term at which the state is pseudo-stabilizer. By proving a relationship between the number of T gates preparing a state and the number of terms at which the state is pseudo-stabilizer, we are able to give a constant energy lower bound which applies to any state with T-count less than $c \cdot n$ for some fixed positive constant $c$. This result represents a significant improvement over [CCNN23] where we used a different technique to give an energy bound which only distinguishes between stabilizer states and states which require a non-zero number of T gates. △ Less

Submitted 10 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: fixed typos, updated abstract, additional references added

arXiv:2310.00145 [pdf, other]

3D Reconstruction in Noisy Agricultural Environments: A Bayesian Optimization Perspective for View Planning

Authors: Athanasios Bacharis, Konstantinos D. Polyzos, Henry J. Nelson, Georgios B. Giannakis, Nikolaos Papanikolopoulos

Abstract: 3D reconstruction is a fundamental task in robotics that gained attention due to its major impact in a wide variety of practical settings, including agriculture, underwater, and urban environments. This task can be carried out via view planning (VP), which aims to optimally place a certain number of cameras in positions that maximize the visual information, improving the resulting 3D reconstructio… ▽ More 3D reconstruction is a fundamental task in robotics that gained attention due to its major impact in a wide variety of practical settings, including agriculture, underwater, and urban environments. This task can be carried out via view planning (VP), which aims to optimally place a certain number of cameras in positions that maximize the visual information, improving the resulting 3D reconstruction. Nonetheless, in most real-world settings, existing environmental noise can significantly affect the performance of 3D reconstruction. To that end, this work advocates a novel geometric-based reconstruction quality function for VP, that accounts for the existing noise of the environment, without requiring its closed-form expression. With no analytic expression of the objective function, this work puts forth an adaptive Bayesian optimization algorithm for accurate 3D reconstruction in the presence of noise. Numerical tests on noisy agricultural environments showcase the merits of the proposed approach for 3D reconstruction with even a small number of available cameras. △ Less

Submitted 18 March, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

arXiv:2308.14733 [pdf, other]

Differentially Private Aggregation via Imperfect Shuffling

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Jelani Nelson, Samson Zhou

Abstract: In this paper, we introduce the imperfect shuffle differential privacy model, where messages sent from users are shuffled in an almost uniform manner before being observed by a curator for private aggregation. We then consider the private summation problem. We show that the standard split-and-mix protocol by Ishai et. al. [FOCS 2006] can be adapted to achieve near-optimal utility bounds in the imp… ▽ More In this paper, we introduce the imperfect shuffle differential privacy model, where messages sent from users are shuffled in an almost uniform manner before being observed by a curator for private aggregation. We then consider the private summation problem. We show that the standard split-and-mix protocol by Ishai et. al. [FOCS 2006] can be adapted to achieve near-optimal utility bounds in the imperfect shuffle model. Specifically, we show that surprisingly, there is no additional error overhead necessary in the imperfect shuffle model. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.02025 [pdf]

Applications and Societal Implications of Artificial Intelligence in Manufacturing: A Systematic Review

Authors: John P. Nelson, Justin B. Biddle, Philip Shapira

Abstract: This paper undertakes a systematic review of relevant extant literature to consider the potential societal implications of the growth of AI in manufacturing. We analyze the extensive range of AI applications in this domain, such as interfirm logistics coordination, firm procurement management, predictive maintenance, and shop-floor monitoring and control of processes, machinery, and workers. Addit… ▽ More This paper undertakes a systematic review of relevant extant literature to consider the potential societal implications of the growth of AI in manufacturing. We analyze the extensive range of AI applications in this domain, such as interfirm logistics coordination, firm procurement management, predictive maintenance, and shop-floor monitoring and control of processes, machinery, and workers. Additionally, we explore the uncertain societal implications of industrial AI, including its impact on the workforce, job upskilling and deskilling, cybersecurity vulnerability, and environmental consequences. After building a typology of AI applications in manufacturing, we highlight the diverse possibilities for AI's implementation at different scales and application types. We discuss the importance of considering AI's implications both for individual firms and for society at large, encompassing economic prosperity, equity, environmental health, and community safety and security. The study finds that there is a predominantly optimistic outlook in prior literature regarding AI's impact on firms, but that there is substantial debate and contention about adverse effects and the nature of AI's societal implications. The paper draws analogies to historical cases and other examples to provide a contextual perspective on potential societal effects of industrial AI. Ultimately, beneficial integration of AI in manufacturing will depend on the choices and priorities of various stakeholders, including firms and their managers and owners, technology developers, civil society organizations, and governments. A broad and balanced awareness of opportunities and risks among stakeholders is vital not only for successful and safe technical implementation but also to construct a socially beneficial and sustainable future for manufacturing in the age of AI. △ Less

Submitted 25 July, 2023; originally announced August 2023.

arXiv:2306.04444 [pdf, other]

Fast Optimal Locally Private Mean Estimation via Random Projections

Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar

Abstract: We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity… ▽ More We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a $1+o(1)$-factor. Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm such as PrivUnitG in the lower-dimensional space. In addition, we show that, by appropriately correlating the random projection matrices across devices, we can achieve fast server run-time. We mathematically analyze the error of the algorithm in terms of properties of the random projections, and study two instantiations. Lastly, our experiments for private mean estimation and private federated learning demonstrate that our algorithms empirically obtain nearly the same utility as optimal ones while having significantly lower communication and computational cost. △ Less

Submitted 26 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Added the correct github link

arXiv:2305.10834 [pdf]

AIwriting: Relations Between Image Generation and Digital Writing

Authors: Scott Rettberg, Talan Memmott, Jill Walker Rettberg, Jason Nelson, Patrick Lichty

Abstract: During 2022, both transformer-based AI text generation sys-tems such as GPT-3 and AI text-to-image generation systems such as DALL-E 2 and Stable Diffusion made exponential leaps forward and are unquestionably altering the fields of digital art and electronic literature. In this panel a group of electronic literature authors and theorists consider new oppor-tunities for human creativity presented… ▽ More During 2022, both transformer-based AI text generation sys-tems such as GPT-3 and AI text-to-image generation systems such as DALL-E 2 and Stable Diffusion made exponential leaps forward and are unquestionably altering the fields of digital art and electronic literature. In this panel a group of electronic literature authors and theorists consider new oppor-tunities for human creativity presented by these systems and present new works have produced during the past year that specifically address these systems as environments for literary expressions that are translated through iterative interlocutive processes into visual representations. The premise that binds these presentations is that these systems and the works gener-ated must be considered from a literary perspective, as they originate in human writing. In works ranging from a visual memoir of the personal experience of a health crisis, to interac-tive web comics, to architectures based on abstract poetic language, to political satire, four artists explore the capabili-ties of these writing environments for new genres of literary artist practice, while a digital culture theorist considers the origins and effects of the particular training datasets of human language and images on which these new hybrid forms are based. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: Extended abstract for panel presented at ISEA 2023, Paris 16-22 May 2023

ACM Class: J.5

arXiv:2304.04488 [pdf, other]

Hybrid Computing for Interactive Datacenter Applications

Authors: Pratyush Patel, Katie Lim, Kushal Jhunjhunwalla, Ashlie Martinez, Max Demoulin, Jacob Nelson, Irene Zhang, Thomas Anderson

Abstract: Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at rea… ▽ More Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty workloads, this advantage can be difficult to harness due to high FPGA spin-up costs. We propose that a hybrid FPGA and CPU computing framework can harness the energy efficiency benefits of FPGAs for such workloads at reasonable cost. Our key insight is to use FPGAs for stable-state workload and CPUs for short-term workload bursts. Using this insight, we design Spork, a lightweight hybrid scheduler that can realize these energy efficiency and cost benefits in practice. Depending on the desired objective, Spork can trade off energy efficiency for cost reduction and vice versa. It is parameterized with key differences between FPGAs and CPUs in terms of power draw, performance, cost, and spin-up latency. We vary this parameter space and analyze various application and worker configurations on production and synthetic traces. Our evaluation of cloud workloads shows that energy-optimized Spork is not only more energy efficient but it is also cheaper than homogeneous platforms--for short application requests with tight deadlines, it is 1.53x more energy efficient and 2.14x cheaper than using only FPGAs. Relative to an idealized version of an existing cost-optimized hybrid scheduler, energy-optimized Spork provides 1.2-2.4x higher energy efficiency at comparable cost, while cost-optimized Spork provides 1.1-2x higher energy efficiency at 1.06-1.2x lower cost. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 13 pages

arXiv:2303.01229 [pdf, other]

Almanac: Retrieval-Augmented Language Models for Clinical Medicine

Authors: Cyril Zakka, Akash Chaurasia, Rohan Shad, Alex R. Dalal, Jennifer L. Kim, Michael Moor, Kevin Alexander, Euan Ashley, Jack Boyd, Kathleen Boyd, Karen Hirsch, Curt Langlotz, Joanna Nelson, William Hiesinger

Abstract: Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine, adoption of these models in real-world settings has been largely limited by their tendency to generate incorrect and sometimes even toxic statements. In… ▽ More Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine, adoption of these models in real-world settings has been largely limited by their tendency to generate incorrect and sometimes even toxic statements. In this study, we develop Almanac, a large language model framework augmented with retrieval capabilities for medical guideline and treatment recommendations. Performance on a novel dataset of clinical scenarios (n = 130) evaluated by a panel of 5 board-certified and resident physicians demonstrates significant increases in factuality (mean of 18% at p-value < 0.05) across all specialties, with improvements in completeness and safety. Our results demonstrate the potential for large language models to be effective tools in the clinical decision-making process, while also emphasizing the importance of careful testing and deployment to mitigate their shortcomings. △ Less

Submitted 31 May, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

arXiv:2302.14755 [pdf, other]

doi 10.4230/LIPIcs.TQC.2023.14

Local Hamiltonians with no low-energy stabilizer states

Authors: Nolan J. Coble, Matthew Coudron, Jon Nelson, Seyed Sajjad Nezhadi

Abstract: The recently-defined No Low-energy Sampleable States (NLSS) conjecture of Gharibian and Le Gall [GL22] posits the existence of a family of local Hamiltonians where all states of low-enough constant energy do not have succinct representations allowing perfect sampling access. States that can be prepared using only Clifford gates (i.e. stabilizer states) are an example of sampleable states, so the N… ▽ More The recently-defined No Low-energy Sampleable States (NLSS) conjecture of Gharibian and Le Gall [GL22] posits the existence of a family of local Hamiltonians where all states of low-enough constant energy do not have succinct representations allowing perfect sampling access. States that can be prepared using only Clifford gates (i.e. stabilizer states) are an example of sampleable states, so the NLSS conjecture implies the existence of local Hamiltonians whose low-energy space contains no stabilizer states. We describe families that exhibit this requisite property via a simple alteration to local Hamiltonians corresponding to CSS codes. Our method can also be applied to the recent NLTS Hamiltonians of Anshu, Breuckmann, and Nirkhe [ABN22], resulting in a family of local Hamiltonians whose low-energy space contains neither stabilizer states nor trivial states. We hope that our techniques will eventually be helpful for constructing Hamiltonians which simultaneously satisfy NLSS and NLTS. △ Less

Submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.12170 [pdf, other]

Language Model Crossover: Variation through Few-Shot Prompting

Authors: Elliot Meyerson, Mark J. Nelson, Herbie Bradley, Adam Gaier, Arash Moradi, Amy K. Hoover, Joel Lehman

Abstract: This paper pursues the insight that language models naturally enable an intelligent variation operator similar in spirit to evolutionary crossover. In particular, language models of sufficient scale demonstrate in-context learning, i.e. they can learn from associations between a small number of input patterns to generate outputs incorporating such associations (also called few-shot prompting). Thi… ▽ More This paper pursues the insight that language models naturally enable an intelligent variation operator similar in spirit to evolutionary crossover. In particular, language models of sufficient scale demonstrate in-context learning, i.e. they can learn from associations between a small number of input patterns to generate outputs incorporating such associations (also called few-shot prompting). This ability can be leveraged to form a simple but powerful variation operator, i.e. to prompt a language model with a few text-based genotypes (such as code, plain-text sentences, or equations), and to parse its corresponding output as those genotypes' offspring. The promise of such language model crossover (which is simple to implement and can leverage many different open-source language models) is that it enables a simple mechanism to evolve semantically-rich text representations (with few domain-specific tweaks), and naturally benefits from current progress in language models. Experiments in this paper highlight the versatility of language-model crossover, through evolving binary bit-strings, sentences, equations, text-to-image prompts, and Python code. The conclusion is that language model crossover is a promising method for evolving genomes representable as text. △ Less

Submitted 13 May, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.06165 [pdf, ps, other]

Sparse Dimensionality Reduction Revisited

Authors: Mikael Møller Høgsgaard, Lion Kamma, Kasper Green Larsen, Jelani Nelson, Chris Schwiegelshohn

Abstract: The sparse Johnson-Lindenstrauss transform is one of the central techniques in dimensionality reduction. It supports embedding a set of $n$ points in $\mathbb{R}^d$ into $m=O(\varepsilon^{-2} \lg n)$ dimensions while preserving all pairwise distances to within $1 \pm \varepsilon$. Each input point $x$ is embedded to $Ax$, where $A$ is an $m \times d$ matrix having $s$ non-zeros per column, allowin… ▽ More The sparse Johnson-Lindenstrauss transform is one of the central techniques in dimensionality reduction. It supports embedding a set of $n$ points in $\mathbb{R}^d$ into $m=O(\varepsilon^{-2} \lg n)$ dimensions while preserving all pairwise distances to within $1 \pm \varepsilon$. Each input point $x$ is embedded to $Ax$, where $A$ is an $m \times d$ matrix having $s$ non-zeros per column, allowing for an embedding time of $O(s \|x\|_0)$. Since the sparsity of $A$ governs the embedding time, much work has gone into improving the sparsity $s$. The current state-of-the-art by Kane and Nelson (JACM'14) shows that $s = O(\varepsilon ^{-1} \lg n)$ suffices. This is almost matched by a lower bound of $s = Ω(\varepsilon ^{-1} \lg n/\lg(1/\varepsilon))$ by Nelson and Nguyen (STOC'13). Previous work thus suggests that we have near-optimal embeddings. In this work, we revisit sparse embeddings and identify a loophole in the lower bound. Concretely, it requires $d \geq n$, which in many applications is unrealistic. We exploit this loophole to give a sparser embedding when $d = o(n)$, achieving $s = O(\varepsilon^{-1}(\lg n/\lg(1/\varepsilon)+\lg^{2/3}n \lg^{1/3} d))$. We also complement our analysis by strengthening the lower bound of Nelson and Nguyen to hold also when $d \ll n$, thereby matching the first term in our new sparsity upper bound. Finally, we also improve the sparsity of the best oblivious subspace embeddings for optimal embedding dimensionality. △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2211.12063 [pdf, ps, other]

Generalized Private Selection and Testing with High Confidence

Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Uri Stemmer

Abstract: Composition theorems are general and powerful tools that facilitate privacy accounting across multiple data accesses from per-access privacy bounds. However they often result in weaker bounds compared with end-to-end analysis. Two popular tools that mitigate that are the exponential mechanism (or report noisy max) and the sparse vector technique. They were generalized in a couple of recent private… ▽ More Composition theorems are general and powerful tools that facilitate privacy accounting across multiple data accesses from per-access privacy bounds. However they often result in weaker bounds compared with end-to-end analysis. Two popular tools that mitigate that are the exponential mechanism (or report noisy max) and the sparse vector technique. They were generalized in a couple of recent private selection/test frameworks, including the work by Liu and Talwar (STOC 2019), and Papernot and Steinke (ICLR 2022). In this work, we first present an alternative framework for private selection and testing with a simpler privacy proof and equally-good utility guarantee. Second, we observe that the private selection framework (both previous ones and ours) can be applied to improve the accuracy/confidence trade-off for many fundamental privacy-preserving data-analysis tasks, including query releasing, top-$k$ selection, and stable selection. Finally, for online settings, we apply the private testing to design a mechanism for adaptive query releasing, which improves the sample complexity dependence on the confidence parameter for the celebrated private multiplicative weights algorithm of Hardt and Rothblum (FOCS 2010). △ Less

Submitted 9 February, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: Appeared in ITCS 2023; This version: revised introduction and related works sections;

arXiv:2211.11718 [pdf, ps, other]

Private Counting of Distinct and k-Occurring Items in Time Windows

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Jelani Nelson

Abstract: In this work, we study the task of estimating the numbers of distinct and $k$-occurring items in a time window under the constraint of differential privacy (DP). We consider several variants depending on whether the queries are on general time windows (between times $t_1$ and $t_2$), or are restricted to being cumulative (between times $1$ and $t_2$), and depending on whether the DP neighboring re… ▽ More In this work, we study the task of estimating the numbers of distinct and $k$-occurring items in a time window under the constraint of differential privacy (DP). We consider several variants depending on whether the queries are on general time windows (between times $t_1$ and $t_2$), or are restricted to being cumulative (between times $1$ and $t_2$), and depending on whether the DP neighboring relation is event-level or the more stringent item-level. We obtain nearly tight upper and lower bounds on the errors of DP algorithms for these problems. En route, we obtain an event-level DP algorithm for estimating, at each time step, the number of distinct items seen over the last $W$ updates with error polylogarithmic in $W$; this answers an open question of Bolot et al. (ICDT 2013). △ Less

Submitted 21 November, 2022; originally announced November 2022.

Comments: To appear in ITCS 2023

arXiv:2211.06387 [pdf, ps, other]

Õptimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization

Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Uri Stemmer

Abstract: The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(ξ^{-1} \log(1/β))$ (for generalization error $ξ$ with confidence $1-β$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dep… ▽ More The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(ξ^{-1} \log(1/β))$ (for generalization error $ξ$ with confidence $1-β$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dependence, via lower and upper bounds, was made in a line of works over the past decade. In this paper, we finally close the gap for approximate-DP and provide a nearly tight upper bound of $\tilde{O}(\log^* |X|)$, which matches a lower bound by Alon et al (that applies even with improper learning) and improves over a prior upper bound of $\tilde{O}((\log^* |X|)^{1.5})$ by Kaplan et al. We also provide matching upper and lower bounds of $\tildeΘ(2^{\log^*|X|})$ for the additive error of private quasi-concave optimization (a related and more general problem). Our improvement is achieved via the novel Reorder-Slice-Compute paradigm for private data analysis which we believe will have further applications. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2211.03917 [pdf, ps, other]

On the amortized complexity of approximate counting

Authors: Ishaq Aden-Ali, Yanjun Han, Jelani Nelson, Huacheng Yu

Abstract: Naively storing a counter up to value $n$ would require $Ω(\log n)$ bits of memory. Nelson and Yu [NY22], following work of [Morris78], showed that if the query answers need only be $(1+ε)$-approximate with probability at least $1 - δ$, then $O(\log\log n + \log\log(1/δ) + \log(1/ε))$ bits suffice, and in fact this bound is tight. Morris' original motivation for studying this problem though, as we… ▽ More Naively storing a counter up to value $n$ would require $Ω(\log n)$ bits of memory. Nelson and Yu [NY22], following work of [Morris78], showed that if the query answers need only be $(1+ε)$-approximate with probability at least $1 - δ$, then $O(\log\log n + \log\log(1/δ) + \log(1/ε))$ bits suffice, and in fact this bound is tight. Morris' original motivation for studying this problem though, as well as modern applications, require not only maintaining one counter, but rather $k$ counters for $k$ large. This motivates the following question: for $k$ large, can $k$ counters be simultaneously maintained using asymptotically less memory than $k$ times the cost of an individual counter? That is to say, does this problem benefit from an improved {\it amortized} space complexity bound? We answer this question in the negative. Specifically, we prove a lower bound for nearly the full range of parameters showing that, in terms of memory usage, there is no asymptotic benefit possible via amortization when storing multiple counters. Our main proof utilizes a certain notion of "information cost" recently introduced by Braverman, Garg and Woodruff in FOCS 2020 to prove lower bounds for streaming algorithms. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2210.03305 [pdf, other]

How Do Data Science Workers Communicate Intermediate Results?

Authors: Rock Yuren Pang, Ruotong Wang, Joely Nelson, Leilani Battle

Abstract: Data science workers increasingly collaborate on large-scale projects before communicating insights to a broader audience in the form of visualization. While prior work has modeled how data science teams, oftentimes with distinct roles and work processes, communicate knowledge to outside stakeholders, we have little knowledge of how data science workers communicate intermediately before delivering… ▽ More Data science workers increasingly collaborate on large-scale projects before communicating insights to a broader audience in the form of visualization. While prior work has modeled how data science teams, oftentimes with distinct roles and work processes, communicate knowledge to outside stakeholders, we have little knowledge of how data science workers communicate intermediately before delivering the final products. In this work, we contribute a nuanced description of the intermediate communication process within data science teams. By analyzing interview data with 8 self-identified data science workers, we characterized the data science intermediate communication process with four factors, including the types of audience, communication goals, shared artifacts, and mode of communication. We also identified overarching challenges in the current communication process. We also discussed design implications that might inform better tools that facilitate intermediate communication within data science teams. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: This paper was accepted for presentation as part of the eighth Symposium on Visualization in Data Science (VDS) at ACM KDD 2022 as well as IEEE VIS 2022. http://www.visualdatascience.org/2022/index.html

arXiv:2207.00956 [pdf, ps, other]

Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs

Authors: Edith Cohen, Jelani Nelson, Tamás Sarlós, Uri Stemmer

Abstract: CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of $\ell_2$-heavy hitters (keys $i$ where $v_i^2 > ε\|\boldsymbol{v}\|_2^2$) and approximate inner products. When the inputs are {\em not adaptive} (do not depend on prior outputs), classic estimators applied to a sketch of size $O(\ell/ε)$ are accurate for a number o… ▽ More CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of $\ell_2$-heavy hitters (keys $i$ where $v_i^2 > ε\|\boldsymbol{v}\|_2^2$) and approximate inner products. When the inputs are {\em not adaptive} (do not depend on prior outputs), classic estimators applied to a sketch of size $O(\ell/ε)$ are accurate for a number of queries that is exponential in $\ell$. When inputs are adaptive, however, an adversarial input can be constructed after $O(\ell)$ queries with the classic estimator and the best known robust estimator only supports $\tilde{O}(\ell^2)$ queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after $O(\ell^2)$ queries produces an adversarial input vector whose sketch is highly biased. Our attack uses "natural" non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method. △ Less

Submitted 28 August, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

arXiv:2205.09804 [pdf, ps, other]

Estimation of Entropy in Constant Space with Improved Sample Complexity

Authors: Maryam Aliakbarpour, Andrew McGregor, Jelani Nelson, Erik Waingarten

Abstract: Recent work of Acharya et al. (NeurIPS 2019) showed how to estimate the entropy of a distribution $\mathcal D$ over an alphabet of size $k$ up to $\pmε$ additive error by streaming over $(k/ε^3) \cdot \text{polylog}(1/ε)$ i.i.d. samples and using only $O(1)$ words of memory. In this work, we give a new constant memory scheme that reduces the sample complexity to $(k/ε^2)\cdot \text{polylog}(1/ε)$.… ▽ More Recent work of Acharya et al. (NeurIPS 2019) showed how to estimate the entropy of a distribution $\mathcal D$ over an alphabet of size $k$ up to $\pmε$ additive error by streaming over $(k/ε^3) \cdot \text{polylog}(1/ε)$ i.i.d. samples and using only $O(1)$ words of memory. In this work, we give a new constant memory scheme that reduces the sample complexity to $(k/ε^2)\cdot \text{polylog}(1/ε)$. We conjecture that this is optimal up to $\text{polylog}(1/ε)$ factors. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2205.07362 [pdf, ps, other]

What is an equivariant neural network?

Authors: Lek-Heng Lim, Bradley J. Nelson

Abstract: We explain equivariant neural networks, a notion underlying breakthroughs in machine learning from deep convolutional neural networks for computer vision to AlphaFold 2 for protein structure prediction, without assuming knowledge of equivariance or neural networks. The basic mathematical ideas are simple but are often obscured by engineering complications that come with practical realizations. We… ▽ More We explain equivariant neural networks, a notion underlying breakthroughs in machine learning from deep convolutional neural networks for computer vision to AlphaFold 2 for protein structure prediction, without assuming knowledge of equivariance or neural networks. The basic mathematical ideas are simple but are often obscured by engineering complications that come with practical realizations. We extract and focus on the mathematical aspects, and limit ourselves to a cursory treatment of the engineering issues at the end. △ Less

Submitted 16 November, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

Comments: 8 pages, 3 figure

ACM Class: I.2.6

arXiv:2205.01539 [pdf, other]

Parameterized Vietoris-Rips Filtrations via Covers

Authors: Bradley J. Nelson

Abstract: A challenge in computational topology is to deal with large filtered geometric complexes built from point cloud data such as Vietoris-Rips filtrations. This has led to the development of schemes for parallel computation and compression which restrict simplices to lie in open sets in a cover of the data. We extend the method of acyclic carriers to the setting of persistent homology to give detailed… ▽ More A challenge in computational topology is to deal with large filtered geometric complexes built from point cloud data such as Vietoris-Rips filtrations. This has led to the development of schemes for parallel computation and compression which restrict simplices to lie in open sets in a cover of the data. We extend the method of acyclic carriers to the setting of persistent homology to give detailed bounds on the relationship between Vietoris-Rips filtrations restricted to covers and the full construction. We show how these complexes can be used to study data over a base space and use our results to guide the selection of covers of data. We demonstrate these techniques on a variety of covers, and show the utility of this construction in investigating higher-order homology of a model of high-dimensional image patches. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: 18 pages, 6 figures

MSC Class: 55N31 (Primary); 68T09 (Secondary)

arXiv:2203.16476 [pdf, ps, other]

Differentially Private All-Pairs Shortest Path Distances: Improved Algorithms and Lower Bounds

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Jelani Nelson

Abstract: We study the problem of releasing the weights of all-pair shortest paths in a weighted undirected graph with differential privacy (DP). In this setting, the underlying graph is fixed and two graphs are neighbors if their edge weights differ by at most $1$ in the $\ell_1$-distance. We give an $ε$-DP algorithm with additive error $\tilde{O}(n^{2/3} / ε)$ and an $(ε, δ)$-DP algorithm with additive er… ▽ More We study the problem of releasing the weights of all-pair shortest paths in a weighted undirected graph with differential privacy (DP). In this setting, the underlying graph is fixed and two graphs are neighbors if their edge weights differ by at most $1$ in the $\ell_1$-distance. We give an $ε$-DP algorithm with additive error $\tilde{O}(n^{2/3} / ε)$ and an $(ε, δ)$-DP algorithm with additive error $\tilde{O}(\sqrt{n} / ε)$ where $n$ denotes the number of vertices. This positively answers a question of Sealfon (PODS'16), who asked whether a $o(n)$-error algorithm exists. We also show that an additive error of $Ω(n^{1/6})$ is necessary for any sufficiently small $ε, δ> 0$. Finally, we consider a relaxed setting where a multiplicative approximation is allowed. We show that, with a multiplicative approximation factor $k$, %$2k - 1$, the additive error can be reduced to $\tilde{O}\left(n^{1/2 + O(1/k)} / ε\right)$ in the $ε$-DP case and $\tilde{O}(n^{1/3 + O(1/k)} / ε)$ in the $(ε, δ)$-DP case, respectively. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.08906 [pdf, other]

ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

Authors: Yifan Yuan, Jinghan Huang, Yan Sun, Tianchen Wang, Jacob Nelson, Dan R. K. Ports, Yipeng Wang, Ren Wang, Charlie Tai, Nam Sung Kim

Abstract: Responding to the "datacenter tax" and "killer microseconds" problems for datacenter applications, diverse solutions including Smart NIC-based ones have been proposed. Nonetheless, they often suffer from high overhead of communications over network and/or PCIe links. To tackle the limitations of the current solutions, this paper proposes ORCA, a holistic network and architecture co-design solution… ▽ More Responding to the "datacenter tax" and "killer microseconds" problems for datacenter applications, diverse solutions including Smart NIC-based ones have been proposed. Nonetheless, they often suffer from high overhead of communications over network and/or PCIe links. To tackle the limitations of the current solutions, this paper proposes ORCA, a holistic network and architecture co-design solution that leverages current RDMA and emerging cache-coherent off-chip interconnect technologies. Specifically, ORCA consists of four hardware and software components: (1) unified abstraction of inter- and intra-machine communications managed by one-sided RDMA write and cache-coherent memory write; (2) efficient notification of requests to accelerators assisted by cache coherence; (3) cache-coherent accelerator architecture directly processing requests received by NIC; and (4) adaptive device-to-host data transfer for modern server memory systems consisting of both DRAM and NVM exploiting state-of-the-art features in CPUs and PCIe. We prototype ORCA with a commercial system and evaluate three popular datacenter applications: in-memory key-value store, chain replication-based distributed transaction system, and deep learning recommendation model inference. The evaluation shows that ORCA provides 30.1~69.1% lower latency, up to 2.5x higher throughput, and 3x higher power efficiency than the current state-of-the-art solutions. △ Less

Submitted 17 October, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

Comments: This paper has been accepted by HPCA'23. This arxiv paper is not the final camera-ready version

arXiv:2203.01599 [pdf, ps, other]

Uniform Approximations for Randomized Hadamard Transforms with Applications

Authors: Yeshwanth Cherapanamjeri, Jelani Nelson

Abstract: Randomized Hadamard Transforms (RHTs) have emerged as a computationally efficient alternative to the use of dense unstructured random matrices across a range of domains in computer science and machine learning. For several applications such as dimensionality reduction and compressed sensing, the theoretical guarantees for methods based on RHTs are comparable to approaches using dense random matric… ▽ More Randomized Hadamard Transforms (RHTs) have emerged as a computationally efficient alternative to the use of dense unstructured random matrices across a range of domains in computer science and machine learning. For several applications such as dimensionality reduction and compressed sensing, the theoretical guarantees for methods based on RHTs are comparable to approaches using dense random matrices with i.i.d.\ entries. However, several such applications are in the low-dimensional regime where the number of rows sampled from the matrix is rather small. Prior arguments are not applicable to the high-dimensional regime often found in machine learning applications like kernel approximation. Given an ensemble of RHTs with Gaussian diagonals, $\{M^i\}_{i = 1}^m$, and any $1$-Lipschitz function, $f: \mathbb{R} \to \mathbb{R}$, we prove that the average of $f$ over the entries of $\{M^i v\}_{i = 1}^m$ converges to its expectation uniformly over $\| v \| \leq 1$ at a rate comparable to that obtained from using truly Gaussian matrices. We use our inequality to then derive improved guarantees for two applications in the high-dimensional regime: 1) kernel approximation and 2) distance estimation. For kernel approximation, we prove the first \emph{uniform} approximation guarantees for random features constructed through RHTs lending theoretical justification to their empirical success while for distance estimation, our convergence result implies data structures with improved runtime guarantees over previous work by the authors. We believe our general inequality is likely to find use in other applications. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: STOC 2022

arXiv:2203.00194 [pdf, other]

Private Frequency Estimation via Projective Geometry

Authors: Vitaly Feldman, Jelani Nelson, Huy Lê Nguyen, Kunal Talwar

Abstract: In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For a universe size of $k$ and with $n$ users, our $\varepsilon$-LDP algorithm has communication cost $\lceil\log_2k\rceil$ bits in the private coin setting and $\varepsilon\log_2 e + O(1)$ in the public coin setting, and has computation cost… ▽ More In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For a universe size of $k$ and with $n$ users, our $\varepsilon$-LDP algorithm has communication cost $\lceil\log_2k\rceil$ bits in the private coin setting and $\varepsilon\log_2 e + O(1)$ in the public coin setting, and has computation cost $O(n + k\exp(\varepsilon) \log k)$ for the server to approximately reconstruct the frequency histogram, while achieving the state-of-the-art privacy-utility tradeoff. In many parameter settings used in practice this is a significant improvement over the $ O(n+k^2)$ computation cost that is achieved by the recent PI-RAPPOR algorithm (Feldman and Talwar; 2021). Our empirical evaluation shows a speedup of over 50x over PI-RAPPOR while using approximately 75x less memory for practically relevant parameter settings. In addition, the running time of our algorithm is within an order of magnitude of HadamardResponse (Acharya, Sun, and Zhang; 2019) and RecursiveHadamardResponse (Chen, Kairouz, and Ozgur; 2020) which have significantly worse reconstruction error. The error of our algorithm essentially matches that of the communication- and time-inefficient but utility-optimal SubsetSelection (SS) algorithm (Ye and Barg; 2017). Our new algorithm is based on using Projective Planes over a finite field to define a small collection of sets that are close to being pairwise independent and a dynamic programming algorithm for approximate histogram reconstruction on the server side. We also give an extension of PGR, which we call HybridProjectiveGeometryResponse, that allows trading off computation time with utility smoothly. △ Less

Submitted 28 February, 2022; originally announced March 2022.

arXiv:2202.13736 [pdf, other]

On the Robustness of CountSketch to Adaptive Inputs

Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Moshe Shechner, Uri Stemmer

Abstract: CountSketch is a popular dimensionality reduction technique that maps vectors to a lower dimension using randomized linear measurements. The sketch supports recovering $\ell_2$-heavy hitters of a vector (entries with $v[i]^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$). We study the robustness of the sketch in adaptive settings where input vectors may depend on the output from prior inputs. Adaptive s… ▽ More CountSketch is a popular dimensionality reduction technique that maps vectors to a lower dimension using randomized linear measurements. The sketch supports recovering $\ell_2$-heavy hitters of a vector (entries with $v[i]^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$). We study the robustness of the sketch in adaptive settings where input vectors may depend on the output from prior inputs. Adaptive settings arise in processes with feedback or with adversarial attacks. We show that the classic estimator is not robust, and can be attacked with a number of queries of the order of the sketch size. We propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries in the sketch size, which is an improvement factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior work. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2201.13012 [pdf, other]

Topology-Preserving Dimensionality Reduction via Interleaving Optimization

Authors: Bradley J. Nelson, Yuan Luo

Abstract: Dimensionality reduction techniques are powerful tools for data preprocessing and visualization which typically come with few guarantees concerning the topological correctness of an embedding. The interleaving distance between the persistent homology of Vietoris-Rips filtrations can be used to identify a scale at which topological features such as clusters or holes in an embedding and original dat… ▽ More Dimensionality reduction techniques are powerful tools for data preprocessing and visualization which typically come with few guarantees concerning the topological correctness of an embedding. The interleaving distance between the persistent homology of Vietoris-Rips filtrations can be used to identify a scale at which topological features such as clusters or holes in an embedding and original data set are in correspondence. We show how optimization seeking to minimize the interleaving distance can be incorporated into dimensionality reduction algorithms, and explicitly demonstrate its use in finding an optimal linear projection. We demonstrate the utility of this framework to data visualization. △ Less

Submitted 31 January, 2022; originally announced January 2022.

arXiv:2112.06095 [pdf, other]

Unlocking the Power of Inline Floating-Point Operations on Programmable Switches

Authors: Yifan Yuan, Omar Alama, Amedeo Sapio, Jiawei Fei, Jacob Nelson, Dan R. K. Ports, Marco Canini, Nam Sung Kim

Abstract: The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing switch hardware was not designed with application acceleration in mind, and thus applications requiring operations or datatypes not used in traditional network prot… ▽ More The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing switch hardware was not designed with application acceleration in mind, and thus applications requiring operations or datatypes not used in traditional network protocols must resort to expensive workarounds. Applications involving floating point data, including distributed training for machine learning and distributed query processing, are key examples. In this paper, we propose FPISA, a floating point representation designed to work efficiently in programmable switches. We first implement FPISA on an Intel Tofino switch, but find that it has limitations that impact throughput and accuracy. We then propose hardware changes to address these limitations based on the open-source Banzai switch architecture, and synthesize them in a 15-nm standard-cell library to demonstrate their feasibility. Finally, we use FPISA to implement accelerators for training for machine learning and for query processing, and evaluate their performance on a switch implementing our changes using emulation. We find that FPISA allows distributed training to use 25-75% fewer CPU cores and provide up to 85.9% better throughput in a CPU-constrained environment than SwitchML. For distributed query processing with floating point data, FPISA enables up to 2.7x better throughput than Spark. △ Less

Submitted 11 December, 2021; originally announced December 2021.

Comments: This paper has been accepted by NSDI'22. This arxiv paper is not the final camera-ready version

arXiv:2111.10984 [pdf, other]

Topological Regularization for Dense Prediction

Authors: Deqing Fu, Bradley J. Nelson

Abstract: Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homolo… ▽ More Dense prediction tasks such as depth perception and semantic segmentation are important applications in computer vision that have a concrete topological description in terms of partitioning an image into connected components or estimating a function with a small number of local extrema corresponding to objects in the image. We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with these topological descriptions. Experimental results show that the output topology can also appear in the internal activations of trained neural networks which allows for a novel use of topological regularization to the internal states of neural networks during training, reducing the computational cost of the regularization. We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures. △ Less

Submitted 24 October, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

arXiv:2111.04867 [pdf, other]

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

Authors: Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh

Abstract: Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm d… ▽ More Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm designers to guide a synthesizer into automatically generating algorithms for a given hardware configuration and communication collective. TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms. TACCL also uses a novel encoding of the problem that allows it to scale beyond single-node topologies. We use TACCL to synthesize algorithms for three collectives and two hardware topologies: DGX-2 and NDv2. We demonstrate that the algorithms synthesized by TACCL outperform the Nvidia Collective Communication Library (NCCL) by up to 6.7x. We also show that TACCL can speed up end-to-end training of Transformer-XL and BERT models by 11%--2.3x for different batch sizes. △ Less

Submitted 5 October, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: Accepted at NSDI'23. Contains 20 pages, 11 figures, including Appendix

arXiv:2110.08691 [pdf, other]

doi 10.46298/theoretics.24.6

Terminal Embeddings in Sublinear Time

Authors: Yeshwanth Cherapanamjeri, Jelani Nelson

Abstract: Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $ρ\ge 1$ if $ρ$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \… ▽ More Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $ρ\ge 1$ if $ρ$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C ρd_X(x, q) . \end{equation*} When $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+ε$ is achievable via such a terminal embedding with $m = O(ε^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside of prior work is that evaluating their embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $Θ(n)$ constraints in~$m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $O^* (n^{1-Θ(ε^2)} + d)$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search. △ Less

Submitted 13 March, 2024; v1 submitted 16 October, 2021; originally announced October 2021.

Journal ref: TheoretiCS, Volume 3 (March 14, 2024) theoretics:9167

arXiv:2109.13120 [pdf]

An End-to-end Entangled Segmentation and Classification Convolutional Neural Network for Periodontitis Stage Grading from Periapical Radiographic Images

Authors: Tanjida Kabir, Chun-Teh Lee, Jiman Nelson, Sally Sheng, Hsiu-Wan Meng, Luyao Chen, Muhammad F Walji, Xioaqian Jiang, Shayan Shams

Abstract: Periodontitis is a biofilm-related chronic inflammatory disease characterized by gingivitis and bone loss in the teeth area. Approximately 61 million adults over 30 suffer from periodontitis (42.2%), with 7.8% having severe periodontitis in the United States. The measurement of radiographic bone loss (RBL) is necessary to make a correct periodontal diagnosis, especially if the comprehensive and lo… ▽ More Periodontitis is a biofilm-related chronic inflammatory disease characterized by gingivitis and bone loss in the teeth area. Approximately 61 million adults over 30 suffer from periodontitis (42.2%), with 7.8% having severe periodontitis in the United States. The measurement of radiographic bone loss (RBL) is necessary to make a correct periodontal diagnosis, especially if the comprehensive and longitudinal periodontal mapping is unavailable. However, doctors can interpret X-rays differently depending on their experience and knowledge. Computerized diagnosis support for doctors sheds light on making the diagnosis with high accuracy and consistency and drawing up an appropriate treatment plan for preventing or controlling periodontitis. We developed an end-to-end deep learning network HYNETS (Hybrid NETwork for pEriodoNTiTiS STagES from radiograpH) by integrating segmentation and classification tasks for grading periodontitis from periapical radiographic images. HYNETS leverages a multi-task learning strategy by combining a set of segmentation networks and a classification network to provide an end-to-end interpretable solution and highly accurate and consistent results. HYNETS achieved the average dice coefficient of 0.96 and 0.94 for the bone area and tooth segmentation and the average AUC of 0.97 for periodontitis stage assignment. Additionally, conventional image processing techniques provide RBL measurements and build transparency and trust in the model's prediction. HYNETS will potentially transform clinical diagnosis from a manual time-consuming, and error-prone task to an efficient and automated periodontitis stage assignment based on periapical radiographic images. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Comments: 8 pages, 8 figures, 5 tables

arXiv:2109.12115 [pdf]

Use of the Deep Learning Approach to Measure Alveolar Bone Level

Authors: Chun-Teh Lee, Tanjida Kabir, Jiman Nelson, Sally Sheng, Hsiu-Wan Meng, Thomas E. Van Dyke, Muhammad F. Walji, Xiaoqian Jiang, Shayan Shams

Abstract: Abstract: Aim: The goal was to use a Deep Convolutional Neural Network to measure the radiographic alveolar bone level to aid periodontal diagnosis. Material and methods: A Deep Learning (DL) model was developed by integrating three segmentation networks (bone area, tooth, cementoenamel junction) and image analysis to measure the radiographic bone level and assign radiographic bone loss (RBL)… ▽ More Abstract: Aim: The goal was to use a Deep Convolutional Neural Network to measure the radiographic alveolar bone level to aid periodontal diagnosis. Material and methods: A Deep Learning (DL) model was developed by integrating three segmentation networks (bone area, tooth, cementoenamel junction) and image analysis to measure the radiographic bone level and assign radiographic bone loss (RBL) stages. The percentage of RBL was calculated to determine the stage of RBL for each tooth. A provisional periodontal diagnosis was assigned using the 2018 periodontitis classification. RBL percentage, staging, and presumptive diagnosis were compared to the measurements and diagnoses made by the independent examiners. Results: The average Dice Similarity Coefficient (DSC) for segmentation was over 0.91. There was no significant difference in RBL percentage measurements determined by DL and examiners (p=0.65). The Area Under the Receiver Operating Characteristics Curve of RBL stage assignment for stage I, II and III was 0.89, 0.90 and 0.90, respectively. The accuracy of the case diagnosis was 0.85. Conclusion: The proposed DL model provides reliable RBL measurements and image-based periodontal diagnosis using periapical radiographic images. However, this model has to be further optimized and validated by a larger number of images to facilitate its application. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Comments: Word count: 3485; Number of figures: 4; tables: 2; references: 34

arXiv:2109.01690 [pdf, other]

doi 10.1103/PhysRevApplied.17.044046

High-quality Thermal Gibbs Sampling with Quantum Annealing Hardware

Authors: Jon Nelson, Marc Vuffray, Andrey Y. Lokhov, Tameem Albash, Carleton Coffrin

Abstract: Quantum Annealing (QA) was originally intended for accelerating the solution of combinatorial optimization tasks that have natural encodings as Ising models. However, recent experiments on QA hardware platforms have demonstrated that, in the operating regime corresponding to weak interactions, the QA hardware behaves like a noisy Gibbs sampler at a hardware-specific effective temperature. This wor… ▽ More Quantum Annealing (QA) was originally intended for accelerating the solution of combinatorial optimization tasks that have natural encodings as Ising models. However, recent experiments on QA hardware platforms have demonstrated that, in the operating regime corresponding to weak interactions, the QA hardware behaves like a noisy Gibbs sampler at a hardware-specific effective temperature. This work builds on those insights and identifies a class of small hardware-native Ising models that are robust to noise effects and proposes a procedure for executing these models on QA hardware to maximize Gibbs sampling performance. Experimental results indicate that the proposed protocol results in high-quality Gibbs samples from a hardware-specific effective temperature. Furthermore, we show that this effective temperature can be adjusted by modulating the annealing time and energy scale. The procedure proposed in this work provides an approach to using QA hardware for Ising model sampling presenting potential new opportunities for applications in machine learning and physics simulation. △ Less

Submitted 23 February, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

Report number: LA-UR-21-28692

Journal ref: Phys. Rev. Applied 17, 044046 (2022)

arXiv:2108.05022 [pdf, other]

Accelerating Iterated Persistent Homology Computations with Warm Starts

Authors: Yuan Luo, Bradley J. Nelson

Abstract: Persistent homology is a topological feature used in a variety of applications such as generating features for data analysis and penalizing optimization problems. We develop an approach to accelerate persistent homology computations performed on many similar filtered topological spaces which is based on updating associated matrix factorizations. Our approach improves the update scheme of Cohen-Ste… ▽ More Persistent homology is a topological feature used in a variety of applications such as generating features for data analysis and penalizing optimization problems. We develop an approach to accelerate persistent homology computations performed on many similar filtered topological spaces which is based on updating associated matrix factorizations. Our approach improves the update scheme of Cohen-Steiner, Edelsbrunner, and Morozov for permutations by additionally handling addition and deletion of cells in a filtered topological space and by processing changes in a single batch. We show that the complexity of our scheme scales with the number of elementary changes to the filtration which as a result is often less expensive than the full persistent homology computation. Finally, we perform computational experiments demonstrating practical speedups in several situations including feature generation and optimization guided by persistent homology. △ Less

Submitted 17 January, 2023; v1 submitted 11 August, 2021; originally announced August 2021.

Showing 1–50 of 114 results for author: Nelson, J