Search | arXiv e-print repository

Practical limitations on robustness and scalability of quantum Internet

Authors: Abhishek Sadhu, Meghana Ayyala Somayajula, Karol Horodecki, Siddhartha Das

Abstract: As quantum theory allows for information processing and computing tasks that otherwise are not possible with classical systems, there is a need and use of quantum Internet beyond existing network systems. At the same time, the realization of a desirably functional quantum Internet is hindered by fundamental and practical challenges such as high loss during transmission of quantum systems, decohere… ▽ More As quantum theory allows for information processing and computing tasks that otherwise are not possible with classical systems, there is a need and use of quantum Internet beyond existing network systems. At the same time, the realization of a desirably functional quantum Internet is hindered by fundamental and practical challenges such as high loss during transmission of quantum systems, decoherence due to interaction with the environment, fragility of quantum states, etc. We study the implications of these constraints by analyzing the limitations on the scaling and robustness of quantum Internet. Considering quantum networks, we present practical bottlenecks for secure communication, delegated computing, and resource distribution among end nodes. Motivated by the power of abstraction in graph theory (in association with quantum information theory), we consider graph-theoretic quantifiers to assess network robustness and provide critical values of communication lines for viable communication over quantum Internet. In particular, we begin by discussing limitations on usefulness of isotropic states as device-independent quantum key repeaters which otherwise could be useful for device-independent quantum key distribution. We consider some quantum networks of practical interest, ranging from satellite-based networks connecting far-off spatial locations to currently available quantum processor architectures within computers, and analyze their robustness to perform quantum information processing tasks. Some of these tasks form primitives for delegated quantum computing, e.g., entanglement distribution and quantum teleportation. For some examples of quantum networks, we present algorithms to perform different quantum network tasks of interest such as constructing the network structure, finding the shortest path between a pair of end nodes, and optimizing the flow of resources at a node. △ Less

Submitted 6 November, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: Happy about the successful soft landing of Chandrayaan-3 on the moon by ISRO. 37 pages, 34 figures. Revised version

arXiv:2110.00031 [pdf]

Variance of Twitter Embeddings and Temporal Trends of COVID-19 cases

Authors: Mayank Sethi, Ambika Sadhu, Khushbu Pahwa, Sargun Nagpal, Tavpritesh Sethi

Abstract: The severity of the coronavirus pandemic necessitates the need of effective administrative decisions. Over 4 lakh people in India succumbed to COVID-19, with over 3 crore confirmed cases, and still counting. The threat of a plausible third wave continues to haunt millions. In this ever changing dynamic of the virus, predictive modeling methods can serve as an integral tool. The pandemic has furthe… ▽ More The severity of the coronavirus pandemic necessitates the need of effective administrative decisions. Over 4 lakh people in India succumbed to COVID-19, with over 3 crore confirmed cases, and still counting. The threat of a plausible third wave continues to haunt millions. In this ever changing dynamic of the virus, predictive modeling methods can serve as an integral tool. The pandemic has further triggered an unprecedented usage of social media. This paper aims to propose a method for harnessing social media, specifically Twitter, to predict the upcoming scenarios related to COVID-19 cases. In this study, we seek to understand how the surges in COVID-19 related tweets can indicate rise in the cases. This prospective analysis can be utilised to aid administrators about timely resource allocation to lessen the severity of the damage. Using word embeddings to capture the semantic meaning of tweets, we identify Significant Dimensions (SDs).Our methodology predicts the rise in cases with a lead time of 15 days and 30 days with R2 scores of 0.80 and 0.62 respectively. Finally, we explain the thematic utility of the SDs. △ Less

Submitted 29 December, 2021; v1 submitted 30 September, 2021; originally announced October 2021.

arXiv:2108.11501 [pdf, other]

Improving Object Detection and Attribute Recognition by Feature Entanglement Reduction

Authors: Zhaoheng Zheng, Arka Sadhu, Ram Nevatia

Abstract: We explore object detection with two attributes: color and material. The task aims to simultaneously detect objects and infer their color and material. A straight-forward approach is to add attribute heads at the very end of a usual object detection pipeline. However, we observe that the two goals are in conflict: Object detection should be attribute-independent and attributes be largely object-in… ▽ More We explore object detection with two attributes: color and material. The task aims to simultaneously detect objects and infer their color and material. A straight-forward approach is to add attribute heads at the very end of a usual object detection pipeline. However, we observe that the two goals are in conflict: Object detection should be attribute-independent and attributes be largely object-independent. Features computed by a standard detection network entangle the category and attribute features; we disentangle them by the use of a two-stream model where the category and attribute features are computed independently but the classification heads share Regions of Interest (RoIs). Compared with a traditional single-stream model, our model shows significant improvements over VG-20, a subset of Visual Genome, on both supervised and attribute transfer tasks. △ Less

Submitted 25 August, 2021; originally announced August 2021.

Comments: Camera-ready for ICIP 2021

arXiv:2104.03762 [pdf, other]

Video Question Answering with Phrases via Semantic Roles

Authors: Arka Sadhu, Kan Chen, Ram Nevatia

Abstract: Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases. These metrics limit the VidQA models' application scenario. In this work, we leverage semantic roles derived from video descriptions to mask out certain phrases, to introduce VidQAP which poses VidQA as a fill-in-the-phrase task. To enable evaluation of a… ▽ More Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases. These metrics limit the VidQA models' application scenario. In this work, we leverage semantic roles derived from video descriptions to mask out certain phrases, to introduce VidQAP which poses VidQA as a fill-in-the-phrase task. To enable evaluation of answer phrases, we compute the relative improvement of the predicted answer compared to an empty string. To reduce the influence of language bias in VidQA datasets, we retrieve a video having a different answer for the same question. To facilitate research, we construct ActivityNet-SRL-QA and Charades-SRL-QA and benchmark them by extending three vision-language models. We further perform extensive analysis and ablative studies to guide future work. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: NAACL21 Camera Ready including appendix

arXiv:2104.00990 [pdf, other]

Visual Semantic Role Labeling for Video Understanding

Authors: Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi

Abstract: We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling. We represent videos as a set of related events, wherein each event consists of a verb and multiple entities that fulfill various roles relevant to that event. To study the challenging task of semantic role labeling in videos or VidSRL, we introduce the VidSitu benchm… ▽ More We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling. We represent videos as a set of related events, wherein each event consists of a verb and multiple entities that fulfill various roles relevant to that event. To study the challenging task of semantic role labeling in videos or VidSRL, we introduce the VidSitu benchmark, a large-scale video understanding data source with $29K$ $10$-second movie clips richly annotated with a verb and semantic-roles every $2$ seconds. Entities are co-referenced across events within a movie clip and events are connected to each other via event-event relations. Clips in VidSitu are drawn from a large collection of movies (${\sim}3K$) and have been chosen to be both complex (${\sim}4.2$ unique verbs within a video) as well as diverse (${\sim}200$ verbs have more than $100$ annotations each). We provide a comprehensive analysis of the dataset in comparison to other publicly available video understanding benchmarks, several illustrative baselines and evaluate a range of standard video recognition models. Our code and dataset is available at vidsitu.org. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: CVPR21 camera-ready including appendix. Project Page at https://vidsitu.org/

arXiv:2011.02655 [pdf, other]

Utilizing Every Image Object for Semi-supervised Phrase Grounding

Authors: Haidong Zhu, Arka Sadhu, Zhaoheng Zheng, Ram Nevatia

Abstract: Phrase grounding models localize an object in the image given a referring expression. The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training. In this paper, we study the case applying objects without labeled queries for training the semi-supervised phrase grounding. We propose to use learn… ▽ More Phrase grounding models localize an object in the image given a referring expression. The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training. In this paper, we study the case applying objects without labeled queries for training the semi-supervised phrase grounding. We propose to use learned location and subject embedding predictors (LSEP) to generate the corresponding language embeddings for objects lacking annotated queries in the training set. With the assistance of the detector, we also apply LSEP to train a grounding model on images without any annotation. We evaluate our method based on MAttNet on three public datasets: RefCOCO, RefCOCO+, and RefCOCOg. We show that our predictors allow the grounding system to learn from the objects without labeled queries and improve accuracy by 34.9\% relatively with the detection results. △ Less

Submitted 4 November, 2020; originally announced November 2020.

arXiv:2006.15294 [pdf, other]

Gradient-based Editing of Memory Examples for Online Task-free Continual Learning

Authors: Xisen Jin, Arka Sadhu, Junyi Du, Xiang Ren

Abstract: We explore task-free continual learning (CL), in which a model is trained to avoid catastrophic forgetting in the absence of explicit task boundaries or identities. Among many efforts on task-free CL, a notable family of approaches are memory-based that store and replay a subset of training examples. However, the utility of stored seen examples may diminish over time since CL models are continuall… ▽ More We explore task-free continual learning (CL), in which a model is trained to avoid catastrophic forgetting in the absence of explicit task boundaries or identities. Among many efforts on task-free CL, a notable family of approaches are memory-based that store and replay a subset of training examples. However, the utility of stored seen examples may diminish over time since CL models are continually updated. Here, we propose Gradient based Memory EDiting (GMED), a framework for editing stored examples in continuous input space via gradient updates, in order to create more "challenging" examples for replay. GMED-edited examples remain similar to their unedited forms, but can yield increased loss in the upcoming model updates, thereby making the future replays more effective in overcoming catastrophic forgetting. By construction, GMED can be seamlessly applied in conjunction with other memory-based CL algorithms to bring further improvement. Experiments validate the effectiveness of GMED, and our best method significantly outperforms baselines and previous state-of-the-art on five out of six datasets. Code can be found at https://github.com/INK-USC/GMED. △ Less

Submitted 7 December, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

Comments: 10 pages. Accepted at NeurIPS 2021

arXiv:2005.00785 [pdf, other]

Visually Grounded Continual Learning of Compositional Phrases

Authors: Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, Xiang Ren

Abstract: Humans acquire language continually with much more limited access to data samples at a time, as compared to contemporary NLP systems. To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes. In the task, models are trained on a paired image-… ▽ More Humans acquire language continually with much more limited access to data samples at a time, as compared to contemporary NLP systems. To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes. In the task, models are trained on a paired image-caption stream which has shifting object distribution; while being constantly evaluated by a visually-grounded masked language prediction task on held-out test sets. VisCOLL compounds the challenges of continual learning (i.e., learning from continuously shifting data distribution) and compositional generalization (i.e., generalizing to novel compositions). To facilitate research on VisCOLL, we construct two datasets, COCO-shift and Flickr-shift, and benchmark them using different continual learning methods. Results reveal that SoTA continual learning approaches provide little to no improvements on VisCOLL, since storing examples of all possible compositions is infeasible. We conduct further ablations and analysis to guide future work. △ Less

Submitted 16 November, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

Comments: EMNLP 2020; Fixed typos

arXiv:2003.10606 [pdf, other]

Video Object Grounding using Semantic Roles in Language Description

Authors: Arka Sadhu, Kan Chen, Ram Nevatia

Abstract: We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions. Previous methods apply image grounding based algorithms to address VOG, fail to explore the object relation information and suffer from limited generalization. Here, we investigate the role of object relations in VOG and propose a novel framework VOGNet to encode multi… ▽ More We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions. Previous methods apply image grounding based algorithms to address VOG, fail to explore the object relation information and suffer from limited generalization. Here, we investigate the role of object relations in VOG and propose a novel framework VOGNet to encode multi-modal object relations via self-attention with relative position encoding. To evaluate VOGNet, we propose novel contrasting sampling methods to generate more challenging grounding input samples, and construct a new dataset called ActivityNet-SRL (ASRL) based on existing caption and grounding datasets. Experiments on ASRL validate the need of encoding object relations in VOG, and our VOGNet outperforms competitive baselines by a significant margin. △ Less

Submitted 23 March, 2020; originally announced March 2020.

Comments: CVPR20 camera-ready including appendix

arXiv:1908.07129 [pdf, other]

Zero-Shot Grounding of Objects from Natural Language Queries

Authors: Arka Sadhu, Kan Chen, Ram Nevatia

Abstract: A phrase grounding system localizes a particular object in an image referred to by a natural language query. In previous work, the phrases were restricted to have nouns that were encountered in training, we extend the task to Zero-Shot Grounding(ZSG) which can include novel, "unseen" nouns. Current phrase grounding systems use an explicit object detection network in a 2-stage framework where one s… ▽ More A phrase grounding system localizes a particular object in an image referred to by a natural language query. In previous work, the phrases were restricted to have nouns that were encountered in training, we extend the task to Zero-Shot Grounding(ZSG) which can include novel, "unseen" nouns. Current phrase grounding systems use an explicit object detection network in a 2-stage framework where one stage generates sparse proposals and the other stage evaluates them. In the ZSG setting, generating appropriate proposals itself becomes an obstacle as the proposal generator is trained on the entities common in the detection and grounding datasets. We propose a new single-stage model called ZSGNet which combines the detector network and the grounding system and predicts classification scores and regression parameters. Evaluation of ZSG system brings additional subtleties due to the influence of the relationship between the query and learned categories; we define four distinct conditions that incorporate different levels of difficulty. We also introduce new datasets, sub-sampled from Flickr30k Entities and Visual Genome, that enable evaluations for the four conditions. Our experiments show that ZSGNet achieves state-of-the-art performance on Flickr30k and ReferIt under the usual "seen" settings and performs significantly better than baseline in the zero-shot setting. △ Less

Submitted 19 August, 2019; originally announced August 2019.

Comments: ICCV19 oral, camera-ready version

arXiv:1810.03390 [pdf]

Constant Time Quantum search Algorithm Over A Datasets: An Experimental Study Using IBM Q Experience

Authors: Kunal Das, Arindam Sadhu

Abstract: In this work, a constant time Quantum searching algorithm over a datasets is proposed and subsequently the algorithm is executed in real chip quantum computer developed by IBM Quantum experience (IBMQ). QISKit, the software platform developed by IBM is used for this algorithm implementation. Quantum interference, Quantum superposition and π phase shift of quantum state applied for this constant ti… ▽ More In this work, a constant time Quantum searching algorithm over a datasets is proposed and subsequently the algorithm is executed in real chip quantum computer developed by IBM Quantum experience (IBMQ). QISKit, the software platform developed by IBM is used for this algorithm implementation. Quantum interference, Quantum superposition and π phase shift of quantum state applied for this constant time search algorithm. The proposed quantum algorithm is executed in QISKit SDK local backend 'local_qasm_simulator', real chip 'ibmq_16_melbourne' and 'ibmqx4' IBMQ. Result also suggest that real chip ibmq_16_melbourne is more quantum error or noise prone than ibmqx4. △ Less

Submitted 8 October, 2018; originally announced October 2018.

Comments: 9 pages

Showing 1–11 of 11 results for author: Sadhu, A