-
Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
Authors:
Divyansh Agarwal,
Alexander R. Fabbri,
Philippe Laban,
Ben Risher,
Shafiq Joty,
Caiming Xiong,
Chien-Sheng Wu
Abstract:
Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and ope…
▽ More
Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research.
△ Less
Submitted 26 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Waves Around the Body
Authors:
Saif Mahmud,
Vineet Parikh,
Qikang Liang,
Ke Li,
Ruidong Zhang,
Ashwin Ajit,
Vipin Gunda,
Devansh Agarwal,
François Guimbretière,
Cheng Zhang
Abstract:
We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body with a time resolution of one second. It only needs a pair of miniature speakers and microphones mounted on each hinge of eyeglasses to emit ultrasonic waves…
▽ More
We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body with a time resolution of one second. It only needs a pair of miniature speakers and microphones mounted on each hinge of eyeglasses to emit ultrasonic waves to create an acoustic aura around the body. Based on the position and motion of various body parts, the acoustic signals are reflected with unique patterns captured by the microphone and analyzed by a customized self-supervised deep learning framework to infer the performed activities. ActSonic was deployed in a user study with 19 participants across 19 households to evaluate its efficacy. Without requiring any training data from a new user (leave-one-participant-out evaluation), ActSonic was able to detect 27 activities, achieving an average F1-score of 86.6% in fully unconstrained scenarios and 93.4% in prompted settings at participants' homes.
△ Less
Submitted 8 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Ring-a-Pose: A Ring for Continuous Hand Pose Tracking
Authors:
Tianhong Catherine Yu,
Guilin Hu,
Ruidong Zhang,
Hyunchul Lim,
Saif Mahmud,
Chi-Jung Lee,
Ke Li,
Devansh Agarwal,
Shuyang Nie,
Jinseok Oh,
François Guimbretière,
Cheng Zhang
Abstract:
We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use…
▽ More
We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three user studies with a total of 30 participants, we evaluate Ring-a-Pose's performance on pose tracking and micro-finger gesture recognition. Without collecting any training data from a user, Ring-a-Pose tracks continuous hand poses with a joint error of 14.1mm. The joint error decreases to 10.3mm for fine-tuned user-dependent models. Ring-a-Pose recognizes 7-class micro-gestures with a 90.60% and 99.27% accuracy for user-independent and user-dependent models, respectively. Furthermore, the ring exhibits promising performance when worn on any finger. Ring-a-Pose enables the future of smart rings to track and recognize hand poses using relatively low-power acoustic sensing.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models
Authors:
Sai Sree Harsha,
Ambareesh Revanur,
Dhwanit Agarwal,
Shradha Agrawal
Abstract:
Video editing methods based on diffusion models that rely solely on a text prompt for the edit are hindered by the limited expressive power of text prompts. Thus, incorporating a reference target image as a visual guide becomes desirable for precise control over edit. Also, most existing methods struggle to accurately edit a video when the shape and size of the object in the target image differ fr…
▽ More
Video editing methods based on diffusion models that rely solely on a text prompt for the edit are hindered by the limited expressive power of text prompts. Thus, incorporating a reference target image as a visual guide becomes desirable for precise control over edit. Also, most existing methods struggle to accurately edit a video when the shape and size of the object in the target image differ from the source object. To address these challenges, we propose "GenVideo" for editing videos leveraging target-image aware T2I models. Our approach handles edits with target objects of varying shapes and sizes while maintaining the temporal consistency of the edit using our novel target and shape aware InvEdit masks. Further, we propose a novel target-image aware latent noise correction strategy during inference to improve the temporal consistency of the edits. Experimental analyses indicate that GenVideo can effectively handle edits with objects of varying shapes, where existing approaches fail.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Data-driven Discovery with Large Generative Models
Authors:
Bodhisattwa Prasad Majumder,
Harshit Surana,
Dhruv Agarwal,
Sanchaita Hazra,
Ashish Sabharwal,
Peter Clark
Abstract:
With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a se…
▽ More
With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a set of provided datasets, without the need for additional data collection or physical experiments. We first outline several desiderata for an ideal data-driven discovery system. Then, through DATAVOYAGER, a proof-of-concept utilizing GPT-4, we demonstrate how LGMs fulfill several of these desiderata -- a feat previously unattainable -- while also highlighting important limitations in the current system that open up opportunities for novel ML research. We contend that achieving accurate, reliable, and robust end-to-end discovery systems solely through the current capabilities of LGMs is challenging. We instead advocate for fail-proof tool integration, along with active user moderation through feedback mechanisms, to foster data-driven scientific discoveries with efficiency and reproducibility.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband
Authors:
Chi-Jung Lee,
Ruidong Zhang,
Devansh Agarwal,
Tianhong Catherine Yu,
Vipin Gunda,
Oliver Lopez,
James Kim,
Sicheng Yin,
Boao Dong,
Ke Li,
Mose Sakashita,
Francois Guimbretiere,
Cheng Zhang
Abstract:
Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible s…
▽ More
Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible sound waves toward the hand. These sound waves interact with the hand and its surroundings through reflections and diffractions, carrying rich information about the hand's shape and the objects it interacts with. The information captured by the two microphones goes through a deep learning inference system that recovers hand poses and identifies various everyday hand activities. Results from the two 12-participant user studies show that EchoWrist is effective and efficient at tracking 3D hand poses and recognizing hand-object interactions. Operating at 57.9mW, EchoWrist is able to continuously reconstruct 20 3D hand joints with MJEDE of 4.81mm and recognize 12 naturalistic hand-object interactions with 97.6% accuracy.
△ Less
Submitted 29 March, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
'One Style Does Not Regulate All': Moderation Practices in Public and Private WhatsApp Groups
Authors:
Farhana Shahid,
Dhruv Agarwal,
Aditya Vashistha
Abstract:
WhatsApp is the largest social media platform in the Global South and is a virulent force in global misinformation and political propaganda. Due to end-to-end encryption WhatsApp can barely review any content and this often pushes the responsibility of moderation towards group admins. Yet, little is known about how WhatsApp group admins manage their groups, what factors and values influence modera…
▽ More
WhatsApp is the largest social media platform in the Global South and is a virulent force in global misinformation and political propaganda. Due to end-to-end encryption WhatsApp can barely review any content and this often pushes the responsibility of moderation towards group admins. Yet, little is known about how WhatsApp group admins manage their groups, what factors and values influence moderation decisions, and what challenges they face in moderating their groups. To fill this gap, we interviewed admins of 32 diverse groups and reviewed content from 30 public groups in India and Bangladesh. We observed notable differences in the formation, members' behavior, and moderation of public versus private groups, as well as in how WhatsApp admins operate compared to those on other platforms. We used Baumrind's typology of 'parenting styles' as a lens to explore moderation practices in WhatsApp groups and identified four moderation styles based on how responsive and controlling the admins were and discuss design recommendations to help them better manage problematic content in WhatsApp groups.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Active Foundational Models for Fault Diagnosis of Electrical Motors
Authors:
Sriram Anbalagan,
Sai Shashank GP,
Deepesh Agarwal,
Balasubramaniam Natarajan,
Babji Srinivasan
Abstract:
Fault detection and diagnosis of electrical motors are of utmost importance in ensuring the safe and reliable operation of several industrial systems. Detection and diagnosis of faults at the incipient stage allows corrective actions to be taken in order to reduce the severity of faults. The existing data-driven deep learning approaches for machine fault diagnosis rely extensively on huge amounts…
▽ More
Fault detection and diagnosis of electrical motors are of utmost importance in ensuring the safe and reliable operation of several industrial systems. Detection and diagnosis of faults at the incipient stage allows corrective actions to be taken in order to reduce the severity of faults. The existing data-driven deep learning approaches for machine fault diagnosis rely extensively on huge amounts of labeled samples, where annotations are expensive and time-consuming. However, a major portion of unlabeled condition monitoring data is not exploited in the training process. To overcome this limitation, we propose a foundational model-based Active Learning framework that utilizes less amount of labeled samples, which are most informative and harnesses a large amount of available unlabeled data by effectively combining Active Learning and Contrastive Self-Supervised Learning techniques. It consists of a transformer network-based backbone model trained using an advanced nearest-neighbor contrastive self-supervised learning method. This approach empowers the backbone to learn improved representations of samples derived from raw, unlabeled vibration data. Subsequently, the backbone can undergo fine-tuning to address a range of downstream tasks, both within the same machines and across different machines. The effectiveness of the proposed methodology has been assessed through the fine-tuning of the backbone for multiple target tasks using three distinct machine-bearing fault datasets. The experimental evaluation demonstrates a superior performance as compared to existing state-of-the-art fault diagnosis methods with less amount of labeled data.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Eye Disease Prediction using Ensemble Learning and Attention on OCT Scans
Authors:
Gauri Naik,
Nandini Narvekar,
Dimple Agarwal,
Nishita Nandanwar,
Himangi Pande
Abstract:
Eye diseases have posed significant challenges for decades, but advancements in technology have opened new avenues for their detection and treatment. Machine learning and deep learning algorithms have become instrumental in this domain, particularly when combined with Optical Coherent Technology (OCT) imaging. We propose a novel method for efficient detection of eye diseases from OCT images. Our t…
▽ More
Eye diseases have posed significant challenges for decades, but advancements in technology have opened new avenues for their detection and treatment. Machine learning and deep learning algorithms have become instrumental in this domain, particularly when combined with Optical Coherent Technology (OCT) imaging. We propose a novel method for efficient detection of eye diseases from OCT images. Our technique enables the classification of patients into disease free (normal eyes) or affected by specific conditions such as Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME), or Drusen. In this work, we introduce an end to end web application that utilizes machine learning and deep learning techniques for efficient eye disease prediction. The application allows patients to submit their raw OCT scanned images, which undergo segmentation using a trained custom UNet model. The segmented images are then fed into an ensemble model, comprising InceptionV3 and Xception networks, enhanced with a self attention layer. This self attention approach leverages the feature maps of individual models to achieve improved classification accuracy. The ensemble model's output is aggregated to predict and classify various eye diseases. Extensive experimentation and optimization have been conducted to ensure the application's efficiency and optimal performance. Our results demonstrate the effectiveness of the proposed approach in accurate eye disease prediction. The developed web application holds significant potential for early detection and timely intervention, thereby contributing to improved eye healthcare outcomes.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA
Authors:
Dhruv Agarwal,
Rajarshi Das,
Sopan Khosla,
Rashmi Gangadharaiah
Abstract:
We present BYOKG, a universal question-answering (QA) system that can operate on any knowledge graph (KG), requires no human-annotated training data, and can be ready to use within a day -- attributes that are out-of-scope for current KGQA systems. BYOKG draws inspiration from the remarkable ability of humans to comprehend information present in an unseen KG through exploration -- starting at rand…
▽ More
We present BYOKG, a universal question-answering (QA) system that can operate on any knowledge graph (KG), requires no human-annotated training data, and can be ready to use within a day -- attributes that are out-of-scope for current KGQA systems. BYOKG draws inspiration from the remarkable ability of humans to comprehend information present in an unseen KG through exploration -- starting at random nodes, inspecting the labels of adjacent nodes and edges, and combining them with their prior world knowledge. In BYOKG, exploration leverages an LLM-backed symbolic agent that generates a diverse set of query-program exemplars, which are then used to ground a retrieval-augmented reasoning procedure to predict programs for arbitrary questions. BYOKG is effective over both small- and large-scale graphs, showing dramatic gains in QA accuracy over a zero-shot baseline of 27.89 and 58.02 F1 on GrailQA and MetaQA, respectively. On GrailQA, we further show that our unsupervised BYOKG outperforms a supervised in-context learning method, demonstrating the effectiveness of exploration. Lastly, we find that performance of BYOKG reliably improves with continued exploration as well as improvements in the base LLM, notably outperforming a state-of-the-art fine-tuned model by 7.08 F1 on a sub-sampled zero-shot split of GrailQA.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Art or Artifice? Large Language Models and the False Promise of Creativity
Authors:
Tuhin Chakrabarty,
Philippe Laban,
Divyansh Agarwal,
Smaranda Muresan,
Chien-Sheng Wu
Abstract:
Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writ…
▽ More
Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. We recruit 10 creative writers and implement a human assessment of 48 stories written either by professional authors or LLMs using TTCW. Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals. In addition, we explore the use of LLMs as assessors to automate the TTCW evaluation, revealing that none of the LLMs positively correlate with the expert assessments.
△ Less
Submitted 8 March, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Foundational Models for Fault Diagnosis of Electrical Motors
Authors:
Sriram Anbalagan,
Deepesh Agarwal,
Balasubramaniam Natarajan,
Babji Srinivasan
Abstract:
A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing s…
▽ More
A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing studies for fault diagnosis, as they rely on fully labelled training data spanning all operating conditions and assume a consistent distribution. This is because obtaining a large number of labelled samples for several machines across different fault cases and operating scenarios may be unfeasible. In order to overcome the aforementioned limitations, this work proposes a framework to develop a foundational model for fault diagnosis of electrical motors. It involves building a neural network-based backbone to learn high-level features using self-supervised learning, and then fine-tuning the backbone to achieve specific objectives. The primary advantage of such an approach is that the backbone can be fine-tuned to achieve a wide variety of target tasks using very less amount of training data as compared to traditional supervised learning methodologies. The empirical evaluation demonstrates the effectiveness of the proposed approach by obtaining more than 90\% classification accuracy by fine-tuning the backbone not only across different types of fault scenarios or operating conditions, but also across different machines. This illustrates the promising potential of the proposed approach for cross-machine fault diagnosis tasks in real-world applications.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
SPLAL: Similarity-based pseudo-labeling with alignment loss for semi-supervised medical image classification
Authors:
Md Junaid Mahmood,
Pranaw Raj,
Divyansh Agarwal,
Suruchi Kumari,
Pravendra Singh
Abstract:
Medical image classification is a challenging task due to the scarcity of labeled samples and class imbalance caused by the high variance in disease prevalence. Semi-supervised learning (SSL) methods can mitigate these challenges by leveraging both labeled and unlabeled data. However, SSL methods for medical image classification need to address two key challenges: (1) estimating reliable pseudo-la…
▽ More
Medical image classification is a challenging task due to the scarcity of labeled samples and class imbalance caused by the high variance in disease prevalence. Semi-supervised learning (SSL) methods can mitigate these challenges by leveraging both labeled and unlabeled data. However, SSL methods for medical image classification need to address two key challenges: (1) estimating reliable pseudo-labels for the images in the unlabeled dataset and (2) reducing biases caused by class imbalance. In this paper, we propose a novel SSL approach, SPLAL, that effectively addresses these challenges. SPLAL leverages class prototypes and a weighted combination of classifiers to predict reliable pseudo-labels over a subset of unlabeled images. Additionally, we introduce alignment loss to mitigate model biases toward majority classes. To evaluate the performance of our proposed approach, we conduct experiments on two publicly available medical image classification benchmark datasets: the skin lesion classification (ISIC 2018) and the blood cell classification dataset (BCCD). The experimental results empirically demonstrate that our approach outperforms several state-of-the-art SSL methods over various evaluation metrics. Specifically, our proposed approach achieves a significant improvement over the state-of-the-art approach on the ISIC 2018 dataset in both Accuracy and F1 score, with relative margins of 2.24\% and 11.40\%, respectively. Finally, we conduct extensive ablation experiments to examine the contribution of different components of our approach, validating its effectiveness.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Machine Reading Comprehension using Case-based Reasoning
Authors:
Dung Thai,
Dhruv Agarwal,
Mudit Chaudhary,
Wenlong Zhao,
Rajarshi Das,
Manzil Zaheer,
Jay-Yoon Lee,
Hannaneh Hajishirzi,
Andrew McCallum
Abstract:
We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds upon the hypothesis that contextualized answers to similar questions share semantic similarities with each other. Given a test question, CBR-MRC first retrieves a set of similar cases from a nonparame…
▽ More
We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds upon the hypothesis that contextualized answers to similar questions share semantic similarities with each other. Given a test question, CBR-MRC first retrieves a set of similar cases from a nonparametric memory and then predicts an answer by selecting the span in the test context that is most similar to the contextualized representations of answers in the retrieved cases. The semi-parametric nature of our approach allows it to attribute a prediction to the specific set of evidence cases, making it a desirable choice for building reliable and debuggable QA systems. We show that CBR-MRC provides high accuracy comparable with large reader models and outperforms baselines by 11.5 and 8.4 EM on NaturalQuestions and NewsQA, respectively. Further, we demonstrate the ability of CBR-MRC in identifying not just the correct answer tokens but also the span with the most relevant supporting evidence. Lastly, we observe that contexts for certain question types show higher lexical diversity than others and find that CBR-MRC is robust to these variations while performance using fully-parametric methods drops.
△ Less
Submitted 5 December, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
Authors:
Philippe Laban,
Wojciech Kryściński,
Divyansh Agarwal,
Alexander R. Fabbri,
Caiming Xiong,
Shafiq Joty,
Chien-Sheng Wu
Abstract:
With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency de…
▽ More
With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency detection compared to traditional non-LLM methods. However, a closer analysis reveals that most LLMs fail on more complex formulations of the task and exposes issues with existing evaluation benchmarks, affecting evaluation precision. To address this, we propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits. This new benchmark is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, as we estimate inter-annotator agreement at about 0.9. Most LLMs struggle on SummEdits, with performance close to random chance. The best-performing model, GPT-4, is still 8\% below estimated human performance, highlighting the gaps in LLMs' ability to reason about facts and detect inconsistencies when they occur.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
Authors:
Ambareesh Revanur,
Debraj Basu,
Shradha Agrawal,
Dhwanit Agarwal,
Deepak Pai
Abstract:
Edit fidelity is a significant issue in open-world controllable generative image editing. Recently, CLIP-based approaches have traded off simplicity to alleviate these problems by introducing spatial attention in a handpicked layer of a StyleGAN. In this paper, we propose CoralStyleCLIP, which incorporates a multi-layer attention-guided blending strategy in the feature space of StyleGAN2 for obtai…
▽ More
Edit fidelity is a significant issue in open-world controllable generative image editing. Recently, CLIP-based approaches have traded off simplicity to alleviate these problems by introducing spatial attention in a handpicked layer of a StyleGAN. In this paper, we propose CoralStyleCLIP, which incorporates a multi-layer attention-guided blending strategy in the feature space of StyleGAN2 for obtaining high-fidelity edits. We propose multiple forms of our co-optimized region and layer selection strategy to demonstrate the variation of time complexity with the quality of edits over different architectural intricacies while preserving simplicity. We conduct extensive experimental analysis and benchmark our method against state-of-the-art CLIP-based methods. Our findings suggest that CoralStyleCLIP results in high-quality edits while preserving the ease of use.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
AugTriever: Unsupervised Dense Retrieval by Scalable Data Augmentation
Authors:
Rui Meng,
Ye Liu,
Semih Yavuz,
Divyansh Agarwal,
Lifu Tu,
Ning Yu,
Jianguo Zhang,
Meghana Bhat,
Yingbo Zhou
Abstract:
Dense retrievers have made significant strides in text retrieval and open-domain question answering, even though most achievements were made possible only with large amounts of human supervision. In this work, we aim to develop unsupervised methods by proposing two methods that create pseudo query-document pairs and train dense retrieval models in an annotation-free and scalable manner: query extr…
▽ More
Dense retrievers have made significant strides in text retrieval and open-domain question answering, even though most achievements were made possible only with large amounts of human supervision. In this work, we aim to develop unsupervised methods by proposing two methods that create pseudo query-document pairs and train dense retrieval models in an annotation-free and scalable manner: query extraction and transferred query generation. The former method produces pseudo queries by selecting salient spans from the original document. The latter utilizes generation models trained for other NLP tasks (e.g., summarization) to produce pseudo queries. Extensive experiments show that models trained with the proposed augmentation methods can perform comparably well (or better) to multiple strong baselines. Combining those strategies leads to further improvements, achieving the state-of-the-art performance of unsupervised dense retrieval on both BEIR and ODQA datasets.
△ Less
Submitted 7 March, 2023; v1 submitted 17 December, 2022;
originally announced December 2022.
-
CREATIVESUMM: Shared Task on Automatic Summarization for Creative Writing
Authors:
Divyansh Agarwal,
Alexander R. Fabbri,
Simeng Han,
Wojciech Kryściński,
Faisal Ladhak,
Bryan Li,
Kathleen McKeown,
Dragomir Radev,
Tianyi Zhang,
Sam Wiseman
Abstract:
This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique cha…
▽ More
This paper introduces the shared task of summarizing documents in several creative domains, namely literary texts, movie scripts, and television scripts. Summarizing these creative documents requires making complex literary interpretations, as well as understanding non-trivial temporal dependencies in texts containing varied styles of plot development and narrative structure. This poses unique challenges and is yet underexplored for text summarization systems. In this shared task, we introduce four sub-tasks and their corresponding datasets, focusing on summarizing books, movie scripts, primetime television scripts, and daytime soap opera scripts. We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions. As part of the CREATIVESUMM workshop at COLING 2022, the shared task attracted 18 submissions in total. We discuss the submissions and the baselines for each sub-task in this paper, along with directions for facilitating future work in the field.
△ Less
Submitted 6 December, 2022; v1 submitted 10 November, 2022;
originally announced November 2022.
-
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Authors:
Zifeng Wang,
Zhenbang Wu,
Dinesh Agarwal,
Jimeng Sun
Abstract:
Existing vision-text contrastive learning like CLIP aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., im…
▽ More
Existing vision-text contrastive learning like CLIP aims to match the paired image and caption embeddings while pushing others apart, which improves representation transferability and supports zero-shot prediction. However, medical image-text datasets are orders of magnitude below the general images and captions from the internet. Moreover, previous methods encounter many false negatives, i.e., images and reports from separate patients probably carry the same semantics but are wrongly treated as negatives. In this paper, we decouple images and texts for multimodal contrastive learning thus scaling the usable training data in a combinatorial magnitude with low cost. We also propose to replace the InfoNCE loss with semantic matching loss based on medical knowledge to eliminate false negatives in contrastive learning. We prove that MedCLIP is a simple yet effective framework: it outperforms state-of-the-art methods on zero-shot prediction, supervised classification, and image-text retrieval. Surprisingly, we observe that with only 20K pre-training data, MedCLIP wins over the state-of-the-art method (using around 200K data). Our code is available at https://github.com/RyanWangZf/MedCLIP.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Challenges and Opportunities in Deep Reinforcement Learning with Graph Neural Networks: A Comprehensive review of Algorithms and Applications
Authors:
Sai Munikoti,
Deepesh Agarwal,
Laya Das,
Mahantesh Halappanavar,
Balasubramaniam Natarajan
Abstract:
Deep reinforcement learning (DRL) has empowered a variety of artificial intelligence fields, including pattern recognition, robotics, recommendation-systems, and gaming. Similarly, graph neural networks (GNN) have also demonstrated their superior performance in supervised learning for graph-structured data. In recent times, the fusion of GNN with DRL for graph-structured environments has attracted…
▽ More
Deep reinforcement learning (DRL) has empowered a variety of artificial intelligence fields, including pattern recognition, robotics, recommendation-systems, and gaming. Similarly, graph neural networks (GNN) have also demonstrated their superior performance in supervised learning for graph-structured data. In recent times, the fusion of GNN with DRL for graph-structured environments has attracted a lot of attention. This paper provides a comprehensive review of these hybrid works. These works can be classified into two categories: (1) algorithmic enhancement, where DRL and GNN complement each other for better utility; (2) application-specific enhancement, where DRL and GNN support each other. This fusion effectively addresses various complex problems in engineering and life sciences. Based on the review, we further analyze the applicability and benefits of fusing these two domains, especially in terms of increasing generalizability and reducing computational complexity. Finally, the key challenges in integrating DRL and GNN, and potential future research directions are highlighted, which will be of interest to the broader machine learning community.
△ Less
Submitted 7 November, 2022; v1 submitted 16 June, 2022;
originally announced June 2022.
-
A General Framework for quantifying Aleatoric and Epistemic uncertainty in Graph Neural Networks
Authors:
Sai Munikoti,
Deepesh Agarwal,
Laya Das,
Balasubramaniam Natarajan
Abstract:
Graph Neural Networks (GNN) provide a powerful framework that elegantly integrates Graph theory with Machine learning for modeling and analysis of networked data. We consider the problem of quantifying the uncertainty in predictions of GNN stemming from modeling errors and measurement uncertainty. We consider aleatoric uncertainty in the form of probabilistic links and noise in feature vector of n…
▽ More
Graph Neural Networks (GNN) provide a powerful framework that elegantly integrates Graph theory with Machine learning for modeling and analysis of networked data. We consider the problem of quantifying the uncertainty in predictions of GNN stemming from modeling errors and measurement uncertainty. We consider aleatoric uncertainty in the form of probabilistic links and noise in feature vector of nodes, while epistemic uncertainty is incorporated via a probability distribution over the model parameters. We propose a unified approach to treat both sources of uncertainty in a Bayesian framework, where Assumed Density Filtering is used to quantify aleatoric uncertainty and Monte Carlo dropout captures uncertainty in model parameters. Finally, the two sources of uncertainty are aggregated to estimate the total uncertainty in predictions of a GNN. Results in the real-world datasets demonstrate that the Bayesian model performs at par with a frequentist model and provides additional information about predictions uncertainty that are sensitive to uncertainties in the data and model.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Masked Image Modeling Advances 3D Medical Image Analysis
Authors:
Zekai Chen,
Devansh Agarwal,
Kshitij Aggarwal,
Wiem Safta,
Samit Hirawat,
Venkat Sethuraman,
Mariann Micsinai Balan,
Kevin Brown
Abstract:
Recently, masked image modeling (MIM) has gained considerable attention due to its capacity to learn from vast amounts of unlabeled data and has been demonstrated to be effective on a wide variety of vision tasks involving natural images. Meanwhile, the potential of self-supervised learning in modeling 3D medical images is anticipated to be immense due to the high quantities of unlabeled images, a…
▽ More
Recently, masked image modeling (MIM) has gained considerable attention due to its capacity to learn from vast amounts of unlabeled data and has been demonstrated to be effective on a wide variety of vision tasks involving natural images. Meanwhile, the potential of self-supervised learning in modeling 3D medical images is anticipated to be immense due to the high quantities of unlabeled images, and the expense and difficulty of quality labels. However, MIM's applicability to medical images remains uncertain. In this paper, we demonstrate that masked image modeling approaches can also advance 3D medical images analysis in addition to natural images. We study how masked image modeling strategies leverage performance from the viewpoints of 3D medical image segmentation as a representative downstream task: i) when compared to naive contrastive learning, masked image modeling approaches accelerate the convergence of supervised training even faster (1.40$\times$) and ultimately produce a higher dice score; ii) predicting raw voxel values with a high masking ratio and a relatively smaller patch size is non-trivial self-supervised pretext-task for medical images modeling; iii) a lightweight decoder or projection head design for reconstruction is powerful for masked image modeling on 3D medical images which speeds up training and reduce cost; iv) finally, we also investigate the effectiveness of MIM methods under different practical scenarios where different image resolutions and labeled data ratios are applied.
△ Less
Submitted 23 August, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Detecting, Tracking and Counting Motorcycle Rider Traffic Violations on Unconstrained Roads
Authors:
Aman Goyal,
Dev Agarwal,
Anbumani Subramanian,
C. V. Jawahar,
Ravi Kiran Sarvadevabhatla,
Rohit Saluja
Abstract:
In many Asian countries with unconstrained road traffic conditions, driving violations such as not wearing helmets and triple-riding are a significant source of fatalities involving motorcycles. Identifying and penalizing such riders is vital in curbing road accidents and improving citizens' safety. With this motivation, we propose an approach for detecting, tracking, and counting motorcycle ridin…
▽ More
In many Asian countries with unconstrained road traffic conditions, driving violations such as not wearing helmets and triple-riding are a significant source of fatalities involving motorcycles. Identifying and penalizing such riders is vital in curbing road accidents and improving citizens' safety. With this motivation, we propose an approach for detecting, tracking, and counting motorcycle riding violations in videos taken from a vehicle-mounted dashboard camera. We employ a curriculum learning-based object detector to better tackle challenging scenarios such as occlusions. We introduce a novel trapezium-shaped object boundary representation to increase robustness and tackle the rider-motorcycle association. We also introduce an amodal regressor that generates bounding boxes for the occluded riders. Experimental results on a large-scale unconstrained driving dataset demonstrate the superiority of our approach compared to existing approaches and other ablative variants.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Long-Term Missing Value Imputation for Time Series Data Using Deep Neural Networks
Authors:
Jangho Park,
Juliane Muller,
Bhavna Arora,
Boris Faybishenko,
Gilberto Pastorello,
Charuleka Varadharajan,
Reetik Sahu,
Deborah Agarwal
Abstract:
We present an approach that uses a deep learning model, in particular, a MultiLayer Perceptron (MLP), for estimating the missing values of a variable in multivariate time series data. We focus on filling a long continuous gap (e.g., multiple months of missing daily observations) rather than on individual randomly missing observations. Our proposed gap filling algorithm uses an automated method for…
▽ More
We present an approach that uses a deep learning model, in particular, a MultiLayer Perceptron (MLP), for estimating the missing values of a variable in multivariate time series data. We focus on filling a long continuous gap (e.g., multiple months of missing daily observations) rather than on individual randomly missing observations. Our proposed gap filling algorithm uses an automated method for determining the optimal MLP model architecture, thus allowing for optimal prediction performance for the given time series. We tested our approach by filling gaps of various lengths (three months to three years) in three environmental datasets with different time series characteristics, namely daily groundwater levels, daily soil moisture, and hourly Net Ecosystem Exchange. We compared the accuracy of the gap-filled values obtained with our approach to the widely-used R-based time series gap filling methods ImputeTS and mtsdi. The results indicate that using an MLP for filling a large gap leads to better results, especially when the data behave nonlinearly. Thus, our approach enables the use of datasets that have a large gap in one variable, which is common in many long-term environmental monitoring observations.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding
Authors:
Tanay Agrawal,
Dhruv Agarwal,
Michal Balazia,
Neelabh Sinha,
Francois Bremond
Abstract:
Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, w…
▽ More
Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, we propose the use of behaviour encoding which boosts performance with minimal change to the model. Cross-attention using transformers has become popular in recent times and is utilised for fusion of different modalities. Since long term relations may exist, breaking the input into chunks is not desirable, thus the proposed model processes the entire input together. Our experiments show the importance of each of the above contributions
△ Less
Submitted 12 January, 2023; v1 submitted 22 December, 2021;
originally announced December 2021.
-
From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation
Authors:
Dhruv Agarwal,
Tanay Agrawal,
Laura M. Ferrari,
François Bremond
Abstract:
Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modali…
▽ More
Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded.
△ Less
Submitted 19 October, 2021; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Addressing practical challenges in Active Learning via a hybrid query strategy
Authors:
Deepesh Agarwal,
Pravesh Srivastava,
Sergio Martin-del-Campo,
Balasubramaniam Natarajan,
Babji Srinivasan
Abstract:
Active Learning (AL) is a powerful tool to address modern machine learning problems with significantly fewer labeled training instances. However, implementation of traditional AL methodologies in practical scenarios is accompanied by multiple challenges due to the inherent assumptions. There are several hindrances, such as unavailability of labels for the AL algorithm at the beginning; unreliable…
▽ More
Active Learning (AL) is a powerful tool to address modern machine learning problems with significantly fewer labeled training instances. However, implementation of traditional AL methodologies in practical scenarios is accompanied by multiple challenges due to the inherent assumptions. There are several hindrances, such as unavailability of labels for the AL algorithm at the beginning; unreliable external source of labels during the querying process; or incompatible mechanisms to evaluate the performance of Active Learner. Inspired by these practical challenges, we present a hybrid query strategy-based AL framework that addresses three practical challenges simultaneously: cold-start, oracle uncertainty and performance evaluation of Active Learner in the absence of ground truth. While a pre-clustering approach is employed to address the cold-start problem, the uncertainty surrounding the expertise of labeler and confidence in the given labels is incorporated to handle oracle uncertainty. The heuristics obtained during the querying process serve as the fundamental premise for accessing the performance of Active Learner. The robustness of the proposed AL framework is evaluated across three different environments and industrial settings. The results demonstrate the capability of the proposed framework to tackle practical challenges during AL implementation in real-world scenarios.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Entity Linking and Discovery via Arborescence-based Supervised Clustering
Authors:
Dhruv Agarwal,
Rico Angell,
Nicholas Monath,
Andrew McCallum
Abstract:
Previous work has shown promising results in performing entity linking by measuring not only the affinities between mentions and entities but also those amongst mentions. In this paper, we present novel training and inference procedures that fully utilize mention-to-mention affinities by building minimum arborescences (i.e., directed spanning trees) over mentions and entities across documents in o…
▽ More
Previous work has shown promising results in performing entity linking by measuring not only the affinities between mentions and entities but also those amongst mentions. In this paper, we present novel training and inference procedures that fully utilize mention-to-mention affinities by building minimum arborescences (i.e., directed spanning trees) over mentions and entities across documents in order to make linking decisions. We also show that this method gracefully extends to entity discovery, enabling the clustering of mentions that do not have an associated entity in the knowledge base. We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset, and show significant improvements in performance for both entity linking and discovery compared to identically parameterized models. We further show significant efficiency improvements with only a small loss in accuracy over previous work, which use more computationally expensive models.
△ Less
Submitted 10 May, 2022; v1 submitted 2 September, 2021;
originally announced September 2021.
-
On-Device Content Moderation
Authors:
Anchal Pandey,
Sukumar Moharana,
Debi Prasanna Mohanty,
Archit Panwar,
Dewang Agarwal,
Siva Prasad Thota
Abstract:
With the advent of internet, not safe for work(NSFW) content moderation is a major problem today. Since,smartphones are now part of daily life of billions of people,it becomes even more important to have a solution which coulddetect and suggest user about potential NSFW content present ontheir phone. In this paper we present a novel on-device solutionfor detecting NSFW images. In addition to conve…
▽ More
With the advent of internet, not safe for work(NSFW) content moderation is a major problem today. Since,smartphones are now part of daily life of billions of people,it becomes even more important to have a solution which coulddetect and suggest user about potential NSFW content present ontheir phone. In this paper we present a novel on-device solutionfor detecting NSFW images. In addition to conventional porno-graphic content moderation, we have also included semi-nudecontent moderation as it is still NSFW in a large demography.We have curated a dataset comprising of three major categories,namely nude, semi-nude and safe images. We have created anensemble of object detector and classifier for filtering of nudeand semi-nude contents. The solution provides unsafe body partannotations along with identification of semi-nude images. Weextensively tested our proposed solution on several public datasetand also on our custom dataset. The model achieves F1 scoreof 0.91 with 95% precision and 88% recall on our customNSFW16k dataset and 0.92 MAP on NPDI dataset. Moreover itachieves average 0.002 false positive rate on a collection of safeimage open datasets.
△ Less
Submitted 25 July, 2021;
originally announced July 2021.
-
BookSum: A Collection of Datasets for Long-form Narrative Summarization
Authors:
Wojciech Kryściński,
Nazneen Rajani,
Divyansh Agarwal,
Caiming Xiong,
Dragomir Radev
Abstract:
The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form…
▽ More
The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form narrative summarization. Our dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of our dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. To facilitate future work, we trained and evaluated multiple extractive and abstractive summarization models as baselines for our dataset.
△ Less
Submitted 6 December, 2022; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Accurate and Scalable Matching of Translators to Displaced Persons for Overcoming Language Barriers
Authors:
Divyansh Agarwal,
Yuta Baba,
Pratik Sachdeva,
Tanya Tandon,
Thomas Vetterli,
Aziz Alghunaim
Abstract:
Residents of developing countries are disproportionately susceptible to displacement as a result of humanitarian crises. During such crises, language barriers impede aid workers in providing services to those displaced. To build resilience, such services must be flexible and robust to a host of possible languages. \textit{Tarjimly} aims to overcome the barriers by providing a platform capable of m…
▽ More
Residents of developing countries are disproportionately susceptible to displacement as a result of humanitarian crises. During such crises, language barriers impede aid workers in providing services to those displaced. To build resilience, such services must be flexible and robust to a host of possible languages. \textit{Tarjimly} aims to overcome the barriers by providing a platform capable of matching bilingual volunteers to displaced persons or aid workers in need of translating. However, Tarjimly's large pool of translators comes with the challenge of selecting the right translator per request. In this paper, we describe a machine learning system that matches translator requests to volunteers at scale. We demonstrate that a simple logistic regression, operating on easily computable features, can accurately predict and rank translator response. In deployment, this lightweight system matches 82\% of requests with a median response time of 59 seconds, allowing aid workers to accelerate their services supporting displaced persons.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Solving Physics Puzzles by Reasoning about Paths
Authors:
Augustin Harter,
Andrew Melnik,
Gaurav Kumar,
Dhruv Agarwal,
Animesh Garg,
Helge Ritter
Abstract:
We propose a new deep learning model for goal-driven tasks that require intuitive physical reasoning and intervention in the scene to achieve a desired end goal. Its modular structure is motivated by hypothesizing a sequence of intuitive steps that humans apply when trying to solve such a task. The model first predicts the path the target object would follow without intervention and the path the t…
▽ More
We propose a new deep learning model for goal-driven tasks that require intuitive physical reasoning and intervention in the scene to achieve a desired end goal. Its modular structure is motivated by hypothesizing a sequence of intuitive steps that humans apply when trying to solve such a task. The model first predicts the path the target object would follow without intervention and the path the target object should follow in order to solve the task. Next, it predicts the desired path of the action object and generates the placement of the action object. All components of the model are trained jointly in a supervised way; each component receives its own learning signal but learning signals are also backpropagated through the entire architecture. To evaluate the model we use PHYRE - a benchmark test for goal-driven physical reasoning in 2D mechanics puzzles.
△ Less
Submitted 14 November, 2020;
originally announced November 2020.
-
DeText: A Deep Text Ranking Framework with BERT
Authors:
Weiwei Guo,
Xiaowei Liu,
Sida Wang,
Huiji Gao,
Ananth Sankar,
Zimeng Yang,
Qi Guo,
Liang Zhang,
Bo Long,
Bee-Chung Chen,
Deepak Agarwal
Abstract:
Ranking is the most important component in a search system. Mostsearch systems deal with large amounts of natural language data,hence an effective ranking system requires a deep understandingof text semantics. Recently, deep learning based natural languageprocessing (deep NLP) models have generated promising results onranking systems. BERT is one of the most successful models thatlearn contextual…
▽ More
Ranking is the most important component in a search system. Mostsearch systems deal with large amounts of natural language data,hence an effective ranking system requires a deep understandingof text semantics. Recently, deep learning based natural languageprocessing (deep NLP) models have generated promising results onranking systems. BERT is one of the most successful models thatlearn contextual embedding, which has been applied to capturecomplex query-document relations for search ranking. However,this is generally done by exhaustively interacting each query wordwith each document word, which is inefficient for online servingin search product systems. In this paper, we investigate how tobuild an efficient BERT-based ranking model for industry use cases.The solution is further extended to a general ranking framework,DeText, that is open sourced and can be applied to various rankingproductions. Offline and online experiments of DeText on threereal-world search systems present significant improvement overstate-of-the-art approaches.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
On Adversarial Robustness: A Neural Architecture Search perspective
Authors:
Chaitanya Devaguptapu,
Devansh Agarwal,
Gaurav Mittal,
Pulkit Gopalani,
Vineeth N Balasubramanian
Abstract:
Adversarial robustness of deep learning models has gained much traction in the last few years. Various attacks and defenses are proposed to improve the adversarial robustness of modern-day deep learning architectures. While all these approaches help improve the robustness, one promising direction for improving adversarial robustness is unexplored, i.e., the complex topology of the neural network a…
▽ More
Adversarial robustness of deep learning models has gained much traction in the last few years. Various attacks and defenses are proposed to improve the adversarial robustness of modern-day deep learning architectures. While all these approaches help improve the robustness, one promising direction for improving adversarial robustness is unexplored, i.e., the complex topology of the neural network architecture. In this work, we address the following question: Can the complex topology of a neural network give adversarial robustness without any form of adversarial training?. We answer this empirically by experimenting with different hand-crafted and NAS-based architectures. Our findings show that, for small-scale attacks, NAS-based architectures are more robust for small-scale datasets and simple tasks than hand-crafted architectures. However, as the size of the dataset or the complexity of task increases, hand-crafted architectures are more robust than NAS-based architectures. Our work is the first large-scale study to understand adversarial robustness purely from an architectural perspective. Our study shows that random sampling in the search space of DARTS (a popular NAS method) with simple ensembling can improve the robustness to PGD attack by nearly~12\%. We show that NAS, which is popular for achieving SoTA accuracy, can provide adversarial accuracy as a free add-on without any form of adversarial training. Our results show that leveraging the search space of NAS methods with methods like ensembles can be an excellent way to achieve adversarial robustness without any form of adversarial training. We also introduce a metric that can be used to calculate the trade-off between clean accuracy and adversarial robustness. Code and pre-trained models will be made available at \url{https://github.com/tdchaitanya/nas-robustness}
△ Less
Submitted 26 August, 2021; v1 submitted 16 July, 2020;
originally announced July 2020.
-
A kinetic model for qualitative understanding and analysis of the effect of complete lockdown imposed by India for controlling the COVID-19 disease spread by the SARS-CoV-2 virus
Authors:
Raj Kishore,
Prashant Kumar Jha,
Shreeja Das,
Dheeresh Agarwal,
Tanmay Maloo,
Hansraj Pegu,
Devadatta Sahoo,
Ankita Singhal,
Kisor K. Sahu
Abstract:
The present ongoing global pandemic caused by SARS-CoV-2 virus is creating havoc across the world. The absence of any vaccine as well as any definitive drug to cure, has made the situation very grave. Therefore only few effective tools are available to contain the rapid pace of spread of this disease, named as COVID-19. On 24th March, 2020, the the Union Government of India made an announcement of…
▽ More
The present ongoing global pandemic caused by SARS-CoV-2 virus is creating havoc across the world. The absence of any vaccine as well as any definitive drug to cure, has made the situation very grave. Therefore only few effective tools are available to contain the rapid pace of spread of this disease, named as COVID-19. On 24th March, 2020, the the Union Government of India made an announcement of unprecedented complete lockdown of the entire country effective from the next day. No exercise of similar scale and magnitude has been ever undertaken anywhere on the globe in the history of entire mankind. This study aims to scientifically analyze the implications of this decision using a kinetic model covering more than 96% of Indian territory. This model was further constrained by large sets of realistic parameters pertinent to India in order to capture the ground realities prevailing in India, such as: (i) true state wise population density distribution, (ii) accurate state wise infection distribution for the zeroth day of simulation (20th March, 2020), (iii) realistic movements of average clusters, (iv) rich diversity in movements patterns across different states, (v) migration patterns across different geographies, (vi) different migration patterns for pre- and post-COVID-19 outbreak, (vii) Indian demographic data based on the 2011 census, (viii) World Health Organization (WHO) report on demography wise infection rate and (ix) incubation period as per WHO report. This model does not attempt to make a long-term prediction about the disease spread on a standalone basis; but to compare between two different scenarios (complete lockdown vs. no lockdown). In the framework of model assumptions, our model conclusively shows significant success of the lockdown in containing the disease within a tiny fraction of the population and in the absence of it, it would have led to a very grave situation.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
A Dataset for measuring reading levels in India at scale
Authors:
Dolly Agarwal,
Jayant Gupchup,
Nishant Baghel
Abstract:
One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children's speech are not only rare but are primarily in English. To solve this assessment problem and advance deep…
▽ More
One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets of children's speech are not only rare but are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14. The dataset consists of 5,301 subjects generating 81,330 labeled audio clips in Hindi, Marathi and English. These labels represent expert opinions on the child's ability to read at a specified level. Using this dataset, we built a simple ASR-based classifier. Early results indicate that we can achieve a prediction accuracy of 86% for the English language. Considering the ASER survey spans half a million subjects, this dataset can grow to those scales.
△ Less
Submitted 13 February, 2020; v1 submitted 27 November, 2019;
originally announced December 2019.
-
Modulo: Drive-by Sensing at City-scale on the Cheap
Authors:
Dhruv Agarwal,
Srinivasan Iyengar,
Manohar Swaminathan
Abstract:
Drive-by sensing is gaining popularity as an inexpensive way to perform fine-grained, city-scale, spatiotemporal monitoring of physical phenomena. Prior work explores several challenges in the design of low-cost sensors, the reliability of these sensors, and their application for specific use-cases like pothole detection and pollution monitoring. However, the process of deployment of a drive-by se…
▽ More
Drive-by sensing is gaining popularity as an inexpensive way to perform fine-grained, city-scale, spatiotemporal monitoring of physical phenomena. Prior work explores several challenges in the design of low-cost sensors, the reliability of these sensors, and their application for specific use-cases like pothole detection and pollution monitoring. However, the process of deployment of a drive-by sensing network at a city-scale is still unexplored. Despite the rise of ride-sharing services, there is still no way to optimally select vehicles from a fleet that can accomplish the sensing task by providing enough coverage of the city. In this paper, we propose Modulo -- a system to bootstrap drive-by sensing deployment by taking into consideration a variety of aspects such as spatiotemporal coverage, budget constraints. Further, Modulo is well-suited to satisfy unique deployment constraints such as colocations with other sensors (needed for gas and PM sensor calibration), etc. We compare Modulo with two baseline algorithms on real-world taxi and bus datasets. We find that Modulo marginally outperforms the two baselines for datasets with just random-routes vehicles such as taxis. However, it significantly outperforms the baselines when a fleet comprises of both taxis and fixed-route vehicles such as public transport buses. Finally, we present a real-deployment that uses Modulo to select vehicles for an air pollution sensing application.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Surrogate Optimization of Deep Neural Networks for Groundwater Predictions
Authors:
Juliane Mueller,
Jangho Park,
Reetik Sahu,
Charuleka Varadharajan,
Bhavna Arora,
Boris Faybishenko,
Deborah Agarwal
Abstract:
Sustainable management of groundwater resources under changing climatic conditions require an application of reliable and accurate predictions of groundwater levels. Mechanistic multi-scale, multi-physics simulation models are often too hard to use for this purpose, especially for groundwater managers who do not have access to the complex compute resources and data. Therefore, we analyzed the appl…
▽ More
Sustainable management of groundwater resources under changing climatic conditions require an application of reliable and accurate predictions of groundwater levels. Mechanistic multi-scale, multi-physics simulation models are often too hard to use for this purpose, especially for groundwater managers who do not have access to the complex compute resources and data. Therefore, we analyzed the applicability and performance of four modern deep learning computational models for predictions of groundwater levels. We compare three methods for optimizing the models' hyperparameters, including two surrogate model-based algorithms and a random sampling method. The models were tested using predictions of the groundwater level in Butte County, California, USA, taking into account the temporal variability of streamflow, precipitation, and ambient temperature. Our numerical study shows that the optimization of the hyperparameters can lead to reasonably accurate performance of all models (root mean squared errors of groundwater predictions of 2 meters or less), but the ''simplest'' network, namely a multilayer perceptron (MLP) performs overall better for learning and predicting groundwater data than the more advanced long short-term memory or convolutional neural networks in terms of prediction accuracy and time-to-solution, making the MLP a suitable candidate for groundwater prediction.
△ Less
Submitted 3 February, 2020; v1 submitted 28 August, 2019;
originally announced August 2019.
-
F1/10: An Open-Source Autonomous Cyber-Physical Platform
Authors:
Matthew O'Kelly,
Varundev Sukhil,
Houssam Abbas,
Jack Harkins,
Chris Kao,
Yash Vardhan Pant,
Rahul Mangharam,
Dipshil Agarwal,
Madhur Behl,
Paolo Burgio,
Marko Bertogna
Abstract:
In 2005 DARPA labeled the realization of viable autonomous vehicles (AVs) a grand challenge; a short time later the idea became a moonshot that could change the automotive industry. Today, the question of safety stands between reality and solved. Given the right platform the CPS community is poised to offer unique insights. However, testing the limits of safety and performance on real vehicles is…
▽ More
In 2005 DARPA labeled the realization of viable autonomous vehicles (AVs) a grand challenge; a short time later the idea became a moonshot that could change the automotive industry. Today, the question of safety stands between reality and solved. Given the right platform the CPS community is poised to offer unique insights. However, testing the limits of safety and performance on real vehicles is costly and hazardous. The use of such vehicles is also outside the reach of most researchers and students. In this paper, we present F1/10: an open-source, affordable, and high-performance 1/10 scale autonomous vehicle testbed. The F1/10 testbed carries a full suite of sensors, perception, planning, control, and networking software stacks that are similar to full scale solutions. We demonstrate key examples of the research enabled by the F1/10 testbed, and how the platform can be used to augment research and education in autonomous systems, making autonomy more accessible.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
Bringing Salary Transparency to the World: Computing Robust Compensation Insights via LinkedIn Salary
Authors:
Krishnaram Kenthapadi,
Stuart Ambler,
Liang Zhang,
Deepak Agarwal
Abstract:
The recently launched LinkedIn Salary product has been designed with the goal of providing compensation insights to the world's professionals and thereby helping them optimize their earning potential. We describe the overall design and architecture of the statistical modeling system underlying this product. We focus on the unique data mining challenges while designing and implementing the system,…
▽ More
The recently launched LinkedIn Salary product has been designed with the goal of providing compensation insights to the world's professionals and thereby helping them optimize their earning potential. We describe the overall design and architecture of the statistical modeling system underlying this product. We focus on the unique data mining challenges while designing and implementing the system, and describe the modeling components such as Bayesian hierarchical smoothing that help to compute and present robust compensation insights to users. We report on extensive evaluation with nearly one year of de-identified compensation data collected from over one million LinkedIn users, thereby demonstrating the efficacy of the statistical models. We also highlight the lessons learned through the deployment of our system at LinkedIn.
△ Less
Submitted 1 September, 2017; v1 submitted 28 March, 2017;
originally announced March 2017.
-
Me, Myself and My Killfie: Characterizing and Preventing Selfie Deaths
Authors:
Hemank Lamba,
Varun Bharadhwaj,
Mayank Vachher,
Divyansh Agarwal,
Megha Arora,
Ponnurangam Kumaraguru
Abstract:
Over the past couple of years, clicking and posting selfies has become a popular trend. However, since March 2014, 127 people have died and many have been injured while trying to click a selfie. Researchers have studied selfies for understanding the psychology of the authors, and understanding its effect on social media platforms. In this work, we perform a comprehensive analysis of the selfie-rel…
▽ More
Over the past couple of years, clicking and posting selfies has become a popular trend. However, since March 2014, 127 people have died and many have been injured while trying to click a selfie. Researchers have studied selfies for understanding the psychology of the authors, and understanding its effect on social media platforms. In this work, we perform a comprehensive analysis of the selfie-related casualties and infer various reasons behind these deaths. We use inferences from incidents and from our understanding of the features, we create a system to make people more aware of the dangerous situations in which these selfies are taken. We use a combination of text-based, image-based and location-based features to classify a particular selfie as dangerous or not. Our method ran on 3,155 annotated selfies collected on Twitter gave 73% accuracy. Individually the image-based features were the most informative for the prediction task. The combination of image-based and location-based features resulted in the best accuracy. We have made our code and dataset available at http://labs.precog.iiitd.edu.in/killfie.
△ Less
Submitted 11 November, 2016; v1 submitted 7 November, 2016;
originally announced November 2016.
-
Efficient Optimal Algorithm of Task Scheduling in Cloud Computing Environment
Authors:
Dr. Amit Agarwal,
Saloni Jain
Abstract:
Cloud computing is an emerging technology in distributed computing which facilitates pay per model as per user demand and requirement.Cloud consist of a collection of virtual machine which includes both computational and storage facility. The primary aim of cloud computing is to provide efficient access to remote and geographically distributed resources. Cloud is developing day by day and faces ma…
▽ More
Cloud computing is an emerging technology in distributed computing which facilitates pay per model as per user demand and requirement.Cloud consist of a collection of virtual machine which includes both computational and storage facility. The primary aim of cloud computing is to provide efficient access to remote and geographically distributed resources. Cloud is developing day by day and faces many challenges, one of them is scheduling. Scheduling refers to a set of policies to control the order of work to be performed by a computer system. A good scheduler adapts its scheduling strategy according to the changing environment and the type of task. In this research paper we presented a Generalized Priority algorithm for efficient execution of task and comparison with FCFS and Round Robin Scheduling. Algorithm should be tested in cloud Sim toolkit and result shows that it gives better performance compared to other traditional scheduling algorithm.
△ Less
Submitted 8 April, 2014;
originally announced April 2014.
-
Multi-Faceted Ranking of News Articles using Post-Read Actions
Authors:
Deepak Agarwal,
Bee-Chung Chen,
Xuanhui Wang
Abstract:
Personalized article recommendation is important to improve user engagement on news sites. Existing work quantifies engagement primarily through click rates. We argue that quality of recommendations can be improved by incorporating different types of "post-read" engagement signals like sharing, commenting, printing and e-mailing article links. More specifically, we propose a multi-faceted ranking…
▽ More
Personalized article recommendation is important to improve user engagement on news sites. Existing work quantifies engagement primarily through click rates. We argue that quality of recommendations can be improved by incorporating different types of "post-read" engagement signals like sharing, commenting, printing and e-mailing article links. More specifically, we propose a multi-faceted ranking problem for recommending news articles where each facet corresponds to a ranking problem to maximize actions of a post-read action type. The key technical challenge is to estimate the rates of post-read action types by mitigating the impact of enormous data sparsity, we do so through several variations of factor models. To exploit correlations among post-read action types we also introduce a novel variant called locally augmented tensor (LAT) model. Through data obtained from a major news site in the US, we show that factor models significantly outperform a few baseline IR models and the LAT model significantly outperforms several other variations of factor models. Our findings show that it is possible to incorporate post-read signals that are commonly available on online news sites to improve quality of recommendations.
△ Less
Submitted 2 May, 2012;
originally announced May 2012.
-
Parallel Matrix Factorization for Binary Response
Authors:
Rajiv Khanna,
Liang Zhang,
Deepak Agarwal,
Beechung Chen
Abstract:
Predicting user affinity to items is an important problem in applications like content optimization, computational advertising, and many more. While bilinear random effect models (matrix factorization) provide state-of-the-art performance when minimizing RMSE through a Gaussian response model on explicit ratings data, applying it to imbalanced binary response data presents additional challenges th…
▽ More
Predicting user affinity to items is an important problem in applications like content optimization, computational advertising, and many more. While bilinear random effect models (matrix factorization) provide state-of-the-art performance when minimizing RMSE through a Gaussian response model on explicit ratings data, applying it to imbalanced binary response data presents additional challenges that we carefully study in this paper. Data in many applications usually consist of users' implicit response that are often binary -- clicking an item or not; the goal is to predict click rates, which is often combined with other measures to calculate utilities to rank items at runtime of the recommender systems. Because of the implicit nature, such data are usually much larger than explicit rating data and often have an imbalanced distribution with a small fraction of click events, making accurate click rate prediction difficult. In this paper, we address two problems. First, we show previous techniques to estimate bilinear random effect models with binary data are less accurate compared to our new approach based on adaptive rejection sampling, especially for imbalanced response. Second, we develop a parallel bilinear random effect model fitting framework using Map-Reduce paradigm that scales to massive datasets. Our parallel algorithm is based on a "divide and conquer" strategy coupled with an ensemble approach. Through experiments on the benchmark MovieLens data, a small Yahoo! Front Page data set, and a large Yahoo! Front Page data set that contains 8M users and 1B binary observations, we show that careful handling of binary response as well as identifiability issues are needed to achieve good performance for click rate prediction, and that the proposed adaptive rejection sampler and the partitioning as well as ensemble techniques significantly improve model performance.
△ Less
Submitted 22 March, 2012;
originally announced March 2012.
-
A Reference Based, Tree Structured Time Synchronization Approach and its Analysis in WSN
Authors:
Surendra Rahamatkar,
Dr. Ajay Agarwal
Abstract:
Time synchronization for wireless sensor networks (WSNs) has been studied in recent years as a fundamental and significant research issue. Many applications based on these WSNs assume local clocks at each sensor node that need to be synchronized to a common notion of time. Time synchronization in a WSN is critical for accurate time stamping of events and fine-tuned coordination among the sensor no…
▽ More
Time synchronization for wireless sensor networks (WSNs) has been studied in recent years as a fundamental and significant research issue. Many applications based on these WSNs assume local clocks at each sensor node that need to be synchronized to a common notion of time. Time synchronization in a WSN is critical for accurate time stamping of events and fine-tuned coordination among the sensor nodes to reduce power consumption. This paper proposes a bidirectional, reference based, tree structured time synchronization service for WSNs along with network evaluation phase. This offers a push mechanism for (i) accurate and (ii) low overhead for global time synchronization. Analysis study of proposed approach shows that it is lightweight as the number of required broadcasting messages is constant in one broadcasting domain.
△ Less
Submitted 25 March, 2011;
originally announced March 2011.
-
The Hunting of the Bump: On Maximizing Statistical Discrepancy
Authors:
Deepak Agarwal,
Jeff M. Phillips,
Suresh Venkatasubramanian
Abstract:
Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different me…
▽ More
Anomaly detection has important applications in biosurveilance and environmental monitoring. When comparing measured data to data drawn from a baseline distribution, merely, finding clusters in the measured data may not actually represent true anomalies. These clusters may likely be the clusters of the baseline distribution. Hence, a discrepancy function is often used to examine how different measured data is to baseline data within a region. An anomalous region is thus defined to be one with high discrepancy.
In this paper, we present algorithms for maximizing statistical discrepancy functions over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one needs only maximize a linear discrepancy function over the same set of shapes.
We derive general discrepancy functions for data generated from a one- parameter exponential family. This generalizes the widely-used Kulldorff scan statistic for data from a Poisson distribution. We present an algorithm running in $O(\smash[tb]{\frac{1}ε n^2 \log^2 n})$ that computes the maximum discrepancy rectangle to within additive error $ε$, for the Kulldorff scan statistic. Similar results hold for relative error and for discrepancy functions for data coming from Gaussian, Bernoulli, and gamma distributions. Prior to our work, the best known algorithms were exact and ran in time $\smash[t]{O(n^4)}$.
△ Less
Submitted 2 October, 2005;
originally announced October 2005.
-
Supporting Dynamic Ad hoc Collaboration Capabilities
Authors:
D. Agarwal,
K. Berket
Abstract:
Modern HENP experiments such as CMS and Atlas involve as many as 2000 collaborators around the world. Collaborations this large will be unable to meet often enough to support working closely together. Many of the tools currently available for collaboration focus on heavy-weight applications such as videoconferencing tools. While these are important, there is a more basic need for tools that supp…
▽ More
Modern HENP experiments such as CMS and Atlas involve as many as 2000 collaborators around the world. Collaborations this large will be unable to meet often enough to support working closely together. Many of the tools currently available for collaboration focus on heavy-weight applications such as videoconferencing tools. While these are important, there is a more basic need for tools that support connecting physicists to work together on an ad hoc or continuous basis. Tools that support the day-to-day connectivity and underlying needs of a group of collaborators are important for providing light-weight, non-intrusive, and flexible ways to work collaboratively. Some example tools include messaging, file-sharing, and shared plot viewers. An important component of the environment is a scalable underlying communication framework. In this paper we will describe our current progress on building a dynamic and ad hoc collaboration environment and our vision for its evolution into a HENP collaboration environment.
△ Less
Submitted 14 July, 2003;
originally announced July 2003.