Skip to main content

Showing 1–50 of 137 results for author: Sabharwal, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.08819  [pdf, other

    cs.LG cs.CC cs.CL cs.FL

    The Illusion of State in State-Space Models

    Authors: William Merrill, Jackson Petty, Ashish Sabharwal

    Abstract: State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill and Sabharwal, 2023), which SSMs are explicitly designed to address via their… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Preprint

  2. arXiv:2404.02040  [pdf, other

    cs.FL cs.LG

    Transformers as Transducers

    Authors: Lena Strobl, Dana Angluin, David Chiang, Jonathan Rawski, Ashish Sabharwal

    Abstract: We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence funct… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  3. arXiv:2402.13610  [pdf, other

    cs.CL cs.AI cs.LG

    Data-driven Discovery with Large Generative Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Sanchaita Hazra, Ashish Sabharwal, Peter Clark

    Abstract: With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a se… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  4. arXiv:2402.02656  [pdf, other

    cs.CL q-bio.QM

    RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews

    Authors: Satpreet Harcharan Singh, Kevin Jiang, Kanchan Bhasin, Ashutosh Sabharwal, Nidal Moukaddam, Ankit B Patel

    Abstract: Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for l… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  5. arXiv:2311.09519  [pdf, other

    cs.CL

    Leveraging Code to Improve In-context Learning for Semantic Parsing

    Authors: Ben Bogin, Shivanshu Gupta, Peter Clark, Ashish Sabharwal

    Abstract: In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. However, learning to parse to rare domain-specific languages (DSLs) from just a few demonstrations is challenging, limiting the performance of even the most capable LLMs. In this work, we improve the effectiveness of ICL for semantic parsing by (1) using general-purpose p… ▽ More

    Submitted 27 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024

  6. arXiv:2311.05772  [pdf, other

    cs.AI cs.CL cs.LG

    ADaPT: As-Needed Decomposition and Planning with Language Models

    Authors: Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot

    Abstract: Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative executors) or generating plans and executing sub-tasks using LLMs (plan-and-execute). However, these methods struggle with task complexity, as the… ▽ More

    Submitted 8 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 (findings) camera-ready. Project Page: https://allenai.github.io/adaptllm

  7. arXiv:2311.04892  [pdf, other

    cs.CL

    Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

    Authors: Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot

    Abstract: Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs' capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of… ▽ More

    Submitted 27 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Project page: https://allenai.github.io/persona-bias. Paper to appear at ICLR 2024. Added results for other LLMs in v2 (similar findings)

  8. arXiv:2311.02807  [pdf, other

    cs.LG cs.AI cs.CL

    QualEval: Qualitative Evaluation for Model Improvement

    Authors: Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan

    Abstract: Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world tasks, a single scalar to quantify and compare is insufficient to capture the fine-grained nuances of model behavior. Metrics serve only as a… ▽ More

    Submitted 5 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  9. arXiv:2310.07923  [pdf, ps, other

    cs.LG cs.CC cs.CL cs.LO

    The Expressive Power of Transformers with Chain of Thought

    Authors: William Merrill, Ashish Sabharwal

    Abstract: Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after reading their input. However, in practice, transformers' reasoning can be improved by allowing them to use a "chain of thought" or "scratchpad",… ▽ More

    Submitted 11 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 9-page preprint. ICLR camera ready posted April 11

  10. arXiv:2310.01693  [pdf, other

    cs.CL

    Closing the Curious Case of Neural Text Degeneration

    Authors: Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal

    Abstract: Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonze… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    MSC Class: 68T50 ACM Class: I.2.7

  11. arXiv:2305.14596  [pdf, other

    cs.CL cs.LG

    Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy

    Authors: Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal

    Abstract: When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer choices. Spreading probability mass across multiple surface forms with identical meaning (such as "bath" and "bathtub") is thought to cause an underestimation of a model's true performance, referred to as th… ▽ More

    Submitted 31 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  12. arXiv:2305.14250  [pdf, other

    cs.CL cs.AI

    Language Models with Rationality

    Authors: Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schuetze, Peter Clark

    Abstract: While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent "beliefs". This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that… ▽ More

    Submitted 29 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  13. arXiv:2305.14010  [pdf, other

    cs.CL

    IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

    Authors: Wenhao Yu, Meng Jiang, Peter Clark, Ashish Sabharwal

    Abstract: Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we introduce the first such dataset, named IfQA, where each question is based on a counterfactual presupposition via an "if" clause. For example, if L… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  14. arXiv:2305.14002  [pdf, other

    cs.CL

    Improving Language Models via Plug-and-Play Retrieval Feedback

    Authors: Wenhao Yu, Zhihan Zhang, Zhenwen Liang, Meng Jiang, Ashish Sabharwal

    Abstract: Large language models (LLMs) exhibit remarkable performance across various NLP tasks. However, they often generate incorrect or hallucinated information, which hinders their practical applicability in real-world scenarios. Human feedback has been shown to effectively enhance the factuality and quality of generated content, addressing some of these limitations. However, this approach is resource-in… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  15. arXiv:2304.08789  [pdf, other

    cs.IT eess.SP

    Full-Duplex Wireless for 6G: Progress Brings New Opportunities and Challenges

    Authors: Besma Smida, Ashutosh Sabharwal, Gabor Fodor, George C. Alexandropoulos, Himal A. Suraweera, Chan-Byoung Chae

    Abstract: The use of in-band full-duplex (FD) enables nodes to simultaneously transmit and receive on the same frequency band, which challenges the traditional assumption in wireless network design. The full-duplex capability enhances spectral efficiency and decreases latency, which are two key drivers pushing the performance expectations of next-generation mobile networks. In less than ten years, in-band F… ▽ More

    Submitted 24 April, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: 21 pages, 15 figures, accepted to an IEEE Journal

  16. arXiv:2303.10727  [pdf, other

    cs.LG cs.SD eess.AS

    ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement

    Authors: Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Lin, Ashutosh Sabharwal

    Abstract: Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned… ▽ More

    Submitted 24 March, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP'23

  17. A Deep Reinforcement Learning-Based Resource Scheduler for Massive MIMO Networks

    Authors: Qing An, Santiago Segarra, Chris Dick, Ashutosh Sabharwal, Rahman Doost-Mohammady

    Abstract: The large number of antennas in massive MIMO systems allows the base station to communicate with multiple users at the same time and frequency resource with multi-user beamforming. However, highly correlated user channels could drastically impede the spectral efficiency that multi-user beamforming can achieve. As such, it is critical for the base station to schedule a suitable group of users in ea… ▽ More

    Submitted 13 September, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: IEEE Transactions on Machine Learning in Communications and Networking (TMLCN) 2023

  18. arXiv:2301.12726  [pdf, other

    cs.CL cs.AI cs.LG

    Specializing Smaller Language Models towards Multi-Step Reasoning

    Authors: Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot

    Abstract: The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models (100+ billion parameters). We show that such abilities can, in fact, be distilled down from GPT-3.5 ($\ge$ 175B) to T5 variants ($\le$ 11B). We propose model specialization, to specialize the model's ability to… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Preprint

  19. arXiv:2301.04030  [pdf, other

    cs.SI

    Conversational Turn-taking as a Stochastic Process on Networks

    Authors: Lisa O'Bryan, Santiago Segarra, Jensine Paoletti, Stephanie Zajac, Margaret E. Beier, Ashutosh Sabharwal, Matthew Wettergreen, Eduardo Salas

    Abstract: Understanding why certain individuals work well (or poorly) together as a team is a key research focus in the psychological and behavioral sciences and a fundamental problem for team-based organizations. Nevertheless, we have a limited ability to predict the social and work-related dynamics that will emerge from a given combination of team members. In this work, we model vocal turn-taking behavior… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: 5 pages, 2 figures. To be published in the 2022 Conference Proceedings of the Asilomar Conference on Signals, Systems and Computers

  20. arXiv:2212.10534  [pdf, other

    cs.CL

    DISCO: Distilling Counterfactuals with Large Language Models

    Authors: Zeming Chen, Qiyue Gao, Antoine Bosselut, Ashish Sabharwal, Kyle Richardson

    Abstract: Models trained with counterfactually augmented data learn representations of the causal structure of tasks, enabling robust generalization. However, high-quality counterfactual data is scarce for most tasks and not easily generated at scale. When crowdsourced, such data is typically limited in scale and diversity; when generated using supervised methods, it is computationally expensive to extend t… ▽ More

    Submitted 5 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023 camera ready, final title change

  21. arXiv:2212.10509  [pdf, other

    cs.CL

    Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While using the question to retrieve relevant text from an external knowledge source he… ▽ More

    Submitted 22 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL'23 Camera Ready

  22. arXiv:2211.07950  [pdf, other

    cs.CL

    Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs

    Authors: Kyle Richardson, Ronen Tamari, Oren Sultan, Reut Tsarfaty, Dafna Shahaf, Ashish Sabharwal

    Abstract: Can we teach natural language understanding models to track their beliefs through intermediate points in text? We propose a representation learning framework called breakpoint modeling that allows for learning of this type. Given any text encoder and data marked with intermediate states (breakpoints) along with corresponding textual queries viewed as true/false propositions (i.e., the candidate be… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  23. arXiv:2210.17517  [pdf, other

    cs.CL cs.AI

    Lila: A Unified Benchmark for Mathematical Reasoning

    Authors: Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan

    Abstract: Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., q… ▽ More

    Submitted 8 March, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

    MSC Class: 68T50 ACM Class: I.2.7

  24. arXiv:2210.02671  [pdf, other

    cs.LG cs.CC

    A Logic for Expressing Log-Precision Transformers

    Authors: William Merrill, Ashish Sabharwal

    Abstract: One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed that finite-precision transformers can be equivalently expressed in a generalization of first-order logic. However, finite-precision transformers are a weak transformer variant because, as we show, a sin… ▽ More

    Submitted 6 November, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: May 24, 2023: Restructured version of old preprint. Oct 12, 2023: To appear at NeurIPS

  25. arXiv:2210.02406  [pdf, other

    cs.CL

    Decomposed Prompting: A Modular Approach for Solving Complex Tasks

    Authors: Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

    Abstract: Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by deco… ▽ More

    Submitted 11 April, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: ICLR'23 Camera Ready

  26. arXiv:2210.00720  [pdf, other

    cs.CL cs.AI cs.LG

    Complexity-Based Prompting for Multi-Step Reasoning

    Authors: Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot

    Abstract: We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make… ▽ More

    Submitted 30 January, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Preprint

  27. arXiv:2209.03901  [pdf, other

    cs.SD cs.AI eess.AS

    Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment

    Authors: Bishal Lamichhane, Nidal Moukaddam, Ankit B. Patel, Ashutosh Sabharwal

    Abstract: Psychomotor retardation in depression has been associated with speech timing changes from dyadic clinical interviews. In this work, we investigate speech timing features from free-living dyadic interactions. Apart from the possibility of continuous monitoring to complement clinical visits, a study in free-living conditions would also allow inferring sociability features such as dyadic interaction… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: Accepted to INTERSPEECH 2022

  28. arXiv:2207.00729  [pdf, other

    cs.CC cs.CL

    The Parallelism Tradeoff: Limitations of Log-Precision Transformers

    Authors: William Merrill, Ashish Sabharwal

    Abstract: Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens (and whose feedforward nets are computable using space linear in their input) can be simulated by constant-depth logspace-uniform threshold circuits. This… ▽ More

    Submitted 26 April, 2023; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted at TACL. Formerly entitled "Log-Precision Transformers are Constant-Depth Threshold Circuits". Updated with minor corrections in Section 2 (Implications) on March 6, 2023. Update with minor edits to the proof of Lemma 3 on April 26, 2023

  29. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  30. arXiv:2205.12496  [pdf, other

    cs.CL cs.AI

    Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Question-answering datasets require a broad set of reasoning skills. We show how to use question decompositions to teach language models these broad reasoning skills in a robust fashion. Specifically, we use widely available QDMR representations to programmatically create hard-to-cheat synthetic contexts for real questions in six multi-step reasoning datasets. These contexts are carefully designed… ▽ More

    Submitted 3 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP'22

  31. arXiv:2205.03685  [pdf, other

    cs.CL

    Better Retrieval May Not Lead to Better Question Answering

    Authors: Zhengzhong Liang, Tushar Khot, Steven Bethard, Mihai Surdeanu, Ashish Sabharwal

    Abstract: Considerable progress has been made recently in open-domain question answering (QA) problems, which require Information Retrieval (IR) and Reading Comprehension (RC). A popular approach to improve the system's performance is to improve the quality of the retrieved context from the IR stage. In this work we show that for StrategyQA, a challenging open-domain QA dataset that requires multi-hop reaso… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: 10 pages

  32. arXiv:2204.09148  [pdf, other

    cs.CL cs.AI

    What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

    Authors: Matthew Finlayson, Kyle Richardson, Ashish Sabharwal, Peter Clark

    Abstract: The instruction learning paradigm -- where a model learns to perform new tasks from task descriptions alone -- has become popular in general-purpose model research. The capabilities of large transformer models as instruction learners, however, remain poorly understood. We use a controlled synthetic environment to characterize such capabilities. Specifically, we use the task of deciding whether a g… ▽ More

    Submitted 24 May, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Typos corrected, rewordings

    MSC Class: 68T50 ACM Class: I.2.7

  33. arXiv:2112.09054  [pdf, other

    cs.CL

    Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

    Authors: Kyle Richardson, Ashish Sabharwal

    Abstract: Investigating the reasoning abilities of transformer models, and discovering new challenging tasks for them, has been a topic of much interest. Recent studies have found these models to be surprisingly strong at performing deductive reasoning over formal logical theories expressed in natural language. A shortcoming of these studies, however, is that they do not take into account that logical theor… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted to AAAI-2022, AAAI preprint

  34. arXiv:2112.08348  [pdf, other

    cs.CL

    Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

    Authors: Daniel Khashabi, Shane Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal, Sameer Singh, Yejin Choi

    Abstract: Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of extracting a discrete (textual) interpretation of continuous prompts that is faithful to the problem they solve. In practice, we observe a "wayward" behavior between the task solved by continuous prompts and… ▽ More

    Submitted 4 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  35. arXiv:2110.14207  [pdf, other

    cs.CL cs.AI

    How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI

    Authors: Ashwin Kalyan, Abhinav Kumar, Arjun Chandrasekaran, Ashish Sabharwal, Peter Clark

    Abstract: Many real-world problems require the combined application of multiple reasoning abilities employing suitable abstractions, commonsense knowledge, and creative synthesis of problem-solving strategies. To help advance AI systems towards such capabilities, we propose a new reasoning challenge, namely Fermi Problems (FPs), which are questions whose answers can only be approximately estimated because t… ▽ More

    Submitted 20 December, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at EMNLP 2021, 11 pages, 5 tables, 4 figures

  36. arXiv:2110.08542  [pdf, other

    cs.CL

    Hey AI, Can You Solve Complex Tasks by Talking to Agents?

    Authors: Tushar Khot, Kyle Richardson, Daniel Khashabi, Ashish Sabharwal

    Abstract: Training giant models from scratch for each complex task is resource- and data-inefficient. To help develop models that can leverage existing systems, we propose a new challenge: Learning to solve complex tasks by communicating with existing agents (or models) in natural language. We design a synthetic benchmark, CommaQA, with three complex reasoning tasks (explicit, implicit, numeric) designed to… ▽ More

    Submitted 9 May, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of ACL 2022

  37. arXiv:2110.07053  [pdf, other

    eess.SP cs.LG

    Robust MIMO Detection using Hypernetworks with Learned Regularizers

    Authors: Nicolas Zilberstein, Chris Dick, Rahman Doost-Mohammady, Ashutosh Sabharwal, Santiago Segarra

    Abstract: Optimal symbol detection in multiple-input multiple-output (MIMO) systems is known to be an NP-hard problem. Recently, there has been a growing interest to get reasonably close to the optimal solution using neural networks while keeping the computational complexity in check. However, existing work based on deep learning shows that it is difficult to design a generic network that works well for a v… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  38. arXiv:2108.00573  [pdf, other

    cs.CL cs.AI

    MuSiQue: Multihop Questions via Single-hop Question Composition

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Multihop reasoning remains an elusive goal as existing multihop benchmarks are known to be largely solvable via shortcuts. Can we create a question answering (QA) dataset that, by construction, \emph{requires} proper multihop reasoning? To this end, we introduce a bottom-up approach that systematically selects composable pairs of single-hop questions that are connected, i.e., where one reasoning s… ▽ More

    Submitted 5 May, 2022; v1 submitted 1 August, 2021; originally announced August 2021.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2022

  39. arXiv:2106.16213  [pdf, other

    cs.CL cs.CC cs.LG

    Saturated Transformers are Constant-Depth Threshold Circuits

    Authors: William Merrill, Ashish Sabharwal, Noah A. Smith

    Abstract: Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with hard attention are quite limited in power (Hahn, 2020), as they can be simulated by constant-depth AND/OR circuits (Hao et al. 2021). However, hard attention is a strong assumption, which may… ▽ More

    Submitted 10 April, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: To appear in TACL

  40. arXiv:2106.01465  [pdf, other

    cs.CL cs.AI cs.LG

    Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

    Authors: Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Kai-Wei Chang

    Abstract: Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 9 pages, Findings of ACL-IJCNLP 2021

  41. arXiv:2104.08727  [pdf, other

    cs.CL cs.AI

    GooAQ: Open Question Answering with Diverse Answer Types

    Authors: Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh Hajishirzi, Chris Callison-Burch

    Abstract: While day-to-day questions come with a variety of answer types, the current question-answering (QA) literature has failed to adequately address the answer diversity of questions. To this end, we present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatic… ▽ More

    Submitted 10 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: EMNLP-Findings 2021

  42. arXiv:2103.12248  [pdf, other

    cs.CV cs.CL

    Multi-Modal Answer Validation for Knowledge-Based VQA

    Authors: Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi

    Abstract: The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in various forms, including visual, textual, and commonsense knowledge. Using more knowledge sources increases the chance of retrieving more irrelevant or noisy facts, making it challenging to comprehend the fa… ▽ More

    Submitted 13 December, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: AAAI 2022

  43. arXiv:2102.03315  [pdf, other

    cs.CL cs.AI

    Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

    Authors: Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark

    Abstract: We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  44. arXiv:2010.12854  [pdf, other

    cs.CL cs.AI

    ReadOnce Transformers: Reusable Representations of Text for Transformers

    Authors: Shih-Ting Lin, Ashish Sabharwal, Tushar Khot

    Abstract: We present ReadOnce Transformers, an approach to convert a transformer-based model into one that can build an information-capturing, task-independent, and compressed representation of text. The resulting representation is reusable across different examples and tasks, thereby requiring a document shared across many examples or tasks to only be \emph{read once}. This leads to faster training and eva… ▽ More

    Submitted 3 August, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: Accepted to ACL 2021(Camera Ready)

  45. arXiv:2010.12753  [pdf, other

    cs.CL

    Temporal Reasoning on Implicit Events from Distant Supervision

    Authors: Ben Zhou, Kyle Richardson, Qiang Ning, Tushar Khot, Ashish Sabharwal, Dan Roth

    Abstract: We propose TRACIE, a novel temporal reasoning dataset that evaluates the degree to which systems understand implicit events -- events that are not mentioned explicitly in natural language text but can be inferred from it. This introduces a new challenge in temporal reasoning research, where prior work has focused on explicitly mentioned events. Human readers can infer implicit events via commonsen… ▽ More

    Submitted 7 May, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted at NAACL 2021

  46. arXiv:2010.02428  [pdf, other

    cs.CL

    UnQovering Stereotyping Biases via Underspecified Questions

    Authors: Tao Li, Tushar Khot, Daniel Khashabi, Ashish Sabharwal, Vivek Srikumar

    Abstract: While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence an… ▽ More

    Submitted 9 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Accepted at Findings of EMNLP 2020

  47. arXiv:2009.00751  [pdf, other

    cs.CL cs.AI

    Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models

    Authors: Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish Sabharwal

    Abstract: We propose a general framework called Text Modular Networks(TMNs) for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models. To ensure solvability of simpler tasks, TMNs learn the textual input-output behavior (i.e., language) of existing models through their datasets. This differs from prior decomposition-based approache… ▽ More

    Submitted 12 April, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

    Comments: Accepted to NAACL 2021

  48. arXiv:2007.12761  [pdf, other

    cs.CY cs.HC cs.IR

    Understanding Reflection Needs for Personal Health Data in Diabetes

    Authors: Temiloluwa Prioleau, Ashutosh Sabharwal, Madhuri M. Vasudevan

    Abstract: To empower users of wearable medical devices, it is important to enable methods that facilitate reflection on previous care to improve future outcomes. In this work, we conducted a two-phase user-study involving patients, caregivers, and clinicians to understand gaps in current approaches that support reflection and user needs for new solutions. Our results show that users desire to have specific… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: 11 pages, 6 figures, paper to appear in Pervasive Health 2020

    ACM Class: J.3

  49. arXiv:2007.00295  [pdf, ps, other

    cs.LG stat.ML

    Belief Propagation Neural Networks

    Authors: Jonathan Kuck, Shuvam Chakraborty, Hao Tang, Rachel Luo, Jiaming Song, Ashish Sabharwal, Stefano Ermon

    Abstract: Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

  50. arXiv:2005.00789  [pdf, other

    cs.CL cs.AI cs.LG

    Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Has there been real progress in multi-hop question-answering? Models often exploit dataset artifacts to produce correct answers, without connecting information across multiple supporting facts. This limits our ability to measure true progress and defeats the purpose of building multi-hop QA datasets. We make three contributions towards addressing this. First, we formalize such undesirable behavior… ▽ More

    Submitted 16 November, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted at EMNLP'20