Skip to main content

Showing 1–50 of 147 results for author: Sabharwal, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13109  [pdf, ps, other

    cs.CL cs.AI

    Leveraging In-Context Learning for Language Model Agents

    Authors: Shivanshu Gupta, Sameer Singh, Ashish Sabharwal, Tushar Khot, Ben Bogin

    Abstract: In-context learning (ICL) with dynamically selected demonstrations combines the flexibility of prompting large language models (LLMs) with the ability to leverage training data to improve performance. While ICL has been highly successful for prediction and generation tasks, leveraging it for agentic tasks that require sequential decision making is challenging -- one must think not only about how t… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 16 pages, 12 figures

  2. arXiv:2506.04206  [pdf, ps, other

    cs.LG

    A Few Moments Please: Scalable Graphon Learning via Moment Matching

    Authors: Reza Ramezanpour, Victor M. Tenorio, Antonio G. Marques, Ashutosh Sabharwal, Santiago Segarra

    Abstract: Graphons, as limit objects of dense graph sequences, play a central role in the statistical analysis of network data. However, existing graphon estimation methods often struggle with scalability to large networks and resolution-independent approximation, due to their reliance on estimating latent variables or costly metrics such as the Gromov-Wasserstein distance. In this work, we propose a novel,… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  3. arXiv:2505.18948  [pdf, ps, other

    cs.LG cs.CC cs.FL

    Exact Expressive Power of Transformers with Padding

    Authors: William Merrill, Ashish Sabharwal

    Abstract: Chain of thought is a natural inference-time method for increasing the computational power of transformer-based large language models (LLMs), but comes at the cost of sequential decoding. Are there more efficient alternatives to expand a transformer's expressive power without adding parameters? We consider transformers with padding tokens as a form of parallelizable test-time compute. We show that… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  4. arXiv:2503.03961  [pdf, other

    cs.LG cs.CC

    A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

    Authors: William Merrill, Ashish Sabharwal

    Abstract: Recent theoretical results show transformers cannot express sequential reasoning problems over long inputs, intuitively because their computational depth is bounded. However, prior work treats the depth as a constant, leaving it unclear to what degree bounded depth may suffice for solving problems over short inputs, or how increasing the transformer's depth affects its expressive power. We address… ▽ More

    Submitted 22 May, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: Preprint

  5. arXiv:2502.01100  [pdf, other

    cs.AI cs.CL cs.LG

    ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

    Authors: Bill Yuchen Lin, Ronan Le Bras, Kyle Richardson, Ashish Sabharwal, Radha Poovendran, Peter Clark, Yejin Choi

    Abstract: We investigate the logical reasoning capabilities of large language models (LLMs) and their scalability in complex non-monotonic reasoning. To this end, we introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance on logic grid puzzles derived from constraint satisfaction problems (CSPs). ZebraLogic enables the generation of puzzles with controllable and qu… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: Website: https://huggingface.co/spaces/WildEval/ZebraLogic

  6. arXiv:2412.17696  [pdf, other

    cs.CL

    Understanding the Logic of Direct Preference Alignment through Logic

    Authors: Kyle Richardson, Vivek Srikumar, Ashish Sabharwal

    Abstract: Recent direct preference alignment algorithms (DPA), such as DPO, have shown great promise in aligning large language models to human preferences. While this has motivated the development of many new variants of the original DPO loss, understanding the differences between these recent proposals, as well as developing new DPA loss functions, remains difficult given the lack of a technical and conce… ▽ More

    Submitted 27 March, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

  7. arXiv:2409.07440  [pdf, other

    cs.AI cs.CL cs.SE

    SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

    Authors: Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot

    Abstract: Given that Large Language Models (LLMs) have made significant progress in writing code, can they now be used to autonomously reproduce results from research repositories? Such a capability would be a boon to the research community, helping researchers validate, understand, and extend prior work. To advance towards this goal, we introduce SUPER, the first benchmark designed to evaluate the capabili… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  8. arXiv:2407.18901  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

    Authors: Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian

    Abstract: Autonomous agents that address day-to-day digital tasks (e.g., ordering groceries for a household), must not only operate multiple apps (e.g., notes, messaging, shopping app) via APIs, but also generate rich code with complex control flow in an iterative manner based on their interaction with the environment. However, existing benchmarks for tool use are inadequate, as they only cover tasks that r… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: ACL'24 Camera Ready

  9. arXiv:2407.15018  [pdf, other

    cs.CL cs.AI

    Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

    Authors: Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov, Hannaneh Hajishirzi, Ashish Sabharwal

    Abstract: Multiple-choice question answering (MCQA) is a key competence of performant transformer language models that is tested by mainstream benchmarks. However, recent evidence shows that models can have quite a range of performance, particularly when the task format is diversified slightly (such as by shuffling answer choice order). In this work we ask: how do successful models perform formatted MCQA? W… ▽ More

    Submitted 7 March, 2025; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: ICLR 2025 (spotlight). Substantially updated from previous preprint to contain experiments on 4-way multiple-choice with various answer choice symbols, 3 open model families, and extensive activation patching results, including on individual attention heads

  10. arXiv:2407.01725  [pdf, other

    cs.CL cs.AI cs.LG

    DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systemat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Website: https://github.com/allenai/discoverybench

  11. arXiv:2404.08819  [pdf, other

    cs.LG cs.CC cs.CL cs.FL

    The Illusion of State in State-Space Models

    Authors: William Merrill, Jackson Petty, Ashish Sabharwal

    Abstract: State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill & Sabharwal, 2023), which SSMs are explicitly designed to address via their cl… ▽ More

    Submitted 5 March, 2025; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: To appear at ICML 2024. 9 pages + appendices

  12. arXiv:2404.02040  [pdf, other

    cs.FL cs.LG

    Transformers as Transducers

    Authors: Lena Strobl, Dana Angluin, David Chiang, Jonathan Rawski, Ashish Sabharwal

    Abstract: We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of transductions. We do so using variants of RASP, a programming language designed to help people "think like transformers," as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence funct… ▽ More

    Submitted 5 November, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: To appear in Transactions of the Association for Computational Linguistics

  13. arXiv:2402.13610  [pdf, other

    cs.CL cs.AI cs.LG

    Data-driven Discovery with Large Generative Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Sanchaita Hazra, Ashish Sabharwal, Peter Clark

    Abstract: With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery -- a paradigm encompassing the search and verification of hypotheses purely from a se… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  14. arXiv:2402.02656  [pdf, other

    cs.CL q-bio.QM

    RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews

    Authors: Satpreet Harcharan Singh, Kevin Jiang, Kanchan Bhasin, Ashutosh Sabharwal, Nidal Moukaddam, Ankit B Patel

    Abstract: Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for l… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  15. arXiv:2311.09519  [pdf, other

    cs.CL

    Leveraging Code to Improve In-context Learning for Semantic Parsing

    Authors: Ben Bogin, Shivanshu Gupta, Peter Clark, Ashish Sabharwal

    Abstract: In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. However, learning to parse to rare domain-specific languages (DSLs) from just a few demonstrations is challenging, limiting the performance of even the most capable LLMs. In this work, we improve the effectiveness of ICL for semantic parsing by (1) using general-purpose p… ▽ More

    Submitted 27 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024

  16. arXiv:2311.05772  [pdf, other

    cs.AI cs.CL cs.LG

    ADaPT: As-Needed Decomposition and Planning with Language Models

    Authors: Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot

    Abstract: Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative executors) or generating plans and executing sub-tasks using LLMs (plan-and-execute). However, these methods struggle with task complexity, as the… ▽ More

    Submitted 8 April, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 (findings) camera-ready. Project Page: https://allenai.github.io/adaptllm

  17. arXiv:2311.04892  [pdf, other

    cs.CL

    Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

    Authors: Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande, Ashwin Kalyan, Peter Clark, Ashish Sabharwal, Tushar Khot

    Abstract: Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs' capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of… ▽ More

    Submitted 27 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Project page: https://allenai.github.io/persona-bias. Paper to appear at ICLR 2024. Added results for other LLMs in v2 (similar findings)

  18. arXiv:2311.02807  [pdf, other

    cs.LG cs.AI cs.CL

    QualEval: Qualitative Evaluation for Model Improvement

    Authors: Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan

    Abstract: Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world tasks, a single scalar to quantify and compare is insufficient to capture the fine-grained nuances of model behavior. Metrics serve only as a… ▽ More

    Submitted 5 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  19. arXiv:2310.07923  [pdf, ps, other

    cs.LG cs.CC cs.CL cs.LO

    The Expressive Power of Transformers with Chain of Thought

    Authors: William Merrill, Ashish Sabharwal

    Abstract: Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after reading their input. However, in practice, transformers' reasoning can be improved by allowing them to use a "chain of thought" or "scratchpad",… ▽ More

    Submitted 11 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 9-page preprint. ICLR camera ready posted April 11

  20. arXiv:2310.01693  [pdf, other

    cs.CL

    Closing the Curious Case of Neural Text Degeneration

    Authors: Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal

    Abstract: Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonze… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    MSC Class: 68T50 ACM Class: I.2.7

  21. arXiv:2305.14596  [pdf, other

    cs.CL cs.LG

    Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy

    Authors: Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal

    Abstract: When pretrained language models (LMs) are applied to discriminative tasks such as multiple-choice questions, they place probability mass on vocabulary tokens that aren't among the given answer choices. Spreading probability mass across multiple surface forms with identical meaning (such as "bath" and "bathtub") is thought to cause an underestimation of a model's true performance, referred to as th… ▽ More

    Submitted 31 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  22. arXiv:2305.14250  [pdf, other

    cs.CL cs.AI

    Language Models with Rationality

    Authors: Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schuetze, Peter Clark

    Abstract: While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent "beliefs". This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that… ▽ More

    Submitted 29 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  23. arXiv:2305.14010  [pdf, other

    cs.CL

    IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

    Authors: Wenhao Yu, Meng Jiang, Peter Clark, Ashish Sabharwal

    Abstract: Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we introduce the first such dataset, named IfQA, where each question is based on a counterfactual presupposition via an "if" clause. For example, if L… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  24. arXiv:2305.14002  [pdf, other

    cs.CL

    Improving Language Models via Plug-and-Play Retrieval Feedback

    Authors: Wenhao Yu, Zhihan Zhang, Zhenwen Liang, Meng Jiang, Ashish Sabharwal

    Abstract: Large language models (LLMs) exhibit remarkable performance across various NLP tasks. However, they often generate incorrect or hallucinated information, which hinders their practical applicability in real-world scenarios. Human feedback has been shown to effectively enhance the factuality and quality of generated content, addressing some of these limitations. However, this approach is resource-in… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  25. arXiv:2304.08789  [pdf, other

    cs.IT eess.SP

    Full-Duplex Wireless for 6G: Progress Brings New Opportunities and Challenges

    Authors: Besma Smida, Ashutosh Sabharwal, Gabor Fodor, George C. Alexandropoulos, Himal A. Suraweera, Chan-Byoung Chae

    Abstract: The use of in-band full-duplex (FD) enables nodes to simultaneously transmit and receive on the same frequency band, which challenges the traditional assumption in wireless network design. The full-duplex capability enhances spectral efficiency and decreases latency, which are two key drivers pushing the performance expectations of next-generation mobile networks. In less than ten years, in-band F… ▽ More

    Submitted 24 April, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: 21 pages, 15 figures, accepted to an IEEE Journal

  26. arXiv:2303.10727  [pdf, other

    cs.LG cs.SD eess.AS

    ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement

    Authors: Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Celine Lin, Ashutosh Sabharwal

    Abstract: Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned… ▽ More

    Submitted 28 March, 2025; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP'23

  27. A Deep Reinforcement Learning-Based Resource Scheduler for Massive MIMO Networks

    Authors: Qing An, Santiago Segarra, Chris Dick, Ashutosh Sabharwal, Rahman Doost-Mohammady

    Abstract: The large number of antennas in massive MIMO systems allows the base station to communicate with multiple users at the same time and frequency resource with multi-user beamforming. However, highly correlated user channels could drastically impede the spectral efficiency that multi-user beamforming can achieve. As such, it is critical for the base station to schedule a suitable group of users in ea… ▽ More

    Submitted 13 September, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: IEEE Transactions on Machine Learning in Communications and Networking (TMLCN) 2023

  28. arXiv:2301.12726  [pdf, other

    cs.CL cs.AI cs.LG

    Specializing Smaller Language Models towards Multi-Step Reasoning

    Authors: Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot

    Abstract: The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models (100+ billion parameters). We show that such abilities can, in fact, be distilled down from GPT-3.5 ($\ge$ 175B) to T5 variants ($\le$ 11B). We propose model specialization, to specialize the model's ability to… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Preprint

  29. arXiv:2301.04030  [pdf, other

    cs.SI

    Conversational Turn-taking as a Stochastic Process on Networks

    Authors: Lisa O'Bryan, Santiago Segarra, Jensine Paoletti, Stephanie Zajac, Margaret E. Beier, Ashutosh Sabharwal, Matthew Wettergreen, Eduardo Salas

    Abstract: Understanding why certain individuals work well (or poorly) together as a team is a key research focus in the psychological and behavioral sciences and a fundamental problem for team-based organizations. Nevertheless, we have a limited ability to predict the social and work-related dynamics that will emerge from a given combination of team members. In this work, we model vocal turn-taking behavior… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: 5 pages, 2 figures. To be published in the 2022 Conference Proceedings of the Asilomar Conference on Signals, Systems and Computers

  30. arXiv:2212.10534  [pdf, other

    cs.CL

    DISCO: Distilling Counterfactuals with Large Language Models

    Authors: Zeming Chen, Qiyue Gao, Antoine Bosselut, Ashish Sabharwal, Kyle Richardson

    Abstract: Models trained with counterfactually augmented data learn representations of the causal structure of tasks, enabling robust generalization. However, high-quality counterfactual data is scarce for most tasks and not easily generated at scale. When crowdsourced, such data is typically limited in scale and diversity; when generated using supervised methods, it is computationally expensive to extend t… ▽ More

    Submitted 5 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023 camera ready, final title change

  31. arXiv:2212.10509  [pdf, other

    cs.CL

    Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While using the question to retrieve relevant text from an external knowledge source he… ▽ More

    Submitted 22 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL'23 Camera Ready

  32. arXiv:2211.07950  [pdf, other

    cs.CL

    Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs

    Authors: Kyle Richardson, Ronen Tamari, Oren Sultan, Reut Tsarfaty, Dafna Shahaf, Ashish Sabharwal

    Abstract: Can we teach natural language understanding models to track their beliefs through intermediate points in text? We propose a representation learning framework called breakpoint modeling that allows for learning of this type. Given any text encoder and data marked with intermediate states (breakpoints) along with corresponding textual queries viewed as true/false propositions (i.e., the candidate be… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  33. arXiv:2210.17517  [pdf, other

    cs.CL cs.AI

    Lila: A Unified Benchmark for Mathematical Reasoning

    Authors: Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan

    Abstract: Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., q… ▽ More

    Submitted 8 March, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

    MSC Class: 68T50 ACM Class: I.2.7

  34. arXiv:2210.02671  [pdf, other

    cs.LG cs.CC

    A Logic for Expressing Log-Precision Transformers

    Authors: William Merrill, Ashish Sabharwal

    Abstract: One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed that finite-precision transformers can be equivalently expressed in a generalization of first-order logic. However, finite-precision transformers are a weak transformer variant because, as we show, a sin… ▽ More

    Submitted 6 November, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: May 24, 2023: Restructured version of old preprint. Oct 12, 2023: To appear at NeurIPS

  35. arXiv:2210.02406  [pdf, other

    cs.CL

    Decomposed Prompting: A Modular Approach for Solving Complex Tasks

    Authors: Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

    Abstract: Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by deco… ▽ More

    Submitted 11 April, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: ICLR'23 Camera Ready

  36. arXiv:2210.00720  [pdf, other

    cs.CL cs.AI cs.LG

    Complexity-Based Prompting for Multi-Step Reasoning

    Authors: Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot

    Abstract: We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make… ▽ More

    Submitted 30 January, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Preprint

  37. arXiv:2209.03901  [pdf, other

    cs.SD cs.AI eess.AS

    Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment

    Authors: Bishal Lamichhane, Nidal Moukaddam, Ankit B. Patel, Ashutosh Sabharwal

    Abstract: Psychomotor retardation in depression has been associated with speech timing changes from dyadic clinical interviews. In this work, we investigate speech timing features from free-living dyadic interactions. Apart from the possibility of continuous monitoring to complement clinical visits, a study in free-living conditions would also allow inferring sociability features such as dyadic interaction… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: Accepted to INTERSPEECH 2022

  38. arXiv:2207.00729  [pdf, other

    cs.CC cs.CL

    The Parallelism Tradeoff: Limitations of Log-Precision Transformers

    Authors: William Merrill, Ashish Sabharwal

    Abstract: Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens (and whose feedforward nets are computable using space linear in their input) can be simulated by constant-depth logspace-uniform threshold circuits. This… ▽ More

    Submitted 26 April, 2023; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted at TACL. Formerly entitled "Log-Precision Transformers are Constant-Depth Threshold Circuits". Updated with minor corrections in Section 2 (Implications) on March 6, 2023. Update with minor edits to the proof of Lemma 3 on April 26, 2023

  39. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  40. arXiv:2205.12496  [pdf, other

    cs.CL cs.AI

    Teaching Broad Reasoning Skills for Multi-Step QA by Generating Hard Contexts

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Question-answering datasets require a broad set of reasoning skills. We show how to use question decompositions to teach language models these broad reasoning skills in a robust fashion. Specifically, we use widely available QDMR representations to programmatically create hard-to-cheat synthetic contexts for real questions in six multi-step reasoning datasets. These contexts are carefully designed… ▽ More

    Submitted 3 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP'22

  41. arXiv:2205.03685  [pdf, other

    cs.CL

    Better Retrieval May Not Lead to Better Question Answering

    Authors: Zhengzhong Liang, Tushar Khot, Steven Bethard, Mihai Surdeanu, Ashish Sabharwal

    Abstract: Considerable progress has been made recently in open-domain question answering (QA) problems, which require Information Retrieval (IR) and Reading Comprehension (RC). A popular approach to improve the system's performance is to improve the quality of the retrieved context from the IR stage. In this work we show that for StrategyQA, a challenging open-domain QA dataset that requires multi-hop reaso… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: 10 pages

  42. arXiv:2204.09148  [pdf, other

    cs.CL cs.AI

    What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

    Authors: Matthew Finlayson, Kyle Richardson, Ashish Sabharwal, Peter Clark

    Abstract: The instruction learning paradigm -- where a model learns to perform new tasks from task descriptions alone -- has become popular in general-purpose model research. The capabilities of large transformer models as instruction learners, however, remain poorly understood. We use a controlled synthetic environment to characterize such capabilities. Specifically, we use the task of deciding whether a g… ▽ More

    Submitted 24 May, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Typos corrected, rewordings

    MSC Class: 68T50 ACM Class: I.2.7

  43. arXiv:2112.09054  [pdf, other

    cs.CL

    Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

    Authors: Kyle Richardson, Ashish Sabharwal

    Abstract: Investigating the reasoning abilities of transformer models, and discovering new challenging tasks for them, has been a topic of much interest. Recent studies have found these models to be surprisingly strong at performing deductive reasoning over formal logical theories expressed in natural language. A shortcoming of these studies, however, is that they do not take into account that logical theor… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted to AAAI-2022, AAAI preprint

  44. arXiv:2112.08348  [pdf, other

    cs.CL

    Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

    Authors: Daniel Khashabi, Shane Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal, Sameer Singh, Yejin Choi

    Abstract: Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of extracting a discrete (textual) interpretation of continuous prompts that is faithful to the problem they solve. In practice, we observe a "wayward" behavior between the task solved by continuous prompts and… ▽ More

    Submitted 4 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  45. arXiv:2110.14207  [pdf, other

    cs.CL cs.AI

    How Much Coffee Was Consumed During EMNLP 2019? Fermi Problems: A New Reasoning Challenge for AI

    Authors: Ashwin Kalyan, Abhinav Kumar, Arjun Chandrasekaran, Ashish Sabharwal, Peter Clark

    Abstract: Many real-world problems require the combined application of multiple reasoning abilities employing suitable abstractions, commonsense knowledge, and creative synthesis of problem-solving strategies. To help advance AI systems towards such capabilities, we propose a new reasoning challenge, namely Fermi Problems (FPs), which are questions whose answers can only be approximately estimated because t… ▽ More

    Submitted 20 December, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at EMNLP 2021, 11 pages, 5 tables, 4 figures

  46. arXiv:2110.08542  [pdf, other

    cs.CL

    Hey AI, Can You Solve Complex Tasks by Talking to Agents?

    Authors: Tushar Khot, Kyle Richardson, Daniel Khashabi, Ashish Sabharwal

    Abstract: Training giant models from scratch for each complex task is resource- and data-inefficient. To help develop models that can leverage existing systems, we propose a new challenge: Learning to solve complex tasks by communicating with existing agents (or models) in natural language. We design a synthetic benchmark, CommaQA, with three complex reasoning tasks (explicit, implicit, numeric) designed to… ▽ More

    Submitted 9 May, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of ACL 2022

  47. arXiv:2110.07053  [pdf, other

    eess.SP cs.LG

    Robust MIMO Detection using Hypernetworks with Learned Regularizers

    Authors: Nicolas Zilberstein, Chris Dick, Rahman Doost-Mohammady, Ashutosh Sabharwal, Santiago Segarra

    Abstract: Optimal symbol detection in multiple-input multiple-output (MIMO) systems is known to be an NP-hard problem. Recently, there has been a growing interest to get reasonably close to the optimal solution using neural networks while keeping the computational complexity in check. However, existing work based on deep learning shows that it is difficult to design a generic network that works well for a v… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  48. arXiv:2108.00573  [pdf, other

    cs.CL cs.AI

    MuSiQue: Multihop Questions via Single-hop Question Composition

    Authors: Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

    Abstract: Multihop reasoning remains an elusive goal as existing multihop benchmarks are known to be largely solvable via shortcuts. Can we create a question answering (QA) dataset that, by construction, \emph{requires} proper multihop reasoning? To this end, we introduce a bottom-up approach that systematically selects composable pairs of single-hop questions that are connected, i.e., where one reasoning s… ▽ More

    Submitted 5 May, 2022; v1 submitted 1 August, 2021; originally announced August 2021.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2022

  49. arXiv:2106.16213  [pdf, other

    cs.CL cs.CC cs.LG

    Saturated Transformers are Constant-Depth Threshold Circuits

    Authors: William Merrill, Ashish Sabharwal, Noah A. Smith

    Abstract: Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages. Recent work has shown that transformers with hard attention are quite limited in power (Hahn, 2020), as they can be simulated by constant-depth AND/OR circuits (Hao et al. 2021). However, hard attention is a strong assumption, which may… ▽ More

    Submitted 10 April, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: To appear in TACL

  50. arXiv:2106.01465  [pdf, other

    cs.CL cs.AI cs.LG

    Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

    Authors: Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Kai-Wei Chang

    Abstract: Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 9 pages, Findings of ACL-IJCNLP 2021