Skip to main content

Showing 1–27 of 27 results for author: Laban, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16251  [pdf, ps, other

    cs.CR cs.AI cs.CL

    Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

    Authors: Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and ope… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  2. arXiv:2404.10774  [pdf, other

    cs.CL cs.AI

    MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

    Authors: Liyan Tang, Philippe Laban, Greg Durrett

    Abstract: Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of "fact-checking" are based on verifying each piece of a model generation against potential evidence using an LLM. However, this process can be very computationally expensive, requiring many call… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: LLM-AggreFact benchmark, MiniCheck models, data generation code at https://github.com/Liyan06/MiniCheck

  3. arXiv:2311.08596  [pdf, other

    cs.CL

    Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment

    Authors: Philippe Laban, Lidiya Murakhovs'ka, Caiming Xiong, Chien-Sheng Wu

    Abstract: The interactive nature of Large Language Models (LLMs) theoretically allows models to refine and improve their answers, yet systematic analysis of the multi-turn behavior of LLMs remains limited. In this paper, we propose the FlipFlop experiment: in the first round of the conversation, an LLM completes a classification task. In a second round, the LLM is challenged with a follow-up phrase like "Ar… ▽ More

    Submitted 21 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  4. arXiv:2310.17749  [pdf, other

    cs.CL cs.AI

    Salespeople vs SalesBot: Exploring the Role of Educational Value in Conversational Recommender Systems

    Authors: Lidiya Murakhovs'ka, Philippe Laban, Tian Xie, Caiming Xiong, Chien-Sheng Wu

    Abstract: Making big purchases requires consumers to research or consult a salesperson to gain domain expertise. However, existing conversational recommender systems (CRS) often overlook users' lack of background knowledge, focusing solely on gathering preferences. In this work, we define a new problem space for conversational agents that aim to provide both product recommendations and educational value thr… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  5. arXiv:2310.03878  [pdf, other

    cs.CL

    Automatic and Human-AI Interactive Text Generation

    Authors: Yao Dou, Philippe Laban, Claire Gardent, Wei Xu

    Abstract: In this tutorial, we focus on text-to-text generation, a class of natural language generation (NLG) tasks, that takes a piece of text as input and then generates a revision that is improved according to some specific criteria (e.g., readability or linguistic styles), while largely retaining the original meaning and the length of the text. This includes many useful applications, such as text simpli… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: To appear at ACL 2024, Tutorial

  6. arXiv:2309.15337  [pdf, other

    cs.CL cs.HC

    Beyond the Chat: Executable and Verifiable Text-Editing with LLMs

    Authors: Philippe Laban, Jesse Vig, Marti A. Hearst, Caiming Xiong, Chien-Sheng Wu

    Abstract: Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing. However, standard chat-based conversational interfaces do not support transparency and verifiability of the editing changes that they suggest. To give the author more agency when editing with an LLM, we present InkSync, an editing interface that suggests… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  7. arXiv:2309.14556  [pdf, other

    cs.CL cs.AI cs.HC

    Art or Artifice? Large Language Models and the False Promise of Creativity

    Authors: Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, Chien-Sheng Wu

    Abstract: Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writ… ▽ More

    Submitted 8 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: ACM CHI 2024

  8. arXiv:2309.09369  [pdf, other

    cs.CL

    Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

    Authors: Kung-Hsiang Huang, Philippe Laban, Alexander R. Fabbri, Prafulla Kumar Choubey, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains underexplored. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To… ▽ More

    Submitted 22 March, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: NAACL 2024

  9. arXiv:2309.03450  [pdf, other

    cs.CL cs.AI cs.LG

    XGen-7B Technical Report

    Authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong

    Abstract: Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many t… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  10. arXiv:2306.15774  [pdf

    cs.HC cs.CL cs.CV cs.LG

    Next Steps for Human-Centered Generative AI: A Technical Perspective

    Authors: Xiang 'Anthony' Chen, Jeff Burke, Ruofei Du, Matthew K. Hong, Jennifer Jacobs, Philippe Laban, Dingzeyu Li, Nanyun Peng, Karl D. D. Willis, Chien-Sheng Wu, Bolei Zhou

    Abstract: Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary… ▽ More

    Submitted 22 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  11. arXiv:2306.01150  [pdf, other

    cs.CL cs.AI

    Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning

    Authors: Fan Yin, Jesse Vig, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Jason Wu

    Abstract: Large language models (LLMs) have shown impressive performance in following natural language instructions to solve unseen tasks. However, it remains unclear whether models truly understand task definitions and whether the human-written definitions are optimal. In this paper, we systematically study the role of task definitions in instruction learning. We first conduct an ablation analysis informed… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ACL 2023, camera-ready; 10 pages

  12. arXiv:2305.19204  [pdf, other

    cs.CL

    SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages

    Authors: Philippe Laban, Jesse Vig, Wojciech Kryscinski, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Text simplification research has mostly focused on sentence-level simplification, even though many desirable edits - such as adding relevant background information or reordering content - may require document-level context. Prior work has also predominantly framed simplification as a single-step, input-to-output task, only implicitly modeling the fine-grained, span-level edits that elucidate the s… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: ACL 2023, Long Paper

  13. arXiv:2305.14540  [pdf, other

    cs.CL

    LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

    Authors: Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu

    Abstract: With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency de… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  14. arXiv:2302.08997  [pdf, other

    cs.HC cs.CL

    Designing and Evaluating Interfaces that Highlight News Coverage Diversity Using Discord Questions

    Authors: Philippe Laban, Chien-Sheng Wu, Lidiya Murakhovs'ka, Xiang 'Anthony' Chen, Caiming Xiong

    Abstract: Modern news aggregators do the hard work of organizing a large news stream, creating collections for a given news story with tens of source options. This paper shows that navigating large source collections for a news story can be challenging without further guidance. In this work, we design three interfaces -- the Annotated Article, the Recomposed Article, and the Question Grid -- aimed at accomp… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: CHI2023 Accepted Paper

  15. arXiv:2211.05007  [pdf, other

    cs.CL

    Discord Questions: A Computational Approach To Diversity Analysis in News Coverage

    Authors: Philippe Laban, Chien-Sheng Wu, Lidiya Murakhovs'ka, Xiang 'Anthony' Chen, Caiming Xiong

    Abstract: There are many potential benefits to news readers accessing diverse sources. Modern news aggregators do the hard work of organizing the news, offering readers a plethora of source options, but choosing which source to read remains challenging. We propose a new framework to assist readers in identifying source differences and gaining an understanding of news coverage diversity. The framework is bas… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022 Findings - Long Paper

  16. arXiv:2207.08401  [pdf

    cs.HC

    Marvista: Exploring the Design of a Human-AI Collaborative News Reading Tool

    Authors: Xiang 'Anthony' Chen, Chien-Sheng Wu, Lidiya Murakhovs'ka, Philippe Laban, Tong Niu, Wenhao Liu, Caiming Xiong

    Abstract: We explore the design of Marvista -- a human-AI collaborative tool that employs a suite of natural language processing models to provide end-to-end support for reading online news articles. Before reading an article, Marvista helps a user plan what to read by filtering text based on how much time one can spend and what questions one is interested to find out from the article. During reading, Marvi… ▽ More

    Submitted 23 June, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

  17. arXiv:2205.12854  [pdf, other

    cs.CL cs.AI

    Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

    Authors: Liyan Tang, Tanya Goyal, Alexander R. Fabbri, Philippe Laban, Jiacheng Xu, Semih Yavuz, Wojciech Kryściński, Justin F. Rousseau, Greg Durrett

    Abstract: The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has be… ▽ More

    Submitted 25 May, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted to ACL 2023

  18. arXiv:2205.06871  [pdf, other

    cs.CL

    Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets

    Authors: Philippe Laban, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

    Abstract: Precisely assessing the progress in natural language generation (NLG) tasks is challenging, and human evaluation to establish a preference in a model's output over another is often necessary. However, human evaluation is usually costly, difficult to reproduce, and non-reusable. In this paper, we propose a new and simple automatic evaluation method for NLG called Near-Negative Distinction (NND) tha… ▽ More

    Submitted 9 November, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022 - Long Paper

  19. arXiv:2205.01730  [pdf, other

    cs.CL cs.HC

    Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation

    Authors: Philippe Laban, Chien-Sheng Wu, Lidiya Murakhovs'ka, Wenhao Liu, Caiming Xiong

    Abstract: Question generation (QGen) models are often evaluated with standardized NLG metrics that are based on n-gram overlap. In this paper, we measure whether these metric improvements translate to gains in a practical setting, focusing on the use case of helping teachers automate the generation of reading comprehension quizzes. In our study, teachers building a quiz receive question suggestions, which t… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Accepted at NAACL 2022 Special HCI Theme (Findings, short paper), 10 pages, 6 figures

  20. NewsPod: Automatic and Interactive News Podcasts

    Authors: Philippe Laban, Elicia Ye, Srujay Korlakunta, John Canny, Marti A. Hearst

    Abstract: News podcasts are a popular medium to stay informed and dive deep into news topics. Today, most podcasts are handcrafted by professionals. In this work, we advance the state-of-the-art in automatically generated podcasts, making use of recent advances in natural language processing and text-to-speech technology. We present NewsPod, an automatically generated, interactive news podcast. The podcast… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Comments: Accepted at IUI 2022, 16 pages, 10 figures

  21. arXiv:2111.09525  [pdf, other

    cs.CL

    SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

    Authors: Philippe Laban, Tobias Schnabel, Paul N. Bennett, Marti A. Hearst

    Abstract: In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suffered from a mismatch in input granularity between… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: TACL pre-MIT Press publication version; 11 pages, 2 figures, 5 tables

  22. arXiv:2110.08175  [pdf, other

    cs.CL

    MixQG: Neural Question Generation with Mixed Answer Types

    Authors: Lidiya Murakhovs'ka, Chien-Sheng Wu, Philippe Laban, Tong Niu, Wenhao Liu, Caiming Xiong

    Abstract: Asking good questions is an essential ability for both human and machine intelligence. However, existing neural question generation approaches mainly focus on the short factoid type of answers. In this paper, we propose a neural question generator, MixQG, to bridge this gap. We combine 9 question answering datasets with diverse answer types, including yes/no, multiple-choice, extractive, and abstr… ▽ More

    Submitted 31 May, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: camera-ready version

  23. arXiv:2107.03448  [pdf, other

    cs.CL

    Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

    Authors: Philippe Laban, Luke Dai, Lucas Bandarkar, Marti A. Hearst

    Abstract: The Shuffle Test is the most common task to evaluate whether NLP models can measure coherence in text. Most recent work uses direct supervision on the task; we show that by simply finetuning a RoBERTa model, we can achieve a near perfect accuracy of 97.8%, a state-of-the-art. We argue that this outstanding performance is unlikely to lead to a good model of text coherence, and suggest that the Shuf… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted at ACL-IJCNLP 2021 (short paper), 7 pages, 4 figures

    Journal ref: Association for Computational Linguistics (2021)

  24. arXiv:2107.03444  [pdf, other

    cs.CL

    Keep it Simple: Unsupervised Simplification of Multi-Paragraph Text

    Authors: Philippe Laban, Tobias Schnabel, Paul Bennett, Marti A. Hearst

    Abstract: This work presents Keep it Simple (KiS), a new approach to unsupervised text simplification which learns to balance a reward across three properties: fluency, salience and simplicity. We train the model with a novel algorithm to optimize the reward (k-SCST), in which the model proposes several candidate simplifications, computes each candidate's reward, and encourages candidates that outperform th… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: Accepted at ACL-IJCNLP 2021, 14 pages, 7 figures

    Journal ref: Association for Computational Linguistics (2021)

  25. What's The Latest? A Question-driven News Chatbot

    Authors: Philippe Laban, John Canny, Marti A. Hearst

    Abstract: This work describes an automatic news chatbot that draws content from a diverse set of news articles and creates conversations with a user about the news. Key components of the system include the automatic organization of news articles into topical chatrooms, integration of automatically generated questions into the conversation, and a novel method for choosing which questions to present which avo… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: ACL2020 Demo Track, 8 pages, 5 figures

    Journal ref: ACL Demos (2020) 380-387

  26. arXiv:2105.05391  [pdf, other

    cs.CL

    News Headline Grouping as a Challenging NLU Task

    Authors: Philippe Laban, Lucas Bandarkar, Marti A. Hearst

    Abstract: Recent progress in Natural Language Understanding (NLU) has seen the latest models outperform human performance on many standard tasks. These impressive results have led the community to introspect on dataset limitations, and iterate on more nuanced challenges. In this paper, we introduce the task of HeadLine Grouping (HLG) and a corresponding dataset (HLGD) consisting of 20,056 pairs of news head… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: NAACL2021, 13 pages, 8 figures

  27. The Summary Loop: Learning to Write Abstractive Summaries Without Examples

    Authors: Philippe Laban, Andrew Hsi, John Canny, Marti A. Hearst

    Abstract: This work presents a new approach to unsupervised abstractive summarization based on maximizing a combination of coverage and fluency for a given length constraint. It introduces a novel method that encourages the inclusion of key terms from the original document into the summary: key terms are masked out of the original document and must be filled in by a coverage model using the current generate… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: ACL2020, 16 pages, 9 figures

    Journal ref: Association for Computational Linguistics (2020) 5135-5150