Skip to main content

Showing 1–50 of 406 results for author: Huang, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.19652  [pdf, other

    cs.CV cs.AI

    VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

    Authors: Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai

    Abstract: Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queri… ▽ More

    Submitted 4 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  2. arXiv:2404.18919  [pdf, other

    cs.CV

    TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

    Authors: Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: Recent advances in diffusion models can generate high-quality and stunning images from text. However, multi-turn image generation, which is of high demand in real-world scenarios, still faces challenges in maintaining semantic consistency between images and texts, as well as contextual consistency of the same subject across multiple interactive turns. To address this issue, we introduce TheaterGen… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  3. arXiv:2404.18033  [pdf, other

    cs.CV

    Exposing Text-Image Inconsistency Using Diffusion Models

    Authors: Mingzhen Huang, Shan Jia, Zhou Zhou, Yan Ju, Jialing Cai, Siwei Lyu

    Abstract: In the battle against widespread online misinformation, a growing problem is text-image inconsistency, where images are misleadingly paired with texts with different intent or meaning. Existing classification-based methods for text-image inconsistency can identify contextual inconsistencies but fail to provide explainable justifications for their decisions that humans can understand. Although more… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  4. arXiv:2404.17607  [pdf, other

    cs.IR cs.AI cs.CL cs.LG cs.SI

    Utilizing Large Language Models to Identify Reddit Users Considering Vaping Cessation for Digital Interventions

    Authors: Sai Krishna Revanth Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Caleb Henry, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, Dian Hu, Ming Huang

    Abstract: The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of vaping or e-cigarette use in the United States and other countr… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  5. arXiv:2404.16792  [pdf, other

    cs.LG cs.AI cs.CL

    Weak-to-Strong Extrapolation Expedites Alignment

    Authors: Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng

    Abstract: Although the capabilities of large language models (LLMs) ideally scale up with increasing data and compute, they are inevitably constrained by limited resources in reality. Suppose we have a moderately trained LLM (e.g., trained to align with human preference) in hand, can we further exploit its potential and cheaply acquire a stronger model? In this paper, we propose a simple method called ExPO… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  6. arXiv:2404.15790  [pdf, other

    cs.CV

    Leveraging Large Language Models for Multimodal Search

    Authors: Oriol Barbany, Michael Huang, Xinliang Zhu, Arnab Dhua

    Abstract: Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries. The problem becomes harder with the… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Published at CVPRW 2024

  7. arXiv:2404.14228  [pdf, other

    cs.NE

    A Survey of Decomposition-Based Evolutionary Multi-Objective Optimization: Part II -- A Data Science Perspective

    Authors: Mingyu Huang, Ke Li

    Abstract: This paper presents the second part of the two-part survey series on decomposition-based evolutionary multi-objective optimization where we mainly focus on discussing the literature related to multi-objective evolutionary algorithms based on decomposition (MOEA/D). Complementary to the first part, here we employ a series of advanced data mining approaches to provide a comprehensive anatomy of the… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  8. arXiv:2404.13667  [pdf, other

    cs.CV cs.AI

    MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition

    Authors: Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, Alireza Darvishy

    Abstract: Printed mathematical expression recognition (MER) models are usually trained and tested using LaTeX-generated mathematical expressions (MEs) as input and the LaTeX source code as ground truth. As the same ME can be generated by various different LaTeX source codes, this leads to unwanted variations in the ground truth data that bias test performance results and hinder efficient learning. In additi… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures

  9. arXiv:2404.12602  [pdf

    cs.CV cs.LG

    A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks

    Authors: Minzhe Huang, Changwei Nie, Weihong Zhong

    Abstract: In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology. With the rise of counterfeit face generation techniques, the challenge posed by digitally edited faces to face anti-spoofing is escalating. Existing FAS technologies primarily focus on intercepting physically forged faces and lack a robust solution for cross-domain FAS cha… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  10. arXiv:2404.10494  [pdf, other

    cs.HC cs.LG

    BDAN: Mitigating Temporal Difference Across Electrodes in Cross-Subject Motor Imagery Classification via Generative Bridging Domain

    Authors: Zhige Chen, Rui Yang, Mengjie Huang, Chengxuan Qin, Zidong Wang

    Abstract: Because of "the non-repeatability of the experiment settings and conditions" and "the variability of brain patterns among subjects", the data distributions across sessions and electrodes are different in cross-subject motor imagery (MI) studies, eventually reducing the performance of the classification model. Systematically summarised based on the existing studies, a novel temporal-electrode data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  11. arXiv:2404.05569  [pdf, other

    cs.AI cs.CL cs.MA

    360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System

    Authors: Shen Gao, Hao Li, Zhengliang Shi, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang

    Abstract: Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  12. arXiv:2404.04624  [pdf, other

    cs.CV

    Bridging the Gap Between End-to-End and Two-Step Text Spotting

    Authors: Mingxin Huang, Hongliang Li, Yuliang Liu, Xiang Bai, Lianwen Jin

    Abstract: Modularity plays a crucial role in the development and maintenance of complex systems. While end-to-end text spotting efficiently mitigates the issues of error accumulation and sub-optimal performance seen in traditional two-step methodologies, the two-step methods continue to be favored in many competitions and practical settings due to their superior modularity. In this paper, we introduce Bridg… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  13. arXiv:2404.02655  [pdf, other

    cs.CL

    Calibrating the Confidence of Large Language Models by Eliciting Fidelity

    Authors: Mozhi Zhang, Mianqiu Huang, Rundong Shi, Linsen Guo, Chong Peng, Peng Yan, Yaqian Zhou, Xipeng Qiu

    Abstract: Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 17 pages, 13 figures

  14. arXiv:2404.00934  [pdf, other

    cs.CL

    ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback

    Authors: Zhenyu Hou, Yilin Niu, Zhengxiao Du, Xiaohan Zhang, Xiao Liu, Aohan Zeng, Qinkai Zheng, Minlie Huang, Hongning Wang, Jie Tang, Yuxiao Dong

    Abstract: ChatGLM is a free-to-use AI service powered by the ChatGLM family of large language models (LLMs). In this paper, we present the ChatGLM-RLHF pipeline -- a reinforcement learning from human feedback (RLHF) system -- designed to enhance ChatGLM's alignment with human preferences. ChatGLM-RLHF encompasses three major components: the collection of human preference data, the training of the reward mod… ▽ More

    Submitted 3 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  15. arXiv:2403.19116  [pdf

    cs.CL cs.AI

    MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering

    Authors: Che Guan, Mengyu Huang, Peng Zhang

    Abstract: In today's fast-paced industry, professionals face the challenge of summarizing a large number of documents and extracting vital information from them on a daily basis. These metrics are frequently hidden away in tables and/or their nested hyperlinks. To address this challenge, the approach of Table Question Answering (QA) has been developed to extract the relevant information. However, traditiona… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 8 pages

  16. arXiv:2403.19112  [pdf, other

    cs.CR cs.SE

    Uncover the Premeditated Attacks: Detecting Exploitable Reentrancy Vulnerabilities by Identifying Attacker Contracts

    Authors: Shuo Yang, Jiachi Chen, Mingyuan Huang, Zibin Zheng, Yuan Huang

    Abstract: Reentrancy, a notorious vulnerability in smart contracts, has led to millions of dollars in financial loss. However, current smart contract vulnerability detection tools suffer from a high false positive rate in identifying contracts with reentrancy vulnerabilities. Moreover, only a small portion of the detected reentrant contracts can actually be exploited by hackers, making these tools less effe… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by ICSE 2024

  17. arXiv:2403.12108  [pdf, other

    cs.AI econ.GN stat.AP stat.ME

    Does AI help humans make better decisions? A methodological framework for experimental evaluation

    Authors: Eli Ben-Michael, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin

    Abstract: The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to ans… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  18. arXiv:2403.08857  [pdf, other

    cs.CV

    DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

    Authors: Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

    Abstract: Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language M… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Project page: https://hunyuan-dialoggen.github.io/

  19. arXiv:2403.02718  [pdf, other

    cs.CL

    DP-CRE: Continual Relation Extraction via Decoupled Contrastive Learning and Memory Structure Preservation

    Authors: Mengyi Huang, Meng Xiao, Ludi Wang, Yi Du

    Abstract: Continuous Relation Extraction (CRE) aims to incrementally learn relation knowledge from a non-stationary stream of data. Since the introduction of new relational tasks can overshadow previously learned information, catastrophic forgetting becomes a significant challenge in this domain. Current replay-based training paradigms prioritize all data uniformly and train memory samples through multiple… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted By LREC-Coling-2024, 10 pages with 2 pages of appendix

  20. arXiv:2403.00483  [pdf, other

    cs.CV

    RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

    Authors: Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang

    Abstract: Text-to-image customization, which aims to synthesize text-driven images for the given subjects, has recently revolutionized content creation. Existing works follow the pseudo-word paradigm, i.e., represent the given subjects as pseudo-words and then compose them with the given text. However, the inherent entangled influence scope of pseudo-words with the given text results in a dual-optimum parad… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  21. arXiv:2402.17759  [pdf, other

    cs.CL

    Towards Optimal Learning of Language Models

    Authors: Yuxian Gu, Li Dong, Yaru Hao, Qingxiu Dong, Minlie Huang, Furu Wei

    Abstract: This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance. Specifically, we present a theory for the optimal learning of LMs. We first propose an objective that optimizes LM learning by maximizing the data compression ratio in an "LM-training-as-lossless-compression" view. Then… ▽ More

    Submitted 3 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  22. arXiv:2402.17042  [pdf, ps, other

    stat.ME cs.AI cs.LG econ.EM

    Towards Generalizing Inferences from Trials to Target Populations

    Authors: Melody Y Huang, Sarah E Robertson, Harsh Parikh

    Abstract: Randomized Controlled Trials (RCTs) are pivotal in generating internally valid estimates with minimal assumptions, serving as a cornerstone for researchers dedicated to advancing causal inference methods. However, extending these findings beyond the experimental cohort to achieve externally valid estimates is crucial for broader scientific inquiry. This paper delves into the forefront of addressin… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  23. arXiv:2402.16810  [pdf

    cs.CL

    OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)

    Authors: Fujian Jia, Xin Liu, Lixi Deng, Jiwen Gu, Chunchao Pu, Tunan Bai, Mengjiang Huang, Yuanzhi Lu, Kang Liu

    Abstract: In the past year, there has been a growing trend in applying Large Language Models (LLMs) to the field of medicine, particularly with the advent of advanced language models such as ChatGPT developed by OpenAI. However, there is limited research on LLMs specifically addressing oncology-related queries. The primary aim of this research was to develop a specialized language model that demonstrates im… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  24. arXiv:2402.16515  [pdf, other

    cs.CL cs.CR

    LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification

    Authors: Yiping Song, Juhua Zhang, Zhiliang Tian, Yuxin Yang, Minlie Huang, Dongsheng Li

    Abstract: As sufficient data are not always publically accessible for model training, researchers exploit limited data with advanced learning algorithms or expand the dataset via data augmentation (DA). Conducting DA in private domain requires private protection approaches (i.e. anonymization and perturbation), but those methods cannot provide protection guarantees. Differential privacy (DP) learning method… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  25. arXiv:2402.16444  [pdf, other

    cs.CL

    ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

    Authors: Zhexin Zhang, Yida Lu, Jingyuan Ma, Di Zhang, Rui Li, Pei Ke, Hao Sun, Lei Sha, Zhifang Sui, Hongning Wang, Minlie Huang

    Abstract: The safety of Large Language Models (LLMs) has gained increasing attention in recent years, but there still lacks a comprehensive approach for detecting safety issues within LLMs' responses in an aligned, customizable and explainable manner. In this paper, we propose ShieldLM, an LLM-based safety detector, which aligns with general human safety standards, supports customizable detection rules, and… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 17 pages

  26. arXiv:2402.16006  [pdf, other

    cs.CL

    From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

    Authors: Hao Wang, Hao Li, Minlie Huang, Lei Sha

    Abstract: The safety defense methods of Large language models(LLMs) stays limited because the dangerous prompts are manually curated to just few known attack types, which fails to keep pace with emerging varieties. Recent studies found that attaching suffixes to harmful instructions can hack the defense of LLMs and lead to dangerous outputs. This method, while effective, leaves a gap in understanding the un… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  27. arXiv:2402.15052  [pdf, other

    cs.CL cs.AI

    ToMBench: Benchmarking Theory of Mind in Large Language Models

    Authors: Zhuang Chen, Jincenzi Wu, Jinfeng Zhou, Bosi Wen, Guanqun Bi, Gongyao Jiang, Yaru Cao, Mengting Hu, Yunghwei Lai, Zexuan Xiong, Minlie Huang

    Abstract: Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Under review

  28. arXiv:2402.14398  [pdf, other

    cs.CV cs.AI

    Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

    Authors: Hao Li, Mengqi Huang, Lei Zhang, Bo Hu, Yi Liu, Zhendong Mao

    Abstract: GAN-based image attribute editing firstly leverages GAN Inversion to project real images into the latent space of GAN and then manipulates corresponding latent codes. Recent inversion methods mainly utilize additional high-bit features to improve image details preservation, as low-bit codes cannot faithfully reconstruct source images, leading to the loss of details. However, during editing, existi… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 18 pages, 18 figures, published to AAAI24

  29. arXiv:2402.12071  [pdf, other

    cs.CL cs.AI

    EmoBench: Evaluating the Emotional Intelligence of Large Language Models

    Authors: Sahand Sabour, Siyang Liu, Zheyuan Zhang, June M. Liu, Jinfeng Zhou, Alvionna S. Sunaryo, Juanzi Li, Tatia M. C. Lee, Rada Mihalcea, Minlie Huang

    Abstract: Recent advances in Large Language Models (LLMs) have highlighted the need for robust, comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional Intelligence (EI) is considerably limited. Existing benchmarks have two major shortcomings: first, they mainly focus on emotion recognition, neglecting essential EI capabilities such as emotion regulation and thought facilitati… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Work in progress

  30. arXiv:2402.09658  [pdf

    eess.IV cs.CV

    Towards Precision Cardiovascular Analysis in Zebrafish: The ZACAF Paradigm

    Authors: Amir Mohammad Naderi, Jennifer G. Casey, Mao-Hsiang Huang, Rachelle Victorio, David Y. Chiang, Calum MacRae, Hung Cao, Vandana A. Gupta

    Abstract: Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  31. arXiv:2402.03256  [pdf, ps, other

    cs.LG math.OC stat.ML

    Learning Best-in-Class Policies for the Predict-then-Optimize Framework

    Authors: Michael Huang, Vishal Gupta

    Abstract: We propose a novel family of decision-aware surrogate losses, called Perturbation Gradient (PG) losses, for the predict-then-optimize framework. These losses directly approximate the downstream decision loss and can be optimized using off-the-shelf gradient-based methods. Importantly, unlike existing surrogate losses, the approximation error of our PG losses vanishes as the number of samples grows… ▽ More

    Submitted 8 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  32. arXiv:2402.01469  [pdf, other

    cs.CL

    AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

    Authors: Jian Guan, Wei Wu, Zujie Wen, Peng Xu, Hongning Wang, Minlie Huang

    Abstract: The notable success of large language models (LLMs) has sparked an upsurge in building language agents to complete various complex tasks. We present AMOR, an agent framework based on open-source LLMs, which reasons with external knowledge bases and adapts to specific domains through human supervision to the reasoning process. AMOR builds reasoning logic over a finite state machine (FSM) that solve… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Work in progress

  33. arXiv:2402.01238  [pdf, other

    cs.LG cs.AI cs.IT

    Flexible Variational Information Bottleneck: Achieving Diverse Compression with a Single Training

    Authors: Sota Kudo, Naoaki Ono, Shigehiko Kanaya, Ming Huang

    Abstract: Information Bottleneck (IB) is a widely used framework that enables the extraction of information related to a target random variable from a source random variable. In the objective function, IB controls the trade-off between data compression and predictiveness through the Lagrange multiplier $β$. Traditionally, to find the trade-off to be learned, IB requires a search for $β$ through multiple tra… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  34. arXiv:2402.01031  [pdf

    eess.IV cs.CV

    MRAnnotator: A Multi-Anatomy Deep Learning Model for MRI Segmentation

    Authors: Alexander Zhou, Zelong Liu, Andrew Tieu, Nikhil Patel, Sean Sun, Anthony Yang, Peter Choi, Valentin Fauveau, George Soultanidis, Mingqian Huang, Amish Doshi, Zahi A. Fayad, Timothy Deyer, Xueyan Mei

    Abstract: Purpose To develop a deep learning model for multi-anatomy and many-class segmentation of diverse anatomic structures on MRI imaging. Materials and Methods In this retrospective study, two datasets were curated and annotated for model development and evaluation. An internal dataset of 1022 MRI sequences from various clinical sites within a health system and an external dataset of 264 MRI sequenc… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  35. arXiv:2402.00856  [pdf, other

    cs.CL

    Towards Efficient and Exact Optimization of Language Model Alignment

    Authors: Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang

    Abstract: The alignment of language models with human preferences is vital for their application in real-world tasks. The problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences with minimal deviation from the initial policy. While considered as a straightforward solution, reinforcement learning (RL) suffers from high variance in policy updates,… ▽ More

    Submitted 23 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 24 pages, 9 figures

  36. arXiv:2401.18018  [pdf, other

    cs.LG cs.AI cs.CL

    On Prompt-Driven Safeguarding for Large Language Models

    Authors: Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie Zhou, Kai-Wei Chang, Minlie Huang, Nanyun Peng

    Abstract: Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) from complying with queries that contain harmful intents. However, the working mechanisms of safety prompts have not been revealed yet, which hinders the potential for automatically optimizing them to improve LLM safety. To this end, we investigate the impact of safety prompts from the pe… ▽ More

    Submitted 4 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  37. arXiv:2401.17880  [pdf, other

    cs.MA cs.IT cs.LG

    Graph Attention-based Reinforcement Learning for Trajectory Design and Resource Assignment in Multi-UAV Assisted Communication

    Authors: Zikai Feng, Di Wu, Mengxing Huang, Chau Yuen

    Abstract: In the multiple unmanned aerial vehicle (UAV)- assisted downlink communication, it is challenging for UAV base stations (UAV BSs) to realize trajectory design and resource assignment in unknown environments. The cooperation and competition between UAV BSs in the communication network leads to a Markov game problem. Multi-agent reinforcement learning is a significant solution for the above decision… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 13 pages

    MSC Class: 68M11 ACM Class: I.2.11

  38. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

    Authors: Susan Lin, Jeremy Warner, J. D. Zamfirescu-Pereira, Matthew G. Lee, Sauhard Jain, Michael Xuelin Huang, Piyawat Lertvittayakumjorn, Shanqing Cai, Shumin Zhai, Björn Hartmann, Can Liu

    Abstract: Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates key… ▽ More

    Submitted 7 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: To appear at ACM CHI 2024

  39. arXiv:2401.07641  [pdf, other

    cs.CV

    SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting

    Authors: Mingxin Huang, Dezhi Peng, Hongliang Li, Zhenghao Peng, Chongyu Liu, Dahua Lin, Yuliang Liu, Xiang Bai, Lianwen Jin

    Abstract: End-to-end scene text spotting, which aims to read the text in natural images, has garnered significant attention in recent years. However, recent state-of-the-art methods usually incorporate detection and recognition simply by sharing the backbone, which does not directly take advantage of the feature interaction between the two tasks. In this paper, we propose a new end-to-end scene text spottin… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: text overlap with arXiv:2203.10209

  40. arXiv:2401.02088  [pdf, other

    cs.LG cs.CL cs.DC

    Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe

    Authors: Mincong Huang, Chao Wang, Chi Ma, Yineng Zhang, Peng Zhang, Lei Yu

    Abstract: Pipeline parallelism is an essential technique in the training of large-scale Transformer models. However, it suffers from imbalanced memory consumption, leading to insufficient memory utilization. The BPipe technique was proposed to address this issue and has proven effective in the GPT-3 model. Nevertheless, our experiments have not yielded similar benefits for LLaMA training. Additionally, BPip… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  41. arXiv:2312.13778   

    cs.CV

    Progressive Evolution from Single-Point to Polygon for Scene Text

    Authors: Linger Deng, Mingxin Huang, Xudong Xie, Yuliang Liu, Lianwen Jin, Xiang Bai

    Abstract: The advancement of text shape representations towards compactness has enhanced text detection and spotting performance, but at a high annotation cost. Current models use single-point annotations to reduce costs, yet they lack sufficient localization information for downstream applications. To overcome this limitation, we introduce Point2Polygon, which can efficiently transform single-points into c… ▽ More

    Submitted 29 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: The paper lacks innovation and has insufficient rigor in experiments

  42. arXiv:2312.08880  [pdf, other

    cs.CV

    GenDet: Towards Good Generalizations for AI-Generated Image Detection

    Authors: Mingjian Zhu, Hanting Chen, Mouxiao Huang, Wei Li, Hailin Hu, Jie Hu, Yunhe Wang

    Abstract: The misuse of AI imagery can have harmful societal effects, prompting the creation of detectors to combat issues like the spread of fake news. Existing methods can effectively detect images generated by seen generators, but it is challenging to detect those generated by unseen generators. They do not concentrate on amplifying the output discrepancy when detectors process real versus fake images. T… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  43. arXiv:2312.07937  [pdf, other

    cs.CV

    BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics

    Authors: Wenqian Zhang, Molin Huang, Yuxuan Zhou, Juze Zhang, Jingyi Yu, Jingya Wang, Lan Xu

    Abstract: The recently emerging text-to-motion advances have spired numerous attempts for convenient and interactive human motion generation. Yet, existing methods are largely limited to generating body motions only without considering the rich two-hand motions, let alone handling various conditions like body dynamics or texts. To break the data bottleneck, we propose BOTH57M, a novel multi-modal dataset fo… ▽ More

    Submitted 10 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2024

  44. arXiv:2312.02720  [pdf, other

    cs.LG cs.AI

    Towards the Inferrence of Structural Similarity of Combinatorial Landscapes

    Authors: Mingyu Huang, Ke Li

    Abstract: One of the most common problem-solving heuristics is by analogy. For a given problem, a solver can be viewed as a strategic walk on its fitness landscape. Thus if a solver works for one problem instance, we expect it will also be effective for other instances whose fitness landscapes essentially share structural similarities with each other. However, due to the black-box nature of combinatorial op… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  45. arXiv:2312.02161  [pdf, other

    cs.IT cs.NE

    Efficient LDPC Decoding using Physical Computation

    Authors: Uday Kumar Reddy Vengalam, Andrew Hahn, Yongchao Liu, Anshujit Sharma, Hui Wu, Michael Huang

    Abstract: Due to 5G deployment, there is significant interest in LDPC decoding. While much research is devoted on efficient hardwiring of algorithms based on Belief Propagation (BP), it has been shown that LDPC decoding can be formulated as a combinatorial optimization problem, which could benefit from significant acceleration of physical computation mechanisms such as Ising machines. This approach has so f… ▽ More

    Submitted 20 September, 2023; originally announced December 2023.

  46. arXiv:2311.18743  [pdf, other

    cs.CL cs.AI cs.LG

    AlignBench: Benchmarking Chinese Alignment of Large Language Models

    Authors: Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang

    Abstract: Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants. However, effective evaluation of alignment for emerging Chinese LLMs is still significantly lacking, calling for real-scenario grounded, open-ended, challenging and automatic evaluations tailored for alignment. To fill in this gap, we introduce AlignBench, a comprehensive multi-dim… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

  47. arXiv:2311.18702  [pdf, other

    cs.CL cs.AI

    CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation

    Authors: Pei Ke, Bosi Wen, Zhuoer Feng, Xiao Liu, Xuanyu Lei, Jiale Cheng, Shengyuan Wang, Aohan Zeng, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang

    Abstract: Since the natural language processing (NLP) community started to make large language models (LLMs), such as GPT-4, act as a critic to evaluate the quality of generated texts, most of them only train a critique generation model of a specific scale on specific datasets. We argue that a comprehensive investigation on the key factor of LLM-based evaluation models, such as scaling properties, is lackin… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 18 pages, 5 figures

  48. arXiv:2311.17391  [pdf, other

    cs.CL

    Unveiling the Implicit Toxicity in Large Language Models

    Authors: Jiaxin Wen, Pei Ke, Hao Sun, Zhexin Zhang, Chengfei Li, Jinfeng Bai, Minlie Huang

    Abstract: The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use. While recent studies primarily focus on probing toxic outputs that can be easily detected with existing toxicity classifiers, we show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Main Conference

  49. arXiv:2311.16832  [pdf, other

    cs.CL cs.AI

    CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

    Authors: Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, Sahand Sabour, Xiaohan Zhang, Wenjing Hou, Yijia Zhang, Yuxiao Dong, Jie Tang, Minlie Huang

    Abstract: In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs. On top of CharacterGLM, we can custom… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Work in progress

  50. arXiv:2311.14014  [pdf, other

    cs.LG

    On the Hyperparameter Landscapes of Machine Learning Algorithms

    Authors: Mingyu Huang, Ke Li

    Abstract: Despite the recent success in a plethora of hyperparameter optimization (HPO) methods for machine learning (ML) models, the intricate interplay between model hyperparameters (HPs) and predictive losses (a.k.a fitness), which is a key prerequisite for understanding HPO, remain notably underexplored in our community. This results in limited explainability in the HPO process, rendering a lack of huma… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.