Skip to main content

Showing 1–50 of 520 results for author: Xiong, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.23139  [pdf, ps, other

    cs.CL cs.AI

    Benchmarking Deep Search over Heterogeneous Enterprise Data

    Authors: Prafulla Kumar Choubey, Xiangyu Peng, Shilpa Bhagavath, Kung-Hsiang Huang, Caiming Xiong, Chien-Sheng Wu

    Abstract: We present a new benchmark for evaluating Deep Search--a realistic and complex form of retrieval-augmented generation (RAG) that requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources. These include documents, meeting transcripts, Slack messages, GitHub, and URLs, which vary in structure and often contain human-to-human interactions. We build it using a synthetic dat… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  2. arXiv:2506.15651  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

    Authors: Tevin Wang, Chenyan Xiong

    Abstract: Rule-based rewards offer a promising strategy for improving reinforcement learning from human feedback (RLHF), but current approaches often rely on manual rule engineering. We present AutoRule, a fully automated method for extracting rules from preference feedback and formulating them into rule-based rewards. AutoRule extraction operates in three stages: it leverages a reasoning model to interpret… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  3. arXiv:2506.07530  [pdf, ps, other

    cs.RO cs.CV

    BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

    Authors: Hongyu Wang, Chuyan Xiong, Ruiping Wang, Xilin Chen

    Abstract: Vision-Language-Action (VLA) models have shown impressive capabilities across a wide range of robotics manipulation tasks. However, their growing model size poses significant challenges for deployment on resource-constrained robotic systems. While 1-bit pretraining has proven effective for enhancing the inference efficiency of large language models with minimal performance loss, its application to… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Work in progress

  4. arXiv:2506.06057  [pdf, ps, other

    cs.CL cs.AI

    Hey, That's My Data! Label-Only Dataset Inference in Large Language Models

    Authors: Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang, Lucila Ohno-Machado

    Abstract: Large Language Models (LLMs) have revolutionized Natural Language Processing by excelling at interpreting, reasoning about, and generating human language. However, their reliance on large-scale, often proprietary datasets poses a critical challenge: unauthorized usage of such data can lead to copyright infringement and significant financial harm. Existing dataset-inference methods typically depend… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  5. arXiv:2506.04723  [pdf, ps, other

    cs.AI

    Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

    Authors: Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, Frederic Sala

    Abstract: Reinforcement learning (RL) has become the dominant paradigm for endowing language models with advanced reasoning capabilities. Despite the substantial empirical gains demonstrated by RL-based training methods like GRPO, a granular understanding of their advantages is still lacking. To address this gap, we introduce a fine-grained analytic framework to dissect the impact of RL on reasoning. Our fr… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  6. arXiv:2506.02929  [pdf, ps, other

    cs.AR

    Large Processor Chip Model

    Authors: Kaiyan Chang, Mingzhi Chen, Yunji Chen, Zhirong Chen, Dongrui Fan, Junfeng Gong, Nan Guo, Yinhe Han, Qinfen Hao, Shuo Hou, Xuan Huang, Pengwei Jin, Changxin Ke, Cangyuan Li, Guangli Li, Huawei Li, Kuan Li, Naipeng Li, Shengwen Liang, Cheng Liu, Hongwei Liu, Jiahua Liu, Junliang Lv, Jianan Mu, Jin Qin , et al. (18 additional authors not shown)

    Abstract: Computer System Architecture serves as a crucial bridge between software applications and the underlying hardware, encompassing components like compilers, CPUs, coprocessors, and RTL designs. Its development, from early mainframes to modern domain-specific architectures, has been driven by rising computational demands and advancements in semiconductor technology. However, traditional paradigms in… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  7. arXiv:2506.02298  [pdf, ps, other

    cs.CL cs.AI cs.LG

    LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback

    Authors: Thai Hoang, Kung-Hsiang Huang, Shirley Kokane, Jianguo Zhang, Zuxin Liu, Ming Zhu, Jake Grigsby, Tian Lan, Michael S Ryoo, Chien-Sheng Wu, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

    Abstract: Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data, especially for multi-steps tasks that involve planning, executing tool calls, and responding to feedback. To address these issues, we present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback. Our… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: LAM Simulator framework for agentic data generation

  8. arXiv:2506.01704  [pdf, other

    cs.AI cs.CL

    Generate, Not Recommend: Personalized Multimodal Content Generation

    Authors: Jiongnan Liu, Zhicheng Dou, Ning Hu, Chenyan Xiong

    Abstract: To address the challenge of information overload from massive web contents, recommender systems are widely applied to retrieve and present personalized results for users. However, recommendation tasks are inherently constrained to filtering existing items and lack the ability to generate novel concepts, limiting their capacity to fully satisfy user demands and preferences. In this paper, we propos… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  9. arXiv:2506.01689  [pdf, ps, other

    cs.AI cs.CL

    Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents

    Authors: Shuting Wang, Yunqi Liu, Zixin Yang, Ning Hu, Zhicheng Dou, Chenyan Xiong

    Abstract: Querying generative AI models, e.g., large language models (LLMs), has become a prevalent method for information acquisition. However, existing query-answer datasets primarily focus on textual responses, making it challenging to address complex user queries that require visual demonstrations or explanations for better understanding. To bridge this gap, we construct a benchmark, RealVideoQuest, des… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  10. arXiv:2506.01275  [pdf, ps, other

    cs.AI

    Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D

    Authors: Artemis Panagopoulou, Le Xue, Honglu Zhou, silvio savarese, Ran Xu, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles

    Abstract: Real-world decision-making often begins with identifying which modality contains the most relevant information for a given query. While recent multimodal models have made impressive progress in processing diverse inputs, it remains unclear whether they can reason contrastively across multiple modalities to select the one that best satisfies a natural language prompt. We argue this capability is fo… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  11. arXiv:2506.00781  [pdf, ps, other

    cs.AI

    CoP: Agentic Red-teaming for Large Language Models using Composition of Principles

    Authors: Chen Xiong, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: Recent advances in Large Language Models (LLMs) have spurred transformative applications in various domains, ranging from open-source to proprietary LLMs. However, jailbreak attacks, which aim to break safety alignment and user compliance by tricking the target LLMs into answering harmful and risky responses, are becoming an urgent concern. The practice of red-teaming for LLMs is to proactively ex… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  12. arXiv:2506.00209  [pdf, ps, other

    cs.LG

    Intercept Cancer: Cancer Pre-Screening with Large Scale Healthcare Foundation Models

    Authors: Liwen Sun, Hao-Ren Yao, Gary Gao, Ophir Frieder, Chenyan Xiong

    Abstract: Cancer screening, leading to early detection, saves lives. Unfortunately, existing screening techniques require expensive and intrusive medical procedures, not globally available, resulting in too many lost would-be-saved lives. We present CATCH-FM, CATch Cancer early with Healthcare Foundation Models, a cancer pre-screening methodology that identifies high-risk patients for further screening sole… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  13. arXiv:2505.24305  [pdf, ps, other

    cs.RO cs.CV

    SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping

    Authors: Mingxu Zhang, Xiaoqi Li, Jiahui Xu, Kaichen Zhou, Hojin Bae, Yan Shen, Chuyan Xiong, Hao Dong

    Abstract: Recent advancements in 3D robotic manipulation have improved grasping of everyday objects, but transparent and specular materials remain challenging due to depth sensing limitations. While several 3D reconstruction and depth completion approaches address these challenges, they suffer from setup complexity or limited observation information utilization. To address this, leveraging the power of sing… ▽ More

    Submitted 20 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

  14. arXiv:2505.24217  [pdf, ps, other

    cs.CL

    Semi-structured LLM Reasoners Can Be Rigorously Audited

    Authors: Jixuan Leng, Cassandra A. Cohen, Zhixian Zhang, Chenyan Xiong, William W. Cohen

    Abstract: As Large Language Models (LLMs) become increasingly capable at reasoning, the problem of "faithfulness" persists: LLM "reasoning traces" can contain errors and omissions that are difficult to detect, and may obscure biases in model outputs. To address these limitations, we introduce Semi-Structured Reasoning Models (SSRMs), which internalize a semi-structured Chain-of-Thought (CoT) reasoning forma… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  15. arXiv:2505.23852  [pdf, ps, other

    cs.CL cs.AI cs.MA stat.AP

    Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

    Authors: Nic Dobbins, Christelle Xiong, Kristine Lan, Meliha Yetisgen

    Abstract: Objective: To demonstrate the capabilities of Large Language Models (LLMs) as autonomous agents to reproduce findings of published research studies using the same or similar dataset. Materials and Methods: We used the "Quick Access" dataset of the National Alzheimer's Coordinating Center (NACC). We identified highly cited published research manuscripts using NACC data and selected five studies t… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  16. arXiv:2505.22130  [pdf, other

    cs.IR

    ConsRec: Denoising Sequential Recommendation through User-Consistent Preference Modeling

    Authors: Haidong Xin, Qiushi Xiong, Zhenghao Liu, Sen Mei, Yukun Yan, Shi Yu, Shuo Wang, Yu Gu, Ge Yu, Chenyan Xiong

    Abstract: User-item interaction histories are pivotal for sequential recommendation systems but often include noise, such as unintended clicks or actions that fail to reflect genuine user preferences. To address this issue, we propose the User-Consistent Preference-based Sequential Recommendation System (ConsRec), designed to capture stable user preferences and filter noisy items from interaction histories.… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  17. arXiv:2505.20225  [pdf, ps, other

    cs.CL cs.LG

    FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

    Authors: Hao Kang, Zichun Yu, Chenyan Xiong

    Abstract: Recent large language models such as Gemini-1.5, DeepSeek-V3, and Llama-4 increasingly adopt Mixture-of-Experts (MoE) architectures, which offer strong efficiency-performance trade-offs by activating only a fraction of the model per token. Yet academic researchers still lack a fully open, end-to-end MoE platform for investigating scaling, routing, and expert behavior. We release FLAME-MoE, a compl… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: All code, training logs, and model checkpoints are available at https://github.com/cmu-flame/FLAME-MoE

  18. arXiv:2505.19307  [pdf, ps, other

    cs.IR

    Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization

    Authors: João Coelho, Bruno Martins, João Magalhães, Chenyan Xiong

    Abstract: Neural retrieval models excel in Web search, but their training requires substantial amounts of labeled query-document pairs, which are costly to obtain. With the widespread availability of Web document collections like ClueWeb22, synthetic queries generated by large language models offer a scalable alternative. Still, synthetic training queries often vary in quality, which leads to suboptimal dow… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted at SIGIR 2025

  19. arXiv:2505.19253  [pdf, ps, other

    cs.IR

    DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

    Authors: João Coelho, Jingjie Ning, Jingyuan He, Kangrui Mao, Abhijay Paladugu, Pranav Setlur, Jiahe Jin, Jamie Callan, João Magalhães, Bruno Martins, Chenyan Xiong

    Abstract: Deep research systems represent an emerging class of agentic information retrieval methods that generate comprehensive and well-supported reports to complex queries. However, most existing frameworks rely on dynamic commercial search APIs, which pose reproducibility and transparency challenges in addition to their cost. To address these limitations, we introduce DeepResearchGym, an open-source san… ▽ More

    Submitted 30 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

  20. arXiv:2505.18878  [pdf, other

    cs.CL cs.AI

    CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions

    Authors: Kung-Hsiang Huang, Akshara Prabhakar, Onkar Thorat, Divyansh Agarwal, Prafulla Kumar Choubey, Yixin Mao, Silvio Savarese, Caiming Xiong, Chien-Sheng Wu

    Abstract: While AI agents hold transformative potential in business, effective performance benchmarking is hindered by the scarcity of public, realistic business data on widely used platforms. Existing benchmarks often lack fidelity in their environments, data, and agent-user interactions, with limited coverage of diverse business scenarios and industries. To address these gaps, we introduce CRMArena-Pro, a… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  21. arXiv:2505.15504  [pdf, ps, other

    cs.CV cs.AI

    Beyond Linearity: Squeeze-and-Recalibrate Blocks for Few-Shot Whole Slide Image Classification

    Authors: Conghao Xiong, Zhengrui Guo, Zhe Xu, Yifei Zhang, Raymond Kai-Yu Tong, Si Yong Yeo, Hao Chen, Joseph J. Y. Sung, Irwin King

    Abstract: Deep learning has advanced computational pathology but expert annotations remain scarce. Few-shot learning mitigates annotation burdens yet suffers from overfitting and discriminative feature mischaracterization. In addition, the current few-shot multiple instance learning (MIL) approaches leverage pretrained vision-language models to alleviate these issues, but at the cost of complex preprocessin… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  22. arXiv:2505.14996  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision

    Authors: Zixuan Ke, Austin Xu, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty

    Abstract: Multi-agent systems (MAS) leveraging the impressive capabilities of Large Language Models (LLMs) hold significant potential for tackling complex tasks. However, most current MAS depend on manually designed agent roles and communication protocols. These manual designs often fail to align with the underlying LLMs' strengths and struggle to adapt to novel tasks. Recent automatic MAS approaches attemp… ▽ More

    Submitted 25 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  23. arXiv:2505.13346  [pdf, ps, other

    cs.CL cs.AI

    J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization

    Authors: Austin Xu, Yilun Zhou, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty

    Abstract: To keep pace with the increasing pace of large language models (LLM) development, model output evaluation has transitioned away from time-consuming human evaluation to automatic evaluation, where LLMs themselves are tasked with assessing and critiquing other model outputs. LLM-as-judge models are a class of generative evaluators that excel in evaluating relatively simple domains, like chat quality… ▽ More

    Submitted 18 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: 25 pages, 4 figures, 6 tables. Updated with code and benchmark

  24. arXiv:2505.13227  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.HC

    Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

    Authors: Tianbao Xie, Jiaqi Deng, Xiaochuan Li, Junlin Yang, Haoyuan Wu, Jixuan Chen, Wenjing Hu, Xinyuan Wang, Yuhui Xu, Zekun Wang, Yiheng Xu, Junli Wang, Doyen Sahoo, Tao Yu, Caiming Xiong

    Abstract: Graphical user interface (GUI) grounding, the ability to map natural language instructions to specific actions on graphical user interfaces, remains a critical bottleneck in computer use agent development. Current benchmarks oversimplify grounding tasks as short referring expressions, failing to capture the complexity of real-world interactions that require software commonsense, layout understandi… ▽ More

    Submitted 17 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: 49 pages, 13 figures

  25. arXiv:2505.12992  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    Fractured Chain-of-Thought Reasoning

    Authors: Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong

    Abstract: Inference-time scaling techniques have significantly bolstered the reasoning capabilities of large language models (LLMs) by harnessing additional computational effort at inference without retraining. Similarly, Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy by generating rich intermediate reasoning trajectories, but these approaches incur substantial token costs th… ▽ More

    Submitted 18 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  26. arXiv:2505.10554  [pdf, ps, other

    cs.CL

    Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

    Authors: Zhiyuan Hu, Yibo Wang, Hanze Dong, Yuhui Xu, Amrita Saha, Caiming Xiong, Bryan Hooi, Junnan Li

    Abstract: Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning. Prior work has shown that outcome-based reinforcement learning (RL) can incidentally elicit advanced reasoning behaviors such as self-correction, backtracking, and verification phenomena often referred to as the model's "aha moment". However, the timing and consistency of these emergent behaviors r… ▽ More

    Submitted 27 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

    Comments: In Progress

  27. arXiv:2505.09568  [pdf, ps, other

    cs.CV cs.AI

    BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

    Authors: Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, Le Xue, Caiming Xiong, Ran Xu

    Abstract: Unifying image understanding and generation has gained growing attention in recent research on multimodal models. Although design choices for image understanding have been extensively studied, the optimal model architecture and training recipe for a unified framework with image generation remain underexplored. Motivated by the strong potential of autoregressive and diffusion models for high-qualit… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  28. arXiv:2505.07849  [pdf, ps, other

    cs.SE cs.AI cs.IR

    SweRank: Software Issue Localization with Code Ranking

    Authors: Revanth Gangi Reddy, Tarun Suresh, JaeHyeok Doo, Ye Liu, Xuan Phi Nguyen, Yingbo Zhou, Semih Yavuz, Caiming Xiong, Heng Ji, Shafiq Joty

    Abstract: Software issue localization, the task of identifying the precise code locations (files, classes, or functions) relevant to a natural language issue description (e.g., bug report, feature request), is a critical yet time-consuming aspect of software development. While recent LLM-based agentic approaches demonstrate promise, they often incur significant latency and cost due to complex multi-step rea… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  29. arXiv:2505.06496  [pdf, ps, other

    cs.CL cs.AI

    xGen-small Technical Report

    Authors: Erik Nijkamp, Bo Pang, Egor Pakhomov, Akash Gokul, Jin Qu, Silvio Savarese, Yingbo Zhou, Caiming Xiong

    Abstract: We introduce xGen-small, a family of 4B and 9B Transformer decoder models optimized for long-context applications. Our vertically integrated pipeline unites domain-balanced, frequency-aware data curation; multi-stage pre-training with quality annealing and length extension to 128k tokens; and targeted post-training via supervised fine-tuning, preference learning, and online reinforcement learning.… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  30. arXiv:2505.05315  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Scalable Chain of Thoughts via Elastic Reasoning

    Authors: Yuhui Xu, Hanze Dong, Lei Wang, Doyen Sahoo, Junnan Li, Caiming Xiong

    Abstract: Large reasoning models (LRMs) have achieved remarkable progress on complex tasks by generating extended chains of thought (CoT). However, their uncontrolled output lengths pose significant challenges for real-world deployment, where inference-time budgets on tokens, latency, or compute are strictly constrained. We propose Elastic Reasoning, a novel framework for scalable chain of thoughts that exp… ▽ More

    Submitted 21 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  31. arXiv:2504.20131  [pdf, ps, other

    cs.LG cs.AI cs.IT

    LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

    Authors: Antonio A. Ginart, Naveen Kodali, Jason Lee, Caiming Xiong, Silvio Savarese, John R. Emmons

    Abstract: We introduce the LZ penalty, a penalty specialized for reducing degenerate repetitions in autoregressive language models without loss of capability. The penalty is based on the codelengths in the LZ77 universal lossless compression algorithm. Through the lens of the prediction-compression duality, decoding the LZ penalty has the interpretation of sampling from the residual distribution after remov… ▽ More

    Submitted 1 July, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

    Comments: Preprint (draft)

  32. arXiv:2504.19867  [pdf, other

    cs.CL cs.DC cs.LG

    semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage

    Authors: Ke Hong, Lufang Chen, Zhong Wang, Xiuhong Li, Qiuli Mao, Jianping Ma, Chao Xiong, Guanyu Wu, Buhe Han, Guohao Dai, Yun Liang, Yu Wang

    Abstract: Existing large language model (LLM) serving systems fall into two categories: 1) a unified system where prefill phase and decode phase are co-located on the same GPU, sharing the unified computational resource and storage, and 2) a disaggregated system where the two phases are disaggregated to different GPUs. The design of the disaggregated system addresses the latency interference and sophisticat… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 18 pages, 16 figures

  33. arXiv:2504.17040  [pdf, other

    cs.CV cs.AI

    DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs

    Authors: Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong, Silvio Savarese, Heng Ji, Ran Xu

    Abstract: We present DyMU, an efficient, training-free framework that dynamically reduces the computational burden of vision-language models (VLMs) while maintaining high task performance. Our approach comprises two key components. First, Dynamic Token Merging (DToMe) reduces the number of visual token embeddings by merging similar tokens based on image complexity, addressing the inherent inefficiency of fi… ▽ More

    Submitted 10 May, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  34. arXiv:2504.15253  [pdf, other

    cs.CL cs.LG

    Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

    Authors: Yilun Zhou, Austin Xu, Peifeng Wang, Caiming Xiong, Shafiq Joty

    Abstract: Scaling test-time computation, or affording a generator large language model (LLM) extra compute during inference, typically employs the help of external non-generative evaluators (i.e., reward models). Concurrently, LLM-judges, models trained to generate evaluations and critiques (explanations) in natural language, are becoming increasingly popular in automatic evaluation. Despite judge empirical… ▽ More

    Submitted 21 May, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: ICML 2025. The first two authors contributed equally. The codebase is at https://github.com/SalesforceAIResearch/jetts-benchmark

  35. arXiv:2504.11343  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

    Authors: Wei Xiong, Jiarui Yao, Yuhui Xu, Bo Pang, Lei Wang, Doyen Sahoo, Junnan Li, Nan Jiang, Tong Zhang, Caiming Xiong, Hanze Dong

    Abstract: Reinforcement learning (RL) has become a prevailing approach for fine-tuning large language models (LLMs) on complex reasoning tasks. Among recent methods, GRPO stands out for its empirical success in training models such as DeepSeek-R1, yet the sources of its effectiveness remain poorly understood. In this work, we revisit GRPO from a reinforce-like algorithm perspective and analyze its core comp… ▽ More

    Submitted 12 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  36. arXiv:2504.09037  [pdf, other

    cs.AI cs.CL

    A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

    Authors: Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, Caiming Xiong, Shafiq Joty

    Abstract: Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, whi… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 72 pages, 6 figures

  37. arXiv:2504.04045  [pdf, other

    cs.CV cs.AI cs.LG

    A Survey of Pathology Foundation Model: Progress and Future Directions

    Authors: Conghao Xiong, Hao Chen, Joseph J. Y. Sung

    Abstract: Computational pathology, which involves analyzing whole slide images for automated cancer diagnosis, relies on multiple instance learning, where performance depends heavily on the feature extractor and aggregator. Recent Pathology Foundation Models (PFMs), pretrained on large-scale histopathology data, have significantly enhanced both the extractor and aggregator, but they lack a systematic analys… ▽ More

    Submitted 21 May, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: Accepted to IJCAI 2025 Survey Track, 10 Pages

  38. arXiv:2504.03794  [pdf, other

    cs.CL cs.AI

    Entropy-Based Block Pruning for Efficient Large Language Models

    Authors: Liangwei Yang, Yuhui Xu, Juntao Tan, Doyen Sahoo, Silvio Savarese, Caiming Xiong, Huan Wang, Shelby Heinecke

    Abstract: As large language models continue to scale, their growing computational and storage demands pose significant challenges for real-world deployment. In this work, we investigate redundancy within Transformer-based models and propose an entropy-based pruning strategy to enhance efficiency while maintaining performance. Empirical analysis reveals that the entropy of hidden representations decreases in… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 9 pages, 8 figures

  39. arXiv:2504.03601  [pdf, other

    cs.CL cs.AI cs.LG

    APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

    Authors: Akshara Prabhakar, Zuxin Liu, Ming Zhu, Jianguo Zhang, Tulika Awalgaonkar, Shiyu Wang, Zhiwei Liu, Haolin Chen, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Weiran Yao, Huan Wang, Silvio Savarese, Caiming Xiong

    Abstract: Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, le… ▽ More

    Submitted 5 May, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: 12 pages plus references and appendices

  40. Let AI Read First: Enhancing Reading Abilities for Individuals with Dyslexia through Artificial Intelligence

    Authors: Sihang Zhao, Shoucong Carol Xiong, Bo Pang, Xiaoying Tang, Pinjia He

    Abstract: Dyslexia, a neurological condition affecting approximately 12% of the global population, presents significant challenges to reading ability and quality of life. Existing assistive technologies are limited by factors such as unsuitability for quiet environments, high costs, and the risk of distorting meaning or failing to provide real-time support. To address these issues, we introduce LARF (Let AI… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 6 pages, 3 figures CHI 2025 (Late Breaking Work)

  41. arXiv:2503.22673  [pdf, other

    cs.AI cs.CL

    ActionStudio: A Lightweight Framework for Data and Training of Large Action Models

    Authors: Jianguo Zhang, Thai Hoang, Ming Zhu, Zuxin Liu, Shiyu Wang, Tulika Awalgaonkar, Akshara Prabhakar, Haolin Chen, Weiran Yao, Zhiwei Liu, Juntao Tan, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

    Abstract: Action models are essential for enabling autonomous agents to perform complex tasks. However, training large action models remains challenging due to the diversity of agent environments and the complexity of agentic data. Despite growing interest, existing infrastructure provides limited support for scalable, agent-specific fine-tuning. We present ActionStudio, a lightweight and extensible data an… ▽ More

    Submitted 31 March, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

    Comments: 15 pages; large action models; xLAM

  42. arXiv:2503.11411  [pdf, other

    cs.LG

    Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models

    Authors: Xu Liu, Taha Aksu, Juncheng Liu, Qingsong Wen, Yuxuan Liang, Caiming Xiong, Silvio Savarese, Doyen Sahoo, Junnan Li, Chenghao Liu

    Abstract: Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs), enabling generalized learning and integrating contextual information. However, their success depends on large, diverse, and high-quality datasets, which are cha… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  43. arXiv:2503.09146  [pdf, other

    cs.CV cs.MM

    Generative Frame Sampler for Long Video Understanding

    Authors: Linli Yao, Haoning Wu, Kun Ouyang, Yuanxing Zhang, Caiming Xiong, Bei Chen, Xu Sun, Junnan Li

    Abstract: Despite recent advances in Video Large Language Models (VideoLLMs), effectively understanding long-form videos remains a significant challenge. Perceiving lengthy videos containing thousands of frames poses substantial computational burden. To mitigate this issue, this paper introduces Generative Frame Sampler (GenS), a plug-and-play module integrated with VideoLLMs to facilitate efficient lengthy… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  44. arXiv:2503.06844  [pdf, other

    cs.RO

    A2I-Calib: An Anti-noise Active Multi-IMU Spatial-temporal Calibration Framework for Legged Robots

    Authors: Chaoran Xiong, Fangyu Jiang, Kehui Ma, Zhen Sun, Zeyu Zhang, Ling Pei

    Abstract: Recently, multi-node inertial measurement unit (IMU)-based odometry for legged robots has gained attention due to its cost-effectiveness, power efficiency, and high accuracy. However, the spatial and temporal misalignment between foot-end motion derived from forward kinematics and foot IMU measurements can introduce inconsistent constraints, resulting in odometry drift. Therefore, accurate spatial… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  45. arXiv:2503.06550  [pdf, other

    cs.CL

    BingoGuard: LLM Content Moderation Tools with Risk Levels

    Authors: Fan Yin, Philippe Laban, Xiangyu Peng, Yilun Zhou, Yixin Mao, Vaibhav Vats, Linnea Ross, Divyansh Agarwal, Caiming Xiong, Chien-Sheng Wu

    Abstract: Malicious content generated by large language models (LLMs) can pose varying degrees of harm. Although existing LLM-based moderators can detect harmful content, they struggle to assess risk levels and may miss lower-risk outputs. Accurate risk assessment allows platforms with different safety thresholds to tailor content filtering and rejection. In this paper, we introduce per-topic severity rubri… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 10 pages, 4 figures, 4 tables. ICLR 2025 poster

  46. arXiv:2503.06072  [pdf, other

    cs.CL cs.AI

    Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning

    Authors: Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu , et al. (1 additional authors not shown)

    Abstract: The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific per… ▽ More

    Submitted 20 May, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

    Comments: 87 pages, 21 figures, 9 tables

  47. arXiv:2503.05112  [pdf, other

    cs.RO

    THE-SEAN: A Heart Rate Variation-Inspired Temporally High-Order Event-Based Visual Odometry with Self-Supervised Spiking Event Accumulation Networks

    Authors: Chaoran Xiong, Litao Wei, Kehui Ma, Zhen Sun, Yan Xiang, Zihan Nan, Trieu-Kien Truong, Ling Pei

    Abstract: Event-based visual odometry has recently gained attention for its high accuracy and real-time performance in fast-motion systems. Unlike traditional synchronous estimators that rely on constant-frequency (zero-order) triggers, event-based visual odometry can actively accumulate information to generate temporally high-order estimation triggers. However, existing methods primarily focus on adaptive… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  48. arXiv:2503.03108  [pdf, other

    cs.CR cs.AI

    SoK: Knowledge is All You Need: Accelerating Last Mile Delivery for Automated Provenance-based Intrusion Detection with LLMs

    Authors: Wenrui Cheng, Tiantian Zhu, Chunlin Xiong, Haofei Sun, Zijun Wang, Shunan Jing, Mingqi Lv, Yan Chen

    Abstract: Recently, provenance-based intrusion detection systems (PIDSes) have been widely proposed for endpoint threat analysis. However, due to the lack of systematic integration and utilization of knowledge, existing PIDSes still require significant manual intervention for practical deployment, making full automation challenging. This paper presents a disruptive innovation by categorizing PIDSes accordin… ▽ More

    Submitted 28 April, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  49. arXiv:2502.20616  [pdf, other

    cs.AI

    PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data

    Authors: Juntao Tan, Liangwei Yang, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Tulika Manoj Awalgaonkar, Jianguo Zhang, Weiran Yao, Ming Zhu, Shirley Kokane, Silvio Savarese, Huan Wang, Caiming Xiong, Shelby Heinecke

    Abstract: Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users. A key scenario in this domain involves enabling AI models to access and interpret a user's private data (e.g., conversation history, user-AI interactions, app usage) to understand personal details such as biographical information, preferences, and social connections. Howe… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  50. arXiv:2502.17321  [pdf, other

    cs.CL

    Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents

    Authors: Prafulla Kumar Choubey, Xiangyu Peng, Shilpa Bhagavath, Caiming Xiong, Shiva Kumar Pentyala, Chien-Sheng Wu

    Abstract: Automated service agents require well-structured workflows to provide consistent and accurate responses to customer queries. However, these workflows are often undocumented, and their automatic extraction from conversations remains unexplored. In this work, we present a novel framework for extracting and evaluating dialog workflows from historical interactions. Our extraction process consists of t… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.