Skip to main content

Showing 1–50 of 973 results for author: Cheng, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06250  [pdf, ps, other

    cs.CR cs.AI cs.SE

    We Urgently Need Privilege Management in MCP: A Measurement of API Usage in MCP Ecosystems

    Authors: Zhihao Li, Kun Li, Boyang Ma, Minghui Xu, Yue Zhang, Xiuzhen Cheng

    Abstract: The Model Context Protocol (MCP) has emerged as a widely adopted mechanism for connecting large language models to external tools and resources. While MCP promises seamless extensibility and rich integrations, it also introduces a substantially expanded attack surface: any plugin can inherit broad system privileges with minimal isolation or oversight. In this work, we conduct the first large-scale… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  2. arXiv:2507.04931  [pdf, ps, other

    cs.CR

    LIFT: Automating Symbolic Execution Optimization with Large Language Models for AI Networks

    Authors: Ruoxi Wang, Kun Li, Minghui Xu, Yue Zhang, Kaidi Xu, Chunchi Liu, Yinhao Xiao, Xiuzhen Cheng

    Abstract: Dynamic Symbolic Execution (DSE) is a key technique in program analysis, widely used in software testing, vulnerability discovery, and formal verification. In distributed AI systems, DSE plays a crucial role in identifying hard-to-detect bugs, especially those arising from complex network communication patterns. However, traditional approaches to symbolic execution are often hindered by scalabilit… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM SIGCOMM 2025 - 2nd Workshop on Networks for AI Computing (NAIC). 7 pages, 2 figures, 2 tables

  3. arXiv:2507.03253  [pdf, ps, other

    cs.CL cs.AI

    RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

    Authors: Baolong Bi, Shenghua Liu, Xingzhang Ren, Dayiheng Liu, Junyang Lin, Yiwei Wang, Lingrui Mei, Junfeng Fang, Jiafeng Guo, Xueqi Cheng

    Abstract: The foundational capabilities of large language models (LLMs) are deeply influenced by the quality of their pre-training corpora. However, enhancing data quality at scale remains a significant challenge, primarily due to the trade-off between refinement effectiveness and processing efficiency. While rule-based filtering remains the dominant paradigm, it typically operates at the document level and… ▽ More

    Submitted 8 July, 2025; v1 submitted 3 July, 2025; originally announced July 2025.

  4. arXiv:2507.02982  [pdf

    cs.CL

    We Need Knowledge Distillation for Solving Math Word Problems

    Authors: Zhenquan Shen, Xinguo Yu, Xiaotian Cheng, Rao Peng, Hao Ming

    Abstract: The enhancement of mathematical capabilities in large language models (LLMs) fosters new developments in mathematics education within primary and secondary schools, particularly as they relate to intelligent tutoring systems. However, LLMs require substantial computational resources, resulting in significant costs in educational contexts. To mitigate this drawback, this paper investigates the feas… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  5. arXiv:2507.00356  [pdf

    cs.CV cs.AI

    CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation

    Authors: Zhiwei Yi, Xin Cheng, Jingyu Ma, Ruifei Zhu, Junwei Tian, Yuanxiu Zhou, Xinge Zhao, Hongzhe Li

    Abstract: Deep learning methods have significantly advanced the development of intelligent rinterpretation in remote sensing (RS), with foundational model research based on large-scale pre-training paradigms rapidly reshaping various domains of Earth Observation (EO). However, compared to the open accessibility and high spatiotemporal coverage of medium-resolution data, the limited acquisition channels for… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: A Remote Sensing Fundation Model for Very High Resolution Images

  6. arXiv:2506.23481  [pdf, ps, other

    cs.CV eess.IV

    Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks

    Authors: Xian Zhang, Xiang Cheng

    Abstract: Objectives: The rapid advancement of Multimodal Large Language Models (MLLMs) has significantly enhanced their reasoning capabilities, enabling a wide range of intelligent applications. However, these advancements also raise critical concerns regarding privacy and ethics. MLLMs are now capable of inferring the geographic location of images -- such as those shared on social media or captured from s… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  7. arXiv:2506.22303  [pdf, ps, other

    cs.IR

    Education-Oriented Graph Retrieval-Augmented Generation for Learning Path Recommendation

    Authors: Xinghe Cheng, Zihan Zhang, Jiapu Wang, Liangda Fang, Chaobo He, Quanlong Guan, Shirui Pan, Weiqi Luo

    Abstract: Learning path recommendation seeks to provide learners with a structured sequence of learning items (e.g., knowledge concepts or exercises) to optimize their learning efficiency. Despite significant efforts in this area, most existing methods primarily rely on prerequisite relationships, which present two major limitations: 1) Many educational datasets do not explicitly provide prerequisite relati… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  8. arXiv:2506.18394  [pdf, ps, other

    cs.SE

    Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval

    Authors: Xiao Cheng, Zhihao Guo, Huan Huo, Yulei Sui

    Abstract: Memory-related errors in C programming continue to pose significant challenges in software development, primarily due to the complexities of manual memory management inherent in the language. These errors frequently serve as vectors for severe vulnerabilities, while their repair requires extensive knowledge of program logic and C's memory model. Automated Program Repair (APR) has emerged as a crit… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  9. arXiv:2506.17311  [pdf, ps, other

    cs.CY

    Can Large Language Models Be Trusted Paper Reviewers? A Feasibility Study

    Authors: Chuanlei Li, Xu Hu, Minghui Xu, Kun Li, Yue Zhang, Xiuzhen Cheng

    Abstract: Academic paper review typically requires substantial time, expertise, and human resources. Large Language Models (LLMs) present a promising method for automating the review process due to their extensive training data, broad knowledge base, and relatively low usage cost. This work explores the feasibility of using LLMs for academic paper review by proposing an automated review system. The system i… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  10. arXiv:2506.14770  [pdf, ps, other

    cs.RO

    GMT: General Motion Tracking for Humanoid Whole-Body Control

    Authors: Zixuan Chen, Mazeyu Ji, Xuxin Cheng, Xuanbin Peng, Xue Bin Peng, Xiaolong Wang

    Abstract: The ability to track general whole-body motions in the real world is a useful way to build general-purpose humanoid robots. However, achieving this can be challenging due to the temporal and kinematic diversity of the motions, the policy's capability, and the difficulty of coordination of the upper and lower bodies. To address these issues, we propose GMT, a general and scalable motion-tracking fr… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  11. arXiv:2506.14641  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot

    Authors: Xiang Cheng, Chengyan Pan, Minjun Zhao, Deyang Li, Fangchao Liu, Xinyu Zhang, Xiao Zhang, Yong Liu

    Abstract: In-Context Learning (ICL) is an essential emergent ability of Large Language Models (LLMs), and recent studies introduce Chain-of-Thought (CoT) to exemplars of ICL to enhance the reasoning capability, especially in mathematics tasks. However, given the continuous advancement of model capabilities, it remains unclear whether CoT exemplars still benefit recent, stronger models in such tasks. Through… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 19 pages,22 figures

  12. arXiv:2506.13795  [pdf, ps, other

    cs.AI cs.SI

    BotTrans: A Multi-Source Graph Domain Adaptation Approach for Social Bot Detection

    Authors: Boshen Shi, Yongqing Wang, Fangda Guo, Jiangli Shao, Huawei Shen, Xueqi Cheng

    Abstract: Transferring extensive knowledge from relevant social networks has emerged as a promising solution to overcome label scarcity in detecting social bots and other anomalies with GNN-based models. However, effective transfer faces two critical challenges. Firstly, the network heterophily problem, which is caused by bots hiding malicious behaviors via indiscriminately interacting with human users, hin… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accetpted to ECML-PKDD 2025 Research Track as oral; Code&data: https://github.com/Skyorca/BotTrans

  13. arXiv:2506.12831  [pdf, ps, other

    eess.SP cs.AI

    Synesthesia of Machines (SoM)-Enhanced Sub-THz ISAC Transmission for Air-Ground Network

    Authors: Zonghui Yang, Shijian Gao, Xiang Cheng, Liuqing Yang

    Abstract: Integrated sensing and communication (ISAC) within sub-THz frequencies is crucial for future air-ground networks, but unique propagation characteristics and hardware limitations present challenges in optimizing ISAC performance while increasing operational latency. This paper introduces a multi-modal sensing fusion framework inspired by synesthesia of machine (SoM) to enhance sub-THz ISAC transmis… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  14. arXiv:2506.12527  [pdf, ps, other

    cs.CL

    Detection, Classification, and Mitigation of Gender Bias in Large Language Models

    Authors: Xiaoqing Cheng, Hongying Zan, Lulu Kong, Jinwang Song, Min Peng

    Abstract: With the rapid development of large language models (LLMs), they have significantly improved efficiency across a wide range of domains. However, recent studies have revealed that LLMs often exhibit gender bias, leading to serious social implications. Detecting, classifying, and mitigating gender bias in LLMs has therefore become a critical research focus. In the NLPCC 2025 Shared Task 7: Chinese C… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  15. arXiv:2506.12339  [pdf, ps, other

    cs.HC cs.AI

    SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation

    Authors: Ruiyan Zhu, Xi Cheng, Ke Liu, Brian Zhu, Daniel Jin, Neeraj Parihar, Zhoutian Xu, Oliver Gao

    Abstract: We present SheetMind, a modular multi-agent framework powered by large language models (LLMs) for spreadsheet automation via natural language instructions. The system comprises three specialized agents: a Manager Agent that decomposes complex user instructions into subtasks; an Action Agent that translates these into structured commands using a Backus Naur Form (BNF) grammar; and a Reflection Agen… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: Ruiyan Zhu and Xi Cheng contributed equally to this work

  16. arXiv:2506.12264  [pdf

    cs.ET cs.AR

    A Novel Thermal Network Model and Electro-Thermal Coupling Study for NSFETs and CFETs Considering Thermal Crosstalk

    Authors: Tianci Miao, Qihang Zheng, Yangyang Hu, Xiaoyu Cheng, Jie Liang, Liang Chen, Aiying Guo, Jingjing Liu, Kailin Ren, Jianhua Zhang

    Abstract: As the technology node continues to shrink, nanosheet field effect transistors (NSFETs) and complementary FETs (CFETs) become valid candidates for the 3nm and sub-nanometre nodes. However, due to the shrinking device size, self-heating and inter-device thermal crosstalk of NSFETs and CFETs become more severe. It is important to accurately calculate the self-heating and thermal crosstalk of devices… ▽ More

    Submitted 9 March, 2025; originally announced June 2025.

  17. arXiv:2506.11063  [pdf, ps, other

    cs.CL cs.AI

    Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation

    Authors: Jiayu Yao, Shenghua Liu, Yiwei Wang, Lingrui Mei, Baolong Bi, Yuyao Ge, Zhecheng Li, Xueqi Cheng

    Abstract: Multimodal Retrieval-Augmented Generation (RAG) systems have become essential in knowledge-intensive and open-domain tasks. As retrieval complexity increases, ensuring the robustness of these systems is critical. However, current RAG models are highly sensitive to the order in which evidence is presented, often resulting in unstable performance and biased reasoning, particularly as the number of r… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  18. arXiv:2506.10826  [pdf, ps, other

    cs.RO

    RationalVLA: A Rational Vision-Language-Action Model with Dual System

    Authors: Wenxuan Song, Jiayi Chen, Wenxue Li, Xu He, Han Zhao, Can Cui, Pengxiang Ding Shiyan Su, Feilong Tang, Xuelian Cheng, Donglin Wang, Zongyuan Ge, Xinhu Zheng, Zhe Liu, Hesheng Wang, Haoang Li

    Abstract: A fundamental requirement for real-world robotic deployment is the ability to understand and respond to natural language instructions. Existing language-conditioned manipulation tasks typically assume that instructions are perfectly aligned with the environment. This assumption limits robustness and generalization in realistic scenarios where instructions may be ambiguous, irrelevant, or infeasibl… ▽ More

    Submitted 13 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 14 pages

  19. arXiv:2506.10813  [pdf, ps, other

    cs.CV eess.IV eess.SP

    Unsupervised Deformable Image Registration with Structural Nonparametric Smoothing

    Authors: Hang Zhang, Xiang Chen, Renjiu Hu, Rongguang Wang, Jinwei Zhang, Min Liu, Yaonan Wang, Gaolei Li, Xinxing Cheng, Jinming Duan

    Abstract: Learning-based deformable image registration (DIR) accelerates alignment by amortizing traditional optimization via neural networks. Label supervision further enhances accuracy, enabling efficient and precise nonlinear alignment of unseen scans. However, images with sparse features amid large smooth regions, such as retinal vessels, introduce aperture and large-displacement challenges that unsuper… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted for publication at Information Processing in Medical Imaging (IPMI) 2025

  20. arXiv:2506.08970  [pdf, ps, other

    cs.AI

    A Survey of Link Prediction in N-ary Knowledge Graphs

    Authors: Jiyao Wei, Saiping Guan, Da Li, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

    Abstract: N-ary Knowledge Graphs (NKGs) are a specialized type of knowledge graph designed to efficiently represent complex real-world facts. Unlike traditional knowledge graphs, where a fact typically involves two entities, NKGs can capture n-ary facts containing more than two entities. Link prediction in NKGs aims to predict missing elements within these n-ary facts, which is essential for completing NKGs… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  21. arXiv:2506.07542  [pdf

    cs.CV cs.AI

    APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

    Authors: Bowen Liu, Weiyi Zhang, Peranut Chotcomwongse, Xiaolan Chen, Ruoyu Chen, Pawin Pakaymaskul, Niracha Arjkongharn, Nattaporn Vongsa, Xuelian Cheng, Zongyuan Ge, Kun Huang, Xiaohui Li, Yiru Duan, Zhenbang Wang, BaoYe Xie, Qiang Chen, Huazhu Fu, Michael A. Mahr, Jiaqi Qu, Wangyiyang Chen, Shiye Wang, Yubo Tan, Yongjie Li, Mingguang He, Danli Shi , et al. (1 additional authors not shown)

    Abstract: Optical Coherence Tomography (OCT) provides high-resolution, 3D, and non-invasive visualization of retinal layers in vivo, serving as a critical tool for lesion localization and disease diagnosis. However, its widespread adoption is limited by equipment costs and the need for specialized operators. In comparison, 2D color fundus photography offers faster acquisition and greater accessibility with… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  22. arXiv:2506.06881  [pdf, other

    cs.AI

    KnowCoder-V2: Deep Knowledge Analysis

    Authors: Zixuan Li, Wenxuan Liu, Long Bai, Chunmao Zhang, Wei Li, Fenghui Zhang, Quanxin Jin, Ruoyun He, Zhuo Chen, Zhilei Hu, Fei Wang, Bingbing Xu, Xuhui Jiang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

    Abstract: Deep knowledge analysis tasks always involve the systematic extraction and association of knowledge from large volumes of data, followed by logical reasoning to discover insights. However, to solve such complex tasks, existing deep research frameworks face three major challenges: 1) They lack systematic organization and management of knowledge; 2) They operate purely online, making it inefficient… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  23. arXiv:2506.06341  [pdf, ps, other

    cs.IR cs.AI cs.CY

    NR4DER: Neural Re-ranking for Diversified Exercise Recommendation

    Authors: Xinghe Cheng, Xufang Zhou, Liangda Fang, Chaobo He, Yuyu Zhou, Weiqi Luo, Zhiguo Gong, Quanlong Guan

    Abstract: With the widespread adoption of online education platforms, an increasing number of students are gaining new knowledge through Massive Open Online Courses (MOOCs). Exercise recommendation have made strides toward improving student learning outcomes. However, existing methods not only struggle with high dropout rates but also fail to match the diverse learning pace of students. They frequently face… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: accepted for presentation at the SIGIR 2025 Full Papers track

  24. arXiv:2506.05276  [pdf, ps, other

    cs.LG

    How to Unlock Time Series Editing? Diffusion-Driven Approach with Multi-Grained Control

    Authors: Hao Yu, Chu Xin Cheng, Runlong Yu, Yuyang Ye, Shiwei Tong, Zhaofeng Liu, Defu Lian

    Abstract: Recent advances in time series generation have shown promise, yet controlling properties in generated sequences remains challenging. Time Series Editing (TSE) - making precise modifications while preserving temporal coherence - consider both point-level constraints and segment-level controls that current methods struggle to provide. We introduce the CocktailEdit framework to enable simultaneous, f… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  25. arXiv:2506.05142  [pdf, ps, other

    cs.CL

    Do Large Language Models Judge Error Severity Like Humans?

    Authors: Diege Sun, Guanyi Chen, Zhao Fan, Xiaorong Cheng, Tingting He

    Abstract: Large Language Models (LLMs) are increasingly used as automated evaluators in natural language generation, yet it remains unclear whether they can accurately replicate human judgments of error severity. In this study, we systematically compare human and LLM assessments of image descriptions containing controlled semantic errors. We extend the experimental framework of van Miltenburg et al. (2020)… ▽ More

    Submitted 8 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  26. arXiv:2506.04163  [pdf, ps, other

    cs.IT

    On the Synthetic Channels in Polar Codes over Binary-Input Discrete Memoryless Channels

    Authors: Yadong Jiao, Xiaoyan Cheng, Yuansheng Tang, Ming Xu

    Abstract: Polar codes introduced by Arikan in 2009 are the first code family achieving the capacity of binary-input discrete memoryless channels (BIDMCs) with low-complexity encoding and decoding. Identifying unreliable synthetic channels in polar code construction is crucial. Currently, because of the large size of the output alphabets of synthetic channels, there is no effective approach to evaluate their… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  27. arXiv:2506.03147  [pdf, ps, other

    cs.CV cs.AI cs.CL

    UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

    Authors: Bin Lin, Zongjian Li, Xinhua Cheng, Yuwei Niu, Yang Ye, Xianyi He, Shenghai Yuan, Wangbo Yu, Shaodong Wang, Yunyang Ge, Yatian Pang, Li Yuan

    Abstract: Although existing unified models achieve strong performance in vision-language understanding and text-to-image generation, they remain limited in addressing image perception and manipulation -- capabilities increasingly demanded in practical applications. Recently, OpenAI introduced the powerful GPT-4o-Image model, which showcases advanced capabilities in comprehensive image perception and manipul… ▽ More

    Submitted 18 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  28. arXiv:2506.03028  [pdf, ps, other

    cs.LG q-bio.BM

    Protein Inverse Folding From Structure Feedback

    Authors: Junde Xu, Zijun Gao, Xinyi Zhou, Jie Hu, Xingyi Cheng, Le Song, Guangyong Chen, Pheng-Ann Heng, Jiezhong Qiu

    Abstract: The inverse folding problem, aiming to design amino acid sequences that fold into desired three-dimensional structures, is pivotal for various biotechnological applications. Here, we introduce a novel approach leveraging Direct Preference Optimization (DPO) to fine-tune an inverse folding model using feedback from a protein folding model. Given a target protein structure, we begin by sampling cand… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  29. arXiv:2506.02761  [pdf, ps, other

    cs.AI cs.CL cs.CR cs.CV

    Rethinking Machine Unlearning in Image Generation Models

    Authors: Renyang Liu, Wenjie Feng, Tianwei Zhang, Wei Zhou, Xueqi Cheng, See-Kiong Ng

    Abstract: With the surge and widespread application of image generation models, data privacy and content safety have become major concerns and attracted great attention from users, service providers, and policymakers. Machine unlearning (MU) is recognized as a cost-effective and promising means to address these challenges. Despite some advancements, image generation model unlearning (IGMU) still faces remar… ▽ More

    Submitted 6 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted by ACM CCS 2025

    Journal ref: ACM Conference on Computer and Communications Security (CCS 2025)

  30. arXiv:2506.02461  [pdf, ps, other

    cs.CL

    XToM: Exploring the Multilingual Theory of Mind for Large Language Models

    Authors: Chunkit Chan, Yauwai Yim, Hongchuan Zeng, Zhiying Zou, Xinyuan Cheng, Zhifan Sun, Zheye Deng, Kawai Chung, Yuzhuo Ao, Yixiang Fan, Cheng Jiayang, Ercong Nie, Ginny Y. Wong, Helmut Schmid, Hinrich Schütze, Simon See, Yangqiu Song

    Abstract: Theory of Mind (ToM), the ability to infer mental states in others, is pivotal for human social cognition. Existing evaluations of ToM in LLMs are largely limited to English, neglecting the linguistic diversity that shapes human cognition. This limitation raises a critical question: can LLMs exhibit Multilingual Theory of Mind, which is the capacity to reason about mental states across diverse lin… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  31. arXiv:2506.02362  [pdf, ps, other

    cs.CR cs.AI

    MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models

    Authors: Xueqi Cheng, Minxing Zheng, Shixiang Zhu, Yushun Dong

    Abstract: Model extraction attacks aim to replicate the functionality of a black-box model through query access, threatening the intellectual property (IP) of machine-learning-as-a-service (MLaaS) providers. Defending against such attacks is challenging, as it must balance efficiency, robustness, and utility preservation in the real-world scenario. Despite the recent advances, most existing defenses presume… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  32. arXiv:2506.01380  [pdf, ps, other

    cs.CV cs.AI

    Playing with Transformer at 30+ FPS via Next-Frame Diffusion

    Authors: Xinle Cheng, Tianyu He, Jiayi Xu, Junliang Guo, Di He, Jiang Bian

    Abstract: Autoregressive video models offer distinct advantages over bidirectional diffusion models in creating interactive video content and supporting streaming applications with arbitrary duration. In this work, we present Next-Frame Diffusion (NFD), an autoregressive diffusion transformer that incorporates block-wise causal attention, enabling iterative sampling and efficient inference via parallel toke… ▽ More

    Submitted 4 July, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: Project page: https://nextframed.github.io/

  33. arXiv:2506.01014  [pdf, ps, other

    eess.AS cs.SD

    Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching

    Authors: Jialong Zuo, Shengpeng Ji, Minghui Fang, Mingze Li, Ziyue Jiang, Xize Cheng, Xiaoda Yang, Chen Feiyang, Xinyu Duan, Zhou Zhao

    Abstract: Zero-Shot Voice Conversion (VC) aims to transform the source speaker's timbre into an arbitrary unseen one while retaining speech content. Most prior work focuses on preserving the source's prosody, while fine-grained timbre information may leak through prosody, and transferring target prosody to synthesized speech is rarely studied. In light of this, we propose R-VC, a rhythm-controllable and eff… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025 (Main Conference)

  34. arXiv:2506.00925  [pdf, ps, other

    q-bio.BM cs.CV cs.LG

    ProtInvTree: Deliberate Protein Inverse Folding with Reward-guided Tree Search

    Authors: Mengdi Liu, Xiaoxue Cheng, Zhangyang Gao, Hong Chang, Cheng Tan, Shiguang Shan, Xilin Chen

    Abstract: Designing protein sequences that fold into a target 3D structure, known as protein inverse folding, is a fundamental challenge in protein engineering. While recent deep learning methods have achieved impressive performance by recovering native sequences, they often overlook the one-to-many nature of the problem: multiple diverse sequences can fold into the same structure. This motivates the need f… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  35. arXiv:2506.00445  [pdf, ps, other

    cs.CL

    G2S: A General-to-Specific Learning Framework for Temporal Knowledge Graph Forecasting with Large Language Models

    Authors: Long Bai, Zixuan Li, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng, Tat-Seng Chua

    Abstract: Forecasting over Temporal Knowledge Graphs (TKGs) which predicts future facts based on historical ones has received much attention. Recent studies have introduced Large Language Models (LLMs) for this task to enhance the models' generalization abilities. However, these models perform forecasting via simultaneously learning two kinds of entangled knowledge in the TKG: (1) general patterns, i.e., in… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Findings of ACL 2025

  36. arXiv:2505.24279  [pdf, ps, other

    cs.IR

    On the Scaling of Robustness and Effectiveness in Dense Retrieval

    Authors: Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Robustness and Effectiveness are critical aspects of developing dense retrieval models for real-world applications. It is known that there is a trade-off between the two. Recent work has addressed scaling laws of effectiveness in dense retrieval, revealing a power-law relationship between effectiveness and the size of models and data. Does robustness follow scaling laws too? If so, can scaling imp… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  37. arXiv:2505.23835  [pdf, ps, other

    cs.CL

    Say What You Mean: Natural Language Access Control with Large Language Models for Internet of Things

    Authors: Ye Cheng, Minghui Xu, Yue Zhang, Kun Li, Hao Wu, Yechao Zhang, Shaoyong Guo, Wangjie Qiu, Dongxiao Yu, Xiuzhen Cheng

    Abstract: Access control in the Internet of Things (IoT) is becoming increasingly complex, as policies must account for dynamic and contextual factors such as time, location, user behavior, and environmental conditions. However, existing platforms either offer only coarse-grained controls or rely on rigid rule matching, making them ill-suited for semantically rich or ambiguous access scenarios. Moreover, th… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  38. arXiv:2505.23829  [pdf, other

    cs.CL

    BiasFilter: An Inference-Time Debiasing Framework for Large Language Models

    Authors: Xiaoqing Cheng, Ruizhe Chen, Hongying Zan, Yuxiang Jia, Min Peng

    Abstract: Mitigating social bias in large language models (LLMs) has become an increasingly important research objective. However, existing debiasing methods often incur high human and computational costs, exhibit limited effectiveness, and struggle to scale to larger models and open-ended generation tasks. To address these limitations, this paper proposes BiasFilter, a model-agnostic, inference-time debias… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  39. arXiv:2505.20081  [pdf, ps, other

    cs.CL cs.AI

    Inference-time Alignment in Continuous Space

    Authors: Yige Yuan, Teng Xiao, Li Yunfan, Bingbing Xu, Shuchang Tao, Yunqi Qiu, Huawei Shen, Xueqi Cheng

    Abstract: Aligning large language models with human feedback at inference time has received increasing attention due to its flexibility. Existing methods rely on generating multiple responses from the base policy for search using a reward model, which can be considered as searching in a discrete response space. However, these methods struggle to explore informative candidates when the base policy is weak or… ▽ More

    Submitted 28 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  40. arXiv:2505.18583  [pdf, ps, other

    cs.IR

    The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

    Authors: Hongru Song, Yu-an Liu, Ruqing Zhang, Jiafeng Guo, Jianming Lv, Maarten de Rijke, Xueqi Cheng

    Abstract: We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities. We focus on generating human-imperceptible adversarial examples and introduce a novel imperceptible retrieve-to-generate attack against RAG. This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate s… ▽ More

    Submitted 28 May, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: 18 pages,accepted by ACL25 findings

  41. BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models

    Authors: Zezhi Shao, Yujie Li, Fei Wang, Chengqing Yu, Yisong Fu, Tangwen Qian, Bin Xu, Boyu Diao, Yongjun Xu, Xueqi Cheng

    Abstract: The advent of universal time series forecasting models has revolutionized zero-shot forecasting across diverse domains, yet the critical role of data diversity in training these models remains underexplored. Existing large-scale time series datasets often suffer from inherent biases and imbalanced distributions, leading to suboptimal model performance and generalization. To address this gap, we in… ▽ More

    Submitted 26 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted by SIGKDD 2025 (Research Track)

  42. arXiv:2505.17656  [pdf, ps, other

    cs.CL

    Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs

    Authors: Hexiang Tan, Fei Sun, Sha Liu, Du Su, Qi Cao, Xin Chen, Jingang Wang, Xunliang Cai, Yuanzhuo Wang, Huawei Shen, Xueqi Cheng

    Abstract: As large language models (LLMs) often generate plausible but incorrect content, error detection has become increasingly critical to ensure truthfulness. However, existing detection methods often overlook a critical problem we term as self-consistent error, where LLMs repeatly generate the same incorrect response across multiple stochastic samples. This work formally defines self-consistent errors… ▽ More

    Submitted 29 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  43. arXiv:2505.17537  [pdf, ps, other

    cs.CL

    How Knowledge Popularity Influences and Enhances LLM Knowledge Boundary Perception

    Authors: Shiyu Ni, Keping Bi, Jiafeng Guo, Xueqi Cheng

    Abstract: Large language models (LLMs) often fail to recognize their knowledge boundaries, producing confident yet incorrect answers. In this paper, we investigate how knowledge popularity affects LLMs' ability to perceive their knowledge boundaries. Focusing on entity-centric factual question answering (QA), we quantify knowledge popularity from three perspectives: the popularity of entities in the questio… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  44. arXiv:2505.17104  [pdf, ps, other

    cs.CL cs.MM

    P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark

    Authors: Tao Sun, Enhao Pan, Zhengkai Yang, Kaixin Sui, Jiajun Shi, Xianfu Cheng, Tongliang Li, Wenhao Huang, Ge Zhang, Jian Yang, Zhoujun Li

    Abstract: Academic posters are vital for scholarly communication, yet their manual creation is time-consuming. However, automated academic poster generation faces significant challenges in preserving intricate scientific details and achieving effective visual-textual integration. Existing approaches often struggle with semantic richness and structural nuances, and lack standardized benchmarks for evaluating… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  45. arXiv:2505.17078  [pdf, ps, other

    cs.CL cs.AI

    GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace

    Authors: Zenghao Duan, Zhiyi Yin, Zhichao Shi, Liang Pang, Shaoling Jing, Jiayi Wu, Yu Yan, Huawei Shen, Xueqi Cheng

    Abstract: This paper investigates the underlying mechanisms of toxicity generation in Large Language Models (LLMs) and proposes an effective detoxification approach. Prior work typically considers the Feed-Forward Network (FFN) as the main source of toxicity, representing toxic regions as a set of toxic vectors or layer-wise subspaces. However, our in-depth analysis reveals that the global toxic subspace of… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  46. arXiv:2505.16652  [pdf, ps, other

    cs.CV cs.LG

    Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

    Authors: Feilong Tang, Chengzhi Liu, Zhongxing Xu, Ming Hu, Zelin Peng, Zhiwei Yang, Jionglong Su, Minquan Lin, Yifan Peng, Xuelian Cheng, Imran Razzak, Zongyuan Ge

    Abstract: Recent advancements in multimodal large language models (MLLMs) have significantly improved performance in visual question answering. However, they often suffer from hallucinations. In this work, hallucinations are categorized into two main types: initial hallucinations and snowball hallucinations. We argue that adequate contextual information can be extracted directly from the token interaction p… ▽ More

    Submitted 7 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Clarification note for the CVPR 2025 paper (FarSight). Prepared by a subset of the original authors; remaining co-authors are acknowledged in the text

  47. arXiv:2505.16315  [pdf, ps, other

    cs.AI cs.CL

    Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning

    Authors: Xiaoxue Cheng, Junyi Li, Zhenduo Zhang, Xinyu Tang, Wayne Xin Zhao, Xinyu Kong, Zhiqiang Zhang

    Abstract: Large reasoning models (LRMs) have demonstrated strong performance on complex reasoning tasks, but often suffer from overthinking, generating redundant content regardless of task difficulty. Inspired by the dual process theory in cognitive science, we propose Adaptive Cognition Policy Optimization (ACPO), a reinforcement learning framework that enables LRMs to achieve efficient reasoning through a… ▽ More

    Submitted 22 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: work in progress

  48. arXiv:2505.16142  [pdf, ps, other

    cs.CL

    Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning

    Authors: Shicheng Xu, Liang Pang, Yunchang Zhu, Jia Gu, Zihao Wei, Jingcheng Deng, Feiyang Pan, Huawei Shen, Xueqi Cheng

    Abstract: Distilling reasoning paths from teacher to student models via supervised fine-tuning (SFT) provides a shortcut for improving the reasoning ability of smaller Large Language Models (LLMs). However, the reasoning paths generated by teacher models often reflect only surface-level traces of their underlying authentic reasoning. Insights from cognitive neuroscience suggest that authentic reasoning invo… ▽ More

    Submitted 5 June, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: 15 pages

  49. arXiv:2505.15647  [pdf, ps, other

    cs.LG cs.AI

    Second-Order Convergence in Private Stochastic Non-Convex Optimization

    Authors: Youming Tao, Zuyuan Zhang, Dongxiao Yu, Xiuzhen Cheng, Falko Dressler, Di Wang

    Abstract: We investigate the problem of finding second-order stationary points (SOSP) in differentially private (DP) stochastic non-convex optimization. Existing methods suffer from two key limitations: (i) inaccurate convergence error rate due to overlooking gradient variance in the saddle point escape analysis, and (ii) dependence on auxiliary private model selection procedures for identifying DP-SOSP, wh… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  50. arXiv:2505.15178  [pdf, ps, other

    cs.LG

    A Unified Gradient-based Framework for Task-agnostic Continual Learning-Unlearning

    Authors: Zhehao Huang, Xinwen Cheng, Jie Zhang, Jinghao Zheng, Haoran Wang, Zhengbao He, Tao Li, Xiaolin Huang

    Abstract: Recent advancements in deep models have highlighted the need for intelligent systems that combine continual learning (CL) for knowledge acquisition with machine unlearning (MU) for data removal, forming the Continual Learning-Unlearning (CLU) paradigm. While existing work treats CL and MU as separate processes, we reveal their intrinsic connection through a unified optimization framework based on… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: text overlap with arXiv:2409.19732