Skip to main content

Showing 1–50 of 2,694 results for author: Xu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02694  [pdf, ps, other

    cs.CL

    Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers

    Authors: Zhijian Xu, Yilun Zhao, Manasi Patwardhan, Lovekesh Vig, Arman Cohan

    Abstract: Peer review is fundamental to scientific research, but the growing volume of publications has intensified the challenges of this expertise-intensive process. While LLMs show promise in various scientific tasks, their potential to assist with peer review, particularly in identifying paper limitations, remains understudied. We first present a comprehensive taxonomy of limitation types in scientific… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  2. arXiv:2507.02288  [pdf, ps, other

    cs.CV cs.LG

    Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

    Authors: De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, Xinbo Gao

    Abstract: Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2507.01006  [pdf, ps, other

    cs.CV cs.AI cs.LG

    GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Authors: GLM-V Team, :, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Boyan Shi, Changyu Pang, Chenhui Zhang , et al. (54 additional authors not shown)

    Abstract: We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the fi… ▽ More

    Submitted 2 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

  4. arXiv:2507.00389  [pdf, ps, other

    cs.CL

    Causal Prompting for Implicit Sentiment Analysis with Large Language Models

    Authors: Jing Ren, Wenhao Zhou, Bowen Li, Mujie Liu, Nguyen Linh Dan Le, Jiade Cen, Liping Chen, Ziqi Xu, Xiwei Xu, Xiaodong Li

    Abstract: Implicit Sentiment Analysis (ISA) aims to infer sentiment that is implied rather than explicitly stated, requiring models to perform deeper reasoning over subtle contextual cues. While recent prompting-based methods using Large Language Models (LLMs) have shown promise in ISA, they often rely on majority voting over chain-of-thought (CoT) reasoning paths without evaluating their causal validity, m… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  5. arXiv:2506.23692  [pdf, ps, other

    cs.AI

    Agent4S: The Transformation of Research Paradigms from the Perspective of Large Language Models

    Authors: Boyuan Zheng, Zerui Fang, Zhe Xu, Rui Wang, Yiwen Chen, Cunshi Wang, Mengwei Qu, Lei Lei, Zhen Feng, Yan Liu, Yuyang Li, Mingzhou Tan, Jiaji Wu, Jianwei Shuai, Jia Li, Fangfu Ye

    Abstract: While AI for Science (AI4S) serves as an analytical tool in the current research paradigm, it doesn't solve its core inefficiency. We propose "Agent for Science" (Agent4S)-the use of LLM-driven agents to automate the entire research workflow-as the true Fifth Scientific Paradigm. This paper introduces a five-level classification for Agent4S, outlining a clear roadmap from simple task automation to… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  6. arXiv:2506.23334  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation

    Authors: Hongyi Pan, Ziliang Hong, Gorkem Durak, Ziyue Xu, Ulas Bagci

    Abstract: Federated learning (FL) has emerged as a promising paradigm for collaboratively training deep learning models across institutions without exchanging sensitive medical data. However, its effectiveness is often hindered by limited data availability and non-independent, identically distributed data across participating clients, which can degrade model performance and generalization. To address these… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  7. arXiv:2506.23281  [pdf, ps, other

    cs.SE cs.PL

    On the Feasibility of Deduplicating Compiler Bugs with Bisection

    Authors: Xintong Zhou, Zhenyang Xu, Chengnian Sun

    Abstract: Random testing has proven to be an effective technique for compiler validation. However, the debugging of bugs identified through random testing presents a significant challenge due to the frequent occurrence of duplicate test programs that expose identical compiler bugs. The process to identify duplicates is a practical research problem known as bug deduplication. Prior methodologies for compiler… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  8. arXiv:2506.21609  [pdf, ps, other

    cs.CL cs.AI cs.CR

    From Thinking to Output: Chain-of-Thought and Text Generation Characteristics in Reasoning Language Models

    Authors: Junhao Liu, Zhenhao Xu, Yuxin Fang, Yichuan Chen, Zuobin Ying, Wenhan Chang

    Abstract: Recently, there have been notable advancements in large language models (LLMs), demonstrating their growing abilities in complex reasoning. However, existing research largely overlooks a thorough and systematic comparison of these models' reasoning processes and outputs, particularly regarding their self-reflection pattern (also termed "Aha moment") and the interconnections across diverse domains.… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 18 pages, 3 figures

  9. arXiv:2506.21001  [pdf, ps, other

    cs.CV

    Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology

    Authors: Qiuyi Qi, Xin Li, Ming Kong, Zikang Xu, Bingdi Chen, Qiang Zhu, S Kevin Zhou

    Abstract: Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and ro… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: MIDL 2025 Oral

  10. arXiv:2506.20947  [pdf, ps, other

    cs.CV cs.MM

    Hierarchical Sub-action Tree for Continuous Sign Language Recognition

    Authors: Dejie Yang, Zhu Xu, Xinjie Gao, Yang Liu

    Abstract: Continuous sign language recognition (CSLR) aims to transcribe untrimmed videos into glosses, which are typically textual words. Recent studies indicate that the lack of large datasets and precise annotations has become a bottleneck for CSLR due to insufficient training data. To address this, some works have developed cross-modal solutions to align visual and textual modalities. However, they typi… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  11. arXiv:2506.19877  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Robust Anomaly Detection in Network Traffic: Evaluating Machine Learning Models on CICIDS2017

    Authors: Zhaoyang Xu, Yunbo Liu

    Abstract: Identifying suitable machine learning paradigms for intrusion detection remains critical for building effective and generalizable security solutions. In this study, we present a controlled comparison of four representative models - Multi-Layer Perceptron (MLP), 1D Convolutional Neural Network (CNN), One-Class Support Vector Machine (OCSVM) and Local Outlier Factor (LOF) - on the CICIDS2017 dataset… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: submitted to IEEE CNS 2025

  12. arXiv:2506.19843  [pdf, ps, other

    cs.AI

    Temporal-IRL: Modeling Port Congestion and Berth Scheduling with Inverse Reinforcement Learning

    Authors: Guo Li, Zixiang Xu, Wei Zhang, Yikuan Hu, Xinyu Yang, Nikolay Aristov, Mingjie Tang, Elenna R Dugundji

    Abstract: Predicting port congestion is crucial for maintaining reliable global supply chains. Accurate forecasts enableimprovedshipment planning, reducedelaysand costs, and optimizeinventoryanddistributionstrategies, thereby ensuring timely deliveries and enhancing supply chain resilience. To achieve accurate predictions, analyzing vessel behavior and their stay times at specific port terminals is essentia… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: TRB2025

  13. arXiv:2506.19742  [pdf, ps, other

    eess.IV cs.AI cs.CV

    NeRF-based CBCT Reconstruction needs Normalization and Initialization

    Authors: Zhuowei Xu, Han Li, Dai Sun, Zhicheng Li, Yujia Li, Qingpeng Kong, Zhiwei Cheng, Nassir Navab, S. Kevin Zhou

    Abstract: Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specif… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  14. arXiv:2506.19676  [pdf, ps, other

    cs.CR

    A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures

    Authors: Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Hujin Peng, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Ningyu Zhang, Chaochao Chen, Muhammad Khurram Khan, Meng Han

    Abstract: In recent years, Large-Language-Model-driven AI agents have exhibited unprecedented intelligence and adaptability, and are rapidly changing human production and life. Nowadays, agents are undergoing a new round of evolution. They no longer act as an isolated island like LLMs. Instead, they start to communicate with diverse external entities, such as other agents and tools, to perform more complex… ▽ More

    Submitted 2 July, 2025; v1 submitted 24 June, 2025; originally announced June 2025.

    Comments: 41 pages, 13 figures, submitted to IEEE COMST

  15. arXiv:2506.19256  [pdf, ps, other

    cs.NE cs.AI

    Enhancing Generalization of Spiking Neural Networks Through Temporal Regularization

    Authors: Boxuan Zhang, Zhen Xu, Kuan Tao

    Abstract: Spiking Neural Networks (SNNs) have received widespread attention due to their event-driven and low-power characteristics, making them particularly effective for processing event-based neuromorphic data. Recent studies have shown that directly trained SNNs suffer from severe overfitting issues due to the limited scale of neuromorphic datasets and the gradient mismatching problem, which fundamental… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Code is available at https://github.com/ZBX05/Temporal-Regularization-Training

  16. arXiv:2506.18890  [pdf, ps, other

    cs.CV

    4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

    Authors: Ziqiao Ma, Xuweiyi Chen, Shoubin Yu, Sai Bi, Kai Zhang, Chen Ziwen, Sihan Xu, Jianing Yang, Zexiang Xu, Kalyan Sunkavalli, Mohit Bansal, Joyce Chai, Hao Tan

    Abstract: Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at some times to any view at any time? We provide an affirmative answer with 4D-LRM, the first large-scale 4D reconstruction model that takes input from unconstrained views and timestamps and renders arbitrary novel view-time combinations. Unlike prior 4D approaches, e.g., optimizati… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Project page: https://4dlrm.github.io/

  17. arXiv:2506.17572  [pdf, ps, other

    cs.IT math.AG

    Signal Recovery on Algebraic Varieties Using Linear Samples

    Authors: Zhiqiang Xu

    Abstract: The recovery of an unknown signal from its linear measurements is a fundamental problem spanning numerous scientific and engineering disciplines. Commonly, prior knowledge suggests that the underlying signal resides within a known algebraic variety. This context naturally leads to a question: what is the minimum number of measurements required to uniquely recover any signal belonging to such an al… ▽ More

    Submitted 26 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

    Comments: 16 pages

  18. arXiv:2506.17288  [pdf, ps, other

    cs.IR cs.AI cs.CL

    SlimRAG: Retrieval without Graphs via Entity-Aware Context Selection

    Authors: Jiale Zhang, Jiaxiang Chen, Zhucong Li, Jie Ding, Kui Zhao, Zenglin Xu, Xin Pang, Yinghui Xu

    Abstract: Retrieval-Augmented Generation (RAG) enhances language models by incorporating external knowledge at inference time. However, graph-based RAG systems often suffer from structural overhead and imprecise retrieval: they require costly pipelines for entity linking and relation extraction, yet frequently return subgraphs filled with loosely related or tangential content. This stems from a fundamental… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  19. arXiv:2506.17264  [pdf, ps, other

    cs.LG cs.AI

    OAT-Rephrase: Optimization-Aware Training Data Rephrasing for Zeroth-Order LLM Fine-Tuning

    Authors: Jikai Long, Zijian Hu, Xiaodong Yu, Jianwen Xie, Zhaozhuo Xu

    Abstract: Fine-tuning large language models (LLMs) using zeroth-order optimization (ZO) offers a memory-efficient alternative to gradient-based methods but suffers from slower convergence and unstable optimization due to noisy gradient estimates. This paper introduces OAT-Rephrase, an Optimization-Aware Training data rephrasing strategy that leverages an LLM to rephrase training instances based on its under… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  20. arXiv:2506.17201  [pdf, ps, other

    cs.CV

    Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

    Authors: Jiaqi Li, Junshu Tang, Zhiyong Xu, Longhuang Wu, Yuan Zhou, Shuai Shao, Tianbao Yu, Zhiguo Cao, Qinglin Lu

    Abstract: Recent advances in diffusion-based and controllable video generation have enabled high-quality and temporally coherent video synthesis, laying the groundwork for immersive interactive gaming experiences. However, current methods face limitations in dynamics, generality, long-term consistency, and efficiency, which limit the ability to create various gameplay videos. To address these gaps, we intro… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Project page: https://hunyuan-gamecraft.github.io/

  21. arXiv:2506.17067  [pdf, ps, other

    eess.SP cs.IT cs.LG

    Empowering Near-Field Communications in Low-Altitude Economy with LLM: Fundamentals, Potentials, Solutions, and Future Directions

    Authors: Zhuo Xu, Tianyue Zheng, Linglong Dai

    Abstract: The low-altitude economy (LAE) is gaining significant attention from academia and industry. Fortunately, LAE naturally aligns with near-field communications in extremely large-scale MIMO (XL-MIMO) systems. By leveraging near-field beamfocusing, LAE can precisely direct beam energy to unmanned aerial vehicles, while the additional distance dimension boosts overall spectrum efficiency. However, near… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  22. arXiv:2506.16411  [pdf, ps, other

    cs.CL cs.LG

    When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework

    Authors: Zhen Xu, Shang Zhu, Jue Wang, Junlin Wang, Ben Athiwaratkun, Chi Wang, James Zou, Ce Zhang

    Abstract: We investigate the challenge of applying Large Language Models (LLMs) to long texts. We propose a theoretical framework that distinguishes the failure modes of long context tasks into three categories: cross-chunk dependence (task noise), confusion that grows with context size (model noise), and the imperfect integration of partial results (aggregator noise). Under this view, we analyze when it is… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: under review

  23. arXiv:2506.16151  [pdf, ps, other

    cs.CL cs.AI

    Under the Shadow of Babel: How Language Shapes Reasoning in LLMs

    Authors: Chenxi Wang, Yixuan Zhang, Lang Gao, Zixiang Xu, Zirui Song, Yanbo Wang, Xiuying Chen

    Abstract: Language is not only a tool for communication but also a medium for human cognition and reasoning. If, as linguistic relativity suggests, the structure of language shapes cognitive patterns, then large language models (LLMs) trained on human language may also internalize the habitual logical structures embedded in different languages. To examine this hypothesis, we introduce BICAUSE, a structured… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 15 pages, 10 figures

  24. arXiv:2506.16110  [pdf, ps, other

    cs.LG

    Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification

    Authors: Langzhang Liang, Fanchen Bu, Zixing Song, Zenglin Xu, Shirui Pan, Kijung Shin

    Abstract: The message-passing paradigm of Graph Neural Networks often struggles with exchanging information across distant nodes typically due to structural bottlenecks in certain graph regions, a limitation known as \textit{over-squashing}. To reduce such bottlenecks, \textit{graph rewiring}, which modifies graph topology, has been widely used. However, existing graph rewiring techniques often overlook the… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Published as a conference paper at ICML 2025

  25. arXiv:2506.15912  [pdf, ps, other

    cs.LG cs.CL cs.SD eess.AS

    Early Attentive Sparsification Accelerates Neural Speech Transcription

    Authors: Zifei Xu, Sayeh Sharify, Hesham Mostafa, Tristan Webb, Wanzin Yazar, Xin Wang

    Abstract: Transformer-based neural speech processing has achieved state-of-the-art performance. Since speech audio signals are known to be highly compressible, here we seek to accelerate neural speech transcription by time-domain signal sparsification early in the neural encoding stage, taking advantage of the interpretability of the self-attention mechanism in transformer audio encoders. With the Whisper f… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  26. arXiv:2506.15788  [pdf, ps, other

    cs.RO eess.SY

    Robust control for multi-legged elongate robots in noisy environments

    Authors: Baxi Chong, Juntao He, Daniel Irvine, Tianyu Wang, Esteban Flores, Daniel Soto, Jianfeng Lin, Zhaochen Xu, Vincent R Nienhusser, Grigoriy Blekherman, Daniel I. Goldman

    Abstract: Modern two and four legged robots exhibit impressive mobility on complex terrain, largely attributed to advancement in learning algorithms. However, these systems often rely on high-bandwidth sensing and onboard computation to perceive/respond to terrain uncertainties. Further, current locomotion strategies typically require extensive robot-specific training, limiting their generalizability across… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  27. arXiv:2506.15675  [pdf, ps, other

    cs.CV cs.AI

    Sekai: A Video Dataset towards World Exploration

    Authors: Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, Zhaopan Xu, Xinyue Li, Yukang Feng, Jianwen Sun, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Zhixiang Wang, Yuwei Wu, Tong He, Jiangmiao Pang, Yu Qiao, Yunde Jia, Kaipeng Zhang

    Abstract: Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai… ▽ More

    Submitted 20 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

    Comments: 12 pages, 6 figures

  28. arXiv:2506.15662  [pdf, ps, other

    cs.CL

    CC-LEARN: Cohort-based Consistency Learning

    Authors: Xiao Ye, Shaswat Shrivastava, Zhaonan Li, Jacob Dineen, Shijie Lu, Avneet Ahuja, Ming Shen, Zhikun Xu, Ben Zhou

    Abstract: Large language models excel at many tasks but still struggle with consistent, robust reasoning. We introduce Cohort-based Consistency Learning (CC-Learn), a reinforcement learning framework that improves the reliability of LLM reasoning by training on cohorts of similar questions derived from shared programmatic abstractions. To enforce cohort-level consistency, we define a composite objective com… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  29. arXiv:2506.15307  [pdf, ps, other

    cs.LG

    SecFwT: Efficient Privacy-Preserving Fine-Tuning of Large Language Models Using Forward-Only Passes

    Authors: Jinglong Luo, Zhuo Zhang, Yehong Zhang, Shiyu Liu, Ye Dong, Xun Zhou, Hui Wang, Yue Yu, Zenglin Xu

    Abstract: Large language models (LLMs) have transformed numerous fields, yet their adaptation to specialized tasks in privacy-sensitive domains, such as healthcare and finance, is constrained by the scarcity of accessible training data due to stringent privacy requirements. Secure multi-party computation (MPC)-based privacy-preserving machine learning offers a powerful approach to protect both model paramet… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  30. arXiv:2506.15084  [pdf, ps, other

    cs.SE cs.CV cs.HC

    An Empirical Study of Bugs in Data Visualization Libraries

    Authors: Weiqi Lu, Yongqiang Tian, Xiaohan Zhong, Haoyang Ma, Zhenyang Xu, Shing-Chi Cheung, Chengnian Sun

    Abstract: Data visualization (DataViz) libraries play a crucial role in presentation, data analysis, and application development, underscoring the importance of their accuracy in transforming data into visual representations. Incorrect visualizations can adversely impact user experience, distort information conveyance, and influence user perception and decision-making processes. Visual bugs in these librari… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Proc. ACM Softw. Eng. 2, FSE

  31. arXiv:2506.14731  [pdf, ps, other

    cs.CL cs.AI

    Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

    Authors: Ling Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo, Jiaming Liu, Jiewei Wu, Jun Mei, Jun Zhou, Junbo Zhao, Junwu Xiong, Kaihong Zhang, Kuan Xu, Lei Liang, Liang Jiang, Liangcheng Fu, Longfei Zheng, Qiang Gao, Qing Cui, Quan Wan , et al. (21 additional authors not shown)

    Abstract: We present Ring-lite, a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL) to achieve efficient and robust reasoning capabilities. Built upon the publicly available Ling-lite model, a 16.8 billion parameter model with 2.75 billion activated parameters, our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challeng… ▽ More

    Submitted 17 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: Technical Report

  32. arXiv:2506.14245  [pdf, ps, other

    cs.AI cs.CL

    Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

    Authors: Xumeng Wen, Zihan Liu, Shun Zheng, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for advancing the reasoning capabilities of Large Language Models (LLMs). However, a critical paradox clouds its efficacy: RLVR-tuned models often underperform their base models on the $Pass@K$ metric for solution-finding, leading to the hypothesis that RLVR merely re-weights existing reasoning paths at the c… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Preprint

  33. arXiv:2506.13827  [pdf, ps, other

    cs.GR cs.AI

    Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing

    Authors: Zhuoying Li, Zhu Xu, Yuxin Peng, Yang Liu

    Abstract: Instruction-based image editing, which aims to modify the image faithfully according to the instruction while preserving irrelevant content unchanged, has made significant progress. However, there still lacks a comprehensive metric for assessing the editing quality. Existing metrics either require high human evaluation costs, which hinder large-scale evaluation, or are adapted from other tasks and… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  34. arXiv:2506.13691  [pdf, ps, other

    cs.CV

    UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

    Authors: Zhucun Xue, Jiangning Zhang, Teng Hu, Haoyang He, Yinan Chen, Yuxuan Cai, Yabiao Wang, Chengjie Wang, Yong Liu, Xiangtai Li, Dacheng Tao

    Abstract: The quality of the video dataset (image quality, resolution, and fine-grained caption) greatly influences the performance of the video generation model. The growing demand for video applications sets higher requirements for high-quality video generation models. For example, the generation of movie-level Ultra-High Definition (UHD) videos and the creation of 4K short video content. However, the exi… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  35. arXiv:2506.13606  [pdf, ps, other

    math.CO cs.CG

    Largest dyadic dual VC-dimension of non-piercing families

    Authors: Xinqi Huang, Yuzhen Qi, Mingyuan Rong, Zixiang Xu

    Abstract: The dyadic dual VC-dimension of a set system \( \mathcal{F} \) is the largest integer \( \ell \) such that there exist \( \ell \) sets \( F_1, F_{2}, \dots, F_\ell \in \mathcal{F} \), where every pair \( \{i, j\} \in \binom{[\ell]}{2} \) is witnessed by an element \( a_{i,j} \in F_i \cap F_j \) that does not belong to any other set \( F_k \) with \( k \in [\ell] \setminus \{i, j\} \). In this pape… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 5 pages, 2 figures

    MSC Class: 52A35; 52C45

  36. arXiv:2506.13589  [pdf, ps, other

    cs.CV

    AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding

    Authors: Zhucun Xue, Jiangning Zhang, Xurong Xie, Yuxuan Cai, Yong Liu, Xiangtai Li, Dacheng Tao

    Abstract: Multimodal Large Language Models (MLLMs) struggle with long videos due to fixed context windows and weak long-term dependency modeling. Existing Retrieval-Augmented Generation (RAG) methods for videos use static retrieval strategies, leading to inefficiencies for simple queries and information loss for complex tasks. To address this, we propose AdaVideoRAG, a novel framework that dynamically adapt… ▽ More

    Submitted 17 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  37. arXiv:2506.13585  [pdf, ps, other

    cs.CL cs.LG

    MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    Authors: MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, Haigang Zhou , et al. (103 additional authors not shown)

    Abstract: We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

  38. arXiv:2506.13502  [pdf, ps, other

    cs.CL

    BOW: Bottlenecked Next Word Exploration

    Authors: Ming Shen, Zhikun Xu, Xiao Ye, Jacob Dineen, Ben Zhou

    Abstract: Large language models (LLMs) are typically trained via next-word prediction (NWP), which provides strong surface-level fluency but often lacks support for robust reasoning. We propose BOttlenecked next Word exploration (BOW), a novel RL framework that rethinks NWP by introducing a reasoning bottleneck where a policy model first generates a reasoning path rather than predicting the next token direc… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  39. arXiv:2506.12924  [pdf, ps, other

    cs.IT math.CO

    Optimal Reconstruction Codes with Given Reads in Multiple Burst-Substitutions Channels

    Authors: Wenjun Yu, Yubo Sun, Zixiang Xu, Gennian Ge, Moshe Schwartz

    Abstract: We study optimal reconstruction codes over the multiple-burst substitution channel. Our main contribution is establishing a trade-off between the error-correction capability of the code, the number of reads used in the reconstruction process, and the decoding list size. We show that over a channel that introduces at most $t$ bursts, we can use a length-$n$ code capable of correcting $ε$ errors, wi… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  40. arXiv:2506.12551  [pdf, ps, other

    cs.CR cs.AI

    MEraser: An Effective Fingerprint Erasure Approach for Large Language Models

    Authors: Jingxuan Zhang, Zhenhua Xu, Rui Hu, Wenpeng Xing, Xuhong Zhang, Meng Han

    Abstract: Large Language Models (LLMs) have become increasingly prevalent across various sectors, raising critical concerns about model ownership and intellectual property protection. Although backdoor-based fingerprinting has emerged as a promising solution for model authentication, effective attacks for removing these fingerprints remain largely unexplored. Therefore, we present Mismatched Eraser (MEraser… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025, Main Conference, Long Paper

  41. arXiv:2506.12509  [pdf, ps, other

    cs.AI

    Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs

    Authors: Jiwei Fang, Bin Zhang, Changwei Wang, Jin Wan, Zhiwei Xu

    Abstract: Verifying the reliability of complex, multi-step reasoning in Large Language Models (LLMs) remains a fundamental challenge, as existing methods often lack both faithfulness and precision. To address this issue, we propose the Graph of Verification (GoV) framework. GoV offers three key contributions: First, it explicitly models the underlying deductive process as a directed acyclic graph (DAG), whe… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  42. arXiv:2506.12352  [pdf, ps, other

    cs.AI cs.LG stat.ML

    Efficient Network Automatic Relevance Determination

    Authors: Hongwei Zhang, Ziqi Ye, Xinyuan Wang, Xin Guo, Zenglin Xu, Yuan Cheng, Zixin Hu, Yuan Qi

    Abstract: We propose Network Automatic Relevance Determination (NARD), an extension of ARD for linearly probabilistic models, to simultaneously model sparse relationships between inputs $X \in \mathbb R^{d \times N}$ and outputs $Y \in \mathbb R^{m \times N}$, while capturing the correlation structure among the $Y$. NARD employs a matrix normal prior which contains a sparsity-inducing parameter to identify… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  43. arXiv:2506.12339  [pdf, ps, other

    cs.HC cs.AI

    SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation

    Authors: Ruiyan Zhu, Xi Cheng, Ke Liu, Brian Zhu, Daniel Jin, Neeraj Parihar, Zhoutian Xu, Oliver Gao

    Abstract: We present SheetMind, a modular multi-agent framework powered by large language models (LLMs) for spreadsheet automation via natural language instructions. The system comprises three specialized agents: a Manager Agent that decomposes complex user instructions into subtasks; an Action Agent that translates these into structured commands using a Backus Naur Form (BNF) grammar; and a Reflection Agen… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: Ruiyan Zhu and Xi Cheng contributed equally to this work

  44. arXiv:2506.11454  [pdf

    eess.IV cs.CV

    FAD-Net: Frequency-Domain Attention-Guided Diffusion Network for Coronary Artery Segmentation using Invasive Coronary Angiography

    Authors: Nan Mu, Ruiqi Song, Xiaoning Li, Zhihui Xu, Jingfeng Jiang, Chen Zhao

    Abstract: Background: Coronary artery disease (CAD) remains one of the leading causes of mortality worldwide. Precise segmentation of coronary arteries from invasive coronary angiography (ICA) is critical for effective clinical decision-making. Objective: This study aims to propose a novel deep learning model based on frequency-domain analysis to enhance the accuracy of coronary artery segmentation and sten… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 35 pages, 12 figures

  45. arXiv:2506.11028  [pdf, ps, other

    cs.LG cs.AI

    Enhancing Epidemic Forecasting: Evaluating the Role of Mobility Data and Graph Convolutional Networks

    Authors: Suhan Guo, Zhenghao Xu, Furao Shen, Jian Zhao

    Abstract: Accurate prediction of contagious disease outbreaks is vital for informed decision-making. Our study addresses the gap between machine learning algorithms and their epidemiological applications, noting that methods optimal for benchmark datasets often underperform with real-world data due to difficulties in incorporating mobility information. We adopt a two-phase approach: first, assessing the sig… ▽ More

    Submitted 20 May, 2025; originally announced June 2025.

  46. arXiv:2506.10988  [pdf, other

    cs.SE cs.LG

    You Only Train Once: A Flexible Training Framework for Code Vulnerability Detection Driven by Vul-Vector

    Authors: Bowen Tian, Zhengyang Xu, Mingqiang Wu, Songning Lai, Yutai Yue

    Abstract: With the pervasive integration of computer applications across industries, the presence of vulnerabilities within code bases poses significant risks. The diversity of software ecosystems coupled with the intricate nature of modern software engineering has led to a shift from manual code vulnerability identification towards the adoption of automated tools. Among these, deep learning-based approache… ▽ More

    Submitted 12 March, 2025; originally announced June 2025.

    Comments: Under Review

  47. arXiv:2506.10831  [pdf, ps, other

    cs.LG cs.AI

    Efficiency Robustness of Dynamic Deep Learning Systems

    Authors: Ravishka Rathnasuriya, Tingxi Li, Zexin Xu, Zihe Song, Mirazul Haque, Simin Chen, Wei Yang

    Abstract: Deep Learning Systems (DLSs) are increasingly deployed in real-time applications, including those in resourceconstrained environments such as mobile and IoT devices. To address efficiency challenges, Dynamic Deep Learning Systems (DDLSs) adapt inference computation based on input complexity, reducing overhead. While this dynamic behavior improves efficiency, such behavior introduces new attack sur… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted to USENIX Security '25

  48. arXiv:2506.10751  [pdf, ps, other

    cs.LG cs.CL

    Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering

    Authors: Sai Prasanna Teja Reddy Bogireddy, Abrar Majeedi, Viswanatha Reddy Gajjala, Zhuoyan Xu, Siddhant Rai, Vaishnav Potlapalli

    Abstract: Automated question answering (QA) over electronic health records (EHRs) can bridge critical information gaps for clinicians and patients, yet it demands both precise evidence retrieval and faithful answer generation under limited supervision. In this work, we present Neural, the runner-up in the BioNLP 2025 ArchEHR-QA shared task on evidence-grounded clinical QA. Our proposed method decouples the… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  49. arXiv:2506.10531  [pdf, ps, other

    cs.DC

    GPU-Accelerated Distributed QAOA on Large-scale HPC Ecosystems

    Authors: Zhihao Xu, Srikar Chundury, Seongmin Kim, Amir Shehata, Xinyi Li, Ang Li, Tengfei Luo, Frank Mueller, In-Saeng Suh

    Abstract: Quantum computing holds great potential to accelerate the process of solving complex combinatorial optimization problems. The Distributed Quantum Approximate Optimization Algorithm (DQAOA) addresses high-dimensional, dense problems using current quantum computing techniques and high-performance computing (HPC) systems. In this work, we improve the scalability and efficiency of DQAOA through advanc… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  50. arXiv:2506.10395  [pdf, ps, other

    cs.CV cs.AI

    Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation

    Authors: Zhiyang Xu, Jiuhai Chen, Zhaojiang Lin, Xichen Pan, Lifu Huang, Tianyi Zhou, Madian Khabsa, Qifan Wang, Di Jin, Michihiro Yasunaga, Lili Yu, Xi Victoria Lin, Shaoliang Nie

    Abstract: Recent advances in large language models (LLMs) have enabled multimodal foundation models to tackle both image understanding and generation within a unified framework. Despite these gains, unified models often underperform compared to specialized models in either task. A key challenge in developing unified models lies in the inherent differences between the visual features needed for image underst… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Unified image understanding and generation model