Skip to main content

Showing 1–50 of 3,116 results for author: Li, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02844  [pdf, ps, other

    cs.CV cs.CL cs.CR

    Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

    Authors: Ziqi Miao, Yi Ding, Lijun Li, Jing Shao

    Abstract: With the emergence of strong visual-language capabilities, multimodal large language models (MLLMs) have demonstrated tremendous potential for real-world applications. However, the security vulnerabilities exhibited by the visual modality pose significant challenges to deploying such models in open-world environments. Recent studies have successfully induced harmful responses from target MLLMs by… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 16 pages

  2. arXiv:2507.02616  [pdf, ps, other

    cs.AI

    DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making

    Authors: Tianqi Shang, Weiqing He, Charles Zheng, Lingyao Li, Li Shen, Bingxin Zhao

    Abstract: The rise of Large Language Models (LLMs) has enabled the development of specialized AI agents with domain-specific reasoning and interaction capabilities, particularly in healthcare. While recent frameworks simulate medical decision-making, they largely focus on single-turn tasks where a doctor agent receives full case information upfront -- diverging from the real-world diagnostic process, which… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 16 pages

  3. arXiv:2507.02289  [pdf, ps, other

    eess.IV cs.CV

    CineMyoPS: Segmenting Myocardial Pathologies from Cine Cardiac MR

    Authors: Wangbin Ding, Lei Li, Junyi Qiu, Bogen Lin, Mingjing Yang, Liqin Huang, Lianming Wu, Sihan Wang, Xiahai Zhuang

    Abstract: Myocardial infarction (MI) is a leading cause of death worldwide. Late gadolinium enhancement (LGE) and T2-weighted cardiac magnetic resonance (CMR) imaging can respectively identify scarring and edema areas, both of which are essential for MI risk stratification and prognosis assessment. Although combining complementary information from multi-sequence CMR is useful, acquiring these sequences can… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  4. arXiv:2507.01961  [pdf, ps, other

    cs.RO cs.AI

    AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

    Authors: Sixiang Chen, Jiaming Liu, Siyuan Qian, Han Jiang, Lily Li, Renrui Zhang, Zhuoyang Liu, Chenyang Gu, Chengkai Hou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

    Abstract: Recently, mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks. However, existing methods still face challenges in coordinating mobile base and manipulator, primarily due to two limitations. On the one hand, they fail to explicitly model the influence of the mobile base on manipulator control, which easily leads to error accumu… ▽ More

    Submitted 3 July, 2025; v1 submitted 2 July, 2025; originally announced July 2025.

    Comments: Project website: https://ac-dit.github.io/

  5. arXiv:2507.01926  [pdf, ps, other

    cs.CV

    IC-Custom: Diverse Image Customization via In-Context Learning

    Authors: Yaowei Li, Xiaoyu Li, Zhaoyang Zhang, Yuxuan Bian, Gan Liu, Xinyuan Li, Jiale Xu, Wenbo Hu, Yating Liu, Lingen Li, Jing Cai, Yuexian Zou, Yancheng He, Ying Shan

    Abstract: Image customization, a crucial technique for industrial media production, aims to generate content that is consistent with reference images. However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse customization, limiting their applications across various scenarios. To overcome t… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Project page: https://liyaowei-stu.github.io/project/IC_Custom

  6. arXiv:2507.01831  [pdf, ps, other

    cs.LG stat.ML

    Out-of-Distribution Detection Methods Answer the Wrong Questions

    Authors: Yucen Lily Li, Daohan Lu, Polina Kirichenko, Shikai Qiu, Tim G. J. Rudner, C. Bayan Bruss, Andrew Gordon Wilson

    Abstract: To detect distribution shifts and improve model safety, many out-of-distribution (OOD) detection methods rely on the predictive uncertainty or features of supervised models trained on in-distribution data. In this paper, we critically re-examine this popular family of OOD detection procedures, and we argue that these methods are fundamentally answering the wrong questions for OOD detection. There… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Extended version of ICML 2025 paper

  7. arXiv:2507.01216  [pdf, ps, other

    cs.LG cs.CR

    PAE MobiLLM: Privacy-Aware and Efficient LLM Fine-Tuning on the Mobile Device via Additive Side-Tuning

    Authors: Xingke Yang, Liang Li, Zhiyi Wan, Sicong Li, Hao Wang, Xiaoqi Qi, Jiang Liu, Tomoaki Ohtsuki, Xin Fu, Miao Pan

    Abstract: There is a huge gap between numerous intriguing applications fostered by on-device large language model (LLM) fine-tuning (FT) from fresh mobile data and the limited resources of a mobile device. While existing server-assisted methods (e.g., split learning or side-tuning) may enable LLM FT on the local mobile device, they suffer from heavy communication burdens of activation transmissions, and may… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  8. arXiv:2507.01012  [pdf, ps, other

    cs.CV

    DAM-VSR: Disentanglement of Appearance and Motion for Video Super-Resolution

    Authors: Zhe Kong, Le Li, Yong Zhang, Feng Gao, Shaoshu Yang, Tao Wang, Kaihao Zhang, Zhuoliang Kang, Xiaoming Wei, Guanying Chen, Wenhan Luo

    Abstract: Real-world video super-resolution (VSR) presents significant challenges due to complex and unpredictable degradations. Although some recent methods utilize image diffusion models for VSR and have shown improved detail generation capabilities, they still struggle to produce temporally consistent frames. We attempt to use Stable Video Diffusion (SVD) combined with ControlNet to address this issue. H… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM SIGGRAPH 2025, Homepage: https://kongzhecn.github.io/projects/dam-vsr/ Github: https://github.com/kongzhecn/DAM-VSR

  9. arXiv:2507.00028  [pdf, ps, other

    cs.LG cs.AI cs.CV

    HiT-JEPA: A Hierarchical Self-supervised Trajectory Embedding Framework for Similarity Computation

    Authors: Lihuan Li, Hao Xue, Shuang Ao, Yang Song, Flora Salim

    Abstract: The representation of urban trajectory data plays a critical role in effectively analyzing spatial movement patterns. Despite considerable progress, the challenge of designing trajectory representations that can capture diverse and complementary information remains an open research problem. Existing methods struggle in incorporating trajectory fine-grained details and high-level summary in a singl… ▽ More

    Submitted 17 June, 2025; originally announced July 2025.

  10. arXiv:2507.00018  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

    Authors: Bo Wang, Qinyuan Cheng, Runyu Peng, Rong Bao, Peiji Li, Qipeng Guo, Linyang Li, Zhiyuan Zeng, Yunhua Zhou, Xipeng Qiu

    Abstract: Post-training processes are essential phases in grounding pre-trained language models to real-world tasks, with learning from demonstrations or preference signals playing a crucial role in this adaptation. We present a unified theoretical framework bridging Supervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training. Through rigorous mathematical derivation, we… ▽ More

    Submitted 15 June, 2025; originally announced July 2025.

  11. arXiv:2506.23918  [pdf, ps, other

    cs.CV

    Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

    Authors: Zhaochen Su, Peng Xia, Hangyu Guo, Zhenhua Liu, Yan Ma, Xiaoye Qu, Jiaqi Liu, Yanshu Li, Kaide Zeng, Zhengyuan Yang, Linjie Li, Yu Cheng, Heng Ji, Junxian He, Yi R. Fung

    Abstract: Recent progress in multimodal reasoning has been significantly advanced by textual Chain-of-Thought (CoT), a paradigm where models conduct reasoning within language. This text-centric approach, however, treats vision as a static, initial context, creating a fundamental "semantic gap" between rich perceptual data and discrete symbolic thought. Human cognition often transcends language, utilizing vi… ▽ More

    Submitted 3 July, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

    Comments: Preprint in progress. We maintain a real-time GitHub repository tracking progress at: https://github.com/zhaochen0110/Awesome_Think_With_Images

  12. arXiv:2506.23488  [pdf, ps, other

    cs.NI

    Generative AI-enhanced Low-Altitude UAV-Mounted Stacked Intelligent Metasurfaces

    Authors: Geng Sun, Mingzhe Fan, Lei Zhang, Hongyang Pan, Jiahui Li, Chuang Zhang, Linyao Li, Changyuan Zhao, Chau Yuen

    Abstract: Wireless communication systems face significant challenges in meeting the increasing demands for higher data rates and more reliable connectivity in complex environments. Stacked intelligent metasurfaces (SIMs) have emerged as a promising technology for realizing wave-domain signal processing, with mobile SIMs offering superior communication performance compared to their fixed counterparts. In thi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: This paper has been already submitted to TCCN

  13. arXiv:2506.23266  [pdf, ps, other

    cs.LG

    Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging

    Authors: Lujun Li, Zhu Qiyuan, Jiacheng Wang, Wei Li, Hao Gu, Sirui Han, Yike Guo

    Abstract: Mixture of Experts (MoE) LLMs face significant obstacles due to their massive parameter scale, which imposes memory, storage, and deployment challenges. Although recent expert merging methods promise greater efficiency by consolidating multiple experts, they are fundamentally hindered by parameter conflicts arising from expert specialization. In this paper, we present Sub-MoE, a novel MoE compress… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Work in progress, revisions ongoing

  14. arXiv:2506.23263  [pdf, ps, other

    cs.CV

    Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

    Authors: Lei-lei Li, Jianwu Fang, Junbin Xiao, Shanmin Pang, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua

    Abstract: Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars, and synthesizing causal-entity reflected accident videos can facilitate the capability test to respond to unaffordable accidents in reality. However, incorporating causal relations as seen in real-world videos into synthetic videos remains challenging. This work argues that precisely… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV2025

  15. arXiv:2506.22521  [pdf, ps, other

    cs.CR cs.AI

    A Survey on Model Extraction Attacks and Defenses for Large Language Models

    Authors: Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong

    Abstract: Model extraction attacks pose significant security threats to deployed language models, potentially compromising intellectual property and user privacy. This survey provides a comprehensive taxonomy of LLM-specific extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks. We analyze various attack methodologies inclu… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  16. arXiv:2506.22467  [pdf

    eess.SP cs.CV

    SegmentAnyMuscle: A universal muscle segmentation model across different locations in MRI

    Authors: Roy Colglazier, Jisoo Lee, Haoyu Dong, Hanxue Gu, Yaqian Chen, Joseph Cao, Zafer Yildiz, Zhonghao Liu, Nicholas Konz, Jichen Yang, Jikai Zhang, Yuwen Chen, Lin Li, Adrian Camarena, Maciej A. Mazurowski

    Abstract: The quantity and quality of muscles are increasingly recognized as important predictors of health outcomes. While MRI offers a valuable modality for such assessments, obtaining precise quantitative measurements of musculature remains challenging. This study aimed to develop a publicly available model for muscle segmentation in MRIs and demonstrate its applicability across various anatomical locati… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 24 pages, 6 figures

  17. arXiv:2506.21977  [pdf, ps, other

    eess.IV cs.CV

    StableCodec: Taming One-Step Diffusion for Extreme Image Compression

    Authors: Tianyu Zhang, Xin Luo, Li Li, Dong Liu

    Abstract: Diffusion-based image compression has shown remarkable potential for achieving ultra-low bitrate coding (less than 0.05 bits per pixel) with high realism, by leveraging the generative priors of large pre-trained text-to-image diffusion models. However, current approaches require a large number of denoising steps at the decoder to generate realistic results under extreme bitrate constraints, limiti… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  18. arXiv:2506.21864  [pdf, ps, other

    cs.CL cs.AI

    DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE

    Authors: Hang Shao, Heting Gao, Yunhang Shen, Jiawei Chen, Lijiang Li, Zuwei Long, Bo Tong, Ke Li, Xing Sun

    Abstract: Native multimodal large language models (MLLMs) restructure a single large language model (LLM) into a spoken language model (SLM) capable of both speech and text generation. Compared to modular and aligned MLLMs, native MLLMs preserve richer paralinguistic features such as emotion and prosody, and generate speech responses directly within the backbone LLM rather than using a separate speech decod… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Under Review

  19. arXiv:2506.21630  [pdf, ps, other

    cs.RO cs.CV cs.LG

    TOMD: A Trail-based Off-road Multimodal Dataset for Traversable Pathway Segmentation under Challenging Illumination Conditions

    Authors: Yixin Sun, Li Li, Wenke E, Amir Atapour-Abarghouei, Toby P. Breckon

    Abstract: Detecting traversable pathways in unstructured outdoor environments remains a significant challenge for autonomous robots, especially in critical applications such as wide-area search and rescue, as well as incident management scenarios like forest fires. Existing datasets and models primarily target urban settings or wide, vehicle-traversable off-road tracks, leaving a substantial gap in addressi… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 8 pages, 9 figures, 2025 IJCNN

  20. arXiv:2506.21263  [pdf, ps, other

    cs.LG cs.AI cs.CL

    DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster

    Authors: Ji Qi, WenPeng Zhu, Li Li, Ming Wu, YingJun Wu, Wu He, Xun Gao, Jason Zeng, Michael Heinrich

    Abstract: The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we conduct training on slow networks and thereby unleash the power of decentralized clusters when dealing with models exceeding 100 billion parameters? In this paper,… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  21. arXiv:2506.20966  [pdf, ps, other

    cs.RO cs.AI

    Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends

    Authors: Tian-Yu Xiang, Ao-Qun Jin, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Sheng-Bin Duan, Fu-Chao Xie, Wen-Kai Wang, Si-Cheng Wang, Ling-Yun Li, Tian Tu, Zeng-Guang Hou

    Abstract: Vision-language-action (VLA) models extend vision-language models (VLM) by integrating action generation modules for robotic manipulation. Leveraging strengths of VLM in vision perception and instruction understanding, VLA models exhibit promising generalization across diverse manipulation tasks. However, applications demanding high precision and accuracy reveal performance gaps without further ad… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  22. arXiv:2506.20954  [pdf, ps, other

    cs.RO

    Cooperative Circumnavigation for Multi-Quadrotor Systems via Onboard Sensing

    Authors: Xueming Liu, Lin Li, Xiang Zhou, Qingrui Zhang, Tianjiang Hu

    Abstract: A cooperative circumnavigation framework is proposed for multi-quadrotor systems to enclose and track a moving target without reliance on external localization systems. The distinct relationships between quadrotor-quadrotor and quadrotor-target interactions are evaluated using a heterogeneous perception strategy and corresponding state estimation algorithms. A modified Kalman filter is developed t… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 8 Pages, 7 figures. Accepted by RA-L

  23. arXiv:2506.20187  [pdf, ps, other

    cs.OS cs.CR

    Breaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPU

    Authors: He Sun, Li Li, Mingjun Xiao, Chengzhong Xu

    Abstract: Advanced Large Language Models (LLMs) have achieved impressive performance across a wide range of complex and long-context natural language tasks. However, performing long-context LLM inference locally on a commodity GPU (a PC) with privacy concerns remains challenging due to the increasing memory demands of the key-value (KV) cache. Existing systems typically identify important tokens and selecti… ▽ More

    Submitted 2 July, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: 15 pages, 23 figures

    MSC Class: 68M20 ACM Class: C.4

  24. arXiv:2506.20014  [pdf, ps, other

    physics.app-ph astro-ph.IM cs.AR eess.SY physics.optics

    Development of an Open-Source Spacecraft Bus for the PULSE-A CubeSat

    Authors: Graydon Schulze-Kalt, Robert Pitu, Spencer Shelton, Catherine Todd, Zane Ebel, Ian Goldberg, Leon Gold, Henry Czarnecki, Mason McCormack, Larry Li, Zumi Riekse, Brian Yu, Akash Piya, Vidya Suri, Dylan Hu, Colleen Kim, John Baird, Seth Knights, Logan Hanssler, Michael Lembeck, Tian Zhong

    Abstract: The undergraduate-led Polarization-modUlated Laser Satellite Experiment (PULSE-A) at the University of Chicago seeks to demonstrate the feasibility of circular polarization shift keyed satellite-to-ground laser communication. PULSE-A's low-cost open-source bus serves as the backbone of the mission and has been designed in tandem with the Payload, with design driven by strict requirements for point… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Submitted to Advanced Technologies II at the 2025 SmallSat Conference, reference number SSC25-P1-42

  25. arXiv:2506.19591  [pdf, ps, other

    cs.CV cs.AI cs.LG eess.IV

    Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications

    Authors: Lujun Li, Yiqun Wang, Radu State

    Abstract: Cloud cover in multispectral imagery (MSI) poses significant challenges for early season crop mapping, as it leads to missing or corrupted spectral information. Synthetic aperture radar (SAR) data, which is not affected by cloud interference, offers a complementary solution, but lack sufficient spectral detail for precise crop mapping. To address this, we propose a novel framework, Time-series MSI… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted as a conference paper at the 2025 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

  26. arXiv:2506.18945  [pdf, ps, other

    cs.LG cs.CL

    Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

    Authors: Zihan Wang, Rui Pan, Jiarui Yao, Robert Csordas, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li, Shiwei Liu

    Abstract: We propose Chain-of-Experts (CoE), a new Mixture-of-Experts (MoE) architecture that introduces sequential expert communication within each layer. Unlike traditional MoE models, where experts operate independently in parallel, CoE processes tokens iteratively across a chain of experts inside a layer. To support dynamic expert selection across iterations, CoE employs a dedicated router at each itera… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  27. arXiv:2506.18178  [pdf, ps, other

    cs.RO

    Integrating LLMs and Digital Twins for Adaptive Multi-Robot Task Allocation in Construction

    Authors: Min Deng, Bo Fu, Lingyao Li, Xi Wang

    Abstract: Multi-robot systems are emerging as a promising solution to the growing demand for productivity, safety, and adaptability across industrial sectors. However, effectively coordinating multiple robots in dynamic and uncertain environments, such as construction sites, remains a challenge, particularly due to unpredictable factors like material delays, unexpected site conditions, and weather-induced d… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  28. arXiv:2506.17613  [pdf, ps, other

    cs.DS cs.DB

    Contextual Pattern Mining and Counting

    Authors: Ling Li, Daniel Gibney, Sharma V. Thankachan, Solon P. Pissis, Grigorios Loukides

    Abstract: Given a string $P$ of length $m$, a longer string $T$ of length $n>m$, and two integers $l\geq 0$ and $r\geq 0$, the context of $P$ in $T$ is the set of all string pairs $(L,R)$, with $|L|=l$ and $|R|=r$, such that the string $LPR$ occurs in $T$. We introduce two problems related to the notion of context: (1) the Contextual Pattern Mining (CPM) problem, which given $T$, $(m,l,r)$, and an integer… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 27 pages, 15 figures

  29. arXiv:2506.17609  [pdf, ps, other

    cs.CL cs.LG

    TyphoFormer: Language-Augmented Transformer for Accurate Typhoon Track Forecasting

    Authors: Lincan Li, Eren Erman Ozguven, Yue Zhao, Guang Wang, Yiqun Xie, Yushun Dong

    Abstract: Accurate typhoon track forecasting is crucial for early system warning and disaster response. While Transformer-based models have demonstrated strong performance in modeling the temporal dynamics of dense trajectories of humans and vehicles in smart cities, they usually lack access to broader contextual knowledge that enhances the forecasting reliability of sparse meteorological trajectories, such… ▽ More

    Submitted 29 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

    Comments: Short research paper

  30. arXiv:2506.17232  [pdf, ps, other

    cs.LG cs.AI cs.CV

    PCaM: A Progressive Focus Attention-Based Information Fusion Method for Improving Vision Transformer Domain Adaptation

    Authors: Zelin Zang, Fei Wang, Liangyu Li, Jinlin Wu, Chunshui Zhao, Zhen Lei, Baigui Sun

    Abstract: Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Recent UDA methods based on Vision Transformers (ViTs) have achieved strong performance through attention-based feature alignment. However, we identify a key limitation: foreground object mismatch, where the discrepancy in foreground object size and spatial distribution acros… ▽ More

    Submitted 27 May, 2025; originally announced June 2025.

  31. arXiv:2506.17204  [pdf, ps, other

    cs.LG cs.AI

    Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

    Authors: Guozheng Ma, Lu Li, Zilin Wang, Li Shen, Pierre-Luc Bacon, Dacheng Tao

    Abstract: Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyo… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted to ICML 2025

  32. arXiv:2506.16995  [pdf, ps, other

    cs.AI

    Elevating Styled Mahjong Agents with Learning from Demonstration

    Authors: Lingfeng Li, Yunlong Lu, Yongyi Wang, Wenxin Li

    Abstract: A wide variety of bots in games enriches the gameplay experience and enhances replayability. Recent advancements in game artificial intelligence have predominantly focused on improving the proficiency of bots. Nevertheless, developing highly competent bots with a wide range of distinct play styles remains a relatively under-explored area. We select the Mahjong game environment as a case study. The… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  33. arXiv:2506.16803  [pdf, ps, other

    eess.IV cs.CV

    Temperature calibration of surface emissivities with an improved thermal image enhancement network

    Authors: Ning Chu, Siya Zheng, Shanqing Zhang, Li Li, Caifang Cai, Ali Mohammad-Djafari, Feng Zhao, Yuanbo Song

    Abstract: Infrared thermography faces persistent challenges in temperature accuracy due to material emissivity variations, where existing methods often neglect the joint optimization of radiometric calibration and image degradation. This study introduces a physically guided neural framework that unifies temperature correction and image enhancement through a symmetric skip-CNN architecture and an emissivity-… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  34. arXiv:2506.16495  [pdf, ps, other

    cs.MM cs.CV

    DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation

    Authors: Changsheng Gao, Zijie Liu, Li Li, Dong Liu, Xiaoyan Sun, Weisi Lin

    Abstract: Like image coding in visual data transmission, feature coding is essential for the distributed deployment of large models by significantly reducing transmission and storage overhead. However, prior studies have mostly targeted task- or model-specific scenarios, leaving the challenge of universal feature coding across diverse large models largely unaddressed. In this paper, we present the first sys… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  35. arXiv:2506.16102  [pdf, ps, other

    eess.IV cs.CV

    Fast Training-free Perceptual Image Compression

    Authors: Ziran Zhu, Tongda Xu, Minye Huang, Dailan He, Xingtong Ge, Xinjie Zhang, Ling Li, Yan Wang

    Abstract: Training-free perceptual image codec adopt pre-trained unconditional generative model during decoding to avoid training new conditional generative model. However, they heavily rely on diffusion inversion or sample communication, which take 1 min to intractable amount of time to decode a single image. In this paper, we propose a training-free algorithm that improves the perceptual quality of any ex… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  36. arXiv:2506.15959  [pdf

    cs.DL physics.soc-ph

    Can Recombination Displace Dominant Scientific Ideas

    Authors: Linzhuo Li, Yiling Lin, Lingfei Wu

    Abstract: Scientific breakthroughs are widely attributed to the novel recombination of existing ideas. Yet despite explosive global growth in scientific labor and publications -- creating more opportunities to reconfigure knowledge -- the rate of breakthroughs has not kept pace. To investigate this disconnect, we analyze 49 million scholarly works from 1960 to 2024 using measures of atypical recombination a… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 10 figures

  37. arXiv:2506.15929  [pdf, ps, other

    cs.CV cs.AI eess.IV

    MoiréXNet: Adaptive Multi-Scale Demoiréing with Linear Attention Test-Time Training and Truncated Flow Matching Prior

    Authors: Liangyan Li, Yimo Ning, Kevin Le, Wei Dong, Yunzhe Li, Jun Chen, Xiaohong Liu

    Abstract: This paper introduces a novel framework for image and video demoiréing by integrating Maximum A Posteriori (MAP) estimation with advanced deep learning techniques. Demoiréing addresses inherently nonlinear degradation processes, which pose significant challenges for existing methods. Traditional supervised learning approaches either fail to remove moiré patterns completely or produce overly smoo… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  38. arXiv:2506.15923  [pdf, ps, other

    cs.LG cs.AI cs.DC

    PNCS:Power-Norm Cosine Similarity for Diverse Client Selection in Federated Learning

    Authors: Liangyan Li, Yangyi Liu, Yimo Ning, Stefano Rini, Jun Chen

    Abstract: Federated Learning (FL) has emerged as a powerful paradigm for leveraging diverse datasets from multiple sources while preserving data privacy by avoiding centralized storage. However, many existing approaches fail to account for the intricate gradient correlations between remote clients, a limitation that becomes especially problematic in data heterogeneity scenarios. In this work, we propose a n… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  39. arXiv:2506.15544  [pdf, ps, other

    cs.LG

    Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning

    Authors: Roger Creus Castanyer, Johan Obando-Ceron, Lu Li, Pierre-Luc Bacon, Glen Berseth, Aaron Courville, Pablo Samuel Castro

    Abstract: Scaling deep reinforcement learning networks is challenging and often results in degraded performance, yet the root causes of this failure mode remain poorly understood. Several recent works have proposed mechanisms to address this, but they are often complex and fail to highlight the causes underlying this difficulty. In this work, we conduct a series of empirical analyses which suggest that the… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  40. arXiv:2506.14805  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?

    Authors: Yang Yao, Lingyu Li, Jiaxin Song, Chiyu Chen, Zhenqi He, Yixu Wang, Xin Wang, Tianle Gu, Jie Li, Yan Teng, Yingchun Wang

    Abstract: As Multimodal Large Language Models (MLLMs) continue to evolve, their cognitive and reasoning capabilities have seen remarkable progress. However, challenges in visual fine-grained perception and commonsense causal inference persist. This paper introduces Argus Inspection, a multimodal benchmark with two levels of difficulty, emphasizing detailed visual recognition while incorporating real-world c… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  41. arXiv:2506.14674  [pdf, ps, other

    cs.CV

    Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

    Authors: Ling Li, Yao Zhou, Yuxuan Liang, Fugee Tsung, Jiaheng Wei

    Abstract: Previous methods for image geo-localization have typically treated the task as either classification or retrieval, often relying on black-box decisions that lack interpretability. The rise of large vision-language models (LVLMs) has enabled a rethinking of geo-localization as a reasoning-driven task grounded in visual cues. However, two major challenges persist. On the data side, existing reasonin… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  42. arXiv:2506.14299  [pdf, ps, other

    cs.AI

    ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems

    Authors: Fanzhi Zeng, Siqi Wang, Chuzhao Zhu, Li Li

    Abstract: How to construct an interpretable autonomous driving decision-making system has become a focal point in academic research. In this study, we propose a novel approach that leverages large language models (LLMs) to generate executable, rule-based decision systems to address this challenge. Specifically, harnessing the strong reasoning and programming capabilities of LLMs, we introduce the ADRD(LLM-D… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  43. arXiv:2506.14246  [pdf, ps, other

    cs.AI

    Mxplainer: Explain and Learn Insights by Imitating Mahjong Agents

    Authors: Lingfeng Li, Yunlong Lu, Yongyi Wang, Qifan Zheng, Wenxin Li

    Abstract: People need to internalize the skills of AI agents to improve their own capabilities. Our paper focuses on Mahjong, a multiplayer game involving imperfect information and requiring effective long-term decision-making amidst randomness and hidden information. Through the efforts of AI researchers, several impressive Mahjong AI agents have already achieved performance levels comparable to those of p… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  44. arXiv:2506.14116  [pdf, other

    cs.RO

    Haptic-Based User Authentication for Tele-robotic System

    Authors: Rongyu Yu, Kan Chen, Zeyu Deng, Chen Wang, Burak Kizilkaya, Liying Emma Li

    Abstract: Tele-operated robots rely on real-time user behavior mapping for remote tasks, but ensuring secure authentication remains a challenge. Traditional methods, such as passwords and static biometrics, are vulnerable to spoofing and replay attacks, particularly in high-stakes, continuous interactions. This paper presents a novel anti-spoofing and anti-replay authentication approach that leverages disti… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  45. arXiv:2506.13585  [pdf, ps, other

    cs.CL cs.LG

    MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

    Authors: MiniMax, :, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li, Enwei Jiao, Haigang Zhou , et al. (103 additional authors not shown)

    Abstract: We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: A technical report from MiniMax. The authors are listed in alphabetical order. We open-source our MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1

  46. arXiv:2506.13192  [pdf, ps, other

    cs.CL cs.AI

    Breaking Thought Patterns: A Multi-Dimensional Reasoning Framework for LLMs

    Authors: Xintong Tang, Meiru Zhang, Shang Xiao, Junzhao Jin, Zihan Zhao, Liwei Li, Yang Zheng, Bangyi Wu

    Abstract: Large language models (LLMs) are often constrained by rigid reasoning processes, limiting their ability to generate creative and diverse responses. To address this, a novel framework called LADDER is proposed, combining Chain-of-Thought (CoT) reasoning, Mixture of Experts (MoE) models, and multi-dimensional up/down-sampling strategies which breaks the limitations of traditional LLMs. First, CoT re… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  47. arXiv:2506.13143  [pdf, ps, other

    cs.CL

    CMU's IWSLT 2025 Simultaneous Speech Translation System

    Authors: Siqi Ouyang, Xi Xu, Lei Li

    Abstract: This paper presents CMU's submission to the IWSLT 2025 Simultaneous Speech Translation (SST) task for translating unsegmented English speech into Chinese and German text in a streaming manner. Our end-to-end speech-to-text system integrates a chunkwise causal Wav2Vec 2.0 speech encoder, an adapter, and the Qwen2.5-7B-Instruct as the decoder. We use a two-stage simultaneous training procedure on ro… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: IWSLT 2025 System Description

  48. arXiv:2506.12760  [pdf, ps, other

    cs.SE

    IDOL: Improved Different Optimization Levels Testing for Solidity Compilers

    Authors: Lantian Li, Yejian Liang, Zhongxing Yu

    Abstract: As blockchain technology continues to evolve and mature, smart contracts have become a key driving force behind the digitization and automation of transactions. Smart contracts greatly simplify and refine the traditional business transaction processes, and thus have had a profound impact on various industries such as finance and supply chain management. However, because smart contracts cannot be m… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: Accepted by QRS 2025 (Fast Abstracts track)

  49. arXiv:2506.12756  [pdf, ps, other

    cs.IR cs.LG

    Hierarchical Group-wise Ranking Framework for Recommendation Models

    Authors: YaChen Yan, Liubo Li, Ravi Choudhary

    Abstract: In modern recommender systems, CTR/CVR models are increasingly trained with ranking objectives to improve item ranking quality. While this shift aligns training more closely with serving goals, most existing methods rely on in-batch negative sampling, which predominantly surfaces easy negatives. This limits the model's ability to capture fine-grained user preferences and weakens overall ranking pe… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  50. arXiv:2506.12355  [pdf, ps, other

    cs.LG cs.CL

    QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm

    Authors: Qirui Zhou, Shaohui Peng, Weiqiang Xiong, Haixin Chen, Yuanbo Wen, Haochen Li, Ling Li, Qi Guo, Yongwei Zhao, Ke Gao, Ruizhi Chen, Yanjun Wu, Chen Zhao, Yunji Chen

    Abstract: The attention operator remains a critical performance bottleneck in large language models (LLMs), particularly for long-context scenarios. While FlashAttention is the most widely used and effective GPU-aware acceleration algorithm, it must require time-consuming and hardware-specific manual implementation, limiting adaptability across GPU architectures. Existing LLMs have shown a lot of promise in… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    ACM Class: I.2.7