Skip to main content

Showing 1–50 of 233 results for author: Zhong, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.20214  [pdf, ps, other

    cs.CV cs.MM

    UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation

    Authors: Yanzhe Chen, Huasong Zhong, Yan Li, Zhenheng Yang

    Abstract: Unified multimodal large language models (MLLMs) have shown promise in jointly advancing multimodal understanding and generation, with visual codebooks discretizing images into tokens for autoregressive modeling. Existing codebook-based methods either rely on small vocabularies (~16K entries) that lack fine-grained semantics or naively scale up, resulting in low token utilization and unstable trai… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 19 pages, 5 figures

  2. arXiv:2506.19262  [pdf, ps, other

    cs.CL cs.LG

    What Matters in LLM-generated Data: Diversity and Its Effect on Model Fine-Tuning

    Authors: Yuchang Zhu, Huazhen Zhong, Qunshu Lin, Haotong Wei, Xiaolong Sun, Zixuan Yu, Minghao Liu, Zibin Zheng, Liang Chen

    Abstract: With the remarkable generative capabilities of large language models (LLMs), using LLM-generated data to train downstream models has emerged as a promising approach to mitigate data scarcity in specific domains and reduce time-consuming annotations. However, recent studies have highlighted a critical issue: iterative training on self-generated data results in model collapse, where model performanc… ▽ More

    Submitted 24 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: Ongoing work

  3. arXiv:2506.09940  [pdf, ps, other

    cs.LG cs.AI stat.ML

    The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability

    Authors: Jiachen Hu, Rui Ai, Han Zhong, Xiaoyu Chen, Liwei Wang, Zhaoran Wang, Zhuoran Yang

    Abstract: Information asymmetry is a pervasive feature of multi-agent systems, especially evident in economics and social sciences. In these settings, agents tailor their actions based on private information to maximize their rewards. These strategic behaviors often introduce complexities due to confounding variables. Simultaneously, knowledge transportability poses another significant challenge, arising fr… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted at ICML 2025

  4. arXiv:2505.23793  [pdf, ps, other

    cs.CR cs.AI

    USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models

    Authors: Baolin Zheng, Guanlin Chen, Hongqiong Zhong, Qingyang Teng, Yingshui Tan, Zhendong Liu, Weixun Wang, Jiaheng Liu, Jian Yang, Huiyun Jing, Jincheng Wei, Wenbo Su, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang

    Abstract: Despite their remarkable achievements and widespread adoption, Multimodal Large Language Models (MLLMs) have revealed significant security vulnerabilities, highlighting the urgent need for robust safety evaluation benchmarks. Existing MLLM safety benchmarks, however, fall short in terms of data quality and coverge, and modal risk combinations, resulting in inflated and contradictory evaluation res… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  5. arXiv:2505.21457  [pdf, ps, other

    cs.CV cs.AI

    Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

    Authors: Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Zheng Huang, Mingyu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen

    Abstract: Active vision, also known as active perception, refers to the process of actively selecting where and how to look in order to gather task-relevant information. It is a critical component of efficient perception and decision-making in humans and advanced embodied agents. Recently, the use of Multimodal Large Language Models (MLLMs) as central planning and decision-making modules in robotic systems… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Project Page: https://aim-uofa.github.io/ACTIVE-o3

  6. arXiv:2505.21154  [pdf, other

    cs.MA cs.AI cs.CY

    GGBond: Growing Graph-Based AI-Agent Society for Socially-Aware Recommender Simulation

    Authors: Hailin Zhong, Hanlin Wang, Yujun Ye, Meiyi Zhang, Shengxin Zhu

    Abstract: Current personalized recommender systems predominantly rely on static offline data for algorithm design and evaluation, significantly limiting their ability to capture long-term user preference evolution and social influence dynamics in real-world scenarios. To address this fundamental challenge, we propose a high-fidelity social simulation platform integrating human-like cognitive agents and dyna… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  7. arXiv:2505.20256  [pdf, ps, other

    cs.CV

    Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

    Authors: Hao Zhong, Muzhi Zhu, Zongze Du, Zheng Huang, Canyu Zhao, Mingyu Liu, Wen Wang, Hao Chen, Chunhua Shen

    Abstract: Long-horizon video-audio reasoning and fine-grained pixel understanding impose conflicting requirements on omnimodal models: dense temporal coverage demands many low-resolution frames, whereas precise grounding calls for high-resolution inputs. We tackle this trade-off with a two-system architecture: a Global Reasoning System selects informative keyframes and rewrites the task at low spatial cost,… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Project page: https://aim-uofa.github.io/OmniR1

  8. arXiv:2505.18933  [pdf, ps, other

    cs.AI cs.CL

    REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing

    Authors: Haitian Zhong, Yuhuan Liu, Ziyang Xu, Guofan Liu, Qiang Liu, Shu Wu, Zhe Zhao, Liang Wang, Tieniu Tan

    Abstract: Large language model editing methods frequently suffer from overfitting, wherein factual updates can propagate beyond their intended scope, overemphasizing the edited target even when it's contextually inappropriate. To address this challenge, we introduce REACT (Representation Extraction And Controllable Tuning), a unified two-phase framework designed for precise and controllable knowledge editin… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 15 pages, 4 figures

  9. arXiv:2505.18493  [pdf, ps, other

    stat.ML cs.LG math.ST

    Statistical Inference under Performativity

    Authors: Xiang Li, Yunai Li, Huiying Zhong, Lihua Lei, Zhun Deng

    Abstract: Performativity of predictions refers to the phenomena that prediction-informed decisions may influence the target they aim to predict, which is widely observed in policy-making in social sciences and economics. In this paper, we initiate the study of statistical inference under performativity. Our contribution is two-fold. First, we build a central limit theorem for estimation and inference under… ▽ More

    Submitted 18 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  10. arXiv:2505.12878  [pdf, ps, other

    cs.PL cs.SE

    NEAT: QCP: A Practical Separation Logic-based C Program Verification Tool

    Authors: Xiwei Wu, Yueyang Feng, Xiaoyang Lu, Tianchuan Lin, Kan Liu, Zhiyi Wang, Shushu Wu, Lihan Xie, Chengxi Yang, Hongyi Zhong, Naijun Zhan, Zhenjiang Hu, Qinxiang Cao

    Abstract: As software systems increase in size and complexity dramatically, ensuring their correctness, security, and reliability becomes an increasingly formidable challenge. Despite significant advancements in verification techniques and tools, there still remain %these tools still continue to encounter substantial difficulties when applying these tools to complex, real-world scenarios. To address these d… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  11. arXiv:2505.12648  [pdf, ps, other

    cs.RO

    SafeMove-RL: A Certifiable Reinforcement Learning Framework for Dynamic Motion Constraints in Trajectory Planning

    Authors: Tengfei Liu, Haoyang Zhong, Jiazheng Hu, Tan Zhang

    Abstract: This study presents a dynamic safety margin-based reinforcement learning framework for local motion planning in dynamic and uncertain environments. The proposed planner integrates real-time trajectory optimization with adaptive gap analysis, enabling effective feasibility assessment under partial observability constraints. To address safety-critical computations in unknown scenarios, an enhanced o… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  12. arXiv:2505.12250  [pdf, ps, other

    cs.CL cs.AI

    Not All Documents Are What You Need for Extracting Instruction Tuning Data

    Authors: Chi Zhang, Huaping Zhong, Hongtao Li, Chengliang Chai, Jiawei Hong, Yuhao Deng, Jiacheng Wang, Tian Tan, Yizhou Yan, Jiantao Qiu, Ye Yuan, Guoren Wang, Conghui He, Lei Cao

    Abstract: Instruction tuning improves the performance of large language models (LLMs), but it heavily relies on high-quality training data. Recently, LLMs have been used to synthesize instruction data using seed question-answer (QA) pairs. However, these synthesized instructions often lack diversity and tend to be similar to the input seeds, limiting their applicability in real-world scenarios. To address t… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  13. arXiv:2505.07855  [pdf, ps, other

    cs.RO

    A Physics-informed End-to-End Occupancy Framework for Motion Planning of Autonomous Vehicles

    Authors: Shuqi Shen, Junjie Yang, Hongliang Lu, Hui Zhong, Qiming Zhang, Xinhu Zheng

    Abstract: Accurate and interpretable motion planning is essential for autonomous vehicles (AVs) navigating complex and uncertain environments. While recent end-to-end occupancy prediction methods have improved environmental understanding, they typically lack explicit physical constraints, limiting safety and generalization. In this paper, we propose a unified end-to-end framework that integrates verifiable… ▽ More

    Submitted 6 June, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  14. arXiv:2505.03775  [pdf, other

    cs.LG

    Hierarchical Multi-Label Generation with Probabilistic Level-Constraint

    Authors: Linqing Chen, Weilei Wang, Wentao Wu, Hanmeng Zhong

    Abstract: Hierarchical Extreme Multi-Label Classification poses greater difficulties compared to traditional multi-label classification because of the intricate hierarchical connections of labels within a domain-specific taxonomy and the substantial number of labels. Some of the prior research endeavors centered on classifying text through several ancillary stages such as the cluster algorithm and multiphas… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  15. arXiv:2505.01538  [pdf, other

    cs.DB cs.CR cs.IR cs.LG

    HoneyBee: Efficient Role-based Access Control for Vector Databases via Dynamic Partitioning

    Authors: Hongbin Zhong, Matthew Lentz, Nina Narodytska, Adriana Szekeres, Kexin Rong

    Abstract: As vector databases gain traction in enterprise applications, robust access control has become critical to safeguard sensitive data. Access control in these systems is often implemented through hybrid vector queries, which combine nearest neighbor search on vector data with relational predicates based on user permissions. However, existing approaches face significant trade-offs: creating dedicated… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    ACM Class: H.2.4; H.3.3; D.4.6

  16. arXiv:2504.18383  [pdf, other

    cs.IR cs.AI

    Bridge the Domains: Large Language Models Enhanced Cross-domain Sequential Recommendation

    Authors: Qidong Liu, Xiangyu Zhao, Yejing Wang, Zijian Zhang, Howard Zhong, Chong Chen, Xiang Li, Wei Huang, Feng Tian

    Abstract: Cross-domain Sequential Recommendation (CDSR) aims to extract the preference from the user's historical interactions across various domains. Despite some progress in CDSR, two problems set the barrier for further advancements, i.e., overlap dilemma and transition complexity. The former means existing CDSR methods severely rely on users who own interactions on all domains to learn cross-domain item… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: accepted by SIGIR'25

  17. arXiv:2504.15295  [pdf, other

    cs.DC

    High-Efficiency Split Computing for Cooperative Edge Systems: A Novel Compressed Sensing Bottleneck

    Authors: Hailin Zhong, Donglong Chen

    Abstract: The advent of big data and AI has precipitated a demand for computational frameworks that ensure real-time performance, accuracy, and privacy. While edge computing mitigates latency and privacy concerns, its scalability is constrained by the resources of edge devices, thus prompting the adoption of split computing (SC) addresses these limitations. However, SC faces challenges in (1) efficient data… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  18. arXiv:2504.12342  [pdf, other

    cs.CL

    Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation

    Authors: Hanmeng Zhong, Linqing Chen, Weilei Wang, Wentao Wu

    Abstract: Recently, the application of the retrieval-augmented Large Language Models (LLMs) in specific domains has gained significant attention, especially in biopharmaceuticals. However, in this context, there is no benchmark specifically designed for biopharmaceuticals to evaluate LLMs. In this paper, we introduce the Biopharmaceuticals Retrieval-Augmented Generation Evaluation (BRAGE) , the first benchm… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  19. arXiv:2504.12341  [pdf, other

    cs.CL

    Streamlining Biomedical Research with Specialized LLMs

    Authors: Linqing Chen, Weilei Wang, Yubin Xia, Wentao Wu, Peng Xu, Zilong Bai, Jie Fang, Chaobo Xu, Ran Hu, Licong Xu, Haoran Hua, Jing Sun, Hanmeng Zhong, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yong Gu, Tao Shi, Chaochao Wang, Jianping Lu, Cheng Sun, Yixin Wang , et al. (8 additional authors not shown)

    Abstract: In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, imag… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Journal ref: Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations,p9--19,2025

  20. arXiv:2504.00490  [pdf

    cs.CV

    SCFANet: Style Distribution Constraint Feature Alignment Network For Pathological Staining Translation

    Authors: Zetong Chen, Yuzhuo Chen, Hai Zhong, Xu Qiao

    Abstract: Immunohistochemical (IHC) staining serves as a valuable technique for detecting specific antigens or proteins through antibody-mediated visualization. However, the IHC staining process is both time-consuming and costly. To address these limitations, the application of deep learning models for direct translation of cost-effective Hematoxylin and Eosin (H&E) stained images into IHC stained images ha… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  21. arXiv:2503.18421  [pdf, other

    cs.CV eess.IV

    4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video

    Authors: Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, XiaoyunZhang, Guangtao Zhai, Yanfeng Wang

    Abstract: 3D Gaussian Splatting (3DGS) has substantial potential for enabling photorealistic Free-Viewpoint Video (FVV) experiences. However, the vast number of Gaussians and their associated attributes poses significant challenges for storage and transmission. Existing methods typically handle dynamic 3DGS representation and compression separately, neglecting motion information and the rate-distortion (RD)… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: CVPR2025

  22. arXiv:2503.17750  [pdf, other

    cs.CV cs.MM

    Serial Low-rank Adaptation of Vision Transformer

    Authors: Houqiang Zhong, Shaocheng Shen, Ke Cai, Zhenglong Wu, Jiangchao Yao, Yuan Cheng, Xuefei Li, Xiaoyun Zhang, Li Song, Qiang Hu

    Abstract: Fine-tuning large pre-trained vision foundation models in a parameter-efficient manner is critical for downstream vision tasks, considering the practical constraints of computational and storage costs. Low-rank adaptation (LoRA) is a well-established technique in this domain, achieving impressive efficiency by reducing the parameter space to a low-rank form. However, developing more advanced low-r… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  23. arXiv:2503.13383  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning

    Authors: Mengyao Lyu, Yan Li, Huasong Zhong, Wenhao Yang, Hui Chen, Jungong Han, Guiguang Ding, Zhenheng Yang

    Abstract: The hypothesis that pretrained large language models (LLMs) necessitate only minimal supervision during the fine-tuning (SFT) stage (Zhou et al., 2024) has been substantiated by recent advancements in data curation and selection research. However, their stability and generalizability are compromised due to the vulnerability to experimental setups and validation protocols, falling short of surpassi… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: update comparison with sota and analysis

  24. arXiv:2503.04766  [pdf

    cs.CY

    Global AI Governance: Where the Challenge is the Solution- An Interdisciplinary, Multilateral, and Vertically Coordinated Approach

    Authors: Huixin Zhong, Thao Do, Ynagliu Jie, Rostam J. Neuwirth, Hong Shen

    Abstract: Current global AI governance frameworks struggle with fragmented disciplinary collaboration, ineffective multilateral coordination, and disconnects between policy design and grassroots implementation. This study, guided by Integration and Implementation Science (IIS) initiated a structured interdisciplinary dialogue at the UN Science Summit, convening legal, NGO, and HCI experts to tackle those ch… ▽ More

    Submitted 12 February, 2025; originally announced March 2025.

  25. arXiv:2503.04376  [pdf, other

    cs.CV

    MIDAS: Modeling Ground-Truth Distributions with Dark Knowledge for Domain Generalized Stereo Matching

    Authors: Peng Xu, Zhiyu Xiang, Jingyun Fu, Tianyu Pu, Hanzhi Zhong, Eryun Liu

    Abstract: Despite the significant advances in domain generalized stereo matching, existing methods still exhibit domain-specific preferences when transferring from synthetic to real domains, hindering their practical applications in complex and diverse scenarios. The probability distributions predicted by the stereo network naturally encode rich similarity and uncertainty information. Inspired by this obser… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  26. arXiv:2502.19971  [pdf, ps, other

    quant-ph cs.AI

    Efficient and Universal Neural-Network Decoder for Stabilizer-Based Quantum Error Correction

    Authors: Gengyuan Hu, Wanli Ouyang, Chao-Yang Lu, Chen Lin, Han-Sen Zhong

    Abstract: Scaling quantum computing to practical applications necessitates reliable quantum error correction. Although numerous correction codes have been proposed, the overall correction efficiency critically limited by the decode algorithms. We introduce GraphQEC, a code-agnostic decoder leveraging machine-learning on the graph structure of stabilizer codes with linear time complexity. GraphQEC demonstrat… ▽ More

    Submitted 3 June, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

  27. arXiv:2502.18957  [pdf, other

    cs.IT

    A Dynamic UAVs Cooperative Suppressive Jamming Method with Joint Task Assignment and Bandwidth Allocation

    Authors: Ruiqing Han, Tianxian Zhang, Han Zhong, Yuanhang Wang

    Abstract: The low detectability and low cost of unmanned aerial vehicles (UAVs) allow them to swarm near the radar network for effective jamming. The key to jamming is the reasonable task assignment and resource allocation of UAVs. However, the existing allocation model is somewhat ideal, weakly adaptive to the dynamic environment, and rarely considers frequency matching, which cannot suppress the frequency… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  28. arXiv:2502.15610  [pdf, ps, other

    cs.LG cs.AI

    A general language model for peptide identification

    Authors: Jixiu Zhai, Tianchi Lu, Haitian Zhong, Ziyang Xu, Yuhuan Liu, Shengrui Xu, Jingwan Wang, Dan Huang

    Abstract: Accurate identification of bioactive peptides (BPs) and protein post-translational modifications (PTMs) is essential for understanding protein function and advancing therapeutic discovery. However, most computational methods remain limited in their generalizability across diverse peptide functions. Here, we present PDeepPP, a unified deep learning framework that integrates pretrained protein langu… ▽ More

    Submitted 30 June, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 24 pages, 9 figures, 4 tables, submitted to arXiv

    MSC Class: 92C40; 68T07 ACM Class: I.2.6; J.3

  29. arXiv:2502.15246  [pdf, other

    cs.SE

    An approach for API synthesis using large language models

    Authors: Hua Zhong, Shan Jiang, Sarfraz Khurshid

    Abstract: APIs play a pivotal role in modern software development by enabling seamless communication and integration between various systems, applications, and services. Component-based API synthesis is a form of program synthesis that constructs an API by assembling predefined components from a library. Existing API synthesis techniques typically implement dedicated search strategies over bounded spaces of… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  30. arXiv:2502.14560  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Less is More: Improving LLM Alignment via Preference Data Selection

    Authors: Xun Deng, Han Zhong, Rui Ai, Fuli Feng, Zheng Wang, Xiangnan He

    Abstract: Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences. While prior work mainly extends DPO from the aspect of the objective function, we instead improve DPO from the largely overlooked but critical aspect of data selection. Specifically, we address the issue of parameter shrinkage caused by noisy data by proposing a novel… ▽ More

    Submitted 14 June, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  31. arXiv:2502.13923  [pdf, other

    cs.CV cs.CL

    Qwen2.5-VL Technical Report

    Authors: Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang , et al. (2 additional authors not shown)

    Abstract: We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities. Qwen2.5-VL achieves a major leap forward in understanding and interacting with the world through enhanced visual recognition, precise object localization, robust document parsing, and long-video comprehensio… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  32. arXiv:2502.08985  [pdf, other

    cs.LG cs.AI cs.MA

    Few is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement Learning

    Authors: Xun Wang, Zhuoran Li, Hai Zhong, Longbo Huang

    Abstract: As a data-driven approach, offline MARL learns superior policies solely from offline datasets, ideal for domains rich in historical data but with high interaction costs and risks. However, most existing methods are task-specific, requiring retraining for new tasks, leading to redundancy and inefficiency. To address this issue, in this paper, we propose a task-efficient multi-task offline MARL algo… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  33. arXiv:2502.08659  [pdf, other

    cs.RO

    Deployment-friendly Lane-changing Intention Prediction Powered by Brain-inspired Spiking Neural Networks

    Authors: Shuqi Shen, Junjie Yang, Hui Zhong, Hongliang Lu, Xinhu Zheng, Hai Yang

    Abstract: Accurate and real-time prediction of surrounding vehicles' lane-changing intentions is a critical challenge in deploying safe and efficient autonomous driving systems in open-world scenarios. Existing high-performing methods remain hard to deploy due to their high computational cost, long training times, and excessive memory requirements. Here, we propose an efficient lane-changing intention predi… ▽ More

    Submitted 8 May, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

  34. arXiv:2502.06777  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    Learning an Optimal Assortment Policy under Observational Data

    Authors: Yuxuan Han, Han Zhong, Miao Lu, Jose Blanchet, Zhengyuan Zhou

    Abstract: We study the fundamental problem of offline assortment optimization under the Multinomial Logit (MNL) model, where sellers must determine the optimal subset of the products to offer based solely on historical customer choice data. While most existing approaches to learning-based assortment optimization focus on the online learning of the optimal assortment through repeated interactions with custom… ▽ More

    Submitted 15 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  35. arXiv:2501.18858  [pdf, ps, other

    cs.LG cs.AI cs.CL

    BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

    Authors: Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping… ▽ More

    Submitted 6 June, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: ICML 2025

  36. arXiv:2501.13630  [pdf, other

    cs.MM

    VARFVV: View-Adaptive Real-Time Interactive Free-View Video Streaming with Edge Computing

    Authors: Qiang Hu, Qihan He, Houqiang Zhong, Guo Lu, Xiaoyun Zhang, Guangtao Zhai, Yanfeng Wang

    Abstract: Free-view video (FVV) allows users to explore immersive video content from multiple views. However, delivering FVV poses significant challenges due to the uncertainty in view switching, combined with the substantial bandwidth and computational resources required to transmit and decode multiple video streams, which may result in frequent playback interruptions. Existing approaches, either client-ba… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

  37. arXiv:2501.12942  [pdf, other

    cs.AI

    Offline Critic-Guided Diffusion Policy for Multi-User Delay-Constrained Scheduling

    Authors: Zhuoran Li, Ruishuo Chen, Hai Zhong, Longbo Huang

    Abstract: Effective multi-user delay-constrained scheduling is crucial in various real-world applications, such as instant messaging, live streaming, and data center management. In these scenarios, schedulers must make real-time decisions to satisfy both delay and resource constraints without prior knowledge of system dynamics, which are often time-varying and challenging to estimate. Current learning-based… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  38. arXiv:2501.06884  [pdf, other

    cs.CV

    Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning

    Authors: Hanwen Zhong, Jiaxin Chen, Yutong Zhang, Di Huang, Yunhong Wang

    Abstract: Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously. Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning. However, their rigid combination hampers both the optimization of MoE and the ef fectiv… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

    Comments: Accepted by the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

  39. arXiv:2412.14571  [pdf, other

    cs.CV cs.AI eess.IV

    SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection

    Authors: Ruoyu Xu, Zhiyu Xiang, Chenwei Zhang, Hanzhi Zhong, Xijun Zhao, Ruina Dang, Peng Xu, Tianyu Pu, Eryun Liu

    Abstract: 3D object detection is one of the fundamental perception tasks for autonomous vehicles. Fulfilling such a task with a 4D millimeter-wave radar is very attractive since the sensor is able to acquire 3D point clouds similar to Lidar while maintaining robust measurements under adverse weather. However, due to the high sparsity and noise associated with the radar point clouds, the performance of the e… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  40. arXiv:2412.12387  [pdf, other

    quant-ph cs.DC

    Differential Privacy Preserving Distributed Quantum Computing

    Authors: Hui Zhong, Keyi Ju, Jiachen Shen, Xinyue Zhang, Xiaoqi Qin, Tomoaki Ohtsuki, Miao Pan, Zhu Han

    Abstract: Existing quantum computers can only operate with hundreds of qubits in the Noisy Intermediate-Scale Quantum (NISQ) state, while quantum distributed computing (QDC) is regarded as a reliable way to address this limitation, allowing quantum computers to achieve their full computational potential. However, similar to classical distributed computing, QDC also faces the problem of privacy leakage. Exis… ▽ More

    Submitted 6 January, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

  41. arXiv:2412.11362  [pdf, other

    eess.IV cs.CV

    VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression

    Authors: Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, Yanfeng Wang

    Abstract: Neural Radiance Field (NeRF)-based volumetric video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences that provide audiences with unprecedented immersion and interactivity. However, the substantial data volumes pose significant challenges for storage and transmission. Existing solutions typically optimize NeRF representation and compression indepen… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  42. arXiv:2412.11014  [pdf, ps, other

    cs.LG cs.AI cs.AR cs.PL cs.SE

    CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation

    Authors: Zhendong Mi, Renming Zheng, Haowen Zhong, Yue Sun, Seth Kneeland, Sayan Moitra, Ken Kutzer, Zhaozhuo Xu Shaoyi Huang

    Abstract: Recent advances in agentic LLMs have demonstrated great capabilities in Verilog code generation. However, existing approaches either use LLM-assisted single-agent prompting or cooperation-only multi-agent learning, which will lead to: (i) Degeneration issue for single-agent learning: characterized by diminished error detection and correction capabilities; (ii) Error propagation in cooperation-only… ▽ More

    Submitted 5 June, 2025; v1 submitted 14 December, 2024; originally announced December 2024.

  43. arXiv:2412.10838  [pdf, other

    cond-mat.mtrl-sci cs.AI physics.app-ph

    Deep Learning Models for Colloidal Nanocrystal Synthesis

    Authors: Kai Gu, Yingping Liang, Jiaming Su, Peihan Sun, Jia Peng, Naihua Miao, Zhimei Sun, Ying Fu, Haizheng Zhong, Jun Zhang

    Abstract: Colloidal synthesis of nanocrystals usually includes complex chemical reactions and multi-step crystallization processes. Despite the great success in the past 30 years, it remains challenging to clarify the correlations between synthetic parameters of chemical reaction and physical properties of nanocrystals. Here, we developed a deep learning-based nanocrystal synthesis model that correlates syn… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  44. arXiv:2412.02210  [pdf, other

    cs.CV

    CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

    Authors: Zhibo Yang, Jun Tang, Zhaohai Li, Pengfei Wang, Jianqiang Wan, Humen Zhong, Xuejing Liu, Mingkun Yang, Peng Wang, Shuai Bai, LianWen Jin, Junyang Lin

    Abstract: Large Multimodal Models (LMMs) have demonstrated impressive performance in recognizing document images with natural language instructions. However, it remains unclear to what extent capabilities in literacy with rich structure and fine-grained visual challenges. The current landscape lacks a comprehensive benchmark to effectively measure the literate capabilities of LMMs. Existing benchmarks are o… ▽ More

    Submitted 10 December, 2024; v1 submitted 3 December, 2024; originally announced December 2024.

    Comments: 23 pages, 14 figures; The code will be released soon

  45. arXiv:2411.16763  [pdf, other

    cs.CR cs.AI cs.LG

    Hide in Plain Sight: Clean-Label Backdoor for Auditing Membership Inference

    Authors: Depeng Chen, Hao Chen, Hulin Jin, Jie Cui, Hong Zhong

    Abstract: Membership inference attacks (MIAs) are critical tools for assessing privacy risks and ensuring compliance with regulations like the General Data Protection Regulation (GDPR). However, their potential for auditing unauthorized use of data remains under explored. To bridge this gap, we propose a novel clean-label backdoor-based approach for MIAs, designed specifically for robust and stealthy data a… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

  46. arXiv:2411.11144  [pdf, other

    cs.LG cs.AI cs.CR

    CLMIA: Membership Inference Attacks via Unsupervised Contrastive Learning

    Authors: Depeng Chen, Xiao Liu, Jie Cui, Hong Zhong

    Abstract: Since machine learning model is often trained on a limited data set, the model is trained multiple times on the same data sample, which causes the model to memorize most of the training set data. Membership Inference Attacks (MIAs) exploit this feature to determine whether a data sample is used for training a machine learning model. However, in realistic scenarios, it is difficult for the adversar… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  47. arXiv:2410.19775  [pdf, other

    cs.CY cs.AI

    Gender Bias of LLM in Economics: An Existentialism Perspective

    Authors: Hui Zhong, Songsheng Chen, Mian Liang

    Abstract: Large Language Models (LLMs), such as GPT-4 and BERT, have rapidly gained traction in natural language processing (NLP) and are now integral to financial decision-making. However, their deployment introduces critical challenges, particularly in perpetuating gender biases that can distort decision-making outcomes in high-stakes economic environments. This paper investigates gender bias in LLMs thro… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: Gender Bias, Large Language Models, Decision-Making

  48. arXiv:2410.19450  [pdf, other

    cs.AI

    Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

    Authors: Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang

    Abstract: Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MAR… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  49. arXiv:2410.13713  [pdf, other

    cs.LG

    CrystalX: Ultra-Precision Crystal Structure Resolution and Error Correction Using Deep Learning

    Authors: Kaipeng Zheng, Weiran Huang, Wanli Ouyang, Han-Sen Zhong, Yuqiang Li

    Abstract: Atomic structure analysis of crystalline materials is a paramount endeavor in both chemical and material sciences. This sophisticated technique necessitates not only a solid foundation in crystallography but also a profound comprehension of the intricacies of the accompanying software, posing a significant challenge in meeting the rigorous daily demands. For the first time, we confront this challe… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  50. arXiv:2410.11180  [pdf, other

    cs.LG eess.SY

    Reinforcement Learning Based Bidding Framework with High-dimensional Bids in Power Markets

    Authors: Jinyu Liu, Hongye Guo, Yun Li, Qinghu Tang, Fuquan Huang, Tunan Chen, Haiwang Zhong, Qixin Chen

    Abstract: Over the past decade, bidding in power markets has attracted widespread attention. Reinforcement Learning (RL) has been widely used for power market bidding as a powerful AI tool to make decisions under real-world uncertainties. However, current RL methods mostly employ low dimensional bids, which significantly diverge from the N price-power pairs commonly used in the current power markets. The N-… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.