Skip to main content

Showing 1–50 of 216 results for author: Niu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07581  [pdf, ps, other

    cs.LG cs.AI cs.DC

    FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning

    Authors: Tan Chen, Jintao Yan, Yuxuan Sun, Sheng Zhou, Zhisheng Niu

    Abstract: Federated learning (FL) is a promising paradigm for multiple devices to cooperatively train a model. When applied in wireless networks, two issues consistently affect the performance of FL, i.e., data heterogeneity of devices and limited bandwidth. Many papers have investigated device scheduling strategies considering the two issues. However, most of them recognize data heterogeneity as a property… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  2. arXiv:2506.07328  [pdf, ps, other

    cs.LG

    Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification

    Authors: Jintao Yan, Tan Chen, Yuxuan Sun, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

    Abstract: Asynchronous Federated Learning (AFL) enables distributed model training across multiple mobile devices, allowing each device to independently update its local model without waiting for others. However, device mobility introduces intermittent connectivity, which necessitates gradient sparsification and leads to model staleness, jointly affecting AFL convergence. This paper develops a theoretical m… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  3. arXiv:2506.03167  [pdf, ps, other

    cs.NI cs.ET cs.IT cs.LG

    Distributionally Robust Wireless Semantic Communication with Large AI Models

    Authors: Long Tan Le, Senura Hansaja Wanasekara, Zerun Niu, Yansong Shi, Nguyen H. Tran, Phuong Vo, Walid Saad, Dusit Niyato, Zhu Han, Choong Seon Hong, H. Vincent Poor

    Abstract: 6G wireless systems are expected to support massive volumes of data with ultra-low latency. However, conventional bit-level transmission strategies cannot support the efficiency and adaptability required by modern, data-intensive applications. The concept of semantic communication (SemCom) addresses this limitation by focusing on transmitting task-relevant semantic information instead of raw data.… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

    Comments: Under Review

  4. arXiv:2506.01748  [pdf, ps, other

    cs.CL

    Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning

    Authors: Yihong Tang, Kehai Chen, Muyun Yang, Zhengyu Niu, Jing Li, Tiejun Zhao, Min Zhang

    Abstract: The advancement of Large Language Models (LLMs) has spurred significant interest in Role-Playing Agents (RPAs) for applications such as emotional companionship and virtual interaction. However, recent RPAs are often built on explicit dialogue data, lacking deep, human-like internal thought processes, resulting in superficial knowledge and style expression. While Large Reasoning Models (LRMs) can b… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  5. arXiv:2505.19931  [pdf, ps, other

    eess.AS cs.SD

    Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling

    Authors: Qixi Zheng, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiaofei Wang, Kai Yu, Xie Chen

    Abstract: Flow-matching-based text-to-speech (TTS) models, such as Voicebox, E2 TTS, and F5-TTS, have attracted significant attention in recent years. These models require multiple sampling steps to reconstruct speech from noise, making inference speed a key challenge. Reducing the number of sampling steps can greatly improve inference efficiency. To this end, we introduce Fast F5-TTS, a training-free appro… ▽ More

    Submitted 4 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  6. arXiv:2505.19595  [pdf, ps, other

    eess.AS cs.SD

    Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

    Authors: Jeongsoo Choi, Zhikang Niu, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen

    Abstract: The goal of this paper is to optimize the training process of diffusion-based text-to-speech models. While recent studies have achieved remarkable advancements, their training demands substantial time and computational costs, largely due to the implicit guidance of diffusion models in learning complex intermediate representations. To address this, we propose A-DMA, an effective strategy for Accele… ▽ More

    Submitted 30 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Interspeech 2025

  7. arXiv:2505.13032  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

    Authors: Ziyang Ma, Yinghao Ma, Yanqiao Zhu, Chen Yang, Yi-Wen Chao, Ruiyang Xu, Wenxi Chen, Yuanzhe Chen, Zhuo Chen, Jian Cong, Kai Li, Keliang Li, Siyou Li, Xinfeng Li, Xiquan Li, Zheng Lian, Yuzhe Liang, Minghao Liu, Zhikang Niu, Tianrui Wang, Yuping Wang, Yuxuan Wang, Yihao Wu, Guanrou Yang, Jianwei Yu , et al. (9 additional authors not shown)

    Abstract: We introduce MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs) across massive multi-disciplinary tasks. MMAR comprises 1,000 meticulously curated audio-question-answer triplets, collected from real-world internet videos and refined through iterative error corrections and quality checks to ensure high quality. Unlike existing benchmarks that… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Open-source at https://github.com/ddlBoJack/MMAR

  8. arXiv:2505.02118  [pdf, ps, other

    cs.AI

    Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

    Authors: Wei Liu, Zhongyu Niu, Lang Gao, Zhiying Deng, Jun Wang, Haozhao Wang, Ruixuan Li

    Abstract: This study investigates the self-rationalization framework constructed with a cooperative game, where a generator initially extracts the most informative segment from raw input, and a subsequent predictor utilizes the selected subset for its input. The generator and predictor are trained collaboratively to maximize prediction accuracy. In this paper, we first uncover a potential caveat: such a coo… ▽ More

    Submitted 28 June, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  9. arXiv:2504.20624  [pdf, other

    cs.AI

    PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval

    Authors: Zihan Niu, Zheyong Xie, Shaosheng Cao, Chonggang Lu, Zheyu Ye, Tong Xu, Zuozhu Liu, Yan Gao, Jia Chen, Zhe Xu, Yi Wu, Yao Hu

    Abstract: Social chatbots have become essential intelligent companions in daily scenarios ranging from emotional support to personal interaction. However, conventional chatbots with passive response mechanisms usually rely on users to initiate or sustain dialogues by bringing up new topics, resulting in diminished engagement and shortened dialogue duration. In this paper, we present PaRT, a novel framework… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  10. arXiv:2504.14611  [pdf, other

    cs.DC

    Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference

    Authors: Yaodan Xu, Sheng Zhou, Zhisheng Niu

    Abstract: With the growing integration of artificial intelligence in mobile applications, a substantial number of deep neural network (DNN) inference requests are generated daily by mobile devices. Serving these requests presents significant challenges due to limited device resources and strict latency requirements. Therefore, edge-device co-inference has emerged as an effective paradigm to address these is… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Accepted by 2025 IEEE International Conference on Communications (ICC)

  11. arXiv:2504.12867  [pdf, other

    eess.AS cs.AI cs.CL

    EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

    Authors: Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, ShiLiang Zhang, Xie Chen

    Abstract: Human speech goes beyond the mere transfer of information; it is a profound exchange of emotions and a connection between individuals. While Text-to-Speech (TTS) models have made huge progress, they still face challenges in controlling the emotional expression in the generated speech. In this work, we propose EmoVoice, a novel emotion-controllable TTS model that exploits large language models (LLM… ▽ More

    Submitted 21 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  12. arXiv:2503.21476  [pdf, ps, other

    cs.DC cs.IT cs.LG

    Robust DNN Partitioning and Resource Allocation Under Uncertain Inference Time

    Authors: Zhaojun Nan, Yunchu Han, Sheng Zhou, Zhisheng Niu

    Abstract: In edge intelligence systems, deep neural network (DNN) partitioning and data offloading can provide real-time task inference for resource-constrained mobile devices. However, the inference time of DNNs is typically uncertain and cannot be precisely determined in advance, presenting significant challenges in ensuring timely task processing within deadlines. To address the uncertain inference time,… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  13. arXiv:2503.06202  [pdf, other

    cs.AI cs.LG

    Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization

    Authors: Wei Liu, Zhiying Deng, Zhongyu Niu, Jun Wang, Haozhao Wang, Zhigang Zeng, Ruixuan Li

    Abstract: Extracting a small subset of crucial rationales from the full input is a key problem in explainability research. The most widely used fundamental criterion for rationale extraction is the maximum mutual information (MMI) criterion. In this paper, we first demonstrate that MMI suffers from diminishing marginal returns. Once part of the rationale has been identified, finding the remaining portions c… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  14. arXiv:2502.20757  [pdf, other

    cs.CL

    The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents

    Authors: Yihong Tang, Kehai Chen, Xuefeng Bai, Zhengyu Niu, Bo Wang, Jie Liu, Min Zhang

    Abstract: Large Language Models (LLMs) have made remarkable advances in role-playing dialogue agents, demonstrating their utility in character simulations. However, it remains challenging for these agents to balance character portrayal utility with content safety because this essential character simulation often comes with the risk of generating unsafe content. To address this issue, we first conduct a syst… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

  15. arXiv:2502.17810  [pdf, other

    cs.CL eess.AS

    URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

    Authors: Ruiqi Yan, Xiquan Li, Wenxi Chen, Zhikang Niu, Chen Yang, Ziyang Ma, Kai Yu, Xie Chen

    Abstract: In recent years, with advances in large language models (LLMs), end-to-end spoken dialogue models (SDMs) have made significant strides. Compared to text-based LLMs, the evaluation of SDMs needs to take speech-related aspects into account, such as paralinguistic information and speech quality. However, there is still a lack of comprehensive evaluations for SDMs in speech-to-speech (S2S) scenarios.… ▽ More

    Submitted 1 March, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

  16. arXiv:2502.15721  [pdf

    cs.IR cs.AI cs.DL

    iTRI-QA: a Toolset for Customized Question-Answer Dataset Generation Using Language Models for Enhanced Scientific Research

    Authors: Qiming Liu, Zhongzheng Niu, Siting Liu, Mao Tian

    Abstract: The exponential growth of AI in science necessitates efficient and scalable solutions for retrieving and preserving research information. Here, we present a tool for the development of a customized question-answer (QA) dataset, called Interactive Trained Research Innovator (iTRI) - QA, tailored for the needs of researchers leveraging language models (LMs) to retrieve scientific knowledge in a QA f… ▽ More

    Submitted 27 January, 2025; originally announced February 2025.

    Comments: 13 pages, 3 figures

  17. arXiv:2502.06295  [pdf, ps, other

    cs.LG cs.NI

    DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis

    Authors: Yunchu Han, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

    Abstract: The rapid development of deep neural networks (DNNs) is inherently accompanied by the problem of high computational costs. To tackle this challenge, dynamic voltage frequency scaling (DVFS) is emerging as a promising technology for balancing the latency and energy consumption of DNN inference by adjusting the computing frequency of processors. However, most existing models of DNN inference time ar… ▽ More

    Submitted 20 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

  18. arXiv:2501.17216  [pdf, other

    cs.LG

    Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting

    Authors: Jingru Fei, Kun Yi, Wei Fan, Qi Zhang, Zhendong Niu

    Abstract: We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. This technique comprises an energy amplification block and an energy restoration block. The energy amplification block enhances the energy of low-energy components to improve the model's learning efficiency for these components, while the energy r… ▽ More

    Submitted 22 February, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

    Comments: Accepted by AAAI 2025

  19. arXiv:2501.10638  [pdf, other

    cs.CV cs.IR

    A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval

    Authors: Weihang Zhang, Jihao Li, Shuoke Li, Ziqing Niu, Jialiang Chen, Wenkai Zhang

    Abstract: Remote sensing text--image retrieval (RSTIR) aims to retrieve the matched remote sensing (RS) images from the database according to the descriptive text. Recently, the rapid development of large visual-language pre-training models provides new insights for RSTIR. Nevertheless, as the complexity of models grows in RSTIR, the previous studies suffer from suboptimal resource efficiency during transfe… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  20. arXiv:2501.08615   

    cs.LG

    Towards Aligned Data Forgetting via Twin Machine Unlearning

    Authors: Zhenxing Niu, Haoxuan Ji, Yuyao Sun, Zheng Lin, Fei Gao, Yuhang Wang, Haichao Gao

    Abstract: Modern privacy regulations have spurred the evolution of machine unlearning, a technique enabling a trained model to efficiently forget specific training data. In prior unlearning methods, the concept of "data forgetting" is often interpreted and implemented as achieving zero classification accuracy on such data. Nevertheless, the authentic aim of machine unlearning is to achieve alignment between… ▽ More

    Submitted 23 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Comments: This paper is withdrawn as the updated version will be published to arXiv:2408.11433. We apologize for the miscommunication earlier

  21. arXiv:2501.06869  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    A Foundational Generative Model for Breast Ultrasound Image Analysis

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Haotian Ye, Siyu He, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, James Zou, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Foundational models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development to breast ultrasound analysis remains untapped. In this paper, we present BUSGen, the first foundational generative model specifically designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired ex… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

    Comments: Peking University; Stanford University; Peking University Cancer Hospital & Institute; Peking Union Medical College Hospital; Cancer Hospital, Chinese Academy of Medical Sciences

  22. arXiv:2501.02181  [pdf, other

    cs.DC cs.LG eess.SY

    SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services

    Authors: Yaodan Xu, Sheng Zhou, Zhisheng Niu

    Abstract: For servers incorporating parallel computing resources, batching is a pivotal technique for providing efficient and economical services at scale. Parallel computing resources exhibit heightened computational and energy efficiency when operating with larger batch sizes. However, in the realm of online services, the adoption of a larger batch size may lead to longer response times. This paper aims t… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

    Comments: Accepted by IEEE Transactions on Parallel and Distributed Systems (TPDS)

  23. arXiv:2412.13565  [pdf, other

    cs.CV cs.AI

    CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing

    Authors: Xiaole Xian, Xilin He, Zenghao Niu, Junliang Zhang, Weicheng Xie, Siyang Song, Zitong Yu, Linlin Shen

    Abstract: For efficient and high-fidelity local facial attribute editing, most existing editing methods either require additional fine-tuning for different editing effects or tend to affect beyond the editing regions. Alternatively, inpainting methods can edit the target image region while preserving external areas. However, current inpainting methods still suffer from the generation misalignment with facia… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: accepted by aaai

  24. Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images

    Authors: Zheng Gong, Zhuo Deng, Weihao Gao, Wenda Zhou, Yuhang Yang, Hanqing Zhao, Zhiyuan Niu, Lei Shao, Wenbin Wei, Lan Ma

    Abstract: Cataract is one of the most common blinding eye diseases and can be treated by surgery. However, because cataract patients may also suffer from other blinding eye diseases, ophthalmologists must diagnose them before surgery. The cloudy lens of cataract patients forms a hazy degeneration in the fundus images, making it challenging to observe the patient's fundus vessels, which brings difficulties t… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 12 pages, 8 figures

  25. arXiv:2411.12273  [pdf, other

    eess.IV cs.CV

    Acquire Precise and Comparable Fundus Image Quality Score: FTHNet and FQS Dataset

    Authors: Zheng Gong, Zhuo Deng, Run Gan, Zhiyuan Niu, Lu Chen, Canfeng Huang, Jia Liang, Weihao Gao, Fang Li, Shaochong Zhang, Lan Ma

    Abstract: The retinal fundus images are utilized extensively in the diagnosis, and their quality can directly affect the diagnosis results. However, due to the insufficient dataset and algorithm application, current fundus image quality assessment (FIQA) methods are not powerful enough to meet ophthalmologists` demands. In this paper, we address the limitations of datasets and algorithms in FIQA. First, we… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 11 pages, 7 figures

  26. arXiv:2411.04511  [pdf, other

    eess.SP cs.LG

    Improve the Fitting Accuracy of Deep Learning for the Nonlinear Schrödinger Equation Using Linear Feature Decoupling Method

    Authors: Yunfan Zhang, Zekun Niu, Minghui Shi, Weisheng Hu, Lilin Yi

    Abstract: We utilize the Feature Decoupling Distributed (FDD) method to enhance the capability of deep learning to fit the Nonlinear Schrodinger Equation (NLSE), significantly reducing the NLSE loss compared to non decoupling model.

    Submitted 7 November, 2024; originally announced November 2024.

  27. arXiv:2411.01603  [pdf, other

    cs.RO

    An Aerial Transport System in Marine GNSS-Denied Environment

    Authors: Jianjun Sun, Zhenwei Niu, Yihao Dong, Fenglin Zhang, Muhayy Ud Din, Lakmal Seneviratne, Defu Lin, Irfan Hussain, Shaoming He

    Abstract: This paper presents an autonomous aerial system specifically engineered for operation in challenging marine GNSS-denied environments, aimed at transporting small cargo from a target vessel. In these environments, characterized by weakly textured sea surfaces with few feature points, chaotic deck oscillations due to waves, and significant wind gusts, conventional navigation methods often prove inad… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

  28. arXiv:2410.13486  [pdf, other

    cs.CV

    SemSim: Revisiting Weak-to-Strong Consistency from a Semantic Similarity Perspective for Semi-supervised Medical Image Segmentation

    Authors: Shiao Xie, Hongyi Wang, Ziwei Niu, Hao Sun, Shuyi Ouyang, Yen-Wei Chen, Lanfen Lin

    Abstract: Semi-supervised learning (SSL) for medical image segmentation is a challenging yet highly practical task, which reduces reliance on large-scale labeled dataset by leveraging unlabeled samples. Among SSL techniques, the weak-to-strong consistency framework, popularized by FixMatch, has emerged as a state-of-the-art method in classification tasks. Notably, such a simple pipeline has also shown compe… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  29. arXiv:2410.06885  [pdf, other

    eess.AS cs.SD

    F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

    Authors: Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen

    Abstract: This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as duration model, text encoder, and phoneme alignment, the text input is simply padded with filler tokens to the same length as input speech, and then the denoising is performed for speech generation, which was originally pr… ▽ More

    Submitted 20 May, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: 17 pages, 9 tables, 3 figures

  30. arXiv:2410.06003  [pdf, other

    cs.LG

    Is the MMI Criterion Necessary for Interpretability? Degenerating Non-causal Features to Plain Noise for Self-Rationalization

    Authors: Wei Liu, Zhiying Deng, Zhongyu Niu, Jun Wang, Haozhao Wang, YuanKai Zhang, Ruixuan Li

    Abstract: An important line of research in the field of explainability is to extract a small subset of crucial rationales from the full input. The most widely used criterion for rationale extraction is the maximum mutual information (MMI) criterion. However, in certain datasets, there are spurious features non-causally correlated with the label and also get high mutual information, complicating the loss lan… ▽ More

    Submitted 21 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted at NeurIPS 2024. arXiv admin note: text overlap with arXiv:2309.13391

  31. arXiv:2409.19592  [pdf, other

    cs.CV cs.LG cs.MA

    DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model

    Authors: Ruiqing Mao, Haotian Wu, Yukuan Jia, Zhaojun Nan, Yuxuan Sun, Sheng Zhou, Deniz Gündüz, Zhisheng Niu

    Abstract: Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: 7 pages, 4 figures

  32. arXiv:2409.12717  [pdf, other

    eess.AS cs.SD

    NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization

    Authors: Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu

    Abstract: Built upon vector quantization (VQ), discrete audio codec models have achieved great success in audio compression and auto-regressive audio generation. However, existing models face substantial challenges in perceptual quality and signal distortion, especially when operating in extremely low bandwidth, rooted in the sensitivity of the VQ codebook to noise. This degradation poses significant challe… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  33. arXiv:2409.10925  [pdf, other

    cs.CV

    HGSLoc: 3DGS-based Heuristic Camera Pose Refinement

    Authors: Zhongyan Niu, Zhen Tan, Jinpu Zhang, Xueliang Yang, Dewen Hu

    Abstract: Visual localization refers to the process of determining camera poses and orientation within a known scene representation. This task is often complicated by factors such as illumination changes and variations in viewing angles. In this paper, we propose HGSLoc, a novel lightweight, plug and-play pose optimization framework, which integrates 3D reconstruction with a heuristic refinement strategy to… ▽ More

    Submitted 20 September, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

  34. arXiv:2409.09362  [pdf, other

    cs.CL

    Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM

    Authors: Yuanjie Lyu, Tong Xu, Zihan Niu, Bo Peng, Jing Ke, Enhong Chen

    Abstract: The prosperity of social media platforms has raised the urgent demand for semantic-rich services, e.g., event and storyline attribution. However, most existing research focuses on clip-level event understanding, primarily through basic captioning tasks, without analyzing the causes of events across an entire movie. This is a significant challenge, as even advanced multimodal large language models… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

  35. arXiv:2408.11433  [pdf, other

    cs.LG cs.AI

    Towards Aligned Data Removal via Twin Machine Unlearning

    Authors: Haoxuan Ji, Zheng Lin, Yuyao Sun, Gao Fei, Yuhang Wang, Haichang Gao, Zhenxing Niu

    Abstract: Modern privacy regulations have spurred the evolution of machine unlearning, a technique that enables the removal of data from an already trained ML model without requiring retraining from scratch. Previous unlearning methods tend to induce the model to achieve lowest classification accuracy on the removal data. Nonetheless, the authentic objective of machine unlearning is to align the unlearned m… ▽ More

    Submitted 2 May, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

  36. arXiv:2407.20505  [pdf, other

    cs.CV

    Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate

    Authors: Zheng Lin, Zhenxing Niu, Zhibin Wang, Yinghui Xu

    Abstract: MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination. Previous methods focus on determining whether a generated output is hallucinated, without identifying which image region leads to the hallucination or interpreting why such hallucinations occur. In this paper, we argue that hallucination in MLLMs is partially due to a lack of slow-thinki… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  37. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  38. arXiv:2407.16244  [pdf, other

    cs.CV cs.AI cs.MM

    HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification

    Authors: Shuyi Ouyang, Hongyi Wang, Ziwei Niu, Zhenjia Bai, Shiao Xie, Yingying Xu, Ruofeng Tong, Yen-Wei Chen, Lanfen Lin

    Abstract: The task of multi-label image classification involves recognizing multiple objects within a single image. Considering both valuable semantic information contained in the labels and essential visual features presented in the image, tight visual-linguistic interactions play a vital role in improving classification performance. Moreover, given the potential variance in object size and appearance with… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 10 pages, 6 figures

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia. 2023: 4768-4777

  39. arXiv:2407.00412  [pdf, other

    cs.RO cs.IT cs.MA cs.NI

    C-MASS: Combinatorial Mobility-Aware Sensor Scheduling for Collaborative Perception with Second-Order Topology Approximation

    Authors: Yukuan Jia, Yuxuan Sun, Ruiqing Mao, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

    Abstract: Collaborative Perception (CP) has been a promising solution to address occlusions in the traffic environment by sharing sensor data among collaborative vehicles (CoV) via vehicle-to-everything (V2X) network. With limited wireless bandwidth, CP necessitates task-oriented and receiver-aware sensor scheduling to prioritize important and complementary sensor data. However, due to vehicular mobility, i… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures

  40. arXiv:2406.17470  [pdf, ps, other

    cs.LG cs.AI cs.DC cs.IT

    Dynamic Scheduling for Vehicle-to-Vehicle Communications Enhanced Federated Learning

    Authors: Jintao Yan, Tan Chen, Yuxuan Sun, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

    Abstract: Leveraging the computing and sensing capabilities of vehicles, vehicular federated learning (VFL) has been applied to edge training for connected vehicles. The dynamic and interconnected nature of vehicular networks presents unique opportunities to harness direct vehicle-to-vehicle (V2V) communications, enhancing VFL training efficiency. In this paper, we formulate a stochastic optimization proble… ▽ More

    Submitted 8 June, 2025; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by the IEEE Transactions on Wireless Communications

  41. arXiv:2406.14979  [pdf, other

    cs.CL

    Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

    Authors: Yuanjie Lyu, Zihan Niu, Zheyong Xie, Chao Zhang, Tong Xu, Yang Wang, Enhong Chen

    Abstract: Despite the significant progress of large language models (LLMs) in various tasks, they often produce factual errors due to their limited internal knowledge. Retrieval-Augmented Generation (RAG), which enhances LLMs with external knowledge sources, offers a promising solution. However, these methods can be misled by irrelevant paragraphs in retrieved documents. Due to the inherent uncertainty in L… ▽ More

    Submitted 8 October, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  42. arXiv:2406.03086  [pdf, other

    cs.MA cs.IT cs.LG

    Task-Oriented Wireless Communications for Collaborative Perception in Intelligent Unmanned Systems

    Authors: Sheng Zhou, Yukuan Jia, Ruiqing Mao, Zhaojun Nan, Yuxuan Sun, Zhisheng Niu

    Abstract: Collaborative Perception (CP) has shown great potential to achieve more holistic and reliable environmental perception in intelligent unmanned systems (IUSs). However, implementing CP still faces key challenges due to the characteristics of the CP task and the dynamics of wireless channels. In this article, a task-oriented wireless communication framework is proposed to jointly optimize the commun… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Network Magazine

  43. arXiv:2405.20015  [pdf, other

    cs.AI cs.CL

    Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak

    Authors: Zhenxing Niu, Yuyao Sun, Haoxuan Ji, Zheng Lin, Haichang Gao, Xinbo Gao, Gang Hua, Rong Jin

    Abstract: This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreak methods that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) built upon the target LLM. Subsequently, we perform an efficient MLLM jailbreak and obta… ▽ More

    Submitted 16 May, 2025; v1 submitted 30 May, 2024; originally announced May 2024.

  44. arXiv:2405.17929  [pdf, other

    cs.CV

    Towards Unified Robustness Against Both Backdoor and Adversarial Attacks

    Authors: Zhenxing Niu, Yuyao Sun, Qiguang Miao, Rong Jin, Gang Hua

    Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct robustness problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, this paper revealed that there is an intriguing connection between them: (1) planting a backdoor… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  45. Performance Prediction of On-NIC Network Functions with Multi-Resource Contention and Traffic Awareness

    Authors: Shaofeng Wu, Qiang Su, Zhixiong Niu, Hong Xu

    Abstract: Network function (NF) offloading on SmartNICs has been widely used in modern data centers, offering benefits in host resource saving and programmability. Co-running NFs on the same SmartNICs can cause performance interference due to contention of onboard resources. To meet performance SLAs while ensuring efficient resource management, operators need mechanisms to predict NF performance under such… ▽ More

    Submitted 9 February, 2025; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: New version

  46. arXiv:2405.00980  [pdf, other

    cs.CL cs.CV

    A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News

    Authors: Zhe Niu, Ronglai Zuo, Brian Mak, Fangyun Wei

    Abstract: This paper introduces TVB-HKSL-News, a new Hong Kong sign language (HKSL) dataset collected from a TV news program over a period of 7 months. The dataset is collected to enrich resources for HKSL and support research in large-vocabulary continuous sign language recognition (SLR) and translation (SLT). It consists of 16.07 hours of sign videos of two signers with a vocabulary of 6,515 glosses (for… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by LREC-COLING 2024

  47. arXiv:2404.03707  [pdf, other

    cs.LG cs.AI cs.IR

    Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study

    Authors: Zechun Niu, Jiaxin Mao, Qingyao Ai, Ji-Rong Wen

    Abstract: Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. While the CLTR models can be theoretically unbiased when the user behavior assumption is correct and the propensity estimation is accurate, their effectiveness is usually empirically evaluated via simulation-based exp… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  48. arXiv:2403.15156  [pdf, other

    cs.RO cs.CV eess.SY

    Infrastructure-Assisted Collaborative Perception in Automated Valet Parking: A Safety Perspective

    Authors: Yukuan Jia, Jiawen Zhang, Shimeng Lu, Baokang Fan, Ruiqing Mao, Sheng Zhou, Zhisheng Niu

    Abstract: Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructur… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 7 figures, 4 tables, accepted by IEEE VTC2024-Spring

  49. arXiv:2403.13850  [pdf, other

    cs.LG cs.AI physics.flu-dyn

    Spatio-Temporal Fluid Dynamics Modeling via Physical-Awareness and Parameter Diffusion Guidance

    Authors: Hao Wu, Fan Xu, Yifan Duan, Ziwei Niu, Weiyan Wang, Gaofeng Lu, Kun Wang, Yuxuan Liang, Yang Wang

    Abstract: This paper proposes a two-stage framework named ST-PAD for spatio-temporal fluid dynamics modeling in the field of earth sciences, aiming to achieve high-precision simulation and prediction of fluid dynamics through spatio-temporal physics awareness and parameter diffusion guidance. In the upstream stage, we design a vector quantization reconstruction module with temporal evolution characteristics… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  50. FedSPU: Personalized Federated Learning for Resource-constrained Devices with Stochastic Parameter Update

    Authors: Ziru Niu, Hai Dong, A. K. Qin

    Abstract: Personalized Federated Learning (PFL) is widely employed in IoT applications to handle high-volume, non-iid client data while ensuring data privacy. However, heterogeneous edge devices owned by clients may impose varying degrees of resource constraints, causing computation and communication bottlenecks for PFL. Federated Dropout has emerged as a popular strategy to address this challenge, wherein… ▽ More

    Submitted 20 January, 2025; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: AAAI 2025 Oral

    MSC Class: 68U35 ACM Class: C.2.4; I.2.11

    Journal ref: AAAI 2025