Skip to main content

Showing 1–50 of 235 results for author: Ling, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.23235  [pdf, ps, other

    cs.CL

    Generalist Reward Models: Found Inside Large Language Models

    Authors: Yi-Chen Li, Tian Xu, Yang Yu, Xuqin Zhang, Xiong-Hui Chen, Zhongxiang Ling, Ningjing Chao, Lei Yuan, Zhi-Hua Zhou

    Abstract: The alignment of Large Language Models (LLMs) is critically dependent on reward models trained on costly human preference data. While recent work explores bypassing this cost with AI feedback, these methods often lack a rigorous theoretical foundation. In this paper, we discover that a powerful generalist reward model is already latently present within any LLM trained via standard next-token predi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  2. arXiv:2506.18656  [pdf, ps, other

    stat.ML cs.LG math.ST

    A Random Matrix Analysis of In-context Memorization for Nonlinear Attention

    Authors: Zhenyu Liao, Jiaqing Liu, TianQi Hou, Difan Zou, Zenan Ling

    Abstract: Attention mechanisms have revolutionized machine learning (ML) by enabling efficient modeling of global dependencies across inputs. Their inherently parallelizable structures allow for efficient scaling with the exponentially increasing size of both pretrained data and model parameters. Yet, despite their central role as the computational backbone of modern large language models (LLMs), the theore… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 40 pages, 7 pages

  3. arXiv:2506.10446  [pdf, ps, other

    cs.CL

    Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty

    Authors: Zehui Ling, Deshu Chen, Hongwei Zhang, Yifeng Jiao, Xin Guo, Yuan Cheng

    Abstract: Large language models (LLMs) have demonstrated significant advancements in reasoning capabilities, performing well on various challenging benchmarks. Techniques like Chain-of-Thought prompting have been introduced to further improve reasoning. However, these approaches frequently generate longer outputs, which in turn increase computational latency. Although some methods use reinforcement learning… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  4. arXiv:2506.04810  [pdf, ps, other

    cs.CL cs.AI cs.LO

    Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study

    Authors: Yujun Zhou, Jiayi Ye, Zipeng Ling, Yufei Han, Yue Huang, Haomin Zhuang, Zhenwen Liang, Kehan Guo, Taicheng Guo, Xiangqi Wang, Xiangliang Zhang

    Abstract: Logical reasoning is a core capability for many applications of large language models (LLMs), yet existing benchmarks often rely solely on final-answer accuracy, failing to capture the quality and structure of the reasoning process. We propose FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions: overall benchmark accuracy, stepwise soundness, and… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  5. arXiv:2506.01972  [pdf, ps, other

    cs.DC cs.IT eess.SP

    Distributionally Robust Optimization for Aerial Multi-access Edge Computing via Cooperation of UAVs and HAPs

    Authors: Ziye Jia, Can Cui, Chao Dong, Qihui Wu, Zhuang Ling, Dusit Niyato, Zhu Han

    Abstract: With an extensive increment of computation demands, the aerial multi-access edge computing (MEC), mainly based on unmanned aerial vehicles (UAVs) and high altitude platforms (HAPs), plays significant roles in future network scenarios. In detail, UAVs can be flexibly deployed, while HAPs are characterized with large capacity and stability. Hence, in this paper, we provide a hierarchical model compo… ▽ More

    Submitted 15 May, 2025; originally announced June 2025.

  6. arXiv:2506.01455  [pdf, ps, other

    cs.SD eess.AS

    Universal Preference-Score-based Pairwise Speech Quality Assessment

    Authors: Yu-Fei Shi, Yang Ai, Zhen-Hua Ling

    Abstract: To compare the performance of two speech generation systems, one of the most effective approaches is estimating the preference score between their generated speech. This paper proposes a novel universal preference-score-based pairwise speech quality assessment (UPPSQA) model, aimed at predicting the preference score between paired speech samples to determine which one has better quality. The model… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  7. arXiv:2505.23379  [pdf, ps, other

    eess.AS cs.SD

    Vision-Integrated High-Quality Neural Speech Coding

    Authors: Yao Guo, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, Zhen-Hua Ling

    Abstract: This paper proposes a novel vision-integrated neural speech codec (VNSC), which aims to enhance speech coding quality by leveraging visual modality information. In VNSC, the image analysis-synthesis module extracts visual features from lip images, while the feature fusion module facilitates interaction between the image analysis-synthesis module and the speech coding module, transmitting visual in… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted by interspeech2025

  8. arXiv:2505.21940  [pdf, ps, other

    cs.CL

    RISE: Reasoning Enhancement via Iterative Self-Exploration in Multi-hop Question Answering

    Authors: Bolei He, Xinran He, Mengke Chen, Xianwei Xue, Ying Zhu, Zhenhua Ling

    Abstract: Large Language Models (LLMs) excel in many areas but continue to face challenges with complex reasoning tasks, such as Multi-Hop Question Answering (MHQA). MHQA requires integrating evidence from diverse sources while managing intricate logical dependencies, often leads to errors in reasoning. Retrieval-Augmented Generation (RAG), widely employed in MHQA tasks, faces challenges in effectively filt… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: ACL 2025 Findings

  9. arXiv:2505.19626  [pdf, ps, other

    cs.SD eess.AS

    Decoding Speaker-Normalized Pitch from EEG for Mandarin Perception

    Authors: Jiaxin Chen, Yiming Wang, Ziyu Zhang, Jiayang Han, Yin-Long Liu, Rui Feng, Xiuyuan Liang, Zhen-Hua Ling, Jiahong Yuan

    Abstract: The same speech content produced by different speakers exhibits significant differences in pitch contour, yet listeners' semantic perception remains unaffected. This phenomenon may stem from the brain's perception of pitch contours being independent of individual speakers' pitch ranges. In this work, we recorded electroencephalogram (EEG) while participants listened to Mandarin monosyllables with… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  10. arXiv:2505.13830  [pdf, ps, other

    eess.AS cs.SD

    Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising

    Authors: Ye-Xin Lu, Hui-Peng Du, Fei Liu, Yang Ai, Zhen-Hua Ling

    Abstract: Large language model (LLM) based zero-shot text-to-speech (TTS) methods tend to preserve the acoustic environment of the audio prompt, leading to degradation in synthesized speech quality when the audio prompt contains noise. In this paper, we propose a novel neural codec-based speech denoiser and integrate it with the advanced LLM-based TTS model, LauraTTS, to achieve noise-robust zero-shot TTS.… ▽ More

    Submitted 22 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  11. arXiv:2505.10599  [pdf, ps, other

    cs.LG cs.AI cs.CL

    UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech

    Authors: Jiaxuan Liu, Zhenhua Ling

    Abstract: Recent neural codec language models have made great progress in the field of text-to-speech (TTS), but controllable emotional TTS still faces many challenges. Traditional methods rely on predefined discrete emotion labels to control emotion categories and intensities, which can't capture the complexity and continuity of human emotional perception and expression. The lack of large-scale emotional s… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Under review

  12. arXiv:2505.09661  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Introducing voice timbre attribute detection

    Authors: Jinghao He, Zhengyan Sheng, Liping Chen, Kong Aik Lee, Zhen-Hua Ling

    Abstract: This paper focuses on explaining the timbre conveyed by speech signals and introduces a task termed voice timbre attribute detection (vTAD). In this task, voice timbre is explained with a set of sensory attributes describing its human perception. A pair of speech utterances is processed, and their intensity is compared in a designated timbre descriptor. Moreover, a framework is proposed, which is… ▽ More

    Submitted 22 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2505.09382

  13. arXiv:2505.09382  [pdf, ps, other

    cs.SD cs.AI eess.AS

    The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan

    Authors: Zhengyan Sheng, Jinghao He, Liping Chen, Kong Aik Lee, Zhen-Hua Ling

    Abstract: Voice timbre refers to the unique quality or character of a person's voice that distinguishes it from others as perceived by human hearing. The Voice Timbre Attribute Detection (VtaD) 2025 challenge focuses on explaining the voice timbre attribute in a comparative manner. In this challenge, the human impression of voice timbre is verbalized with a set of sensory descriptors, including bright, coar… ▽ More

    Submitted 22 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  14. arXiv:2505.06557  [pdf, ps, other

    cs.CV

    Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining

    Authors: Lu Dong, Haiyu Zhang, Hongjie Zhang, Yifei Huang, Zhen-Hua Ling, Yu Qiao, Limin Wang, Yali Wang

    Abstract: The task of weakly supervised temporal sentence grounding (WSTSG) aims to detect temporal intervals corresponding to a language description from untrimmed videos with only video-level video-language correspondence. For an anchor sample, most existing approaches generate negative samples either from other videos or within the same video for contrastive learning. However, some training samples are h… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: TCSVT 2025, doi at https://ieeexplore.ieee.org/document/10970001

  15. arXiv:2504.19486  [pdf, other

    cs.CR

    The Cost of Performance: Breaking ThreadX with Kernel Object Masquerading Attacks

    Authors: Xinhui Shao, Zhen Ling, Yue Zhang, Huaiyu Yan, Yumeng Wei, Lan Luo, Zixia Liu, Junzhou Luo, Xinwen Fu

    Abstract: Microcontroller-based IoT devices often use embedded real-time operating systems (RTOSs). Vulnerabilities in these embedded RTOSs can lead to compromises of those IoT devices. Despite the significance of security protections, the absence of standardized security guidelines results in various levels of security risk across RTOS implementations. Our initial analysis reveals that popular RTOSs such a… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  16. arXiv:2504.16116  [pdf, other

    cs.CR cs.AI

    DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain

    Authors: Enhao Huang, Pengyu Sun, Zixin Lin, Alex Chen, Joey Ouyang, Hobert Wang, Dong Dong, Gang Zhao, James Yi, Frank Li, Ziang Ling, Lowes Yang

    Abstract: Large Language Models (LLMs) have achieved impressive performance in diverse natural language processing tasks, but specialized domains such as Web3 present new challenges and require more tailored evaluation. Despite the significant user base and capital flows in Web3, encompassing smart contracts, decentralized finance (DeFi), non-fungible tokens (NFTs), decentralized autonomous organizations (D… ▽ More

    Submitted 16 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  17. arXiv:2504.10074  [pdf, other

    cs.AI

    MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework

    Authors: Zihan Ling, Zhiyao Guo, Yixuan Huang, Yi An, Shuai Xiao, Jinsong Lan, Xiaoyong Zhu, Bo Zheng

    Abstract: Recent advancements in large language models (LLMs) and multi-modal LLMs have been remarkable. However, these models still rely solely on their parametric knowledge, which limits their ability to generate up-to-date information and increases the risk of producing erroneous content. Retrieval-Augmented Generation (RAG) partially mitigates these challenges by incorporating external data sources, yet… ▽ More

    Submitted 20 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  18. arXiv:2504.06561  [pdf, other

    cs.SD

    A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication

    Authors: Xiao-Hang Jiang, Yang Ai, Rui-Chen Zheng, Zhen-Hua Ling

    Abstract: This paper proposes StreamCodec, a streamable neural audio codec designed for real-time communication. StreamCodec adopts a fully causal, symmetric encoder-decoder structure and operates in the modified discrete cosine transform (MDCT) domain, aiming for low-latency inference and real-time efficient generation. To improve codebook utilization efficiency and compensate for the audio quality loss ca… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE Signal Processing Letters

  19. arXiv:2504.04237  [pdf, other

    cs.IR

    Short Video Segment-level User Dynamic Interests Modeling in Personalized Recommendation

    Authors: Zhiyu He, Zhixin Ling, Jiayu Li, Zhiqiang Guo, Weizhi Ma, Xinchen Luo, Min Zhang, Guorui Zhou

    Abstract: The rapid growth of short videos has necessitated effective recommender systems to match users with content tailored to their evolving preferences. Current video recommendation models primarily treat each video as a whole, overlooking the dynamic nature of user preferences with specific video segments. In contrast, our research focuses on segment-level user interest modeling, which is crucial for… ▽ More

    Submitted 5 May, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

    Comments: This paper has been accepted by SIGIR 2025

  20. arXiv:2503.22180  [pdf, other

    cs.CV

    Knowledge Rectification for Camouflaged Object Detection: Unlocking Insights from Low-Quality Data

    Authors: Juwei Guan, Xiaolin Fang, Donghyun Kim, Haotian Gong, Tongxin Zhu, Zhen Ling, Ming Yang

    Abstract: Low-quality data often suffer from insufficient image details, introducing an extra implicit aspect of camouflage that complicates camouflaged object detection (COD). Existing COD methods focus primarily on high-quality data, overlooking the challenges posed by low-quality data, which leads to significant performance degradation. Therefore, we propose KRNet, the first framework explicitly designed… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  21. arXiv:2503.14231  [pdf, other

    cs.CV cs.LG

    Multi-task Learning for Identification of Porcelain in Song and Yuan Dynasties

    Authors: Ziyao Ling, Giovanni Delnevo, Paola Salomoni, Silvia Mirri

    Abstract: Chinese porcelain holds immense historical and cultural value, making its accurate classification essential for archaeological research and cultural heritage preservation. Traditional classification methods rely heavily on expert analysis, which is time-consuming, subjective, and difficult to scale. This paper explores the application of DL and transfer learning techniques to automate the classifi… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  22. arXiv:2503.09499  [pdf, other

    cs.CV cs.AI cs.CL

    MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?

    Authors: Zhe Xu, Daoyuan Chen, Zhenqing Ling, Yaliang Li, Ying Shen

    Abstract: Large foundation models face challenges in acquiring transferable, structured thinking abilities, especially when supervised with rigid templates or crowd-annotated instruction datasets. Unlike prior approaches, we focus on a thinking-centric data synthesis paradigm that enables models to evolve through self-generated, cognitively guided data. We propose MindGYM, a structured and scalable framewor… ▽ More

    Submitted 22 May, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: 22 pages, 7 tables

  23. arXiv:2503.00325  [pdf, other

    cs.CV

    CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging

    Authors: Zhiwei Ling, Yachen Chang, Hailiang Zhao, Xinkui Zhao, Kingsum Chow, Shuiguang Deng

    Abstract: Deep neural networks (DNNs) have been widely criticized for their overconfidence when dealing with out-of-distribution (OOD) samples, highlighting the critical need for effective OOD detection to ensure the safe deployment of DNNs in real-world settings. Existing post-hoc OOD detection methods primarily enhance the discriminative power of logit-based approaches by reshaping sample features, yet th… ▽ More

    Submitted 13 March, 2025; v1 submitted 28 February, 2025; originally announced March 2025.

    Comments: This paper has been accepted by CVPR 2025

  24. arXiv:2503.00035  [pdf, other

    cs.CL cs.AI cs.LG

    Constraining Sequential Model Editing with Editing Anchor Compression

    Authors: Hao-Xiang Xu, Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu

    Abstract: Large language models (LLMs) struggle with hallucinations due to false or outdated knowledge. Given the high resource demands of retraining these models, there is an increasing focus on developing model editing. However, the general abilities of LLMs across downstream tasks are prone to significant degradation during sequential editing. This paper statistically observes that the parameter matrix a… ▽ More

    Submitted 24 February, 2025; originally announced March 2025.

  25. arXiv:2502.12514  [pdf

    cs.RO

    Memory-updated-based Framework for 100% Reliable Flexible Flat Cables Insertion

    Authors: Zhengrong Ling, Xiong Yang, Dong Guo, Hongyuan Chang, Tieshan Zhang, Ruijia Zhang, Yajing Shen

    Abstract: Automatic assembly lines have increasingly replaced human labor in various tasks; however, the automation of Flexible Flat Cable (FFC) insertion remains unrealized due to its high requirement for effective feedback and dynamic operation, limiting approximately 11% of global industrial capacity. Despite lots of approaches, like vision-based tactile sensors and reinforcement learning, having been pr… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  26. arXiv:2502.11094  [pdf, other

    cs.SD cs.AI

    SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer

    Authors: Zhengyan Sheng, Zhihao Du, Shiliang Zhang, Zhijie Yan, Yexin Yang, Zhenhua Ling

    Abstract: This paper presents a dual-stream text-to-speech (TTS) model, SyncSpeech, capable of receiving streaming text input from upstream models while simultaneously generating streaming speech, facilitating seamless interaction with large language models. SyncSpeech has the following advantages: Low latency, as it begins generating streaming speech upon receiving the second text token; High efficiency, a… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  27. arXiv:2502.09933  [pdf, other

    cs.AI cs.CL cs.LG

    MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?

    Authors: Kai Yan, Zhan Ling, Kang Liu, Yifan Yang, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen

    Abstract: The ability to recognize patterns from examples and apply them to new ones is a primal ability for general intelligence, and is widely studied by psychology and AI researchers. Many benchmarks have been proposed to measure such ability for Large Language Models (LLMs); however, they focus on few-shot (usually <10) setting and lack evaluation for aggregating many pieces of information from long con… ▽ More

    Submitted 16 May, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: 36 pages, 11 figures. The last version adds more experiments and modifies name for better summary of the work

  28. arXiv:2502.05766  [pdf, other

    eess.AS cs.SD

    Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models

    Authors: Jing-Xuan Zhang, Genshun Wan, Jianqing Gao, Zhen-Hua Ling

    Abstract: Audio-visual representation learning is crucial for advancing multimodal speech processing tasks, such as lipreading and audio-visual speech recognition. Recently, speech foundation models (SFMs) have shown remarkable generalization capabilities across various speech-related tasks. Building on this progress, we propose an audio-visual representation learning model that leverages cross-modal knowle… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

    Comments: accepted to Pattern Recognition

  29. arXiv:2502.04380  [pdf, other

    cs.CL cs.AI cs.LG

    Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data

    Authors: Zhenqing Ling, Daoyuan Chen, Liuyi Yao, Qianli Shen, Yaliang Li, Ying Shen

    Abstract: Fine-tuning large language models (LLMs) using diverse datasets is crucial for enhancing their overall performance across various domains. In practical scenarios, existing methods based on modeling the mixture proportions of data composition often struggle with data whose domain labels are missing, imprecise or non-normalized, while methods based on data selection usually encounter difficulties in… ▽ More

    Submitted 22 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: 33 pages, 20 figures, 21 tables

  30. arXiv:2501.16784  [pdf, other

    cs.CR

    TORCHLIGHT: Shedding LIGHT on Real-World Attacks on Cloudless IoT Devices Concealed within the Tor Network

    Authors: Yumingzhi Pan, Zhen Ling, Yue Zhang, Hongze Wang, Guangchi Liu, Junzhou Luo, Xinwen Fu

    Abstract: The rapidly expanding Internet of Things (IoT) landscape is shifting toward cloudless architectures, removing reliance on centralized cloud services but exposing devices directly to the internet and increasing their vulnerability to cyberattacks. Our research revealed an unexpected pattern of substantial Tor network traffic targeting cloudless IoT devices. suggesting that attackers are using Tor t… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: 27 pages, 14 figure, 9 tables

  31. arXiv:2501.15089  [pdf, other

    cs.CL

    LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

    Authors: Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen

    Abstract: Large language models (LLMs) have demonstrated remarkable progress in understanding long-context inputs. However, benchmarks for evaluating the long-context reasoning abilities of LLMs fall behind the pace. Existing benchmarks often focus on a narrow range of tasks or those that do not demand complex reasoning. To address this gap and enable a more comprehensive evaluation of the long-context reas… ▽ More

    Submitted 28 February, 2025; v1 submitted 25 January, 2025; originally announced January 2025.

  32. arXiv:2501.15005  [pdf, other

    cs.LG

    Towards Distributed Backdoor Attacks with Network Detection in Decentralized Federated Learning

    Authors: Bohan Liu, Yang Xiao, Ruimeng Ye, Zinan Ling, Xiaolong Ma, Bo Hui

    Abstract: Distributed backdoor attacks (DBA) have shown a higher attack success rate than centralized attacks in centralized federated learning (FL). However, it has not been investigated in the decentralized FL. In this paper, we experimentally demonstrate that, while directly applying DBA to decentralized FL, the attack success rate depends on the distribution of attackers in the network architecture. Con… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  33. arXiv:2501.13726  [pdf, other

    cs.CL

    RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation

    Authors: Shi-Qi Yan, Zhen-Hua Ling

    Abstract: While Retrieval-Augmented Generation (RAG) has exhibited promise in utilizing external knowledge, its generation process heavily depends on the quality and accuracy of the retrieved context. Large language models (LLMs) struggle to evaluate the correctness of non-parametric knowledge retrieved externally when it differs from internal memorization, leading to knowledge conflicts during response gen… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

  34. arXiv:2501.10711  [pdf, other

    cs.SE cs.AI cs.CL

    How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs

    Authors: Jialun Cao, Yuk-Kit Chan, Zixuan Ling, Wenxuan Wang, Shuqing Li, Mingwei Liu, Ruixi Qiao, Yuting Han, Chaozheng Wang, Boxi Yu, Pinjia He, Shuai Wang, Zibin Zheng, Michael R. Lyu, Shing-Chi Cheung

    Abstract: Various benchmarks have been proposed to assess the performance of large language models (LLMs) in different coding scenarios. We refer to them as code-related benchmarks. However, there are no systematic guidelines by which such a benchmark should be developed to ensure its quality, reliability, and reproducibility. We propose How2Bench, which is comprised of a 55-criteria checklist as a set of g… ▽ More

    Submitted 17 February, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

    Comments: 42 pages

  35. arXiv:2501.06394  [pdf, other

    cs.SD cs.AI eess.AS

    Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation

    Authors: Zhengyan Sheng, Zhihao Du, Heng Lu, Shiliang Zhang, Zhen-Hua Ling

    Abstract: Recent advancements in personalized speech generation have brought synthetic speech increasingly close to the realism of target speakers' recordings, yet multimodal speaker generation remains on the rise. This paper introduces UniSpeaker, a unified approach for multimodality-driven speaker generation. Specifically, we propose a unified voice aggregator based on KV-Former, applying soft contrastive… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

  36. arXiv:2501.03279  [pdf, other

    cs.CR cs.AI cs.LG

    Revolutionizing Encrypted Traffic Classification with MH-Net: A Multi-View Heterogeneous Graph Model

    Authors: Haozhen Zhang, Haodong Yue, Xi Xiao, Le Yu, Qing Li, Zhen Ling, Ye Zhang

    Abstract: With the growing significance of network security, the classification of encrypted traffic has emerged as an urgent challenge. Traditional byte-based traffic analysis methods are constrained by the rigid granularity of information and fail to fully exploit the diverse correlations between bytes. To address these limitations, this paper introduces MH-Net, a novel approach for classifying network tr… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

    Comments: Accepted by AAAI 2025. The code is available at https://github.com/ViktorAxelsen/MH-Net. arXiv admin note: text overlap with arXiv:2402.07501

  37. arXiv:2412.20379  [pdf, other

    cs.DC

    NeutronTP: Load-Balanced Distributed Full-Graph GNN Training with Tensor Parallelism

    Authors: Xin Ai, Hao Yuan, Zeyu Ling, Qiange Wang, Yanfeng Zhang, Zhenbo Fu, Chaoyi Chen, Yu Gu, Ge Yu

    Abstract: Graph neural networks (GNNs) have emerged as a promising direction. Training large-scale graphs that relies on distributed computing power poses new challenges. Existing distributed GNN systems leverage data parallelism by partitioning the input graph and distributing it to multiple workers. However, due to the irregular nature of the graph structure, existing distributed approaches suffer from un… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

    Comments: 14 pages 16 figures, VLDB2025

  38. arXiv:2412.19507  [pdf, other

    cs.AI

    Hybrid Local Causal Discovery

    Authors: Zhaolong Ling, Honghui Peng, Yiwen Zhang, Debo Cheng, Xingyu Wu, Peng Zhou, Kui Yu

    Abstract: Local causal discovery aims to learn and distinguish the direct causes and effects of a target variable from observed data. Existing constraint-based local causal discovery methods use AND or OR rules in constructing the local causal skeleton, but using either rule alone is prone to produce cascading errors in the learned local causal skeleton, and thus impacting the inference of local causal rela… ▽ More

    Submitted 12 May, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

    Comments: This paper has been accepted for publication in the Proceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI 2025)

  39. arXiv:2412.09195  [pdf, other

    cs.SD cs.LG eess.AS

    On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection

    Authors: Chenyang Guo, Liping Chen, Zhuhai Li, Kong Aik Lee, Zhen-Hua Ling, Wu Guo

    Abstract: Neural networks are commonly known to be vulnerable to adversarial attacks mounted through subtle perturbation on the input data. Recent development in voice-privacy protection has shown the positive use cases of the same technique to conceal speaker's voice attribute with additive perturbation signal generated by an adversarial network. This paper examines the reversibility property where an enti… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: 6 pages, 3 figures, published to IEEE SLT Workshop 2024

    Journal ref: 2024 IEEE Spoken Language Technology Workshop (SLT), 2024, pp. 1197-1202

  40. arXiv:2412.06259  [pdf, other

    eess.AS cs.SD

    Leveraging Prompt Learning and Pause Encoding for Alzheimer's Disease Detection

    Authors: Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling

    Abstract: Compared to other clinical screening techniques, speech-and-language-based automated Alzheimer's disease (AD) detection methods are characterized by their non-invasiveness, cost-effectiveness, and convenience. Previous studies have demonstrated the efficacy of fine-tuning pre-trained language models (PLMs) for AD detection. However, the objective of this traditional fine-tuning method, which invol… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Accepted by ISCSLP 2024

  41. arXiv:2412.03388  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles

    Authors: Jiaxuan Liu, Zhaoci Liu, Yajun Hu, Yingying Gao, Shilei Zhang, Zhenhua Ling

    Abstract: Human speech exhibits rich and flexible prosodic variations. To address the one-to-many mapping problem from text to prosody in a reasonable and flexible manner, we propose DiffStyleTTS, a multi-speaker acoustic model based on a conditional diffusion module and an improved classifier-free guidance, which hierarchically models speech prosodic features, and controls different prosodic styles to guid… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: COLING 2025

  42. arXiv:2411.17335  [pdf, other

    cs.CV

    VersatileMotion: A Unified Framework for Motion Synthesis and Comprehension

    Authors: Zeyu Ling, Bo Han, Shiyang Li, Jikang Cheng, Hongdeng Shen, Changqing Zou

    Abstract: Large language models (LLMs) are, by design, inherently capable of multi-task learning: through a unified next-token prediction paradigm, they can naturally address a wide variety of downstream tasks. Prior work in the motion domain has demonstrated some generality by adapting LLMs via a Motion Tokenizer coupled with an autoregressive Transformer to generate and understand human motion. However, t… ▽ More

    Submitted 26 May, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

  43. arXiv:2411.11581  [pdf, other

    cs.CL

    OASIS: Open Agent Social Interaction Simulations with One Million Agents

    Authors: Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Chaochao Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao

    Abstract: There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a result, several LLM-based ABMs have been proposed in the past year. While they hold promise, each simulator is specifically designed to study a parti… ▽ More

    Submitted 23 March, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  44. arXiv:2411.11258  [pdf, other

    cs.SD eess.AS

    ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram

    Authors: Xiao-Hang Jiang, Hui-Peng Du, Yang Ai, Ye-Xin Lu, Zhen-Hua Ling

    Abstract: This paper proposes ESTVocoder, a novel excitation-spectral-transformed neural vocoder within the framework of source-filter theory. The ESTVocoder transforms the amplitude and phase spectra of the excitation into the corresponding speech amplitude and phase spectra using a neural filter whose backbone is ConvNeXt v2 blocks. Finally, the speech waveform is reconstructed through the inverse short-t… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

    Comments: Accepted by NCMMSC2024

  45. arXiv:2411.11232  [pdf, other

    cs.SD eess.AS

    SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features

    Authors: Yu-Fei Shi, Yang Ai, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling

    Abstract: Assessing the naturalness of speech using mean opinion score (MOS) prediction models has positive implications for the automatic evaluation of speech synthesis systems. Early MOS prediction models took the raw waveform or amplitude spectrum of speech as input, whereas more advanced methods employed self-supervised-learning (SSL) based models to extract semantic representations from speech for MOS… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  46. arXiv:2411.11123  [pdf, other

    cs.SD eess.AS

    Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion

    Authors: Yu-Fei Shi, Yang Ai, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling

    Abstract: We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the official baseline. In this paper, we further improve our submission and propose a novel Pitch-and-Spectrum-aware Singing Quality Assessment (PS-SQA) method. The PS-SQA is designed based… ▽ More

    Submitted 23 December, 2024; v1 submitted 17 November, 2024; originally announced November 2024.

  47. arXiv:2411.07381  [pdf, other

    cs.CL

    MaLei at the PLABA Track of TREC 2024: RoBERTa for Term Replacement -- LLaMA3.1 and GPT-4o for Complete Abstract Adaptation

    Authors: Zhidong Ling, Zihao Li, Pablo Romero, Lifeng Han, Goran Nenadic

    Abstract: This report is the system description of the MaLei team (Manchester and Leiden) for the shared task Plain Language Adaptation of Biomedical Abstracts (PLABA) 2024 (we had an earlier name BeeManc following last year), affiliated with TREC2024 (33rd Text REtrieval Conference https://ir.nist.gov/evalbase/conf/trec-2024). This report contains two sections corresponding to the two sub-tasks in PLABA-20… ▽ More

    Submitted 17 February, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

    Comments: ongoing work - system report for PLABA2024 with TREC-2024

  48. arXiv:2411.04491  [pdf, other

    cs.LG cs.AI

    Series-to-Series Diffusion Bridge Model

    Authors: Hao Yang, Zhanbo Feng, Feng Zhou, Robert C Qiu, Zenan Ling

    Abstract: Diffusion models have risen to prominence in time series forecasting, showcasing their robust capability to model complex data distributions. However, their effectiveness in deterministic predictions is often constrained by instability arising from their inherent stochasticity. In this paper, we revisit time series diffusion models and present a comprehensive framework that encompasses most existi… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

  49. arXiv:2411.04410  [pdf, other

    cs.AR

    The Survey of Chiplet-based Integrated Architecture: An EDA perspective

    Authors: Shixin Chen, Hengyuan Zhang, Zichao Ling, Jianwang Zhai, Bei Yu

    Abstract: Enhancing performance while reducing costs is the fundamental design philosophy of integrated circuits (ICs). With advancements in packaging technology, interposer-based chiplet architecture has emerged as a promising solution. Chiplet integration, often referred to as 2.5D IC, offers significant benefits, including cost-effectiveness, reusability, and improved performance. However, realizing thes… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: 9 pages, ASPDAC2025

  50. arXiv:2411.00850  [pdf, ps, other

    cs.LG cs.AI cs.CL

    GWQ: Gradient-Aware Weight Quantization for Large Language Models

    Authors: Yihua Shao, Yan Gu, Siyu Chen, Haiyang Liu, Zixian Zhu, Zijian Ling, Minxi Yan, Ziyang Yan, Chenyu Zhang, Michele Magno, Haotong Qin, Yan Wang, Jingcai Guo, Ling Shao, Hao Tang

    Abstract: Large language models (LLMs) show impressive performance in solving complex language tasks. However, its large number of parameters presents significant challenges for the deployment. So, compressing LLMs to low bits can enable to deploy on resource-constrained devices. To address this problem, we propose gradient-aware weight quantization (GWQ), the first quantization approach for low-bit weight… ▽ More

    Submitted 29 May, 2025; v1 submitted 30 October, 2024; originally announced November 2024.