Skip to main content

Showing 1–50 of 164 results for author: Ling, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.16821  [pdf, other

    cs.CL

    Perturbation-Restrained Sequential Model Editing

    Authors: Jun-Yu Ma, Hong Wang, Hao-Xiang Xu, Zhen-Hua Ling, Jia-Chen Gu

    Abstract: Model editing is an emerging field that focuses on updating the knowledge embedded within large language models (LLMs) without extensive retraining. However, current model editing methods significantly compromise the general abilities of LLMs as the number of edits increases, and this trade-off poses a substantial challenge to the continual learning of LLMs. In this paper, we first theoretically a… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.11541  [pdf, other

    cs.IT eess.SP

    R-NeRF: Neural Radiance Fields for Modeling RIS-enabled Wireless Environments

    Authors: Huiying Yang, Zihan Jin, Chenhao Wu, Rujing Xiong, Robert Caiming Qiu, Zenan Ling

    Abstract: Recently, ray tracing has gained renewed interest with the advent of Reflective Intelligent Surfaces (RIS) technology, a key enabler of 6G wireless communications due to its capability of intelligent manipulation of electromagnetic waves. However, accurately modeling RIS-enabled wireless environments poses significant challenges due to the complex variations caused by various environmental factors… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  3. arXiv:2404.12886  [pdf, other

    cs.CV cs.LG

    MCM: Multi-condition Motion Synthesis Framework

    Authors: Zeyu Ling, Bo Han, Yongkang Wongkan, Han Lin, Mohan Kankanhalli, Weidong Geng

    Abstract: Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framewor… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  4. arXiv:2404.08857  [pdf, other

    cs.SD cs.AI eess.AS

    Voice Attribute Editing with Text Prompt

    Authors: Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

    Abstract: Despite recent advancements in speech generation with text prompt providing control over speech style, voice attributes in synthesized speech remain elusive and challenging to control. This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt. To solve this t… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  5. arXiv:2403.17378  [pdf, other

    cs.SD eess.AS

    Low-Latency Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks

    Authors: Yang Ai, Zhen-Hua Ling

    Abstract: This paper presents a novel neural speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is a core module for direct wrapped phase prediction. This architecture consists of two parallel linear convolutional la… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing. arXiv admin note: substantial text overlap with arXiv:2211.15974

  6. arXiv:2403.11183  [pdf, other

    cs.CL

    Decoding Continuous Character-based Language from Non-invasive Brain Recordings

    Authors: Cenyuan Zhang, Xiaoqing Zheng, Ruicheng Yin, Shujie Geng, Jianhan Xu, Xuan Gao, Changze Lv, Zixuan Ling, Xuanjing Huang, Miao Cao, Jianfeng Feng

    Abstract: Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to… ▽ More

    Submitted 19 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  7. arXiv:2403.10146  [pdf, other

    cs.SD cs.IR eess.AS

    Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

    Authors: Qian Wang, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and vice versa (T2A), has recently attracted much research attention. Existing methods typically aggregate information from each modality into a single vector for matching, but this sacrifices local details and can hardly capture intricate relationships within and between modalities. Furthermore, current ATR d… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 5 pages, accepted to ICASSP2024

  8. arXiv:2403.09718  [pdf

    cs.CL cs.AI

    Comprehensive Implementation of TextCNN for Enhanced Collaboration between Natural Language Processing and System Recommendation

    Authors: Xiaonan Xu, Zheng Xu, Zhipeng Ling, Zhengyu Jin, ShuQian Du

    Abstract: Natural Language Processing (NLP) is an important branch of artificial intelligence that studies how to enable computers to understand, process, and generate human language. Text classification is a fundamental task in NLP, which aims to classify text into different predefined categories. Text classification is the most basic and classic task in natural language processing, and most of the tasks i… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  9. arXiv:2402.15179  [pdf, other

    cs.LG cs.CL

    Advancing Parameter Efficiency in Fine-tuning via Representation Editing

    Authors: Muling Wu, Wenhao Liu, Xiaohua Wang, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Parameter Efficient Fine-Tuning (PEFT) has gained significant attention for its ability to achieve competitive results while updating only a small subset of trainable parameters. Despite the promising performance of current PEFT methods, they present challenges in hyperparameter selection, such as determining the rank of LoRA or Adapter, or specifying the length of soft prompts. In addressing thes… ▽ More

    Submitted 28 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  10. arXiv:2402.10533  [pdf, other

    cs.SD eess.AS

    APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding

    Authors: Yang Ai, Xiao-Hang Jiang, Ye-Xin Lu, Hui-Peng Du, Zhen-Hua Ling

    Abstract: This paper introduces a novel neural audio codec targeting high waveform sampling rates and low bitrates named APCodec, which seamlessly integrates the strengths of parametric codecs and waveform codecs. The APCodec revolutionizes the process of audio encoding and decoding by concurrently handling the amplitude and phase spectra as audio parametric characteristics like parametric codecs. It is com… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  11. arXiv:2402.07501  [pdf, other

    cs.LG cs.AI

    One Train for Two Tasks: An Encrypted Traffic Classification Framework Using Supervised Contrastive Learning

    Authors: Haozhen Zhang, Xi Xiao, Le Yu, Qing Li, Zhen Ling, Ye Zhang

    Abstract: As network security receives widespread attention, encrypted traffic classification has become the current research focus. However, existing methods conduct traffic classification without sufficiently considering the common characteristics between data samples, leading to suboptimal performance. Moreover, they train the packet-level and flow-level classification tasks independently, which is redun… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: The code is available at https://github.com/ViktorAxelsen/CLE-TFE

  12. arXiv:2402.05926  [pdf, other

    cs.LG cs.CL

    On the Convergence of Zeroth-Order Federated Tuning for Large Language Models

    Authors: Zhenqing Ling, Daoyuan Chen, Liuyi Yao, Yaliang Li, Ying Shen

    Abstract: The confluence of Federated Learning (FL) and Large Language Models (LLMs) is ushering in a new era in privacy-preserving natural language processing. However, the intensive memory requirements for fine-tuning LLMs pose significant challenges, especially when deploying on clients with limited computational resources. To circumvent this, we explore the novel integration of Memory-efficient Zeroth-O… ▽ More

    Submitted 20 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 19 pages, 10 figures

  13. arXiv:2402.02697  [pdf, ps, other

    cs.LG stat.ML

    Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

    Authors: Zenan Ling, Longbo Li, Zhanbo Feng, Yixuan Zhang, Feng Zhou, Robert C. Qiu, Zhenyu Liao

    Abstract: Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of… ▽ More

    Submitted 19 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024

  14. arXiv:2401.17623  [pdf, other

    cs.CL

    Neighboring Perturbations of Knowledge Editing on Large Language Models

    Authors: Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu

    Abstract: Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge. Given the resource-intensive nature of retraining LLMs, there has been a notable increase in the development of knowledge editing. However, current approaches and evaluations rarely explore the perturbation of editing on neighboring knowledge. This paper… ▽ More

    Submitted 26 May, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted by ICML 2024

  15. arXiv:2401.15884  [pdf, other

    cs.CL

    Corrective Retrieval Augmented Generation

    Authors: Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling

    Abstract: Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we… ▽ More

    Submitted 16 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  16. arXiv:2401.11857  [pdf, other

    eess.AS cs.SD

    Adversarial speech for voice privacy protection from Personalized Speech generation

    Authors: Shihao Chen, Liping Chen, Jie Zhang, KongAik Lee, Zhenhua Ling, Lirong Dai

    Abstract: The rapid progress in personalized speech generation technology, including personalized text-to-speech (TTS) and voice conversion (VC), poses a challenge in distinguishing between generated and real speech for human listeners, resulting in an urgent demand in protecting speakers' voices from malicious misuse. In this regard, we propose a speaker protection method based on adversarial attacks. The… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted by icassp 2024

  17. arXiv:2401.06387  [pdf, other

    eess.AS cs.SD eess.SP

    Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

    Authors: Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling

    Abstract: Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with parallel prediction of Amplitude and Phase spectra, named AP-BWE, which achieves both high-quality and efficient wideband speech waveform generation. The propose… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  18. arXiv:2401.04700  [pdf, other

    cs.CL

    Model Editing Can Hurt General Abilities of Large Language Models

    Authors: Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng

    Abstract: One critical challenge that has emerged is the presence of hallucinations in the output of large language models (LLMs) due to false or outdated knowledge. Since retraining LLMs with updated information is resource-intensive, there has been a growing interest in model editing. However, current model editing methods, while effective in improving editing performance in various scenarios, often overl… ▽ More

    Submitted 4 February, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Add new results on LLaMA-2 (7B)

  19. arXiv:2312.15997  [pdf, other

    cs.CL

    Aligning Large Language Models with Human Preferences through Representation Engineering

    Authors: Wenhao Liu, Xiaohua Wang, Muling Wu, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Aligning large language models (LLMs) with human preferences is crucial for enhancing their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness. Existing methods for achieving this alignment often involves employing reinforcement learning from human feedback (RLHF) to fine-tune LLMs based on human labels assessing the relative quality of model responses. Nevert… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  20. arXiv:2312.15946  [pdf, other

    cs.SD cs.GR eess.AS

    EnchantDance: Unveiling the Potential of Music-Driven Dance Movement

    Authors: Bo Han, Yi Ren, Hao Peng, Teng Zhang, Zeyu Ling, Xiang Yin, Feilin Han

    Abstract: The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  21. arXiv:2312.08749  [pdf, other

    cs.LG cs.CY

    Mitigating Label Bias in Machine Learning: Fairness through Confident Learning

    Authors: Yixuan Zhang, Boyu Li, Zenan Ling, Feng Zhou

    Abstract: Discrimination can occur when the underlying unbiased labels are overwritten by an agent with potential bias, resulting in biased datasets that unfairly harm specific groups and cause classifiers to inherit these biases. In this paper, we demonstrate that despite only having access to the biased labels, it is possible to eliminate bias by filtering the fairest instances within the framework of con… ▽ More

    Submitted 24 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

  22. arXiv:2312.04817  [pdf, other

    cs.CV

    MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding

    Authors: Hongjie Zhang, Yi Liu, Lu Dong, Yifei Huang, Zhen-Hua Ling, Yali Wang, Limin Wang, Yu Qiao

    Abstract: While several long-form VideoQA datasets have been introduced, the length of both videos used to curate questions and sub-clips of clues leveraged to answer those questions have not yet reached the criteria for genuine long-form video understanding. Moreover, their QAs are unduly narrow and modality-biased, lacking a wider view of understanding long-term video content with rich dynamics and comple… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  23. arXiv:2311.00694  [pdf, other

    cs.AI cs.CL

    Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving

    Authors: Zhan Ling, Yunhao Fang, Xuanlin Li, Tongzhou Mu, Mingu Lee, Reza Pourreza, Roland Memisevic, Hao Su

    Abstract: Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space.… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  24. arXiv:2310.16582  [pdf, other

    cs.CL

    Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons

    Authors: Tianlong Li, Shihan Dou, Changze Lv, Wenhao Liu, Jianhan Xu, Muling Wu, Zixuan Ling, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Personality plays a pivotal role in shaping human expression patterns, thus regulating the personality of large language models (LLMs) holds significant potential in enhancing the user experience of LLMs. Previous methods either relied on fine-tuning LLMs on specific corpora or necessitated manually crafted prompts to elicit specific personalities from LLMs. However, the former approach is ineffic… ▽ More

    Submitted 6 January, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Work in progress

  25. arXiv:2310.16301  [pdf, other

    cs.CL

    Is ChatGPT a Good Multi-Party Conversation Solver?

    Authors: Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Large Language Models (LLMs) have emerged as influential instruments within the realm of natural language processing; nevertheless, their capacity to handle multi-party conversations (MPCs) -- a scenario marked by the presence of multiple interlocutors involved in intricate information exchanges -- remains uncharted. In this paper, we delve into the potential of generative LLMs such as ChatGPT and… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by Findings of EMNLP 2023

  26. arXiv:2310.11595  [pdf, other

    cs.CV cs.AI

    WaveAttack: Asymmetric Frequency Obfuscation-based Backdoor Attacks Against Deep Neural Networks

    Authors: Jun Xia, Zhihao Yue, Yingbo Zhou, Zhiwei Ling, Xian Wei, Mingsong Chen

    Abstract: Due to the popularity of Artificial Intelligence (AI) technology, numerous backdoor attacks are designed by adversaries to mislead deep neural network predictions by manipulating training samples and training processes. Although backdoor attacks are effective in various real scenarios, they still suffer from the problems of both low fidelity of poisoned samples and non-negligible transfer in laten… ▽ More

    Submitted 19 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

  27. arXiv:2310.10379  [pdf, other

    cs.LG stat.ML

    Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

    Authors: Tianjun Ke, Haoqun Cao, Zenan Ling, Feng Zhou

    Abstract: Meta-learning has demonstrated promising results in few-shot classification (FSC) by learning to solve new problems using prior knowledge. Bayesian methods are effective at characterizing uncertainty in FSC, which is crucial in high-risk fields. In this context, the logistic-softmax likelihood is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classificat… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  28. arXiv:2310.10322  [pdf, other

    cs.CL

    Untying the Reversal Curse via Bidirectional Language Model Editing

    Authors: Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu

    Abstract: Recent studies have demonstrated that large language models (LLMs) store massive factual knowledge within their parameters. But existing LLMs are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in the concept of model editing. Despite the emergence of benchmarks and approaches, these unidirectio… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  29. arXiv:2310.04185  [pdf, other

    cs.NI

    Cross-Edge Orchestration of Serverless Functions with Probabilistic Caching

    Authors: Chen Chen, Manuel Herrera, Ge Zheng, Liqiao Xia, Zhengyang Ling, Jiangtao Wang

    Abstract: Serverless edge computing adopts an event-based paradigm that provides back-end services on an as-used basis, resulting in efficient resource utilization. To improve the end-to-end latency and revenue, service providers need to optimize the number and placement of serverless containers while considering the system cost incurred by the provisioning. The particular reason for this circumstance is th… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  30. arXiv:2309.10455  [pdf, other

    eess.AS cs.SD

    Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

    Authors: Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

    Abstract: Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement. This paper proposes the incorporation of ultrasound tongue images to improve the performance of lip-based AV-SE systems further. To address the challenge of acquiring ultrasound tongue images duri… ▽ More

    Submitted 20 November, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Submmited to IEEE/ACM Transactions on Audio, Speech and Language Processing. arXiv admin note: text overlap with arXiv:2305.14933

  31. arXiv:2309.09470  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

    Authors: Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling

    Abstract: This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker. To address this task, we propose a face-voice memory-based zero-shot FaceVC method. This method leverages a memo… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  32. arXiv:2309.03031  [pdf, other

    cs.CV

    MCM: Multi-condition Motion Synthesis Framework for Multi-scenario

    Authors: Zeyu Ling, Bo Han, Yongkang Wong, Mohan Kangkanhalli, Weidong Geng

    Abstract: The objective of the multi-condition human motion synthesis task is to incorporate diverse conditional inputs, encompassing various forms like text, music, speech, and more. This endows the task with the capability to adapt across multiple scenarios, ranging from text-to-motion and music-to-dance, among others. While existing research has primarily focused on single conditions, the multi-condition… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  33. arXiv:2308.16425  [pdf, other

    cs.LG stat.ML

    On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

    Authors: Zenan Ling, Zhenyu Liao, Robert C. Qiu

    Abstract: Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this,… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted by Workshop on High-dimensional Learning Dynamics, ICML 2023, Honolulu, Hawaii

  34. arXiv:2308.15854  [pdf, other

    cs.CV cs.AI

    Zero-shot Inversion Process for Image Attribute Editing with Diffusion Models

    Authors: Zhanbo Feng, Zenan Ling, Ci Gong, Feng Zhou, Jie Li, Robert C. Qiu

    Abstract: Denoising diffusion models have shown outstanding performance in image editing. Existing works tend to use either image-guided methods, which provide a visual reference but lack control over semantic coherence, or text-guided methods, which ensure faithfulness to text guidance but lack visual quality. To address the problem, we propose the Zero-shot Inversion Process (ZIP), a framework that inject… ▽ More

    Submitted 10 October, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  35. arXiv:2308.15122  [pdf, other

    cs.CL

    SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

    Authors: Changze Lv, Tianlong Li, Jianhan Xu, Chenxi Gu, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To… ▽ More

    Submitted 21 February, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  36. arXiv:2308.14726  [pdf, other

    cs.CV cs.AI

    PanoSwin: a Pano-style Swin Transformer for Panorama Understanding

    Authors: Zhixin Ling, Zhen Xing, Xiangdong Zhou, Manliang Cao, Guichun Zhou

    Abstract: In panorama understanding, the widely used equirectangular projection (ERP) entails boundary discontinuity and spatial distortion. It severely deteriorates the conventional CNNs and vision Transformers on panoramas. In this paper, we propose a simple yet effective architecture named PanoSwin to learn panorama representations with ERP. To deal with the challenges brought by equirectangular projecti… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: CVPR 2023

  37. arXiv:2308.08926  [pdf, other

    eess.AS cs.SD

    Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

    Authors: Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

    Abstract: Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech En… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Submmited to IEEE Transactions on Audio, Speech and Language Processing

  38. arXiv:2308.08850  [pdf, other

    cs.SD eess.AS

    Long-frame-shift Neural Speech Phase Prediction with Spectral Continuity Enhancement and Interpolation Error Compensation

    Authors: Yang Ai, Ye-Xin Lu, Zhen-Hua Ling

    Abstract: Speech phase prediction, which is a significant research focus in the field of signal processing, aims to recover speech phase spectra from amplitude-related features. However, existing speech phase prediction methods are constrained to recovering phase spectra with short frame shifts, which are considerably smaller than the theoretical upper bound required for exact waveform reconstruction of sho… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Published at IEEE Signal Processing Letters

  39. arXiv:2307.10710  [pdf, other

    cs.LG

    Reparameterized Policy Learning for Multimodal Trajectory Optimization

    Authors: Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su

    Abstract: We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  40. arXiv:2307.03135  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

    Authors: Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su

    Abstract: Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the dist… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Published at International Conference on Computer Vision (ICCV) 2023. Poster at https://xuanlinli17.github.io/pdfs/iccv23_large_vlm_distillation_poster.pdf

  41. arXiv:2306.10336  [pdf, other

    cs.LG

    Fair Causal Feature Selection

    Authors: Zhaolong Ling, Enqi Xu, Peng Zhou, Liang Du, Kui Yu, Xindong Wu

    Abstract: Fair feature selection for classification decision tasks has recently garnered significant attention from researchers. However, existing fair feature selection algorithms fall short of providing a full explanation of the causal relationship between features and sensitive attributes, potentially impacting the accuracy of fair feature identification. To address this issue, we propose a Fair Causal F… ▽ More

    Submitted 18 September, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

  42. arXiv:2306.06799  [pdf, other

    cs.RO cs.AI cs.LG

    On the Efficacy of 3D Point Cloud Reinforcement Learning

    Authors: Zhan Ling, Yunchao Yao, Xuanlin Li, Hao Su

    Abstract: Recent studies on visual reinforcement learning (visual RL) have explored the use of 3D visual representations. However, none of these work has systematically compared the efficacy of 3D representations with 2D representations across different tasks, nor have they analyzed 3D representations from the perspective of agent-object / object-object relationship reasoning. In this work, we seek answers… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  43. arXiv:2306.03872  [pdf, other

    cs.CL cs.AI cs.LG

    Deductive Verification of Chain-of-Thought Reasoning

    Authors: Zhan Ling, Yunhao Fang, Xuanlin Li, Zhiao Huang, Mingu Lee, Roland Memisevic, Hao Su

    Abstract: Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors, thereby limiting models' ability to solve complex reasoning tasks. Inspired by how hu… ▽ More

    Submitted 3 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023

  44. arXiv:2306.00544  [pdf, other

    cs.IT eess.SP

    Codebook Configuration for RIS-aided Systems via Implicit Neural Representations

    Authors: Huiying Yang, Rujing Xiong, Yao Xiao, Zhijie Fan, Tiebin Mi, Robert Caiming Qiu, Zenan Ling

    Abstract: Reconfigurable Intelligent Surface (RIS) is envisioned to be an enabling technique in 6G wireless communications. By configuring the reflection beamforming codebook, RIS focuses signals on target receivers to enhance signal strength. In this paper, we investigate the codebook configuration for RIS-aided communication systems. We formulate an implicit relationship between user's coordinates informa… ▽ More

    Submitted 28 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

  45. Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation

    Authors: Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

    Abstract: Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement. This paper proposes further incorporating ultrasound tongue images to improve lip-based AV-SE systems' performance. Knowledge distillation is employed at the training stage to address the challeng… ▽ More

    Submitted 20 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published in InterSpeech 2023

    Journal ref: Proc. INTERSPEECH 2023, 844-848 (2023)

  46. arXiv:2305.14359  [pdf, other

    cs.MM cs.AI cs.CV cs.SD eess.AS

    Zero-shot personalized lip-to-speech synthesis with face image based voice control

    Authors: Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling

    Abstract: Lip-to-Speech (Lip2Speech) synthesis, which predicts corresponding speech from talking face images, has witnessed significant progress with various models and training strategies in a series of independent studies. However, existing studies can not achieve voice control under zero-shot condition, because extra speaker embeddings need to be extracted from natural reference speech and are unavailabl… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: ICASSP 2023

  47. arXiv:2305.12733  [pdf, other

    cs.CL

    MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation

    Authors: Jia-Chen Gu, Chao-Hong Tan, Caiyuan Chu, Zhen-Hua Ling, Chongyang Tao, Quan Liu, Cong Liu

    Abstract: Modeling multi-party conversations (MPCs) with graph neural networks has been proven effective at capturing complicated and graphical information flows. However, existing methods rely heavily on the necessary addressee labels and can only be applied to an ideal setting where each utterance must be tagged with an addressee label. To study the scarcity of addressee labels which is a common issue in… ▽ More

    Submitted 17 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by EMNLP 2023. arXiv admin note: text overlap with arXiv:2203.08500

  48. arXiv:2305.12389  [pdf, other

    cs.CL

    SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction

    Authors: Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu

    Abstract: Zero-shot cross-lingual information extraction(IE) aims at constructing an IE model for some low-resource target languages, given annotations exclusively in some rich-resource languages. Recent studies based on language-universal features have shown their effectiveness and are attracting increasing attention. However, prior work has neither explored the potential of establishing interactions betwe… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: 15pages

  49. arXiv:2305.11517  [pdf, other

    cs.CL cs.AI

    DiffuSIA: A Spiral Interaction Architecture for Encoder-Decoder Text Diffusion

    Authors: Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Diffusion models have emerged as the new state-of-the-art family of deep generative models, and their promising potentials for text generation have recently attracted increasing attention. Existing studies mostly adopt a single encoder architecture with partially noising processes for conditional text generation, but its degree of flexibility for conditional modeling is limited. In fact, the encod… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Work in Progress

  50. arXiv:2305.10730  [pdf, other

    cs.LG

    FedMR: Federated Learning via Model Recombination

    Authors: Ming Hu, Zhihao Yue, Zhiwei Ling, Yihao Huang, Cheng Chen, Xian Wei, Yang Liu, Mingsong Chen

    Abstract: Although Federated Learning (FL) enables global model training across clients without compromising their raw data, existing Federated Averaging (FedAvg)-based methods suffer from the problem of low inference performance, especially for unevenly distributed data among clients. This is mainly because i) FedAvg initializes client models with the same global models, which makes the local training hard… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2208.07677