Skip to main content

Showing 1–50 of 270 results for author: Liang, Z

Searching in archive cs. Search in all archives.
.
  1. Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

    Authors: Yitong Jin, Zhiping Qiu, Yi Shi, Shuangpeng Sun, Chongwu Wang, Donghao Pan, Jiachen Zhao, Zhenghao Liang, Yuan Wang, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

    Abstract: In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different v… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH2024

  2. arXiv:2405.00734  [pdf, other

    eess.SP cs.AI cs.LG

    EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

    Authors: Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang

    Abstract: Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Mani… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

  3. arXiv:2404.19557  [pdf, other

    stat.ML cs.LG

    Neural Dynamic Data Valuation

    Authors: Zhangyong Liang, Huanhuan Gao, Ji Zhang

    Abstract: Data constitute the foundational component of the data economy and its marketplaces. Efficient and fair data valuation has emerged as a topic of significant interest.\ Many approaches based on marginal contribution have shown promising results in various downstream tasks. However, they are well known to be computationally expensive as they require training a large number of utility functions, whic… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 43 pages, 19 figures

  4. arXiv:2404.19316  [pdf, other

    cs.CL

    QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering

    Authors: Sheng Ouyang, Jianzong Wang, Yong Zhang, Zhitao Li, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Extractive Question Answering (EQA) in Machine Reading Comprehension (MRC) often faces the challenge of dealing with semantically identical but format-variant inputs. Our work introduces a novel approach, called the ``Query Latent Semantic Calibrator (QLSC)'', designed as an auxiliary module for existing MRC models. We propose a unique scaling strategy to capture latent semantic center features of… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  5. arXiv:2404.19214  [pdf, other

    cs.SD eess.AS

    EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

    Authors: Jianzong Wang, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  6. arXiv:2404.19212  [pdf, other

    cs.SD eess.AS

    EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning

    Authors: Ziqi Liang, Jianzong Wang, Xulong Zhang, Yong Zhang, Ning Cheng, Jing Xiao

    Abstract: Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot research topic. Existing works generally take into account disentangling speech components through human-crafted bottleneck features which can not achieve sufficient information disentangling, while pitch and rhythm may still be mixed together. There is a risk of informat… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  7. arXiv:2404.16271  [pdf

    cs.CR cond-mat.mtrl-sci

    True random number generation using metastable 1T' molybdenum ditelluride

    Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, Jingfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

    Abstract: True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizatio… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  8. A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation

    Authors: Yifan Zhao, Zhenyu Liang, Zhichao Lu, Ran Cheng

    Abstract: As one of the emerging challenges in Automated Machine Learning, the Hardware-aware Neural Architecture Search (HW-NAS) tasks can be treated as black-box multi-objective optimization problems (MOPs). An important application of HW-NAS is real-time semantic segmentation, which plays a pivotal role in autonomous driving scenarios. The HW-NAS for real-time semantic segmentation inherently needs to ba… ▽ More

    Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: GECCO 2024

  9. arXiv:2404.15615  [pdf, other

    cs.HC cs.LG

    MDDD: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition

    Authors: Ting Luo, Jing Zhang, Yingwei Qiu, Li Zhang, Yaohua Hu, Zhuliang Yu, Zhen Liang

    Abstract: Emotion decoding using Electroencephalography (EEG)-based affective brain-computer interfaces represents a significant area within the field of affective computing. In the present study, we propose a novel non-deep transfer learning method, termed as Manifold-based Domain adaptation with Dynamic Distribution (MDDD). The proposed MDDD includes four main modules: manifold feature transformation, dyn… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  10. arXiv:2404.11107  [pdf, other

    cs.CR cs.SE

    KernJC: Automated Vulnerable Environment Generation for Linux Kernel Vulnerabilities

    Authors: Bonan Ruan, Jiahao Liu, Chuqi Zhang, Zhenkai Liang

    Abstract: Linux kernel vulnerability reproduction is a critical task in system security. To reproduce a kernel vulnerability, the vulnerable environment and the Proof of Concept (PoC) program are needed. Most existing research focuses on the generation of PoC, while the construction of environment is overlooked. However, establishing an effective vulnerable environment to trigger a vulnerability is challeng… ▽ More

    Submitted 27 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  11. arXiv:2404.10838  [pdf, other

    cs.CV cs.CL cs.MM

    Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning

    Authors: Zhengyang Liang, Meiyu Liang, Wei Huang, Yawen Li, Zhe Xue

    Abstract: In recent years, pre-trained multimodal large models have attracted widespread attention due to their outstanding performance in various multimodal applications. Nonetheless, the extensive computational resources and vast datasets required for their training present significant hurdles for deployment in environments with limited computational resources. To address this challenge, we propose a nove… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 10 pages

  12. arXiv:2404.10210  [pdf, other

    cs.CV

    MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition

    Authors: Naichuan Zheng, Hailun Xia, Zeyu Liang

    Abstract: In recent years, skeleton-based action recognition, leveraging multimodal Graph Convolutional Networks (GCN), has achieved remarkable results. However, due to their deep structure and reliance on continuous floating-point operations, GCN-based methods are energy-intensive. To address this issue, we propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Disti… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  13. arXiv:2404.09559  [pdf, other

    cs.HC cs.AI

    Joint Contrastive Learning with Feature Alignment for Cross-Corpus EEG-based Emotion Recognition

    Authors: Qile Liu, Zhihao Zhou, Jiyuan Wang, Zhen Liang

    Abstract: The integration of human emotions into multimedia applications shows great potential for enriching user experiences and enhancing engagement across various digital platforms. Unlike traditional methods such as questionnaires, facial expressions, and voice analysis, brain signals offer a more direct and objective understanding of emotional states. However, in the field of electroencephalography (EE… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  14. arXiv:2404.05057  [pdf, other

    cs.LG cs.DB

    TimeCSL: Unsupervised Contrastive Learning of General Shapelets for Explorable Time Series Analysis

    Authors: Zhiyu Liang, Chen Liang, Zheng Liang, Hongzhi Wang, Bo Zheng

    Abstract: Unsupervised (a.k.a. Self-supervised) representation learning (URL) has emerged as a new paradigm for time series analysis, because it has the ability to learn generalizable time series representation beneficial for many downstream tasks without using labels that are usually difficult to obtain. Considering that existing approaches have limitations in the design of the representation encoder and t… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  15. arXiv:2404.01789  [pdf

    cs.SE

    A Feature Dataset of Microservices-based Systems

    Authors: Weipan Yang, Yongchao Xing, Yiming Lyu, Zhihao Liang, Zhiying Tu

    Abstract: Microservice architecture has become a dominant architectural style in the service-oriented software industry. Poor practices in the design and development of microservices are called microservice bad smells. In microservice bad smells research, the detection of these bad smells relies on feature data from microservices. However, there is a lack of an appropriate open-source microservice feature d… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  16. GPU-accelerated Evolutionary Multiobjective Optimization Using Tensorized RVEA

    Authors: Zhenyu Liang, Tao Jiang, Kebin Sun, Ran Cheng

    Abstract: Evolutionary multiobjective optimization has witnessed remarkable progress during the past decades. However, existing algorithms often encounter computational challenges in large-scale scenarios, primarily attributed to the absence of hardware acceleration. In response, we introduce a Tensorized Reference Vector Guided Evolutionary Algorithm (TensorRVEA) for harnessing the advancements of GPU acce… ▽ More

    Submitted 11 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Genetic and Evolutionary Computation Conference (GECCO '24)

  17. arXiv:2403.19098  [pdf, other

    cs.CV

    GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving

    Authors: Yunpeng Zhang, Deheng Qian, Ding Li, Yifeng Pan, Yong Chen, Zhenbao Liang, Zhiyao Zhang, Shurui Zhang, Hongxu Li, Maolei Fu, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du

    Abstract: Modeling complicated interactions among the ego-vehicle, road agents, and map elements has been a crucial part for safety-critical autonomous driving. Previous works on end-to-end autonomous driving rely on the attention mechanism for handling heterogeneous interactions, which fails to capture the geometric priors and is also computationally intensive. In this paper, we propose the Interaction Sce… ▽ More

    Submitted 6 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: project page: https://github.com/zhangyp15/GraphAD

  18. arXiv:2403.18480  [pdf, other

    cs.IR

    Enhanced Generative Recommendation via Content and Collaboration Integration

    Authors: Yidan Wang, Zhaochun Ren, Weiwei Sun, Jiyuan Yang, Zhixiang Liang, Xin Chen, Ruobing Xie, Su Yan, Xu Zhang, Pengjie Ren, Zhumin Chen, Xin Xin

    Abstract: Generative recommendation has emerged as a promising paradigm aimed at augmenting recommender systems with recent advancements in generative artificial intelligence. This task has been formulated as a sequence-to-sequence generation process, wherein the input sequence encompasses data pertaining to the user's previously interacted items, and the output sequence denotes the generative identifier fo… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  19. arXiv:2403.17610  [pdf, other

    cs.CV

    MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors

    Authors: He Zhang, Shenghao Ren, Haolei Yuan, Jianhui Zhao, Fan Li, Shuangpeng Sun, Zhenghao Liang, Tao Yu, Qiu Shen, Xun Cao

    Abstract: Foot contact is an important cue for human motion capture, understanding, and generation. Existing datasets tend to annotate dense foot contact using visual matching with thresholding or incorporating pressure signals. However, these approaches either suffer from low accuracy or are only designed for small-range and slow motion. There is still a lack of a vision-pressure multimodal dataset with la… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  20. arXiv:2403.16513  [pdf, other

    cs.CV cs.CR

    Let Real Images be as a Judger, Spotting Fake Images Synthesized with Generative Models

    Authors: Ziyou Liang, Run Wang, Weifeng Liu, Yuyang Zhang, Wenyuan Yang, Lina Wang, Xingkai Wang

    Abstract: In the last few years, generative models have shown their powerful capabilities in synthesizing realistic images in both quality and diversity (i.e., facial images, and natural subjects). Unfortunately, the artifact patterns in fake images synthesized by different generative models are inconsistent, leading to the failure of previous research that relied on spotting subtle differences between real… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  21. arXiv:2403.11056  [pdf, other

    cs.CV

    Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration

    Authors: Zhihao Liang, Qi Zhang, Wenbo Hu, Ying Feng, Lei Zhu, Kui Jia

    Abstract: The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single po… ▽ More

    Submitted 3 April, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: 29 pages

  22. arXiv:2403.10211  [pdf, other

    cs.CV

    BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution

    Authors: Feng Li, Yixuan Wu, Zichao Liang, Runmin Cong, Huihui Bai, Yao Zhao, Meng Wang

    Abstract: Diffusion models (DM) have achieved remarkable promise in image super-resolution (SR). However, most of them are tailored to solving non-blind inverse problems with fixed known degradation settings, limiting their adaptability to real-world applications that involve complex unknown degradations. In this work, we propose BlindDiff, a DM-based blind SR method to tackle the blind degradation settings… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  23. arXiv:2403.09622  [pdf, other

    cs.CV

    Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

    Authors: Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan

    Abstract: Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies. To achieve accurate text rendering, we identify two crucial requirements for text encoders: character awareness and alignment with glyphs. Our solution involves crafting a series of customized text encoder, Glyph-ByT5, by fine-tuning the ch… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: technical report, 18 pages, 19 figures

  24. Digital Twin-assisted Reinforcement Learning for Resource-aware Microservice Offloading in Edge Computing

    Authors: Xiangchun Chen, Jiannong Cao, Zhixuan Liang, Yuvraj Sahni, Mingjin Zhang

    Abstract: Collaborative edge computing (CEC) has emerged as a promising paradigm, enabling edge nodes to collaborate and execute microservices from end devices. Microservice offloading, a fundamentally important problem, decides when and where microservices are executed upon the arrival of services. However, the dynamic nature of the real-world CEC environment often leads to inefficient microservice offload… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 9 pages, 5 figures

    Journal ref: 2023 IEEE 20th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Toronto, ON, Canada, 2023, pp. 28-36

  25. arXiv:2403.08164  [pdf, other

    cs.SD cs.LG eess.AS

    EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech

    Authors: Ziqi Liang, Haoxiang Shi, Jiawei Wang, Keda Lu

    Abstract: Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techn… ▽ More

    Submitted 17 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted by the 27th IEEE International Conference on Computer Supported Cooperative Work in Design (IEEE CSCWD 2024). arXiv admin note: substantial text overlap with arXiv:2211.01948

  26. arXiv:2403.03310  [pdf, other

    quant-ph cs.LG

    Graph Learning for Parameter Prediction of Quantum Approximate Optimization Algorithm

    Authors: Zhiding Liang, Gang Liu, Zheyuan Liu, Jinglei Cheng, Tianyi Hao, Kecheng Liu, Hang Ren, Zhixin Song, Ji Liu, Fanny Ye, Yiyu Shi

    Abstract: In recent years, quantum computing has emerged as a transformative force in the field of combinatorial optimization, offering novel approaches to tackling complex problems that have long challenged classical computational methods. Among these, the Quantum Approximate Optimization Algorithm (QAOA) stands out for its potential to efficiently solve the Max-Cut problem, a quintessential example of com… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  27. arXiv:2403.02691  [pdf, other

    cs.CL cs.CR

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

    Authors: Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang

    Abstract: Recent work has embodied LLMs as agents, allowing them to access tools, perform actions, and interact with external content (e.g., emails or websites). However, external content introduces the risk of indirect prompt injection (IPI) attacks, where malicious instructions are embedded within the content processed by LLMs, aiming to manipulate these agents into executing detrimental actions against u… ▽ More

    Submitted 25 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 28 pages, 5 figures, 9 tables

  28. arXiv:2402.18146  [pdf, ps, other

    cs.CV

    3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

    Authors: Chaokang Jiang, Guangming Wang, Jiuming Liu, Hesheng Wang, Zhuang Ma, Zhenqiang Liu, Zhujin Liang, Yi Shan, Dalong Du

    Abstract: Learning 3D scene flow from LiDAR point clouds presents significant difficulties, including poor generalization from synthetic datasets to real scenes, scarcity of real-world 3D labels, and poor performance on real sparse LiDAR point clouds. We present a novel approach from the perspective of auto-labelling, aiming to generate a large number of 3D scene flow pseudo labels for real-world LiDAR poin… ▽ More

    Submitted 29 February, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR2024! 10 pages, 6 figures

  29. arXiv:2402.16117  [pdf, other

    cs.RO cs.AI cs.CV

    RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

    Authors: Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

    Abstract: Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  30. arXiv:2402.14623  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

    Authors: Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo

    Abstract: Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 10 pages of main paper, 4 pages of appendix; 10 figures in main paper, 3 figures in appendix

    ACM Class: I.2.7; I.2.8; I.2.9; I.2.10

  31. arXiv:2402.14598  [pdf, other

    cs.NE cs.LG

    Brain-inspired Distributed Memorization Learning for Efficient Feature-free Unsupervised Domain Adaptation

    Authors: Jianming Lv, Depin Liang, Zequan Liang, Yaobin Zhang, Sijun Xia

    Abstract: Compared with gradient based artificial neural networks, biological neural networks usually show a more powerful generalization ability to quickly adapt to unknown environments without using any gradient back-propagation procedure. Inspired by the distributed memory mechanism of human brains, we propose a novel gradient-free Distributed Memorization Learning mechanism, namely DML, to support quick… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 15 pages,15 figures

  32. arXiv:2402.13148  [pdf, other

    cs.LG cs.CR

    Defending Jailbreak Prompts via In-Context Adversarial Game

    Authors: Yujun Zhou, Yufei Han, Haomin Zhuang, Taicheng Guo, Kehan Guo, Zhenwen Liang, Hongyan Bao, Xiangliang Zhang

    Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities across diverse applications. However, concerns regarding their security, particularly the vulnerability to jailbreak attacks, persist. Drawing inspiration from adversarial training in deep learning and LLM agent learning processes, we introduce the In-Context Adversarial Game (ICAG) for defending against jailbreaks without the need f… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  33. arXiv:2402.10855  [pdf, other

    cs.CV

    Control Color: Multimodal Diffusion-based Interactive Image Colorization

    Authors: Zhexin Liang, Zhaochen Li, Shangchen Zhou, Chongyi Li, Chen Change Loy

    Abstract: Despite the existence of numerous colorization methods, several limitations still exist, such as lack of user interaction, inflexibility in local colorization, unnatural color rendering, insufficient color variation, and color overflow. To solve these issues, we introduce Control Color (CtrlColor), a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model, offeri… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Project Page: https://zhexinliang.github.io/Control_Color/; Demo Video: https://youtu.be/tSCwA-srl8Q

  34. arXiv:2402.07386  [pdf, other

    cs.CL

    Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples

    Authors: Qingkai Zeng, Yuyang Bai, Zhaoxuan Tan, Shangbin Feng, Zhenwen Liang, Zhihan Zhang, Meng Jiang

    Abstract: Automatic taxonomy induction is crucial for web search, recommendation systems, and question answering. Manual curation of taxonomies is expensive in terms of human effort, making automatic taxonomy construction highly desirable. In this work, we introduce Chain-of-Layer which is an in-context learning framework designed to induct taxonomies from a given set of entities. Chain-of-Layer breaks down… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  35. arXiv:2402.05138  [pdf, other

    cs.AI cs.CL

    SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

    Authors: Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang

    Abstract: The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-respon… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Work in progress

  36. arXiv:2402.02953  [pdf, other

    cs.CR cs.LG

    Unraveling the Key of Machine Learning Solutions for Android Malware Detection

    Authors: Jiahao Liu, Jun Zeng, Fabio Pierazzi, Lorenzo Cavallaro, Zhenkai Liang

    Abstract: Android malware detection serves as the front line against malicious apps. With the rapid advancement of machine learning (ML), ML-based Android malware detection has attracted increasing attention due to its capability of automatically capturing malicious patterns from Android APKs. These learning-driven methods have reported promising results in detecting malware. However, the absence of an in-d… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  37. arXiv:2401.17865  [pdf, other

    cs.LG cs.AI

    Manipulating Predictions over Discrete Inputs in Machine Teaching

    Authors: Xiaodong Wu, Yufei Han, Hayssam Dahrouj, Jianbing Ni, Zhenwen Liang, Xiangliang Zhang

    Abstract: Machine teaching often involves the creation of an optimal (typically minimal) dataset to help a model (referred to as the `student') achieve specific goals given by a teacher. While abundant in the continuous domain, the studies on the effectiveness of machine teaching in the discrete domain are relatively limited. This paper focuses on machine teaching in the discrete domain, specifically on man… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 8 pages, 2 figures

    ACM Class: I.2.6

  38. arXiv:2401.17807  [pdf, other

    cs.CV cs.GR

    Advances in 3D Generation: A Survey

    Authors: Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, Yan-Pei Cao, Ying Shan

    Abstract: Generating 3D models lies at the core of computer graphics and has been the focus of decades of research. With the emergence of advanced neural representations and generative models, the field of 3D content generation is developing rapidly, enabling the creation of increasingly high-quality and diverse 3D models. The rapid growth of this field makes it difficult to stay abreast of all recent devel… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 33 pages, 12 figures

  39. arXiv:2401.15668  [pdf, other

    cs.CV

    Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes

    Authors: Weifeng Liu, Tianyi She, Jiawei Liu, Run Wang, Dongyu Yao, Ziyou Liang

    Abstract: In recent years, DeepFake technology has achieved unprecedented success in high-quality video synthesis, whereas these methods also pose potential and severe security threats to humanity. DeepFake can be bifurcated into entertainment applications like face swapping and illicit uses such as lip-syncing fraud. However, lip-forgery videos, which neither change identity nor have discernible visual art… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: The first two authors contributed equally to this work

  40. arXiv:2401.14634  [pdf, other

    cs.IT

    Semantic Huffman Coding using Synonymous Mapping

    Authors: Jin Xu, Kai Niu, Zijian Liang, Ping Zhang

    Abstract: Semantic communication stands out as a highly promising avenue for future developments in communications. Theoretically, source compression coding based on semantics can achieve lower rates than Shannon entropy. This paper introduces a semantic Huffman coding built upon semantic information theory. By incorporating synonymous mapping and synonymous sets, semantic Huffman coding can achieve shorter… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures, this paper is submitted to the 2024 IEEE International Symposium on Information Theory (ISIT 2024)

  41. arXiv:2401.14633  [pdf, other

    cs.IT

    Semantic Arithmetic Coding using Synonymous Mappings

    Authors: Zijian Liang, Kai Niu, Jin Xu, Ping Zhang

    Abstract: Recent semantic communication methods explore effective ways to expand the communication paradigm and improve the system performance of the communication systems. Nonetheless, the common problem of these methods is that the essence of semantics is not explicitly pointed out and directly utilized. A new epistemology suggests that synonymy, which is revealed as the fundamental feature of semantics,… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 6 pages, 4 figures. This paper is submitted to the 2024 IEEE International Symposium on Information Theory (ISIT 2024)

  42. arXiv:2401.12550  [pdf, other

    cs.AI cs.LG

    UR4NNV: Neural Network Verification, Under-approximation Reachability Works!

    Authors: Zhen Liang, Taoran Wu, Ran Zhao, Bai Xue, Ji Wang, Wenjing Yang, Shaojun Deng, Wanwei Liu

    Abstract: Recently, formal verification of deep neural networks (DNNs) has garnered considerable attention, and over-approximation based methods have become popular due to their effectiveness and efficiency. However, these strategies face challenges in addressing the "unknown dilemma" concerning whether the exact output region or the introduced approximation error violates the property in question. To addre… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 11 pages, 4 figures

    MSC Class: 68Q60; 68T07 ACM Class: D.2.4; I.2.0

  43. arXiv:2401.03704  [pdf, other

    cs.CV

    Sur2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images

    Authors: Zhangjin Huang, Zhihao Liang, Haojie Zhang, Yangkai Lin, Kui Jia

    Abstract: Multi-view surface reconstruction is an ill-posed, inverse problem in 3D vision research. It involves modeling the geometry and appearance with appropriate surface representations. Most of the existing methods rely either on explicit meshes, using surface rendering of meshes for reconstruction, or on implicit field functions, using volume rendering of the fields for reconstruction. The two types o… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: 18 pages, 16 figures

  44. arXiv:2312.15276  [pdf, other

    cs.HC quant-ph

    VIOLET: Visual Analytics for Explainable Quantum Neural Networks

    Authors: Shaolun Ruan, Zhiding Liang, Qiang Guan, Paul Griffin, Xiaolin Wen, Yanna Lin, Yong Wang

    Abstract: With the rapid development of Quantum Machine Learning, quantum neural networks (QNN) have experienced great advancement in the past few years, harnessing the advantages of quantum computing to significantly speed up classical machine learning tasks. Despite their increasing popularity, the quantum neural network is quite counter-intuitive and difficult to understand, due to their unique quantum-s… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  45. arXiv:2312.13628  [pdf, other

    cs.LG

    Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

    Authors: Ruichu Cai, Yuxuan Zhu, Jie Qiao, Zefeng Liang, Furui Liu, Zhifeng Hao

    Abstract: Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreaso… ▽ More

    Submitted 26 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI-2024

  46. arXiv:2312.11598  [pdf, other

    cs.RO cs.CV cs.LG

    SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

    Authors: Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, Ping Luo

    Abstract: Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end hierarchical planning framework integrating interpretable skill learning with conditional diffusion p… ▽ More

    Submitted 28 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024. Camera ready version. Project page: https://skilldiffuser.github.io/

  47. arXiv:2312.08862  [pdf, other

    cs.IT eess.SP

    Semantics-Division Duplexing: A Novel Full-Duplex Paradigm

    Authors: Kai Niu, Zijian Liang, Chao Dong, Jincheng Dai, Zhongwei Si, Ping Zhang

    Abstract: In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limi… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 9 pages, 5 figures, submitted to IEEE Wireless Communications Magazine

  48. arXiv:2312.05698  [pdf, other

    cs.LG

    Unsupervised Multi-modal Feature Alignment for Time Series Representation Learning

    Authors: Chen Liang, Donghua Yang, Zhiyu Liang, Hongzhi Wang, Zheng Liang, Xiyang Zhang, Jianfeng Huang

    Abstract: In recent times, the field of unsupervised representation learning (URL) for time series data has garnered significant interest due to its remarkable adaptability across diverse downstream applications. Unsupervised learning goals differ from downstream tasks, making it tricky to ensure downstream task utility by focusing only on temporal feature characterization. Researchers have proposed multipl… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  49. arXiv:2311.16473  [pdf, other

    cs.CV

    GS-IR: 3D Gaussian Splatting for Inverse Rendering

    Authors: Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia

    Abstract: We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (e.g. NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-p… ▽ More

    Submitted 28 March, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

  50. arXiv:2311.16035  [pdf, other

    quant-ph cs.AI cs.AR cs.LG

    RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

    Authors: Hanrui Wang, Yilian Liu, Pengyu Liu, Jiaqi Gu, Zirui Li, Zhiding Liang, Jinglei Cheng, Yongshan Ding, Xuehai Qian, Yiyu Shi, David Z. Pan, Frederic T. Chong, Song Han

    Abstract: Quantum state preparation, a crucial subroutine in quantum computing, involves generating a target quantum state from initialized qubits. Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP). AD employs a predefined procedure to decompose the target state into a series of gates, whereas VQSP iterativel… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to FASTML @ ICCAD 2023. 14 pages, 20 figures