Skip to main content

Showing 1–50 of 584 results for author: Gao, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04476  [pdf, other

    eess.AS cs.SD

    BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

    Authors: Lijun Wang, Yixian Lu, Ziyan Gao, Kai Li, Jianqiang Huang, Yuntao Kong, Shogo Okada

    Abstract: Room acoustic parameters (RAPs) and room physical parameters ( RPPs) are essential metrics for parameterizing the room acoustical characteristics (RAC) of a sound field around a listener's local environment, offering comprehensive indications for various applications. The current RAPs and RPPs estimation methods either fall short of covering broad real-world acoustic environments in the context of… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE/ACM Transaction on Audio Speech and Language Processing (TASLP)

  2. arXiv:2405.04377  [pdf, other

    cs.CV

    Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing

    Authors: Boqiang Zhang, Hongtao Xie, Zuan Gao, Yuxin Wang

    Abstract: Scene text images contain not only style information (font, background) but also content information (character, texture). Different scene text tasks need different information, but previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance. We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling th… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  3. Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning

    Authors: Yubo Mai, Zhipeng Gao, Xing Hu, Lingfeng Bao, Yu Liu, Jianling Sun

    Abstract: Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2405.03355  [pdf, other

    cs.LG cs.CV

    On the Theory of Cross-Modality Distillation with Contrastive Learning

    Authors: Hangyu Lin, Chen Liu, Chengming Xu, Zhengqi Gao, Yanwei Fu, Yuan Yao

    Abstract: Cross-modality distillation arises as an important topic for data modalities containing limited knowledge such as depth maps and high-quality sketches. Such techniques are of great importance, especially for memory and privacy-restricted scenarios where labeled training data is generally unavailable. To solve the problem, existing label-free methods leverage a few pairwise unlabeled data to distil… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2405.03066  [pdf

    cs.ET

    A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)

    Authors: Lingyao Li, Jiayan Zhou, Zhenxiang Gao, Wenyue Hua, Lizhou Fan, Huizi Yu, Loni Hagen, Yonfeng Zhang, Themistocles L. Assimes, Libby Hemphill, Siyuan Ma

    Abstract: Electronic Health Records (EHRs) play an important role in the healthcare system. However, their complexity and vast volume pose significant challenges to data interpretation and analysis. Recent advancements in Artificial Intelligence (AI), particularly the development of Large Language Models (LLMs), open up new opportunities for researchers in this domain. Although prior studies have demonstrat… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  6. arXiv:2405.03003  [pdf, other

    cs.LG cs.AI cs.CL

    Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

    Authors: Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, Jia Li

    Abstract: Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $ΔW=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to fur… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  7. Easy over Hard: A Simple Baseline for Test Failures Causes Prediction

    Authors: Zhipeng Gao, Zhipeng Xue, Xing Hu, Weiyi Shang, Xin Xia

    Abstract: The test failure causes analysis is critical since it determines the subsequent way of handling different types of bugs, which is the prerequisite to get the bugs properly analyzed and fixed. After a test case fails, software testers have to inspect the test execution logs line by line to identify its root cause. However, manual root cause determination is often tedious and time-consuming, which c… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  8. arXiv:2405.02918  [pdf, other

    cs.CV

    MERIT: Multi-view Evidential learning for Reliable and Interpretable liver fibrosis sTaging

    Authors: Yuanye Liu, Zheyao Gao, Nannan Shi, Fuping Wu, Yuxin Shi, Qingchao Chen, Xiahai Zhuang

    Abstract: Accurate staging of liver fibrosis from magnetic resonance imaging (MRI) is crucial in clinical practice. While conventional methods often focus on a specific sub-region, multi-view learning captures more information by analyzing multiple patches simultaneously. However, previous multi-view approaches could not typically calculate uncertainty by nature, and they generally integrate features from d… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Submitted to Medical Image Analysis

    MSC Class: 68U10 ACM Class: I.4.6

  9. arXiv:2405.02823  [pdf, other

    cs.IT eess.SP

    Reconfigurable Massive MIMO: Precoding Design and Channel Estimation in the Electromagnetic Domain

    Authors: Keke Ying, Zhen Gao, Yu Su, Tong Qin, Michail Matthaiou, Robert Schober

    Abstract: Reconfigurable massive multiple-input multiple-output (RmMIMO) technology offers increased flexibility for future communication systems by exploiting previously untapped degrees of freedom in the electromagnetic (EM) domain. The representation of the traditional spatial domain channel state information (sCSI) limits the insights into the potential of EM domain channel properties, constraining the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: This work is being submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  10. arXiv:2405.02299  [pdf, other

    cs.CE cs.LG

    Deep Reinforcement Learning for Modelling Protein Complexes

    Authors: Ziqi Gao, Tao Feng, Jiaxuan You, Chenyi Zi, Yan Zhou, Chen Zhang, Jia Li

    Abstract: AlphaFold can be used for both single-chain and multi-chain protein structure prediction, while the latter becomes extremely challenging as the number of chains increases. In this work, by taking each chain as a node and assembly actions as edges, we show that an acyclic undirected connected graph can be used to predict the structure of multi-chain protein complexes (a.k.a., protein complex modell… ▽ More

    Submitted 6 May, 2024; v1 submitted 11 March, 2024; originally announced May 2024.

    Comments: International Conference on Learning Representations (ICLR 2024)

  11. arXiv:2405.00311  [pdf

    cs.LG

    Three-layer deep learning network random trees for fault diagnosis in chemical production process

    Authors: Ming Lu, Zhen Gao, Ying Zou, Zuguo Chen, Pei Li

    Abstract: With the development of technology, the chemical production process is becoming increasingly complex and large-scale, making fault diagnosis particularly important. However, current diagnostic methods struggle to address the complexities of large-scale production processes. In this paper, we integrate the strengths of deep learning and machine learning technologies, combining the advantages of bid… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  12. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  13. arXiv:2404.16767  [pdf, other

    cs.LG cs.CL cs.CV

    REBEL: Reinforcement Learning via Regressing Relative Rewards

    Authors: Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

    Abstract: While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping) and is notorious for its sensitivity to the precise implement… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  14. arXiv:2404.12587  [pdf, other

    cs.AI

    Reinforcement Learning Approach for Integrating Compressed Contexts into Knowledge Graphs

    Authors: Ngoc Quach, Qi Wang, Zijun Gao, Qifeng Sun, Bo Guan, Lillian Floyd

    Abstract: The widespread use of knowledge graphs in various fields has brought about a challenge in effectively integrating and updating information within them. When it comes to incorporating contexts, conventional methods often rely on rules or basic machine learning models, which may not fully grasp the complexity and fluidity of context information. This research suggests an approach based on reinforcem… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by the 2024 International Conference on Machine Learning and Neural Networks (MLNN 2024)

  15. arXiv:2404.11108  [pdf, other

    cs.CV

    LADDER: An Efficient Framework for Video Frame Interpolation

    Authors: Tong Shen, Dong Li, Ziheng Gao, Lu Tian, Emad Barsoum

    Abstract: Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc. This paper introduces an efficient video frame interpolation framework that aims to strike a favorable balance between efficiency and quality. Our framework follows a general paradigm consisting of a flow estimator and a refinement modul… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  16. arXiv:2404.09842  [pdf, other

    cs.CV

    STMixer: A One-Stage Sparse Action Detector

    Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

    Abstract: Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and the feature sampling is constrained inside the box, failing to effectively leverage richer context inf… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Extended version of the paper arXiv:2303.15879 presented at CVPR 2023. Accepted by TPAMI 2024

  17. arXiv:2404.07572  [pdf, other

    cs.CR cs.AI

    Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing

    Authors: ZhenZhe Gao, Zhenjun Tang, Zhaoxia Yin, Baoyuan Wu, Yue Lu

    Abstract: Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. T… ▽ More

    Submitted 23 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: The article has been accepted by IEEE International Conference on Multimedia and Expo 2024

  18. arXiv:2404.06974  [pdf

    cs.RO

    Deep Reinforcement Learning for Mobile Robot Path Planning

    Authors: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu

    Abstract: Path planning is an important problem with the the applications in many aspects, such as video games, robotics etc. This paper proposes a novel method to address the problem of Deep Reinforcement Learning (DRL) based path planning for a mobile robot. We design DRL-based algorithms, including reward functions, and parameter optimization, to avoid time-consuming work in a 2D environment. We also des… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  19. arXiv:2404.06065  [pdf, other

    cs.CV

    Unified Entropy Optimization for Open-Set Test-Time Adaptation

    Authors: Zhengqing Gao, Xu-Yao Zhang, Cheng-Lin Liu

    Abstract: Test-time adaptation (TTA) aims at adapting a model pre-trained on the labeled source domain to the unlabeled target domain. Existing methods usually focus on improving TTA performance under covariate shifts, while neglecting semantic shifts. In this paper, we delve into a realistic open-set TTA setting where the target domain may contain samples from unknown classes. Many state-of-the-art closed-… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  20. arXiv:2404.03652  [pdf, other

    cs.CV

    The More You See in 2D, the More You Perceive in 3D

    Authors: Xinyang Han, Zelin Gao, Angjoo Kanazawa, Shubham Goel, Yossi Gandelsman

    Abstract: Humans can infer 3D structure from 2D images of an object based on past experience and improve their 3D understanding as they see more images. Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. Given a few unposed images of an object, we adapt a pre-trained view-conditioned diffusion model together with… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project page: https://sap3d.github.io/

  21. arXiv:2403.19079  [pdf, other

    cs.CV

    A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement

    Authors: Junjie Wen, Jinqiang Cui, Benyun Zhao, Bingxin Han, Xuchen Liu, Zhi Gao, Ben M. Chen

    Abstract: In recent years, significant progress has been made in the field of underwater image enhancement (UIE). However, its practical utility for high-level vision tasks, such as underwater object detection (UOD) in Autonomous Underwater Vehicles (AUVs), remains relatively unexplored. It may be attributed to several factors: (1) Existing methods typically employ UIE as a pre-processing step, which inevit… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: accepted by ICRA24

  22. arXiv:2403.17256  [pdf, other

    cs.IT eess.SP

    Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models

    Authors: Li Qiao, Mahdi Boloursaz Mashhadi, Zhen Gao, Chuan Heng Foh, Pei Xiao, Mehdi Bennis

    Abstract: Generative foundation AI models have recently shown great success in synthesizing natural signals with high perceptual quality using only textual prompts and conditioning signals to guide the generation process. This enables semantic communications at extremely low data rates in future wireless networks. In this paper, we develop a latency-aware semantic communications framework with pre-trained g… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  23. arXiv:2403.16428  [pdf, other

    cs.CV

    Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

    Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao

    Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  24. arXiv:2403.16393  [pdf, other

    cs.AI cs.CL cs.LG

    Concurrent Linguistic Error Detection (CLED) for Large Language Models

    Authors: Jinhua Zhu, Javier Conde, Zhen Gao, Pedro Reviriego, Shanshan Liu, Fabrizio Lombardi

    Abstract: The wide adoption of Large language models (LLMs) makes their dependability a pressing concern. Detection of errors is the first step to mitigating their impact on a system and thus, efficient error detection for LLMs is an important issue. In many settings, the LLM is considered as a black box with no access to the internal nodes; this prevents the use of many error detection schemes that need ac… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 11 pages, 6 figures, 30 references

  25. arXiv:2403.14843  [pdf, other

    cs.LG cs.AI

    Local Causal Discovery with Linear non-Gaussian Cyclic Models

    Authors: Haoyue Dai, Ignavier Ng, Yujia Zheng, Zhengqing Gao, Kun Zhang

    Abstract: Local causal discovery is of great practical significance, as there are often situations where the discovery of the global causal structure is unnecessary, and the interest lies solely on a single target variable. Most existing local methods utilize conditional independence relations, providing only a partially directed graph, and assume acyclicity for the ground-truth structure, even though real-… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Appears at AISTATS 2024

  26. arXiv:2403.14583  [pdf, other

    cs.RO cs.LG cs.MA

    Co-Optimization of Environment and Policies for Decentralized Multi-Agent Navigation

    Authors: Zhan Gao, Guang Yang, Amanda Prorok

    Abstract: This work views the multi-agent system and its surrounding environment as a co-evolving system, where the behavior of one affects the other. The goal is to take both agent actions and environment configurations as decision variables, and optimize these two components in a coordinated manner to improve some measure of interest. Towards this end, we consider the problem of decentralized multi-agent… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  27. arXiv:2403.12813  [pdf, other

    cs.IT eess.SP

    Knowledge and Data Dual-Driven Channel Estimation and Feedback for Ultra-Massive MIMO Systems under Hybrid Field Beam Squint Effect

    Authors: Kuiyu Wang, Zhen Gao, Sheng Chen, Boyu Ning, Gaojie Chen, Yu Su, Zhaocheng Wang, H. Vincent Poor

    Abstract: Acquiring accurate channel state information (CSI) at an access point (AP) is challenging for wideband millimeter wave (mmWave) ultra-massive multiple-input and multiple-output (UMMIMO) systems, due to the high-dimensional channel matrices, hybrid near- and far- field channel feature, beam squint effects, and imperfect hardware constraints, such as low-resolution analog-to-digital converters, and… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 17 pages, 22 figures, 3 tables

  28. arXiv:2403.11809  [pdf, other

    cs.IT eess.SP

    Sensing-Enhanced Channel Estimation for Near-Field XL-MIMO Systems

    Authors: Shicong Liu, Xianghao Yu, Zhen Gao, Jie Xu, Derrick Wing Kwan Ng, Shuguang Cui

    Abstract: Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. The spherical wavefront characteristics in the near field introduce additional degrees of freedom (DoFs), namely distance and angle, into the channel model, which leads to unique challenges in channe… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 14 pages, 9 figures

  29. arXiv:2403.11519  [pdf, other

    cs.CR

    Efficient and Privacy-Preserving Federated Learning based on Full Homomorphic Encryption

    Authors: Yuqi Guo, Lin Li, Zhongxiang Zheng, Hanrui Yun, Ruoyan Zhang, Xiaolin Chang, Zhixuan Gao

    Abstract: Since the first theoretically feasible full homomorphic encryption (FHE) scheme was proposed in 2009, great progress has been achieved. These improvements have made FHE schemes come off the paper and become quite useful in solving some practical problems. In this paper, we propose a set of novel Federated Learning Schemes by utilizing the latest homomorphic encryption technologies, so as to improv… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  30. arXiv:2403.11481  [pdf, other

    cs.CV

    VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

    Authors: Yue Fan, Xiaojian Ma, Rujie Wu, Yuntao Du, Jiaqi Li, Zhi Gao, Qing Li

    Abstract: We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relations in lengthy videos. In particular, the proposed multimodal agent VideoAgent: 1) constructs a structured memory to store both the generic temporal e… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: videoagent.github.io; First two authors contributed equally

  31. arXiv:2403.10927  [pdf, ps, other

    cs.IT cs.LG

    Distributed Multi-Objective Dynamic Offloading Scheduling for Air-Ground Cooperative MEC

    Authors: Yang Huang, Miaomiao Dong, Yijie Mao, Wenqiang Liu, Zhen Gao

    Abstract: Utilizing unmanned aerial vehicles (UAVs) with edge server to assist terrestrial mobile edge computing (MEC) has attracted tremendous attention. Nevertheless, state-of-the-art schemes based on deterministic optimizations or single-objective reinforcement learning (RL) cannot reduce the backlog of task bits and simultaneously improve energy efficiency in highly dynamic network environments, where t… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for publication in the IEEE Transactions on Vehicular Technology

  32. arXiv:2403.10301  [pdf, other

    cs.CL cs.CV

    Uni-SMART: Universal Science Multimodal Analysis and Research Transformer

    Authors: Hengxing Cai, Xiaochen Cai, Shuwen Yang, Jiankun Wang, Lin Yao, Zhifeng Gao, Junhan Chang, Sihang Li, Mingjun Xu, Changxin Wang, Hongshuai Wang, Yongge Li, Mujie Lin, Yaqi Li, Yuqi Yin, Linfeng Zhang, Guolin Ke

    Abstract: In scientific research and its application, scientific literature analysis is crucial as it allows researchers to build on the work of others. However, the fast growth of scientific knowledge has led to a massive increase in scholarly articles, making in-depth literature analysis increasingly challenging and time-consuming. The emergence of Large Language Models (LLMs) has offered a new way to add… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  33. arXiv:2403.09673  [pdf, other

    q-bio.BM cs.AI cs.LG

    FoldToken: Learning Protein Language via Vector Quantization and Beyond

    Authors: Zhangyang Gao, Cheng Tan, Jue Wang, Yufei Huang, Lirong Wu, Stan Z. Li

    Abstract: Is there a foreign language describing protein sequences and structures simultaneously? Protein structures, represented by continuous 3D points, have long posed a challenge due to the contrasting modeling paradigms of discrete sequences. We introduce \textbf{FoldTokenizer} to represent protein sequence-structure as discrete symbols. This innovative approach involves projecting residue types and st… ▽ More

    Submitted 19 March, 2024; v1 submitted 4 February, 2024; originally announced March 2024.

  34. arXiv:2403.08568  [pdf, other

    cs.CV cs.LG

    Consistent Prompting for Rehearsal-Free Continual Learning

    Authors: Zhanxin Gao, Jun Cen, Xiaobin Chang

    Abstract: Continual learning empowers models to adapt autonomously to the ever-changing environment or data streams without forgetting old knowledge. Prompt-based approaches are built on frozen pre-trained models to learn the task-specific prompts and classifiers efficiently. Existing prompt-based methods are inconsistent between training and testing, limiting their effectiveness. Two types of inconsistency… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  35. arXiv:2403.05256  [pdf, other

    eess.IV cs.CV cs.LG

    DuDoUniNeXt: Dual-domain unified hybrid model for single and multi-contrast undersampled MRI reconstruction

    Authors: Ziqi Gao, Yue Zhang, Xinwen Liu, Kaiyan Li, S. Kevin Zhou

    Abstract: Multi-contrast (MC) Magnetic Resonance Imaging (MRI) reconstruction aims to incorporate a reference image of auxiliary modality to guide the reconstruction process of the target modality. Known MC reconstruction methods perform well with a fully sampled reference image, but usually exhibit inferior performance, compared to single-contrast (SC) methods, when the reference image is missing or of low… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures, 2 tables

  36. arXiv:2403.03483  [pdf, other

    cs.LG

    A Teacher-Free Graph Knowledge Distillation Framework with Dual Self-Distillation

    Authors: Lirong Wu, Haitao Lin, Zhangyang Gao, Guojiang Zhao, Stan Z. Li

    Abstract: Recent years have witnessed great success in handling graph-related tasks with Graph Neural Networks (GNNs). Despite their great academic success, Multi-Layer Perceptrons (MLPs) remain the primary workhorse for practical industrial applications. One reason for such an academic-industry gap is the neighborhood-fetching latency incurred by data dependency in GNNs. To reduce their gaps, Graph Knowled… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.02097

  37. arXiv:2403.02265  [pdf, other

    cs.CV cs.GR

    DaReNeRF: Direction-aware Representation for Dynamic Scenes

    Authors: Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble, Ziyan Wu

    Abstract: Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024. Paper + supplementary material

  38. arXiv:2403.02138  [pdf, other

    cs.CV

    Self-Supervised Facial Representation Learning with Facial Region Awareness

    Authors: Zheng Gao, Ioannis Patras

    Abstract: Self-supervised pre-training has been proved to be effective in learning transferable representations that benefit various visual tasks. This paper asks this question: can self-supervised pre-training learn general facial representations for various facial analysis tasks? Recent efforts toward this goal are limited to treating each face image as a whole, i.e., learning consistent facial representa… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  39. arXiv:2403.01976  [pdf, other

    cs.CL

    SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis

    Authors: Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Changxin Wang, Zhifeng Gao, Hongshuai Wang, Yongge Li, Mujie Lin, Shuwen Yang, Jiankun Wang, Yuqi Yin, Yaqi Li, Linfeng Zhang, Guolin Ke

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have revolutionized natural language understanding and generation, igniting a surge of interest in leveraging these technologies in the field of scientific literature analysis. Existing benchmarks, however, inadequately evaluate the proficiency of LLMs in scientific literature analysis, especially in scenarios involving complex comprehension and… ▽ More

    Submitted 15 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  40. arXiv:2403.01400  [pdf, other

    cs.LG cs.AI

    Decoupling Weighing and Selecting for Integrating Multiple Graph Pre-training Tasks

    Authors: Tianyu Fan, Lirong Wu, Yufei Huang, Haitao Lin, Cheng Tan, Zhangyang Gao, Stan Z. Li

    Abstract: Recent years have witnessed the great success of graph pre-training for graph representation learning. With hundreds of graph pre-training tasks proposed, integrating knowledge acquired from multiple pre-training tasks has become a popular research topic. In this paper, we identify two important collaborative processes for this topic: (1) select: how to select an optimal task combination from a gi… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at ICLR 2024

  41. arXiv:2403.00632  [pdf, other

    cs.HC cs.AI cs.CL cs.CY

    Metamorpheus: Interactive, Affective, and Creative Dream Narration Through Metaphorical Visual Storytelling

    Authors: Qian Wan, Xin Feng, Yining Bei, Zhiqi Gao, Zhicong Lu

    Abstract: Human emotions are essentially molded by lived experiences, from which we construct personalised meaning. The engagement in such meaning-making process has been practiced as an intervention in various psychotherapies to promote wellness. Nevertheless, to support recollecting and recounting lived experiences in everyday life remains under explored in HCI. It also remains unknown how technologies su… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted by CHI 2024

  42. arXiv:2402.18818  [pdf, other

    cs.SE cs.CR

    CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection

    Authors: Hao Wang, Zeyu Gao, Chao Zhang, Mingyang Sun, Yuchen Zhou, Han Qiu, Xi Xiao

    Abstract: Binary code similarity detection (BCSD) is a fundamental technique for various application. Many BCSD solutions have been proposed recently, which mostly are embedding-based, but have shown limited accuracy and efficiency especially when the volume of target binaries to search is large. To address this issue, we propose a cost-effective BCSD framework, CEBin, which fuses embedding-based and compar… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  43. arXiv:2402.18813  [pdf, other

    cs.CE

    Protein Multimer Structure Prediction via Prompt Learning

    Authors: Ziqi Gao, Xiangguo Sun, Zijing Liu, Yu Li, Hong Cheng, Jia Li

    Abstract: Understanding the 3D structures of protein multimers is crucial, as they play a vital role in regulating various cellular processes. It has been empirically confirmed that the multimer structure prediction~(MSP) can be well handled in a step-wise assembly fashion using provided dimer structures and predicted protein-protein interactions~(PPIs). However, due to the biological gap in the formation o… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: International Conference on Learning Representations (ICLR 2024)

  44. arXiv:2402.16928  [pdf, other

    cs.SE cs.AI

    CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision

    Authors: Hao Wang, Zeyu Gao, Chao Zhang, Zihan Sha, Mingyang Sun, Yuchen Zhou, Wenyu Zhu, Wenju Sun, Han Qiu, Xi Xiao

    Abstract: Binary code representation learning has shown significant performance in binary analysis tasks. But existing solutions often have poor transferability, particularly in few-shot and zero-shot scenarios where few or no training samples are available for the tasks. To address this problem, we present CLAP (Contrastive Language-Assembly Pre-training), which employs natural language supervision to lear… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  45. arXiv:2402.11459  [pdf, other

    q-bio.BM cs.AI cs.LG physics.chem-ph

    Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge

    Authors: Yufei Huang, Odin Zhang, Lirong Wu, Cheng Tan, Haitao Lin, Zhangyang Gao, Siyuan Li, Stan. Z. Li

    Abstract: Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging. While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation pre… ▽ More

    Submitted 21 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  46. arXiv:2402.10886  [pdf, other

    cs.CL

    Reviewer2: Optimizing Review Generation Through Prompt Generation

    Authors: Zhaolin Gao, Kianté Brantley, Thorsten Joachims

    Abstract: Recent developments in LLMs offer new opportunities for assisting authors in improving their work. In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft. While initial methods for automated review generation already exist, these methods tend to produce reviews that lack detail, and they do not cover the range of opinions… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  47. arXiv:2402.10609  [pdf, other

    eess.IV cs.CV cs.LG

    U$^2$MRPD: Unsupervised undersampled MRI reconstruction by prompting a large latent diffusion model

    Authors: Ziqi Gao, S. Kevin Zhou

    Abstract: Implicit visual knowledge in a large latent diffusion model (LLDM) pre-trained on natural images is rich and hypothetically universal to natural and medical images. To test this hypothesis, we introduce a novel framework for Unsupervised Undersampled MRI Reconstruction by Prompting a pre-trained large latent Diffusion model ( U$^2$MRPD). Existing data-driven, supervised undersampled MRI reconstruc… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 17 pages, 6 figures, 5 tables, 2 pseudocodes

  48. arXiv:2402.08846  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

    Authors: Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

    Abstract: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning f… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Working in progress and will open-source soon

  49. arXiv:2402.08198  [pdf, other

    q-bio.BM cs.AI cs.LG

    PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction

    Authors: Lirong Wu, Yufei Huang, Cheng Tan, Zhangyang Gao, Bozhen Hu, Haitao Lin, Zicheng Liu, Stan Z. Li

    Abstract: Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery. Existing deep learning-based methods utilize only the single modality of protein sequences or structures and lack the co-modeling of the joint distribution of the two modalities, which may lead to significant performance drops in complex real-world sc… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  50. arXiv:2402.07814  [pdf, other

    cs.CV cs.AI

    PBADet: A One-Stage Anchor-Free Approach for Part-Body Association

    Authors: Zhongpai Gao, Huayi Zhou, Abhishek Sharma, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu

    Abstract: The detection of human parts (e.g., hands, face) and their correct association with individuals is an essential task, e.g., for ubiquitous human-machine interfaces and action recognition. Traditional methods often employ multi-stage processes, rely on cumbersome anchor-based systems, or do not scale well to larger part sets. This paper presents PBADet, a novel one-stage, anchor-free approach for p… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR2024