Skip to main content

Showing 1–50 of 1,781 results for author: Zhang, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03501  [pdf, other

    cs.LG cs.AI cs.CV

    Boosting Single Positive Multi-label Classification with Generalized Robust Loss

    Authors: Yanxi Chen, Chunxiao Li, Xinyang Dai, Jinhuan Li, Weiyu Sun, Yiming Wang, Renyuan Zhang, Tinghe Zhang, Bo Wang

    Abstract: Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and ro… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, 6 tables

  2. arXiv:2405.02778  [pdf, other

    cs.IR

    Improve Temporal Awareness of LLMs for Sequential Recommendation

    Authors: Zhendong Chu, Zichao Wang, Ruiyi Zhang, Yangfeng Ji, Hongning Wang, Tong Sun

    Abstract: Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. However, it is empirically found that LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data, such as sequential recommendation. In this paper, we aim to improve temporal awar… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages

  3. arXiv:2405.01215  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna Enhanced Wireless Sensing Via Antenna Position Optimization

    Authors: Wenyan Ma, Lipeng Zhu, Rui Zhang

    Abstract: In this paper, we propose a new wireless sensing system equipped with the movable-antenna (MA) array, which can flexibly adjust the positions of antenna elements for improving the sensing performance over conventional antenna arrays with fixed-position antennas (FPAs). First, we show that the angle estimation performance in wireless sensing is fundamentally determined by the array geometry, where… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures. We propose a new wireless sensing system equipped with the movable-antenna (MA) array, which can flexibly adjust the positions of antenna elements for improving the sensing performance over conventional antenna arrays with fixed-position antennas (FPAs)

  4. arXiv:2405.00465  [pdf, other

    cs.CL

    BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

    Authors: Mingchen Li, Halil Kilicoglu, Hua Xu, Rui Zhang

    Abstract: Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains; however, these models encounter issues such as generating inaccurate information or hallucinations. Retrieval-augmented generation provided a solution for these models to update knowledge and enhance their performance. In contrast to previous retrieval-augmented… ▽ More

    Submitted 2 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  5. arXiv:2404.19227  [pdf, other

    cs.CV cs.CR

    Espresso: Robust Concept Filtering in Text-to-Image Models

    Authors: Anudeep Das, Vasisht Duddu, Rui Zhang, N. Asokan

    Abstract: Diffusion-based text-to-image (T2I) models generate high-fidelity images for given textual prompts. They are trained on large datasets scraped from the Internet, potentially containing unacceptable concepts (e.g., copyright infringing or unsafe). Retraining T2I models after filtering out unacceptable concepts in the training data is inefficient and degrades utility. Hence, there is a need for conc… ▽ More

    Submitted 7 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2404.18407  [pdf, other

    cs.CR cs.AR

    ICMarks: A Robust Watermarking Framework for Integrated Circuit Physical Design IP Protection

    Authors: Ruisi Zhang, Rachel Selina Rajarathnam, David Z. Pan, Farinaz Koushanfar

    Abstract: Physical design watermarking on contemporary integrated circuit (IC) layout encodes signatures without considering the dense connections and design constraints, which could lead to performance degradation on the watermarked products. This paper presents ICMarks, a quality-preserving and robust watermarking framework for modern IC physical design. ICMarks embeds unique watermark signatures during t… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  7. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 7 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  8. arXiv:2404.18077  [pdf, other

    cs.NI cs.LG

    Generative AI for Low-Carbon Artificial Intelligence of Things

    Authors: Jinbo Wen, Ruichen Zhang, Dusit Niyato, Jiawen Kang, Hongyang Du, Yang Zhang, Zhu Han

    Abstract: By integrating Artificial Intelligence (AI) with the Internet of Things (IoT), Artificial Intelligence of Things (AIoT) has revolutionized many fields. However, AIoT is facing the challenges of energy consumption and carbon emissions due to the continuous advancement of mobile technology. Fortunately, Generative AI (GAI) holds immense potential to reduce carbon emissions of AIoT due to its excelle… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  9. arXiv:2404.17789  [pdf, other

    cs.LG math.NA math.OC

    BiLO: Bilevel Local Operator Learning for PDE inverse problems

    Authors: Ray Zirui Zhang, Xiaohui Xie, John Lowengrub

    Abstract: We propose a new neural network based method for solving inverse problems for partial differential equations (PDEs) by formulating the PDE inverse problem as a bilevel optimization problem. At the upper level, we minimize the data loss with respect to the PDE parameters. At the lower level, we train a neural network to locally approximate the PDE solution operator in the neighborhood of a given se… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  10. arXiv:2404.16790  [pdf, other

    cs.CV

    SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

    Authors: Bohao Li, Yuying Ge, Yi Chen, Yixiao Ge, Ruimao Zhang, Ying Shan

    Abstract: Comprehending text-rich visual content is paramount for the practical application of Multimodal Large Language Models (MLLMs), since text-rich scenarios are ubiquitous in the real world, which are characterized by the presence of extensive texts embedded within images. Recently, the advent of MLLMs with impressive versatility has raised the bar for what we can expect from MLLMs. However, their pro… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2404.16029  [pdf, other

    cs.CV

    Editable Image Elements for Controllable Synthesis

    Authors: Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park

    Abstract: Diffusion models have made significant advances in text-guided synthesis tasks. However, editing user-provided images remains challenging, as the high dimensional noise input space of diffusion models is not naturally suited for image inversion or spatial editing. In this work, we propose an image representation that promotes spatial editing of input images using a diffusion model. Concretely, we… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Project page: https://jitengmu.github.io/Editable_Image_Elements/

  12. arXiv:2404.16006  [pdf, other

    cs.CV

    MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

    Authors: Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 77 pages, 41 figures

  13. arXiv:2404.15756  [pdf, other

    cs.IT cs.NI

    Convolutional Coded Poisson Receivers

    Authors: Cheng-En Lee, Kuo-Yu Liao, Hsiao-Wen Yu, Ruhui Zhang, Cheng-Shang Chang, Duan-Shin Lee

    Abstract: In this paper, we present a framework for convolutional coded Poisson receivers (CCPRs) that incorporates spatially coupled methods into the architecture of coded Poisson receivers (CPRs). We use density evolution equations to track the packet decoding process with the successive interference cancellation (SIC) technique. We derive outer bounds for the stability region of CPRs when the underlying… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Part of this work was presented in 2023 IEEE International Symposium on Information Theory (ISIT) [1] and 2024 IEEE International Symposium on Information Theory (ISIT) [2]

  14. arXiv:2404.15643  [pdf, ps, other

    cs.IT eess.SP

    Dynamic Beam Coverage for Satellite Communications Aided by Movable-Antenna Array

    Authors: Lipeng Zhu, Xiangyu Pi, Wenyan Ma, Zhenyu Xiao, Rui Zhang

    Abstract: Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  15. arXiv:2404.15271  [pdf, other

    cs.CV cs.AI cs.CL

    Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

    Authors: Wanrong Zhu, Jennifer Healey, Ruiyi Zhang, William Yang Wang, Tong Sun

    Abstract: Recent advancements in instruction-following models have made user interactions with models more user-friendly and efficient, broadening their applicability. In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources. In this work, we introduce a novel multimodal instruction-following framework for layout planning, allowing use… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  16. arXiv:2404.14795  [pdf, other

    cs.CL cs.CR cs.LG

    Talk Too Much: Poisoning Large Language Models under Token Limit

    Authors: Jiaming He, Wenbo Jiang, Guanyu Hou, Wenshu Fan, Rui Zhang, Hongwei Li

    Abstract: Mainstream poisoning attacks on large language models (LLMs) typically set a fixed trigger in the input instance and specific responses for triggered queries. However, the fixed trigger setting (e.g., unusual words) may be easily detected by human detection, limiting the effectiveness and practicality in real-world scenarios. To enhance the stealthiness of the trigger, we present a poisoning attac… ▽ More

    Submitted 23 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  17. arXiv:2404.14778  [pdf, other

    cs.IT eess.SP

    Channel Estimation for Optical Intelligent Reflecting Surface-Assisted VLC System: A Joint Space-Time Sampling Approach

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) has attracted increasing attention due to its capability of overcoming signal blockages in visible light communication (VLC), an emerging technology for the next-generation advanced transceivers. However, current works on OIRS predominantly assume known channel state information (CSI), which is essential to practical OIRS configuration. To bridge such… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  18. arXiv:2404.14706  [pdf, other

    cs.IT eess.SP

    Channel Estimation for Optical IRS-Assisted VLC System via Spatial Coherence

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) has been considered a promising technology for visible light communication (VLC) by constructing visual line-of-sight propagation paths to address the signal blockage issue. However, the existing works on OIRSs are mostly based on perfect channel state information (CSI), whose acquisition appears to be challenging due to the passive nature of the OIRS.… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  19. arXiv:2404.13924  [pdf, other

    cs.HC cs.ET

    ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Waves Around the Body

    Authors: Saif Mahmud, Vineet Parikh, Qikang Liang, Ke Li, Ruidong Zhang, Ashwin Ajit, Vipin Gunda, Devansh Agarwal, François Guimbretière, Cheng Zhang

    Abstract: We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body with a time resolution of one second. It only needs a pair of miniature speakers and microphones mounted on each hinge of eyeglasses to emit ultrasonic waves… ▽ More

    Submitted 8 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 29 pages, 11 figures

  20. arXiv:2404.13611  [pdf, other

    cs.CV cs.CL

    Video sentence grounding with temporally global textual knowledge

    Authors: Cai Chen, Runzhong Zhang, Jianjun Gao, Kejun Wu, Kim-Hui Yap, Yi Wang

    Abstract: Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  21. arXiv:2404.13066  [pdf, other

    cs.CL cs.AI

    Leveraging Large Language Model as Simulated Patients for Clinical Education

    Authors: Yanzeng Li, Cheng Zeng, Jialun Zhong, Ruoyu Zhang, Minhao Zhang, Lei Zou

    Abstract: Simulated Patients (SPs) play a crucial role in clinical medical education by providing realistic scenarios for student practice. However, the high cost of training and hiring qualified SPs, along with the heavy workload and potential risks they face in consistently portraying actual patients, limit students' access to this type of clinical training. Consequently, the integration of computer progr… ▽ More

    Submitted 24 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  22. arXiv:2404.12980  [pdf, other

    cs.HC

    Ring-a-Pose: A Ring for Continuous Hand Pose Tracking

    Authors: Tianhong Catherine Yu, Guilin Hu, Ruidong Zhang, Hyunchul Lim, Saif Mahmud, Chi-Jung Lee, Ke Li, Devansh Agarwal, Shuyang Nie, Jinseok Oh, François Guimbretière, Cheng Zhang

    Abstract: We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  23. arXiv:2404.12861  [pdf, other

    cs.CV

    Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation

    Authors: Yilong Chen, Zongyi Xu, xiaoshui Huang, Ruicheng Zhang, Xinqi Jiang, Xinbo Gao

    Abstract: Current point cloud semantic segmentation has achieved great advances when given sufficient labels. However, the dense annotation of LiDAR point clouds remains prohibitively expensive and time-consuming, unable to keep up with the continuously growing volume of data. In this paper, we propose annotating images with scattered points, followed by utilizing SAM (a Foundation model) to generate semant… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  24. arXiv:2404.12388  [pdf, other

    cs.CV

    VideoGigaGAN: Towards Detail-rich Video Super-Resolution

    Authors: Yiran Xu, Taesung Park, Richard Zhang, Yang Zhou, Eli Shechtman, Feng Liu, Jia-Bin Huang, Difan Liu

    Abstract: Video super-resolution (VSR) approaches have shown impressive temporal consistency in upsampled videos. However, these approaches tend to generate blurrier results than their image counterparts as they are limited in their generative capability. This raises a fundamental question: can we extend the success of a generative image upsampler to the VSR task while preserving the temporal consistency? W… ▽ More

    Submitted 1 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: project page: https://videogigagan.github.io/

  25. arXiv:2404.12386  [pdf, other

    cs.CV cs.LG

    SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

    Authors: Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

    Abstract: Open-world entity segmentation, as an emerging computer vision task, aims at segmenting entities in images without being restricted by pre-defined classes, offering impressive generalization capabilities on unseen images and concepts. Despite its promise, existing entity segmentation methods like Segment Anything Model (SAM) rely heavily on costly expert annotators. This work presents Self-supervi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  26. arXiv:2404.12382  [pdf, other

    cs.CV cs.AI cs.GR

    Lazy Diffusion Transformer for Interactive Image Editing

    Authors: Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Daniel Cohen-Or, Taesung Park, Michaël Gharbi

    Abstract: We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications using binary masks and text prompts. Our generator operates in two phases. First, a context encoder processes the curr… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  27. arXiv:2404.12333  [pdf, other

    cs.CV

    Customizing Text-to-Image Diffusion with Camera Viewpoint Control

    Authors: Nupur Kumari, Grace Su, Richard Zhang, Taesung Park, Eli Shechtman, Jun-Yan Zhu

    Abstract: Model customization introduces new concepts to existing text-to-image models, enabling the generation of the new concept in novel contexts. However, such methods lack accurate camera view control w.r.t the object, and users must resort to prompt engineering (e.g., adding "top-view") to achieve coarse view control. In this work, we introduce a new task -- enabling explicit control of camera viewpoi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: project page: https://customdiffusion360.github.io

  28. arXiv:2404.11879  [pdf, other

    cs.DS

    Public Event Scheduling with Busy Agents

    Authors: Bo Li, Lijun Li, Minming Li, Ruilong Zhang

    Abstract: We study a public event scheduling problem, where multiple public events are scheduled to coordinate the availability of multiple agents. The availability of each agent is determined by solving a separate flexible interval job scheduling problem, where the jobs are required to be preemptively processed. The agents want to attend as many events as possible, and their agreements are considered to be… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: To appear in IJCAI 2024

  29. arXiv:2404.11536  [pdf, other

    cs.LG cs.AI

    FedPFT: Federated Proxy Fine-Tuning of Foundation Models

    Authors: Zhaopeng Peng, Xiaoliang Fan, Yufan Chen, Zheng Wang, Shirui Pan, Chenglu Wen, Ruisheng Zhang, Cheng Wang

    Abstract: Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuni… ▽ More

    Submitted 28 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI'24

  30. arXiv:2404.11317  [pdf, other

    cs.CV cs.AI

    Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

    Authors: Zhangchi Feng, Richong Zhang, Zhijie Nie

    Abstract: The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Fur… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 12 pages, 11 figures

  31. arXiv:2404.11206  [pdf, other

    cs.CL

    Prompt-tuning for Clickbait Detection via Text Summarization

    Authors: Haoxiang Deng, Yi Zhu, Ye Wang, Jipeng Qiang, Yunhao Yuan, Yun Li, Runmei Zhang

    Abstract: Clickbaits are surprising social posts or deceptive news headlines that attempt to lure users for more clicks, which have posted at unprecedented rates for more profit or commercial revenue. The spread of clickbait has significant negative impacts on the users, which brings users misleading or even click-jacking attacks. Different from fake news, the crucial problem in clickbait detection is deter… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  32. arXiv:2404.10378  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

    Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

    Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.10476

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

  33. arXiv:2404.09738  [pdf

    q-bio.BM cs.AI q-bio.QM

    AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides

    Authors: Kewei Li, Yuqian Wu, Yutong Guo, Yinheng Li, Yusi Fan, Ruochi Zhang, Lan Huang, Fengfeng Zhou

    Abstract: Activity cliff (AC) is a phenomenon that a pair of similar molecules differ by a small structural alternation but exhibit a large difference in their biochemical activities. The AC of small molecules has been extensively investigated but limited knowledge is accumulated about the AC phenomenon in peptides with canonical amino acids. This study introduces a quantitative definition and benchmarking… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  34. arXiv:2404.09540  [pdf, other

    cs.CV

    Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement

    Authors: Chi Wang, Junming Huang, Rong Zhang, Qi Wang, Haotian Yang, Haibin Huang, Chongyang Ma, Weiwei Xu

    Abstract: Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  35. arXiv:2404.09481  [pdf, other

    cs.CR cs.LG

    SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection

    Authors: Yekai Li, Rufan Zhang, Wenxin Rong, Xianghang Mi

    Abstract: In this study, we introduce SpamDam, a SMS spam detection framework designed to overcome key challenges in detecting and understanding SMS spam, such as the lack of public SMS spam datasets, increasing privacy concerns of collecting SMS data, and the need for adversary-resistant detection models. SpamDam comprises four innovative modules: an SMS spam radar that identifies spam messages from online… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  36. arXiv:2404.09134  [pdf, ps, other

    cs.NI cs.LG

    Interactive Generative AI Agents for Satellite Networks through a Mixture of Experts Transmission

    Authors: Ruichen Zhang, Hongyang Du, Yinqiu Liu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, Dong In Kim

    Abstract: In response to the needs of 6G global communications, satellite communication networks have emerged as a key solution. However, the large-scale development of satellite communication networks is constrained by the complex system models, whose modeling is challenging for massive users. Moreover, transmission interference between satellites and users seriously affects communication performance. To s… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 13 pages, 9 figures

  37. arXiv:2404.08985  [pdf, other

    cs.LG cs.AI

    Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

    Authors: Yijiang Liu, Rongyu Zhang, Huanrui Yang, Kurt Keutzer, Yuan Du, Li Du, Shanghang Zhang

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications, ranging from content generation to interactive entertainment, and artistic creation. However, the diversity of downstream tasks in multitask scenarios presents substantial adaptation challenges for LLMs. While traditional methods often succumb to knowledge confusion on thei… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 13 pages, 5 figures

  38. arXiv:2404.08979  [pdf, other

    cs.CV cs.LG

    BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection

    Authors: Jian Zhang, Ruiteng Zhang, Xinyue Yan, Xiting Zhuang, Ruicheng Cao

    Abstract: Degraded underwater images decrease the accuracy of underwater object detection. However, existing methods for underwater image enhancement mainly focus on improving the indicators in visual aspects, which may not benefit the tasks of underwater image detection, and may lead to serious degradation in performance. To alleviate this problem, we proposed a bidirectional-guided method for underwater o… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures, 4 tables

    MSC Class: 68T07; 68T45 ACM Class: I.4.3; I.4.8; I.4.9; I.4.10; I.2.10

  39. arXiv:2404.08878  [pdf, other

    cs.NI cs.IT cs.LG eess.SP

    Generative AI Agent for Next-Generation MIMO Design: Fundamentals, Challenges, and Vision

    Authors: Zhe Wang, Jiayi Zhang, Hongyang Du, Ruichen Zhang, Dusit Niyato, Bo Ai, Khaled B. Letaief

    Abstract: Next-generation multiple input multiple output (MIMO) is expected to be intelligent and scalable. In this paper, we study generative artificial intelligence (AI) agent-enabled next-generation MIMO design. Firstly, we provide an overview of the development, fundamentals, and challenges of the next-generation MIMO. Then, we propose the concept of the generative AI agent, which is capable of generati… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures, 2 tables

  40. arXiv:2404.08677  [pdf, other

    cs.IR cs.AI cs.CL

    PMG : Personalized Multimodal Generation with Large Language Models

    Authors: Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, Xi Xiao

    Abstract: The emergence of large language models (LLMs) has revolutionized the capabilities of text comprehension and generation. Multi-modal generation attracts great attention from both the industry and academia, but there is little work on personalized generation, which has important applications such as recommender systems. This paper proposes the first method for personalized multimodal generation usin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  41. arXiv:2404.07942  [pdf, other

    cs.SE cs.AI

    On Unified Prompt Tuning for Request Quality Assurance in Public Code Review

    Authors: Xinyu Chen, Lin Li, Rui Zhang, Peng Liang

    Abstract: Public Code Review (PCR) can be implemented through a Software Question Answering (SQA) community, which facilitates high knowledge dissemination. Current methods mainly focus on the reviewer's perspective, including finding a capable reviewer, predicting comment quality, and recommending/generating review comments. Our intuition is that satisfying review necessity requests can increase their visi… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: The 29th International Conference on Database Systems for Advanced Applications (DASFAA)

  42. arXiv:2404.05892  [pdf, other

    cs.CL cs.AI

    Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

    Authors: Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao , et al. (3 additional authors not shown)

    Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokeni… ▽ More

    Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  43. arXiv:2404.05868  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

    Authors: Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei

    Abstract: Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks. Several practical methods have recently been proposed for LLM unlearning, mostly based on gradient ascent (GA) on the loss of undesirable data. Ho… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  44. arXiv:2404.05522  [pdf, other

    cs.MM

    3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering

    Authors: Qingyuan Zhou, Weidong Yang, Ben Fei, Jingyi Xu, Rui Zhang, Keyi Liu, Yeqi Luo, Ying He

    Abstract: Noise is an inevitable aspect of point cloud acquisition, necessitating filtering as a fundamental task within the realm of 3D vision. Existing learning-based filtering methods have shown promising capabilities on small-scale synthetic or real-world datasets. Nonetheless, the effectiveness of these methods is constrained when dealing with a substantial quantity of point clouds. This limitation pri… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  45. arXiv:2404.04485  [pdf, other

    cs.HC

    Majority Voting of Doctors Improves Appropriateness of AI Reliance in Pathology

    Authors: Hongyan Gu, Chunxu Yang, Shino Magaki, Neda Zarrin-Khameh, Nelli S. Lakis, Inma Cobos, Negar Khanlou, Xinhai R. Zhang, Jasmeet Assi, Joshua T. Byers, Ameer Hamza, Karam Han, Anders Meyer, Hilda Mirbaha, Carrie A. Mohila, Todd M. Stevens, Sara L. Stone, Wenzhong Yan, Mohammad Haeri, Xiang 'Anthony' Chen

    Abstract: As Artificial Intelligence (AI) making advancements in medical decision-making, there is a growing need to ensure doctors develop appropriate reliance on AI to avoid adverse outcomes. However, existing methods in enabling appropriate AI reliance might encounter challenges while being applied in the medical domain. With this regard, this work employs and provides the validation of an alternative ap… ▽ More

    Submitted 11 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: 44 pages, 11 figures

  46. arXiv:2404.04050  [pdf, other

    cs.CV

    No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

    Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

    Abstract: To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' c… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR Highlight. Code is available at https://github.com/yangyangyang127/Seg-NN. arXiv admin note: text overlap with arXiv:2308.12961

  47. arXiv:2404.03962  [pdf, other

    cs.CV

    RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications

    Authors: Xingyu Liu, Chenyangguang Zhang, Gu Wang, Ruida Zhang, Xiangyang Ji

    Abstract: In robotic vision, a de-facto paradigm is to learn in simulated environments and then transfer to real-world applications, which poses an essential challenge in bridging the sim-to-real domain gap. While mainstream works tackle this problem in the RGB domain, we focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim). In particular, high-fidelity depth data i… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: accepted by ICRA'24

  48. arXiv:2404.03885  [pdf, ps, other

    cs.IT cs.DS eess.SP math.ST

    The ESPRIT algorithm under high noise: Optimal error scaling and noisy super-resolution

    Authors: Zhiyan Ding, Ethan N. Epperly, Lin Lin, Ruizhe Zhang

    Abstract: Subspace-based signal processing techniques, such as the Estimation of Signal Parameters via Rotational Invariant Techniques (ESPRIT) algorithm, are popular methods for spectral estimation. These algorithms can achieve the so-called super-resolution scaling under low noise conditions, surpassing the well-known Nyquist limit. However, the performance of these algorithms under high-noise conditions… ▽ More

    Submitted 22 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  49. arXiv:2404.03764  [pdf, other

    cs.LG stat.ME stat.ML

    CONCERT: Covariate-Elaborated Robust Local Information Transfer with Conditional Spike-and-Slab Prior

    Authors: Ruqian Zhang, Yijiao Zhang, Annie Qu, Zhongyi Zhu, Juan Shen

    Abstract: The popularity of transfer learning stems from the fact that it can borrow information from useful auxiliary datasets. Existing statistical transfer learning methods usually adopt a global similarity measure between the source data and the target data, which may lead to inefficiency when only local information is shared. In this paper, we propose a novel Bayesian transfer learning method named "CO… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 31 pages, 22 figures

  50. arXiv:2404.03653  [pdf, other

    cs.CV cs.AI cs.CL

    CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

    Authors: Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

    Abstract: Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffu… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://caraj7.github.io/comat