Skip to main content

Showing 1–50 of 1,145 results for author: Chen, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04061  [pdf, other

    cs.LG cs.AI

    Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications

    Authors: Mingfei Lu, Shujian Yu, Robert Jenssen, Badong Chen

    Abstract: Divergence measures play a central role in machine learning and become increasingly essential in deep learning. However, valid and computationally efficient divergence measures for multiple (more than two) distributions are scarcely investigated. This becomes particularly crucial in areas where the simultaneous management of multiple distributions is both unavoidable and essential. Examples includ… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  2. arXiv:2405.02945  [pdf, other

    cs.CV

    Invertible Residual Rescaling Models

    Authors: Jinmin Li, Tao Dai, Yaohua Zha, Yilu Luo, Longfei Lu, Bin Chen, Zhi Wang, Shu-Tao Xia, Jingyun Zhang

    Abstract: Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  3. arXiv:2405.02903  [pdf, other

    cs.CE cs.LG math.NA

    Predicting Open-Hole Laminates Failure Using Support Vector Machines With Classical and Quantum Kernels

    Authors: Giorgio Tosti Balducci, Boyang Chen, Matthias Möller, Marc Gerritsma, Roeland De Breuker

    Abstract: Modeling open hole failure of composites is a complex task, consisting in a highly nonlinear response with interacting failure modes. Numerical modeling of this phenomenon has traditionally been based on the finite element method, but requires to tradeoff between high fidelity and computational cost. To mitigate this shortcoming, recent work has leveraged machine learning to predict the strength o… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  4. arXiv:2405.00865  [pdf, other

    cs.CR

    Hiding Sensitive Information Using PDF Steganography

    Authors: Ryan Klemm, Bo Chen

    Abstract: The use of steganography to transmit secret data is becoming increasingly common in security products and malware today. Despite being extremely popular, PDF files are not often the focus of steganography research, as most applications utilize digital image, audio, and video files as their cover data. However, the PDF file format is promising for usage in medium-capacity steganography applications… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  5. arXiv:2404.19287  [pdf, other

    cs.CV

    Revisiting the Adversarial Robustness of Vision Language Models: a Multimodal Perspective

    Authors: Wanqi Zhou, Shuanghao Bai, Qibin Zhao, Badong Chen

    Abstract: Pretrained vision-language models (VLMs) like CLIP have shown impressive generalization performance across various downstream tasks, yet they remain vulnerable to adversarial attacks. While prior research has primarily concentrated on improving the adversarial robustness of image encoders to guard against attacks on images, the exploration of text-based and multimodal attacks has largely been over… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 16 pages, 14 figures

  6. arXiv:2404.19286  [pdf, other

    cs.CV

    Soft Prompt Generation for Domain Generalization

    Authors: Shuanghao Bai, Yuedi Zhang, Wanqi Zhou, Zhirong Luan, Badong Chen

    Abstract: Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt, which are not optimal for specific domains. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which acts as a learning vector that undergoes fine-tuning based on specific domain data. Prior prompt learning m… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 23 pages, 4 figures

  7. arXiv:2404.19040  [pdf, other

    cs.CV

    GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting

    Authors: Bo Chen, Shoukang Hu, Qi Chen, Chenpeng Du, Ran Yi, Yanmin Qian, Xie Chen

    Abstract: We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Ga… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  8. arXiv:2404.18895  [pdf, other

    cs.CV

    RSCaMa: Remote Sensing Image Change Captioning with State Space Model

    Authors: Chenyang Liu, Keyan Chen, Bowen Chen, Haotian Zhang, Zhengxia Zou, Zhenwei Shi

    Abstract: Remote Sensing Image Change Captioning (RSICC) aims to describe surface changes between multi-temporal remote sensing images in language, including the changed object categories, locations, and dynamics of changing objects (e.g., added or disappeared). This poses challenges to spatial and temporal modeling of bi-temporal features. Despite previous methods progressing in the spatial change percepti… ▽ More

    Submitted 2 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  9. arXiv:2404.18304  [pdf, other

    cs.IR cs.AI

    Retrieval-Oriented Knowledge for Click-Through Rate Prediction

    Authors: Huanshuo Liu, Bo Chen, Menghui Zhu, Jianghao Lin, Jiarui Qin, Yang Yang, Hao Zhang, Ruiming Tang

    Abstract: Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  10. arXiv:2404.16825  [pdf, other

    cs.CV eess.IV

    ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

    Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

    Abstract: With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2404.16710  [pdf, other

    cs.CL cs.AI cs.LG

    LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

    Authors: Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu

    Abstract: We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exi… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Code open sourcing is in progress

  12. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  13. arXiv:2404.15734  [pdf, other

    cs.CV

    Fine-grained Spatial-temporal MLP Architecture for Metro Origin-Destination Prediction

    Authors: Yang Liu, Binglin Chen, Yongsen Zheng, Guanbin Li, Liang Lin

    Abstract: Accurate prediction of metro traffic is crucial for optimizing metro scheduling and enhancing overall transport efficiency. Analyzing fine-grained and comprehensive relations among stations effectively is imperative for metro Origin-Destination (OD) prediction. However, existing metro OD models either mix information from multiple OD pairs from the station's perspective or exclusively focus on a s… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  14. arXiv:2404.15309  [pdf, other

    eess.SP cs.LG q-bio.NC

    Sparse Bayesian Correntropy Learning for Robust Muscle Activity Reconstruction from Noisy Brain Recordings

    Authors: Yuanhao Li, Badong Chen, Natsue Yoshimura, Yasuharu Koike, Okito Yamashita

    Abstract: Sparse Bayesian learning has promoted many effective frameworks for brain activity decoding, especially for the reconstruction of muscle activity. However, existing sparse Bayesian learning mainly employs Gaussian distribution as error assumption in the reconstruction task, which is not necessarily the truth in the real-world application. On the other hand, brain recording is known to be highly no… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  15. arXiv:2404.11912  [pdf, other

    cs.CL cs.LG

    TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

    Authors: Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen

    Abstract: With large language models (LLMs) widely deployed in long content generation recently, there has emerged an increasing demand for efficient long-sequence inference support. However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Due to the auto-regressive nature of LLMs, the entire KV cache w… ▽ More

    Submitted 22 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  16. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  17. arXiv:2404.10225  [pdf

    cs.SE cs.AI

    Rethinking Software Engineering in the Foundation Model Era: From Task-Driven AI Copilots to Goal-Driven AI Pair Programmers

    Authors: Ahmed E. Hassan, Gustavo A. Oliva, Dayi Lin, Boyuan Chen, Zhen Ming, Jiang

    Abstract: The advent of Foundation Models (FMs) and AI-powered copilots has transformed the landscape of software development, offering unprecedented code completion capabilities and enhancing developer productivity. However, the current task-driven nature of these copilots falls short in addressing the broader goals and complexities inherent in software engineering (SE). In this paper, we propose a paradig… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  18. arXiv:2404.09889  [pdf, other

    cs.IR cs.AI cs.CL

    Is Table Retrieval a Solved Problem? Join-Aware Multi-Table Retrieval

    Authors: Peter Baile Chen, Yi Zhang, Dan Roth

    Abstract: Retrieving relevant tables containing the necessary information to accurately answer a given question over tables is critical to open-domain question-answering (QA) systems. Previous methods assume the answer to such a question can be found either in a single table or multiple tables identified through question decomposition or rewriting. However, neither of these approaches is sufficient, as many… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  19. arXiv:2404.08801  [pdf, other

    cs.LG cs.CL

    Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

    Authors: Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

    Abstract: The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture for efficient sequence modeling with unlimited co… ▽ More

    Submitted 16 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures and 8 tables

  20. BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task Promoting

    Authors: Yuqing Cheng, Bo Chen, Fanjin Zhang, Jie Tang

    Abstract: From-scratch name disambiguation is an essential task for establishing a reliable foundation for academic platforms. It involves partitioning documents authored by identically named individuals into groups representing distinct real-life experts. Canonically, the process is divided into two decoupled tasks: locally estimating the pairwise similarities between documents followed by globally groupin… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: TheWebConf 2024 (WWW '24)

    ACM Class: H.3.7; H.3.3

    Journal ref: Proceedings of TheWebConf 2024 (WWW '24), May 13--17, 2024, Singapore

  21. arXiv:2404.07575  [pdf

    cs.SD cs.AI eess.AS

    An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

    Authors: Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

    Abstract: Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distri… ▽ More

    Submitted 11 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Findings

  22. arXiv:2404.06429  [pdf, other

    cs.CV cs.AI

    Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion

    Authors: Fan Yang, Jianfeng Zhang, Yichun Shi, Bowen Chen, Chenxu Zhang, Huichao Zhang, Xiaofeng Yang, Jiashi Feng, Guosheng Lin

    Abstract: Benefiting from the rapid development of 2D diffusion models, 3D content creation has made significant progress recently. One promising solution involves the fine-tuning of pre-trained 2D diffusion models to harness their capacity for producing multi-view images, which are then lifted into accurate 3D models via methods like fast-NeRFs or large reconstruction models. However, as inconsistency stil… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  23. arXiv:2404.04998  [pdf, other

    cs.CV cs.AI cs.IR

    Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval

    Authors: Jinpeng Wang, Bin Chen, Qiang Zhang, Zaiqiao Meng, Shangsong Liang, Shu-Tao Xia

    Abstract: Deep quantization methods have shown high efficiency on large-scale image retrieval. However, current models heavily rely on ground-truth information, hindering the application of quantization in label-hungry scenarios. A more realistic demand is to learn from inexhaustible uploaded images that are associated with informal tags provided by amateur users. Though such sketchy tags do not obviously r… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: In proceedings of AAAI 2021. Code and data are available

  24. arXiv:2404.04833  [pdf, other

    cs.CV

    ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model

    Authors: Binghui Chen, Wenyu Li, Yifeng Geng, Xuansong Xie, Wangmeng Zuo

    Abstract: With the development of the large-scale diffusion model, Artificial Intelligence Generated Content (AIGC) techniques are popular recently. However, how to truly make it serve our daily lives remains an open question. To this end, in this paper, we focus on employing AIGC techniques in one filed of E-commerce marketing, i.e., generating hyper-realistic advertising images for displaying user-specifi… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 16 pages

  25. arXiv:2404.04828  [pdf, other

    cs.CV

    Strictly-ID-Preserved and Controllable Accessory Advertising Image Generation

    Authors: Youze Xue, Binghui Chen, Yifeng Geng, Xuansong Xie, Jiansheng Chen, Hongbing Ma

    Abstract: Customized generative text-to-image models have the ability to produce images that closely resemble a given subject. However, in the context of generating advertising images for e-commerce scenarios, it is crucial that the generated subject's identity aligns perfectly with the product being advertised. In order to address the need for strictly-ID preserved advertising image generation, we have dev… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 22 pages

  26. arXiv:2404.03900  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Nonparametric Modern Hopfield Models

    Authors: Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu

    Abstract: We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Crucially, our framework not only recovers the known resul… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 59 pages; Code available at https://github.com/MAGICS-LAB/NonparametricHopfield

  27. arXiv:2404.03634  [pdf, other

    cs.RO cs.CV

    PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

    Authors: Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Yixin Zhu, Guyue Zhou, Hao Dong, Hao Zhao

    Abstract: Robotic manipulation of ungraspable objects with two-finger grippers presents significant challenges due to the paucity of graspable features, while traditional pre-grasping techniques, which rely on repositioning objects and leveraging external aids like table edges, lack the adaptability across object categories and scenes. Addressing this, we introduce PreAfford, a novel pre-grasping planning f… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://air-discover.github.io/PreAfford/

  28. arXiv:2404.03295  [pdf, other

    quant-ph cs.CR

    The power of a single Haar random state: constructing and separating quantum pseudorandomness

    Authors: Boyang Chen, Andrea Coladangelo, Or Sattath

    Abstract: In this work, we focus on the following question: what are the cryptographic implications of having access to an oracle that provides a single Haar random quantum state? We show, perhaps surprisingly, that such an oracle is sufficient to construct quantum pseudorandomness. Pseudorandom states (PRS) are a family of states for which it is hard to distinguish between polynomially many copies of eit… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  29. arXiv:2404.01365  [pdf, other

    cs.LG cs.AI cs.CL

    Prompt-prompted Mixture of Experts for Efficient LLM Generation

    Authors: Harry Dong, Beidi Chen, Yuejie Chi

    Abstract: With the development of transformer-based large language models (LLMs), they have been applied to many fields due to their remarkable utility, but this comes at a considerable computational cost at deployment. Fortunately, some methods such as pruning or constructing a mixture of experts (MoE) aim at exploiting sparsity in transformer feedforward (FF) blocks to gain boosts in speed and reduction i… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Revision 1: Updated abstract with code link; re-ran top-k + sampling rows in Table 4, conclusions unchanged

  30. arXiv:2404.00443  [pdf, ps, other

    cs.RO

    UDE-based Dynamic Motion Force Control of Mobile Manipulators

    Authors: Songqun Gao, Wendi Ding, Qinyuan Ren, Ben M. Chen

    Abstract: Mobile manipulators are known for their superior mobility over manipulators on fixed bases, offering promising applications in smart industry and housekeeping scenarios. However, the dynamic coupling nature between the mobile base and the manipulator presents challenges for the physical interactive tasks of the mobile manipulator. Current methods suffer from complex modeling processes and poor tra… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  31. arXiv:2404.00424  [pdf, other

    q-fin.MF cs.AI cs.CE

    From attention to profit: quantitative trading strategy based on transformer

    Authors: Zhaofeng Zhang, Banghao Chen, Shengxin Zhu, Nicolas Langrené

    Abstract: In traditional quantitative trading practice, navigating the complicated and dynamic financial market presents a persistent challenge. Former machine learning approaches have struggled to fully capture various market variables, often ignore long-term information and fail to catch up with essential signals that may lead the profit. This paper introduces an enhanced transformer architecture and desi… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    ACM Class: G.3; J.2

  32. arXiv:2403.20327  [pdf, other

    cs.CL cs.AI

    Gecko: Versatile Text Embeddings Distilled from Large Language Models

    Authors: Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

    Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 18 pages

  33. arXiv:2403.19980  [pdf, other

    cs.CV

    A Parallel Attention Network for Cattle Face Recognition

    Authors: Jiayu Li, Xuechao Zou, Shiying Wang, Ben Chen, Junliang Xing, Pin Tao

    Abstract: Cattle face recognition holds paramount significance in domains such as animal husbandry and behavioral research. Despite significant progress in confined environments, applying these accomplishments in wild settings remains challenging. Thus, we create the first large-scale cattle face recognition dataset, ICRWE, for wild environments. It encompasses 483 cattle and 9,816 high-resolution image sam… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME 2024

  34. arXiv:2403.19654  [pdf, other

    cs.CV

    RSMamba: Remote Sensing Image Classification with State Space Model

    Authors: Keyan Chen, Bowen Chen, Chenyang Liu, Wenyuan Li, Zhengxia Zou, Zhenwei Shi

    Abstract: Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of Convolutional Neural Networks (CNNs) and Transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity a… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  35. arXiv:2403.19213  [pdf, other

    cs.CV

    Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting

    Authors: Weihao Jiang, Zhaozhi Xie, Yuxiang Lu, Longjie Qi, Jingyong Cai, Hiroyuki Uchiyama, Bin Chen, Yue Ding, Hongtao Lu

    Abstract: Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes su… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  36. arXiv:2403.19079  [pdf, other

    cs.CV

    A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement

    Authors: Junjie Wen, Jinqiang Cui, Benyun Zhao, Bingxin Han, Xuchen Liu, Zhi Gao, Ben M. Chen

    Abstract: In recent years, significant progress has been made in the field of underwater image enhancement (UIE). However, its practical utility for high-level vision tasks, such as underwater object detection (UOD) in Autonomous Underwater Vehicles (AUVs), remains relatively unexplored. It may be attributed to several factors: (1) Existing methods typically employ UIE as a pre-processing step, which inevit… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: accepted by ICRA24

  37. arXiv:2403.19067  [pdf, other

    cs.CV

    Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

    Authors: Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

    Abstract: Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  38. arXiv:2403.18393  [pdf, other

    cs.LG

    Tensor-based Graph Learning with Consistency and Specificity for Multi-view Clustering

    Authors: Long Shi, Lei Cao, Yunshan Ye, Yu Zhao, Badong Chen

    Abstract: In the context of multi-view clustering, graph learning is recognized as a crucial technique, which generally involves constructing an adaptive neighbor graph based on probabilistic neighbors, and then learning a consensus graph to for clustering. However, they are confronted with two limitations. Firstly, they often rely on Euclidean distance to measure similarity when constructing the adaptive n… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  39. arXiv:2403.17645  [pdf

    cs.CL

    DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition

    Authors: Yi-Cheng Wang, Hsin-Wei Wang, Bi-Cheng Yan, Chi-Han Lin, Berlin Chen

    Abstract: End-to-end automatic speech recognition (E2E ASR) systems often suffer from mistranscription of domain-specific phrases, such as named entities, sometimes leading to catastrophic failures in downstream tasks. A family of fast and lightweight named entity correction (NEC) models for ASR have recently been proposed, which normally build on phonetic-level edit distance algorithms and have shown impre… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  40. arXiv:2403.17236  [pdf, other

    cs.LG

    Neural Image Compression with Quantization Rectifier

    Authors: Wei Luo, Bo Chen

    Abstract: Neural image compression has been shown to outperform traditional image codecs in terms of rate-distortion performance. However, quantization introduces errors in the compression process, which can degrade the quality of the compressed image. Existing approaches address the train-test mismatch problem incurred during quantization, the random impact of quantization on the expressiveness of image fe… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Published at International Conference on Machine Learning (ICML) Neural Compression Workshop 2023, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the authors

  41. arXiv:2403.17165  [pdf, other

    cs.HC

    Building an Open-Source Community to Enhance Autonomic Nervous System Signal Analysis: DBDP-Autonomic

    Authors: Jessilyn Dunn, Varun Mishra, Md Mobashir Hasan Shandhi, Hayoung Jeong, Natasha Yamane, Yuna Watanabe, Bill Chen, Matthew S. Goodwin

    Abstract: Smartphones and wearable sensors offer an unprecedented ability to collect peripheral psychophysiological signals across diverse timescales, settings, populations, and modalities. However, open-source software development has yet to keep pace with rapid advancements in hardware technology and availability, creating an analytical barrier that limits the scientific usefulness of acquired data. We pr… ▽ More

    Submitted 29 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  42. arXiv:2403.17006  [pdf, other

    cs.CV

    Invertible Diffusion Models for Compressed Sensing

    Authors: Bin Chen, Zhenyu Zhang, Weiqi Li, Chen Zhao, Jiwen Yu, Shijie Zhao, Jie Chen, Jian Zhang

    Abstract: While deep neural networks (NN) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch constrains their effectiveness and hampers rapid deployment. Although recent methods utilize pre-trained diffusion models for image reconstruction, they struggle with slow inference and restricted adaptability to CS. To tackl… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  43. arXiv:2403.16378  [pdf, other

    cs.IR

    Play to Your Strengths: Collaborative Intelligence of Conventional Recommender Models and Large Language Models

    Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Chuhan Wu, Bo Chen, Ruiming Tang, Weinan Zhang, Yong Yu

    Abstract: The rise of large language models (LLMs) has opened new opportunities in Recommender Systems (RSs) by enhancing user behavior modeling and content understanding. However, current approaches that integrate LLMs into RSs solely utilize either LLM or conventional recommender model (CRM) to generate final recommendations, without considering which data segments LLM or CRM excel in. To fill in this gap… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  44. arXiv:2403.16131  [pdf, other

    cs.CV

    Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

    Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen

    Abstract: DETR-like methods have significantly increased detection performance in an end-to-end manner. The mainstream two-stage frameworks of them perform dense self-attention and select a fraction of queries for sparse cross-attention, which is proven effective for improving performance but also introduces a heavy computational burden and high dependence on stable query selection. This paper demonstrates… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  45. arXiv:2403.16015  [pdf, other

    cs.RO

    MQE: Unleashing the Power of Interaction with Multi-agent Quadruped Environment

    Authors: Ziyan Xiong, Bo Chen, Shiyu Huang, Wei-Wei Tu, Zhaofeng He, Yang Gao

    Abstract: The advent of deep reinforcement learning (DRL) has significantly advanced the field of robotics, particularly in the control and coordination of quadruped robots. However, the complexity of real-world tasks often necessitates the deployment of multi-robot systems capable of sophisticated interaction and collaboration. To address this need, we introduce the Multi-agent Quadruped Environment (MQE),… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Open-source code is available at https://github.com/ziyanx02/multiagent-quadruped-environment

  46. arXiv:2403.15069  [pdf, other

    cs.AR

    Allspark: Workload Orchestration for Visual Transformers on Processing In-Memory Systems

    Authors: Mengke Ge, Junpeng Wang, Binhan Chen, Yingjian Zhong, Haitao Du, Song Chen, Yi Kang

    Abstract: The advent of Transformers has revolutionized computer vision, offering a powerful alternative to convolutional neural networks (CNNs), especially with the local attention mechanism that excels at capturing local structures within the input and achieve state-of-the-art performance. Processing in-memory (PIM) architecture offers extensive parallelism, low data movement costs, and scalable memory ba… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: The article is currently under review by IEEE Transactions on Computers, and has been submitted to HPCA'2024 and ISCA'2024

  47. arXiv:2403.14268  [pdf

    eess.AS cs.SD

    Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints

    Authors: PeiYing Lee, HauYun Guo, Berlin Chen

    Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Tra… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

    Report number: TAAI2023-Domestic-131

  48. arXiv:2403.13352  [pdf, other

    cs.CV

    AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation

    Authors: Jingkun An, Yinghao Zhu, Zongjian Li, Haoran Feng, Bohua Chen, Yemin Shi, Chengwei Pan

    Abstract: Text-to-Image (T2I) diffusion models have achieved remarkable success in image generation. Despite their progress, challenges remain in both prompt-following ability, image quality and lack of high-quality datasets, which are essential for refining these models. As acquiring labeled data is costly, we introduce AGFSync, a framework that enhances T2I diffusion models through Direct Preference Optim… ▽ More

    Submitted 3 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  49. arXiv:2403.13338  [pdf

    cs.CV

    Adaptive Critical Subgraph Mining for Cognitive Impairment Conversion Prediction with T1-MRI-based Brain Network

    Authors: Yilin Leng, Wenju Cui, Bai Chen, Xi Jiang, Shuangqing Chen, Jian Zheng

    Abstract: Prediction the conversion to early-stage dementia is critical for mitigating its progression but remains challenging due to subtle cognitive impairments and structural brain changes. Traditional T1-weighted magnetic resonance imaging (T1-MRI) research focus on identifying brain atrophy regions but often fails to address the intricate connectivity between them. This limitation underscores the neces… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 11 pages

  50. arXiv:2403.13208  [pdf, other

    cs.RO

    CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios using Real-World Trajectories

    Authors: Peide Huang, Wenhao Ding, Jonathan Francis, Bingqing Chen, Ding Zhao

    Abstract: Simulation is an indispensable tool in the development and testing of autonomous vehicles (AVs), offering an efficient and safe alternative to road testing by allowing the exploration of a wide range of scenarios. Despite its advantages, a significant challenge within simulation-based testing is the generation of safety-critical scenarios, which are essential to ensure that AVs can handle rare but… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.