Skip to main content

Showing 1–50 of 122 results for author: Jiang, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04964  [pdf, other

    cs.CV

    Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

    Authors: Yi Xiao, Qiangqiang Yuan, Kui Jiang, Yuzeng Chen, Qiang Zhang, Chia-Wen Lin

    Abstract: Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-sca… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution

  2. A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

    Authors: Kaiwen Jiang, Yang Fu, Mukund Varma T, Yash Belhe, Xiaolong Wang, Hao Su, Ravi Ramamoorthi

    Abstract: Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation.… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  3. arXiv:2405.02008  [pdf, other

    cs.CV

    DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model

    Authors: Peijin Jia, Tuopu Wen, Ziang Luo, Mengmeng Yang, Kun Jiang, Zhiquan Lei, Xuewei Tang, Ziyuan Liu, Le Cui, Kehua Sheng, Bo Zhang, Diange Yang

    Abstract: Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  4. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  5. MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

    Authors: Zheng Ning, Zheng Zhang, Jerrick Ban, Kaiwen Jiang, Ruohong Gan, Yapeng Tian, Toby Jia-Jun Li

    Abstract: Spatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized equipment and skills, posing a high barrier for amateur video creators. We present MIMOSA, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only mon… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  6. arXiv:2404.13342  [pdf, other

    cs.CV cs.LG

    Hyperspectral Anomaly Detection with Self-Supervised Anomaly Prior

    Authors: Yidan Liu, Weiying Xie, Kai Jiang, Jiaqing Zhang, Yunsong Li, Leyuan Fang

    Abstract: The majority of existing hyperspectral anomaly detection (HAD) methods use the low-rank representation (LRR) model to separate the background and anomaly components, where the anomaly component is optimized by handcrafted sparse priors (e.g., $\ell_{2,1}$-norm). However, this may not be ideal since they overlook the spatial structure present in anomalies and make the detection result largely depen… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  7. arXiv:2404.00992  [pdf, other

    cs.CV

    SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance

    Authors: Yuru Xiao, Xianming Liu, Deming Zhai, Kui Jiang, Junjun Jiang, Xiangyang Ji

    Abstract: Neural Radiance Field (NeRF) technology has made significant strides in creating novel viewpoints. However, its effectiveness is hampered when working with sparsely available views, often leading to performance dips due to overfitting. FreeNeRF attempts to overcome this limitation by integrating implicit geometry regularization, which incrementally improves both geometry and textures. Nonetheless,… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  8. arXiv:2404.00260  [pdf, other

    cs.CV eess.IV

    Exploiting Self-Supervised Constraints in Image Super-Resolution

    Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

    Abstract: Recent advances in self-supervised learning, predominantly studied in high-level visual tasks, have been explored in low-level image processing. This paper introduces a novel self-supervised constraint for single image super-resolution, termed SSC-SR. SSC-SR uniquely addresses the divergence in image complexity by employing a dual asymmetric paradigm and a target model updated via exponential movi… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: ICME 2024

  9. arXiv:2403.16336  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Predictive Inference in Multi-environment Scenarios

    Authors: John C. Duchi, Suyash Gupta, Kuanhao Jiang, Pragya Sur

    Abstract: We address the challenge of constructing valid confidence intervals and sets in problems of prediction across multiple environments. We investigate two types of coverage suitable for these problems, extending the jackknife and split-conformal methods to show how to obtain distribution-free coverage in such non-traditional, hierarchical data-generating scenarios. Our contributions also include exte… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  10. arXiv:2403.10883  [pdf, other

    cs.CV cs.CR cs.MM

    Improving Adversarial Transferability of Visual-Language Pre-training Models through Collaborative Multimodal Interaction

    Authors: Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Jiafeng Wang, Shuyong Gao, Wenqiang Zhang

    Abstract: Despite the substantial advancements in Vision-Language Pre-training (VLP) models, their susceptibility to adversarial attacks poses a significant challenge. Existing work rarely studies the transferability of attacks on VLP models, resulting in a substantial performance gap from white-box attacks. We observe that prior work overlooks the interaction mechanisms between modalities, which plays a cr… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  11. arXiv:2403.09634  [pdf, other

    cs.CV

    OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

    Authors: Lingyi Hong, Shilin Yan, Renrui Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kaixun Jiang, Yiting Chen, Jinglun Li, Zhaoyu Chen, Wenqiang Zhang

    Abstract: Visual object tracking aims to localize the target object of each frame based on its initial appearance in the first frame. Depending on the input modility, tracking tasks can be divided into RGB tracking and RGB+X (e.g. RGB+N, and RGB+D) tracking. Despite the different input modalities, the core aspect of tracking is the temporal matching. Based on this common ground, we present a general framewo… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  12. arXiv:2403.07939  [pdf, other

    cs.CV

    Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification

    Authors: Tingting Zheng, Kui Jiang, Hongxun Yao

    Abstract: Multi-Instance Learning (MIL) has shown impressive performance for histopathology whole slide image (WSI) analysis using bags or pseudo-bags. It involves instance sampling, feature representation, and decision-making. However, existing MIL-based technologies at least suffer from one or more of the following problems: 1) requiring high storage and intensive pre-processing for numerous instances (sa… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024;Project page:https://vilab.hit.edu.cn/projects/pamil

  13. arXiv:2402.18786  [pdf, other

    cs.CV

    OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition

    Authors: Yuchen Pan, Junjun Jiang, Kui Jiang, Zhihao Wu, Keyuan Yu, Xianming Liu

    Abstract: Depression Recognition (DR) poses a considerable challenge, especially in the context of the growing concerns surrounding privacy. Traditional automatic diagnosis of DR technology necessitates the use of facial images, undoubtedly expose the patient identity features and poses privacy risks. In order to mitigate the potential risks associated with the inappropriate disclosure of patient facial ima… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR 2024

  14. arXiv:2402.13393  [pdf, other

    cs.LG cs.CY

    Fairness Risks for Group-conditionally Missing Demographics

    Authors: Kaiqi Jiang, Wenzhe Fan, Mao Li, Xinhua Zhang

    Abstract: Fairness-aware classification models have gained increasing attention in recent years as concerns grow on discrimination against some demographic groups. Most existing models require full knowledge of the sensitive features, which can be impractical due to privacy, legal issues, and an individual's fear of discrimination. The key challenge we will address is the group dependency of the unavailabil… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  15. arXiv:2402.12319  [pdf, other

    cs.LG cs.AI cs.CY

    Dynamic Environment Responsive Online Meta-Learning with Fairness Awareness

    Authors: Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Feng Chen

    Abstract: The fairness-aware online learning framework has emerged as a potent tool within the context of continuous lifelong learning. In this scenario, the learner's objective is to progressively acquire new tasks as they arrive over time, while also guaranteeing statistical parity among various protected sub-populations, such as race and gender, when it comes to the newly introduced tasks. A significant… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted by TKDD, extended from KDD 2022. arXiv admin note: substantial text overlap with arXiv:2205.11264

  16. arXiv:2402.11826  [pdf, other

    cs.CV

    Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

    Authors: Jialei Xu, Xianming Liu, Junjun Jiang, Kui Jiang, Rui Li, Kai Cheng, Xiangyang Ji

    Abstract: Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods foc… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  17. arXiv:2402.07300  [pdf, other

    cs.HC cs.MM

    SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

    Authors: Zheng Ning, Brianna L. Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Jia-Jun Li

    Abstract: Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video… ▽ More

    Submitted 26 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  18. arXiv:2402.05569  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Hypergraph Node Classification With Graph Neural Networks

    Authors: Bohan Tang, Zexi Liu, Keyue Jiang, Siheng Chen, Xiaowen Dong

    Abstract: Hypergraphs, with hyperedges connecting more than two nodes, are key for modelling higher-order interactions in real-world data. The success of graph neural networks (GNNs) reveals the capability of neural networks to process data with pairwise interactions. This inspires the usage of neural networks for data with higher-order interactions, thereby leading to the development of hypergraph neural n… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  19. arXiv:2402.02656  [pdf, other

    cs.CL q-bio.QM

    RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews

    Authors: Satpreet Harcharan Singh, Kevin Jiang, Kanchan Bhasin, Ashutosh Sabharwal, Nidal Moukaddam, Ankit B Patel

    Abstract: Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for l… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  20. arXiv:2402.01874  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models

    Authors: Moschoula Pternea, Prerna Singh, Abir Chakraborty, Yagna Oruganti, Mirco Milletari, Sayli Bapat, Kebei Jiang

    Abstract: In this work, we review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs), two areas that owe their momentum to the development of deep neural networks. We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other. The first class, RL4LLM, includes studies where RL is leveraged to improve the performan… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 30 pages (including bibliography), 1 figure, 7 tables

  21. arXiv:2402.01220  [pdf, other

    cs.CV cs.CR

    Delving into Decision-based Black-box Attacks on Semantic Segmentation

    Authors: Zhaoyu Chen, Zhengyang Shan, Jingwen Chang, Kaixun Jiang, Dingkang Yang, Yiting Cheng, Wenqiang Zhang

    Abstract: Semantic segmentation is a fundamental visual task that finds extensive deployment in applications with security-sensitive considerations. Nonetheless, recent work illustrates the adversarial vulnerability of semantic segmentation models to white-box attacks. However, its adversarial robustness against black-box attacks has not been fully explored. In this paper, we present the first exploration o… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  22. arXiv:2402.00290  [pdf, other

    cs.CV

    MEIA: Towards Realistic Multimodal Interaction and Manipulation for Embodied Robots

    Authors: Yang Liu, Xinshuai Song, Kaixuan Jiang, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin

    Abstract: With the surge in the development of large language models, embodied intelligence has attracted increasing attention. Nevertheless, prior works on embodied intelligence typically encode scene or historical memory in an unimodal manner, either visual or linguistic, which complicates the alignment of the model's action planning with embodied control. To overcome this limitation, we introduce the Mul… ▽ More

    Submitted 26 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Codes will be available at https://github.com/HCPLab-SYSU/CausalVLR

  23. arXiv:2401.08687  [pdf, other

    cs.CV

    DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first dom… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  24. arXiv:2401.06969  [pdf, other

    cs.CV

    Domain Adaptation for Large-Vocabulary Object Detectors

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data. However, LVDs often struggle in recognizing the located objects due to domain discrepancy in data distribution and object vocabulary. At the other end, recent vision-language foundation models such as CL… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  25. arXiv:2401.04651  [pdf, other

    cs.CV

    Learning to Prompt Segment Anything Models

    Authors: Jiaxing Huang, Kai Jiang, Jingyi Zhang, Han Qiu, Lewei Lu, Shijian Lu, Eric Xing

    Abstract: Segment Anything Models (SAMs) like SEEM and SAM have demonstrated great potential in learning to segment anything. The core design of SAMs lies with Promptable Segmentation, which takes a handcrafted prompt as input and returns the expected segmentation mask. SAMs work with two types of prompts including spatial prompts (e.g., points) and semantic prompts (e.g., texts), which work together to pro… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  26. arXiv:2401.03182  [pdf, other

    cs.CV

    Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

    Authors: Jiaqing Zhang, Jie Lei, Weiying Xie, Kai Jiang, Mingxiang Cao, Yunsong Li

    Abstract: Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the nov… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  27. arXiv:2312.16602  [pdf, other

    cs.CV

    Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

    Authors: Jiaxing Huang, Jingyi Zhang, Kai Jiang, Han Qiu, Shijian Lu

    Abstract: Traditional computer vision generally solves each single task independently by a dedicated model with the task instruction implicitly designed in the model architecture, arising two limitations: (1) it leads to task-specific models, which require multiple models for different tasks and restrict the potential synergies from diverse tasks; (2) it leads to a pre-defined and fixed model interface that… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  28. arXiv:2312.16455  [pdf, other

    eess.IV cs.CV cs.LG

    Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer

    Authors: Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Kaiyuan Jiang, Zhengmi Tang, Shinichiro Omachi

    Abstract: Background and objective: High-resolution radiographic images play a pivotal role in the early diagnosis and treatment of skeletal muscle-related diseases. It is promising to enhance image quality by introducing single-image super-resolution (SISR) model into the radiology image field. However, the conventional image pipeline, which can learn a mixed mapping between SR and denoising from the color… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: Accepted by Computer Methods and Programs in Biomedicine

  29. arXiv:2312.16248  [pdf, other

    cs.LG cs.AI cs.DL

    XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library

    Authors: Wenzhang Liu, Wenzhe Cai, Kun Jiang, Guangran Cheng, Yuanda Wang, Jiawei Wang, Jingyu Cao, Lele Xu, Chaoxu Mu, Changyin Sun

    Abstract: In this paper, we present XuanCe, a comprehensive and unified deep reinforcement learning (DRL) library designed to be compatible with PyTorch, TensorFlow, and MindSpore. XuanCe offers a wide range of functionalities, including over 40 classical DRL and multi-agent DRL algorithms, with the flexibility to easily incorporate new algorithms and environments. It is a versatile DRL library that support… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 16 pages, 4 figures, 32 conferences

  30. arXiv:2312.16108  [pdf, other

    cs.CV

    LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving

    Authors: Tianyu Li, Peijin Jia, Bangjun Wang, Li Chen, Kun Jiang, Junchi Yan, Hongyang Li

    Abstract: A map, as crucial information for downstream applications of an autonomous driving system, is usually represented in lanelines or centerlines. However, existing literature on map learning primarily focuses on either detecting geometry-based lanelines or perceiving topology relationships of centerlines. Both of these methods ignore the intrinsic relationship of lanelines and centerlines, that lanel… ▽ More

    Submitted 26 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted in ICLR 2024

  31. arXiv:2312.11084  [pdf, other

    cs.RO cs.MA

    Multi-Agent Reinforcement Learning for Connected and Automated Vehicles Control: Recent Advancements and Future Prospects

    Authors: Min Hua, Dong Chen, Xinda Qi, Kun Jiang, Zemin Eitan Liu, Quan Zhou, Hongming Xu

    Abstract: Connected and automated vehicles (CAVs) have emerged as a potential solution to the future challenges of developing safe, efficient, and eco-friendly transportation systems. However, CAV control presents significant challenges, given the complexity of interconnectivity and coordination required among the vehicles. To address this, multi-agent reinforcement learning (MARL), with its notable advance… ▽ More

    Submitted 16 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  32. arXiv:2312.06999  [pdf, other

    cs.CV

    DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement

    Authors: Jingchun Zhou, Zongxin He, Qiuping Jiang, Kui Jiang, Xianping Fu, Xuelong Li

    Abstract: Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. To solve this issue, previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features, limiting the generalization and adaptability of the model. Previous methods use the reference gradient… ▽ More

    Submitted 8 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  33. arXiv:2311.15643  [pdf, other

    cs.RO

    A Survey on Monocular Re-Localization: From the Perspective of Scene Map Representation

    Authors: Jinyu Miao, Kun Jiang, Tuopu Wen, Yunlong Wang, Peijing Jia, Xuhe Zhao, Qian Cheng, Zhongyang Xiao, Jin Huang, Zhihua Zhong, Diange Yang

    Abstract: Monocular Re-Localization (MRL) is a critical component in autonomous applications, estimating 6 degree-of-freedom ego poses w.r.t. the scene map based on monocular images. In recent decades, significant progress has been made in the development of MRL techniques. Numerous algorithms have accomplished extraordinary success in terms of localization accuracy and robustness. In MRL, scene maps are re… ▽ More

    Submitted 12 January, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 33 pages, 10 tables, 16 figures, under review

  34. arXiv:2311.13816  [pdf, other

    cs.LG cs.AI cs.CY

    Fairness-Aware Domain Generalization under Covariate and Dependence Shifts

    Authors: Chen Zhao, Kai Jiang, Xintao Wu, Haoliang Wang, Latifur Khan, Christan Grant, Feng Chen

    Abstract: Achieving the generalization of an invariant classifier from source domains to shifted target domains while simultaneously considering model fairness is a substantial and complex challenge in machine learning. Existing domain generalization research typically attributes domain shifts to concept shift, which relates to alterations in class labels, and covariate shift, which pertains to variations i… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  35. arXiv:2311.05147  [pdf

    cs.CV

    Dynamic Association Learning of Self-Attention and Convolution in Image Restoration

    Authors: Kui Jiang, Xuemei Jia, Wenxin Huang, Wenbin Wang, Zheng Wang, Junjun Jiang

    Abstract: CNNs and Self attention have achieved great success in multimedia applications for dynamic association learning of self-attention and convolution in image restoration. However, CNNs have at least two shortcomings: 1) limited receptive field; 2) static weight of sliding window at inference, unable to cope with the content diversity.In view of the advantages and disadvantages of CNNs and Self attent… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: in Chinese language, Journal of Image and Graphics. arXiv admin note: substantial text overlap with arXiv:2207.10455

  36. arXiv:2310.19288  [pdf, other

    eess.IV cs.CV

    EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

    Authors: Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Xianyu Jin, Liangpei Zhang

    Abstract: Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Submitted to IEEE TGRS

  37. arXiv:2310.12017  [pdf, other

    cs.CV cs.CY

    Exploring Decision-based Black-box Attacks on Face Forgery Detection

    Authors: Zhaoyu Chen, Bo Li, Kaixun Jiang, Shuang Wu, Shouhong Ding, Wenqiang Zhang

    Abstract: Face forgery generation technologies generate vivid faces, which have raised public concerns about security and privacy. Many intelligent systems, such as electronic payment and identity verification, rely on face forgery detection. Although face forgery detection has successfully distinguished fake faces, recent studies have demonstrated that face forgery detectors are very vulnerable to adversar… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  38. arXiv:2310.09122  [pdf

    cs.CV

    Equirectangular image construction method for standard CNNs for Semantic Segmentation

    Authors: Haoqian Chen, Jian Liu, Minghe Li, Kaiwen Jiang, Ziheng Xu, Rencheng Sun, Yi Sui

    Abstract: 360° spherical images have advantages of wide view field, and are typically projected on a planar plane for processing, which is known as equirectangular image. The object shape in equirectangular images can be distorted and lack translation invariance. In addition, there are few publicly dataset of equirectangular images with labels, which presents a challenge for standard CNNs models to process… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  39. arXiv:2309.10209  [pdf, other

    cs.AI

    Towards Effective Semantic OOD Detection in Unseen Domains: A Domain Generalization Perspective

    Authors: Haoliang Wang, Chen Zhao, Yunhui Guo, Kai Jiang, Feng Chen

    Abstract: Two prevalent types of distributional shifts in machine learning are the covariate shift (as observed across different domains) and the semantic shift (as seen across different classes). Traditional OOD detection techniques typically address only one of these shifts. However, real-world testing environments often present a combination of both covariate and semantic shifts. In this study, we introd… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  40. arXiv:2309.06023  [pdf, other

    cs.CV

    Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration

    Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

    Abstract: Contrastive learning has emerged as a prevailing paradigm for high-level vision tasks, which, by introducing properly negative samples, has also been exploited for low-level vision tasks to achieve a compact optimization space to account for their ill-posed nature. However, existing methods rely on manually predefined and task-oriented negatives, which often exhibit pronounced task-specific biases… ▽ More

    Submitted 25 January, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: Camera Ready Version. Accepted to The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  41. arXiv:2308.06235  [pdf, other

    cs.CL

    KETM:A Knowledge-Enhanced Text Matching method

    Authors: Kexin Jiang, Yahui Zhao, Guozhe Jin, Zhenguo Zhang, Rongyi Cui

    Abstract: Text matching is the task of matching two texts and determining the relationship between them, which has extensive applications in natural language processing tasks such as reading comprehension, and Question-Answering systems. The mainstream approach is to compute text representations or to interact with the text through attention mechanism, which is effective in text matching tasks. However, the… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted to IJCNN 2023

  42. arXiv:2307.16783  [pdf, other

    cs.CV

    From Generation to Suppression: Towards Effective Irregular Glow Removal for Nighttime Visibility Enhancement

    Authors: Wanyu Wu, Wei Wang, Zheng Wang, Kui Jiang, Xin Xu

    Abstract: Most existing Low-Light Image Enhancement (LLIE) methods are primarily designed to improve brightness in dark regions, which suffer from severe degradation in nighttime images. However, these methods have limited exploration in another major visibility damage, the glow effects in real night scenes. Glow effects are inevitable in the presence of artificial light sources and cause further diffused b… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: IJCAI2023

  43. Fully $1\times1$ Convolutional Network for Lightweight Image Super-Resolution

    Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

    Abstract: Deep models have achieved significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3\times3$ or more). However, the heavy computational footprint of such models prevents their deployment in real-time, resource-constrained environments. Conversely, $1\times1$ convolutions bring substantial computational efficiency, but struggle with aggregati… ▽ More

    Submitted 12 March, 2024; v1 submitted 30 July, 2023; originally announced July 2023.

    Comments: Accepted by Machine Intelligence Research, DOI: 10.1007/s11633-024-1401-z

  44. arXiv:2306.10577  [pdf, other

    cs.LG stat.ML

    OpenDataVal: a Unified Benchmark for Data Valuation

    Authors: Kevin Fu Jiang, Weixin Liang, James Zou, Yongchan Kwon

    Abstract: Assessing the quality and impact of individual data points is critical for improving model performance and mitigating undesirable biases within the training dataset. Several data valuation algorithms have been proposed to quantify data quality, however, there lacks a systemic and standardized benchmarking system for data valuation. In this paper, we introduce OpenDataVal, an easy-to-use and unifie… ▽ More

    Submitted 13 October, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: 25 pages, NeurIPS 2023 Track on Datasets and Benchmarks

  45. arXiv:2306.01007  [pdf, other

    cs.LG cs.AI

    Towards Fair Disentangled Online Learning for Changing Environments

    Authors: Chen Zhao, Feng Mi, Xintao Wu, Kai Jiang, Latifur Khan, Christan Grant, Feng Chen

    Abstract: In the problem of online learning for changing environments, data are sequentially received one after another over time, and their distribution assumptions may vary frequently. Although existing methods demonstrate the effectiveness of their learning algorithms by providing a tight bound on either dynamic regret or adaptive regret, most of them completely ignore learning with model fairness, defin… ▽ More

    Submitted 16 July, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: Accepted by KDD 2023

  46. arXiv:2305.10665  [pdf, other

    cs.CV cs.AI cs.CR

    Content-based Unrestricted Adversarial Attack

    Authors: Zhaoyu Chen, Bo Li, Shuang Wu, Kaixun Jiang, Shouhong Ding, Wenqiang Zhang

    Abstract: Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic, demonstrating their ability to deceive human perception and deep neural networks with stealth and success. However, current works usually sacrifice unrestricted degrees and subjectively select some image content t… ▽ More

    Submitted 28 November, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  47. arXiv:2305.07497  [pdf, other

    cs.RO cs.AI

    Dynamically Conservative Self-Driving Planner for Long-Tail Cases

    Authors: Weitao Zhou, Zhong Cao, Nanshan Deng, Xiaoyu Liu, Kun Jiang, Diange Yang

    Abstract: Self-driving vehicles (SDVs) are becoming reality but still suffer from "long-tail" challenges during natural driving: the SDVs will continually encounter rare, safety-critical cases that may not be included in the dataset they were trained. Some safety-assurance planners solve this problem by being conservative in all possible cases, which may significantly affect driving mobility. To this end, t… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  48. arXiv:2305.07487  [pdf, other

    cs.AI cs.LG cs.RO

    Identify, Estimate and Bound the Uncertainty of Reinforcement Learning for Autonomous Driving

    Authors: Weitao Zhou, Zhong Cao, Nanshan Deng, Kun Jiang, Diange Yang

    Abstract: Deep reinforcement learning (DRL) has emerged as a promising approach for developing more intelligent autonomous vehicles (AVs). A typical DRL application on AVs is to train a neural network-based driving policy. However, the black-box nature of neural networks can result in unpredictable decision failures, making such AVs unreliable. To this end, this work proposes a method to identify and protec… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  49. arXiv:2305.04298  [pdf, other

    cs.RO cs.CV

    Poses as Queries: Image-to-LiDAR Map Localization with Transformers

    Authors: Jinyu Miao, Kun Jiang, Yunlong Wang, Tuopu Wen, Zhongyang Xiao, Zheng Fu, Mengmeng Yang, Maolin Liu, Diange Yang

    Abstract: High-precision vehicle localization with commercial setups is a crucial technique for high-level autonomous driving tasks. Localization with a monocular camera in LiDAR map is a newly emerged approach that achieves promising balance between cost and accuracy, but estimating pose by finding correspondences between such cross-modal sensor data is challenging, thereby damaging the localization accura… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

    Comments: 8 pages, 3 figures, 4 tables

  50. Local-Global Temporal Difference Learning for Satellite Video Super-Resolution

    Authors: Yi Xiao, Qiangqiang Yuan, Kui Jiang, Xianyu Jin, Jiang He, Liangpei Zhang, Chia-wen Lin

    Abstract: Optical-flow-based and kernel-based approaches have been extensively explored for temporal compensation in satellite Video Super-Resolution (VSR). However, these techniques are less generalized in large-scale or complex scenarios, especially in satellite videos. In this paper, we propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully uti… ▽ More

    Submitted 30 October, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE TCSVT

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2023