Skip to main content

Showing 1–50 of 306 results for author: Xie, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.03913  [pdf, other

    q-bio.QM cs.LG stat.ML

    Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process

    Authors: Fuqiang Cheng, Wei Xie, Hua Zheng

    Abstract: Biomanufacturing innovation relies on an efficient design of experiments (DoE) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach that can guide a sequential DoEs for digital twin model calibrat… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 12 pages, 5 figures

  3. arXiv:2405.03101  [pdf, ps, other

    cs.IT

    Double Self-Sustainable Reconfigurable Intelligent Surfaces Aided Wireless Communications

    Authors: Ji Wang, Suhong Luo, Yixuan Li, Wenwu Xie, Xingwang Li, Arumugam Nallanathan

    Abstract: A double self-sustainable reconfigurable intelligent surfaces (RISs) assisted multi-user multiple input multiple output (MIMO) system is investigated. Two RISs are equipped with energy harvesting circuit to achieve self-sustainable transmission. The aim is to minimize the transmission power at the base station (BS), while guaranteeing the quality of service (QoS) requirements of the users and meet… ▽ More

    Submitted 7 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  4. arXiv:2405.02783  [pdf, other

    stat.ML cs.LG

    Linear Noise Approximation Assisted Bayesian Inference on Mechanistic Model of Partially Observed Stochastic Reaction Network

    Authors: Wandi Xu, Wei Xie

    Abstract: To support mechanism online learning and facilitate digital twin development for biomanufacturing processes, this paper develops an efficient Bayesian inference approach for partially observed enzymatic stochastic reaction network (SRN), a fundamental building block of multi-scale bioprocess mechanistic model. To tackle the critical challenges brought by the nonlinear stochastic differential equat… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 11 pages, 2 figures

  5. arXiv:2404.17774  [pdf, other

    cs.CV cs.GR

    High-quality Surface Reconstruction using Gaussian Surfels

    Authors: Pinxuan Dai, Jiamin Xu, Wenxiang Xie, Xinguo Liu, Huamin Wang, Weiwei Xu

    Abstract: We propose a novel point-based representation, Gaussian surfels, to combine the advantages of the flexible optimization procedure in 3D Gaussian points and the surface alignment property of surfels. This is achieved by directly setting the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D ellipsoid into a 2D ellipse. Such a design provides clear guidance to the optimizer.… ▽ More

    Submitted 29 April, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Results added and improved

  6. arXiv:2404.16828  [pdf, other

    cs.CV cs.LG

    Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

    Authors: Charig Yang, Weidi Xie, Andrew Zisserman

    Abstract: Our objective is to discover and localize monotonic temporal changes in a sequence of images. To achieve this, we exploit a simple proxy task of ordering a shuffled image sequence, with `time' serving as a supervisory signal since only changes that are monotonic with time can give rise to the correct ordering. We also introduce a flexible transformer-based model for general-purpose ordering of ima… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Project page: https://charigyang.github.io/order/

  7. arXiv:2404.16754  [pdf, other

    cs.CV

    RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

    Authors: Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In thi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.14412  [pdf, other

    cs.CV

    AutoAD III: The Prequel -- Back to the Pixels

    Authors: Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

    Abstract: Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is hampered by using performance measures not specialized to the AD domain. In this paper, we make three c… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: CVPR2024. Project page: https://www.robots.ox.ac.uk/~vgg/research/autoad/

  9. arXiv:2404.13342  [pdf, other

    cs.CV cs.LG

    Hyperspectral Anomaly Detection with Self-Supervised Anomaly Prior

    Authors: Yidan Liu, Weiying Xie, Kai Jiang, Jiaqing Zhang, Yunsong Li, Leyuan Fang

    Abstract: The majority of existing hyperspectral anomaly detection (HAD) methods use the low-rank representation (LRR) model to separate the background and anomaly components, where the anomaly component is optimized by handcrafted sparse priors (e.g., $\ell_{2,1}$-norm). However, this may not be ideal since they overlook the spatial structure present in anomalies and make the detection result largely depen… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  10. arXiv:2404.12389  [pdf, other

    cs.CV

    Moving Object Segmentation: All You Need Is SAM (and Flow)

    Authors: Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman

    Abstract: The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video. This is a much studied area with numerous careful,and sometimes complex, approaches and training schemes including: self-supervised learning, learning from synthetic datasets, object-centric representations, amodal representations, and many more. Our interest in this paper is to determin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Project Page: https://www.robots.ox.ac.uk/~vgg/research/flowsam/

  11. arXiv:2404.10556  [pdf, other

    cs.NI eess.SP

    Generative AI for Advanced UAV Networking

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Hongyang Du, Jiawen Kang, Jing Wu, Sumei Sun, Ping Zhang

    Abstract: With the impressive achievements of chatGPT and Sora, generative artificial intelligence (GAI) has received increasing attention. Not limited to the field of content generation, GAI is also widely used to solve the problems in wireless communication scenarios due to its powerful learning and generalization capabilities. Therefore, we discuss key applications of GAI in improving unmanned aerial veh… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  12. arXiv:2404.09942  [pdf, other

    cs.CV

    Knowledge-enhanced Visual-Language Pretraining for Computational Pathology

    Authors: Xiao Zhou, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Weidi Xie, Yanfeng Wang

    Abstract: In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology. Specifically, we make the following contributions: (i) We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  13. arXiv:2404.08926  [pdf, other

    cs.CV

    Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives

    Authors: Yidan Liu, Jun Yue, Shaobo Xia, Pedram Ghamisi, Weiying Xie, Leyuan Fang

    Abstract: As a newly emerging advance in deep generative models, diffusion models have achieved state-of-the-art results in many fields, including computer vision, natural language processing, and molecule design. The remote sensing community has also noticed the powerful ability of diffusion models and quickly applied them to a variety of tasks for image processing. Given the rapid increase in research on… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  14. arXiv:2404.06443  [pdf, other

    cs.CV

    Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

    Authors: Zihan Wang, Siyang Song, Cheng Luo, Songhe Deng, Weicheng Xie, Linlin Shen

    Abstract: Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper propos… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  15. arXiv:2403.18762  [pdf, other

    cs.CV cs.AI cs.RO

    ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

    Authors: Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen

    Abstract: Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 8 pages, 11 figures, conference

  16. arXiv:2403.15027  [pdf, other

    cs.LG cs.AI

    Grey-informed neural network for time-series forecasting

    Authors: Wanli Xie, Ruibin Zhao, Zhenguo Xu, Tingting Liang

    Abstract: Neural network models have shown outstanding performance and successful resolutions to complex problems in various fields. However, the majority of these models are viewed as black-box, requiring a significant amount of data for development. Consequently, in situations with limited data, constructing appropriate models becomes challenging due to the lack of transparency and scarcity of data. To ta… ▽ More

    Submitted 3 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  17. arXiv:2403.11558  [pdf, other

    cs.CL cs.AI

    Reinforcement Learning with Token-level Feedback for Controllable Text Generation

    Authors: Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng

    Abstract: To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 Findings

  18. arXiv:2403.09323  [pdf, other

    cs.CV

    EfficientMFD: Towards More Efficient Multimodal Synchronous Fusion Detection

    Authors: Jiaqing Zhang, Mingxiang Cao, Xue Yang, Weiying Xie, Jie Lei, Daixun Li, Geng Yang, Wenbo Huang, Yunsong Li

    Abstract: Multimodal image fusion and object detection play a vital role in autonomous driving. Current joint learning methods have made significant progress in the multimodal fusion detection task combining the texture detail and objective semantic information. However, the tedious training steps have limited its applications to wider real-world industrial deployment. To address this limitation, we propose… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  19. arXiv:2403.07832  [pdf, other

    cs.RO

    DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies

    Authors: William Xie, Jensen Lavering, Nikolaus Correll

    Abstract: Large language models (LLMs) can provide rich physical descriptions of most worldly objects, allowing robots to achieve more informed and capable grasping. We leverage LLMs' common sense physical reasoning and code-writing abilities to infer an object's physical characteristics--mass $m$, friction coefficient $μ$, and spring constant $k$--from a semantic description, and then translate those chara… ▽ More

    Submitted 30 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  20. arXiv:2403.04697  [pdf, other

    cs.CV cs.AI

    AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

    Authors: Kaishen Yuan, Zitong Yu, Xin Liu, Weicheng Xie, Huanjing Yue, Jingyu Yang

    Abstract: Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a prom… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 19 pages, 6 figures

  21. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  22. arXiv:2403.00841  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Offline Fictitious Self-Play for Competitive Games

    Authors: Jingxiao Chen, Weiji Xie, Weinan Zhang, Yong yu, Ying Wen

    Abstract: Offline Reinforcement Learning (RL) has received significant interest due to its ability to improve policies in previously collected datasets without online interactions. Despite its success in the single-agent setting, offline multi-agent RL remains a challenge, especially in competitive games. Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a m… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  23. arXiv:2402.15690  [pdf, other

    cs.CL cs.AI

    Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology

    Authors: Zhenhua Wang, Wei Xie, Baosheng Wang, Enze Wang, Zhiwen Gui, Shuoyoucheng Ma, Kai Chen

    Abstract: Large Language Models (LLMs) have gradually become the gateway for people to acquire new knowledge. However, attackers can break the model's security protection ("jail") to access restricted information, which is called "jailbreaking." Previous studies have shown the weakness of current LLMs when confronted with such jailbreaking attacks. Nevertheless, comprehension of the intrinsic decision-makin… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  24. arXiv:2402.13963  [pdf, other

    cs.CL

    Towards Building Multilingual Language Model for Medicine

    Authors: Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this paper, we aim to develop an open-source, multilingual language model for medicine, that the benefits a wider, linguistically diverse audience from different regions. In general, we present the contribution from the following aspects: first, for multilingual medical-specific adaptation, we construct a new multilingual medical corpus, that contains approximately 25.5B tokens encompassing 6 m… ▽ More

    Submitted 26 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  25. arXiv:2402.13088  [pdf, other

    cs.CV

    Slot-VLM: SlowFast Slots for Video-Language Modeling

    Authors: Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

    Abstract: Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a novel framework designed to generate semantically decomposed video token… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 16 pages, 10 figures

  26. arXiv:2402.05937  [pdf, other

    cs.CV

    InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

    Authors: Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma

    Abstract: In this paper, we present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated image… ▽ More

    Submitted 8 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: CVPR2024

  27. arXiv:2402.03951  [pdf, other

    cs.CV cs.AI

    Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping

    Authors: Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song

    Abstract: Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propos… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: AAAI 2024

  28. arXiv:2402.00740  [pdf, other

    cs.CV

    DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras

    Authors: Weixing Xie, Xiao Dong, Yong Yang, Qiqin Lin, Jingze Chen, Junfeng Yao, Xiaohu Guo

    Abstract: With the popularity of monocular videos generated by video sharing and live broadcasting applications, reconstructing and editing dynamic scenes in stationary monocular cameras has become a special but anticipated technology. In contrast to scene reconstructions that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  29. arXiv:2401.16423  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Synchformer: Efficient Synchronization from Sparse Cues

    Authors: Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

    Abstract: Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Extended version of the ICASSP 24 paper. Project page: https://www.robots.ox.ac.uk/~vgg/research/synchformer/ Code: https://github.com/v-iashin/Synchformer

  30. arXiv:2401.11141  [pdf, other

    cs.IT eess.SP

    Wideband Beamforming for RIS Assisted Near-Field Communications

    Authors: Ji Wang, Jian Xiao, Yixuan Zou, Wenwu Xie, Yuanwei Liu

    Abstract: A near-field wideband beamforming scheme is investigated for reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) systems, in which a deep learning-based end-to-end (E2E) optimization framework is proposed to maximize the system spectral efficiency. To deal with the near-field double beam split effect, the base station is equipped with frequency-dependent hybrid… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  31. arXiv:2401.08695  [pdf, other

    cs.AI cs.CV cs.HC

    Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence

    Authors: Zhengqing Fang, Shuowen Zhou, Zhouhang Yuan, Yuxuan Si, Mengze Li, Jinxu Li, Yesheng Xu, Wenjia Xie, Kun Kuang, Yingming Li, Fei Wu, Yu-Feng Yao

    Abstract: Although data-driven artificial intelligence (AI) in medical image diagnosis has shown impressive performance in silico, the lack of interpretability makes it difficult to incorporate the "black box" into clinicians' workflows. To make the diagnostic patterns learned from data understandable by clinicians, we develop an interpretable model, knowledge-guided diagnosis model (KGDM), that provides a… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: 33 pages

  32. arXiv:2401.08687  [pdf, other

    cs.CV

    DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first dom… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  33. arXiv:2401.06969  [pdf, other

    cs.CV

    Domain Adaptation for Large-Vocabulary Object Detectors

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data. However, LVDs often struggle in recognizing the located objects due to domain discrepancy in data distribution and object vocabulary. At the other end, recent vision-language foundation models such as CL… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  34. arXiv:2401.05093  [pdf, other

    cs.CV

    SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

    Authors: Jiayuan Tian, Jie Lei, Jiaqing Zhang, Weiying Xie, Yunsong Li

    Abstract: With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL o… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  35. arXiv:2401.03182  [pdf, other

    cs.CV

    Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image

    Authors: Jiaqing Zhang, Jie Lei, Weiying Xie, Kai Jiang, Mingxiang Cao, Yunsong Li

    Abstract: Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the nov… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  36. arXiv:2401.03179  [pdf, other

    cs.CV

    Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification

    Authors: Jiaqing Zhang, Jie Lei, Weiying Xie, Geng Yang, Daixun Li, Yunsong Li

    Abstract: In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundanc… ▽ More

    Submitted 23 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

  37. arXiv:2401.02433  [pdf, other

    cs.CV cs.AI cs.LG

    FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients

    Authors: DaiXun Li, Weiying Xie, ZiXuan Wang, YiBing Lu, Yunsong Li, Leyuan Fang

    Abstract: With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the… ▽ More

    Submitted 15 November, 2023; originally announced January 2024.

  38. arXiv:2401.02309  [pdf, other

    cs.CV cs.MM

    TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

    Authors: Hao Sun, Mingyao Zhou, Wenjing Chen, Wei Xie

    Abstract: Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature ext… ▽ More

    Submitted 4 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI-24

  39. arXiv:2401.02212  [pdf, other

    cs.CL cs.AI

    Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph

    Authors: Rikui Huang, Wei Wei, Xiaoye Qu, Wenfeng Xie, Xianling Mao, Dangyang Chen

    Abstract: Temporal Knowledge Graph (TKG) is an extension of regular knowledge graph by attaching the time scope. Existing temporal knowledge graph question answering (TKGQA) models solely approach simple questions, owing to the prior assumption that each question only contains a single temporal fact with explicit/implicit temporal constraints. Hence, they perform poorly on questions which own multiple tempo… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  40. arXiv:2401.01093  [pdf, other

    cs.CV

    Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector

    Authors: Jitao Ma, Weiying Xie, Yunsong Li

    Abstract: Hyperspectral anomaly detection (HAD) aims to localize pixel points whose spectral features differ from the background. HAD is essential in scenarios of unknown or camouflaged target features, such as water quality monitoring, crop growth monitoring and camouflaged target detection, where prior information of targets is difficult to obtain. Existing HAD methods aim to objectively detect and distin… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  41. arXiv:2401.00789  [pdf, other

    cs.CV

    Retrieval-Augmented Egocentric Video Captioning

    Authors: Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie

    Abstract: Understanding human actions from videos of first-person view poses significant challenges. Most prior approaches explore representation learning on egocentric videos only, while overlooking the potential benefit of exploiting existing large-scale third-person videos. In this paper, (1) we develop EgoInstructor, a retrieval-augmented multimodal captioning model that automatically retrieves semantic… ▽ More

    Submitted 3 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  42. arXiv:2312.17530  [pdf, other

    cs.CV

    RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

    Authors: Weiying Xie, Zixuan Wang, Jitao Ma, Daixun Li, Yunsong Li

    Abstract: Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification ha… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

  43. arXiv:2312.17267  [pdf, other

    cs.CL cs.AI

    Improving Low-resource Prompt-based Relation Representation with Multi-view Decoupling Learning

    Authors: Chenghao Fan, Wei Wei, Xiaoye Qu, Zhenyi Lu, Wenfeng Xie, Yu Cheng, Dangyang Chen

    Abstract: Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks. However, in low-resource scenarios, where the available training data is scarce, previous prompt-based methods may still perform poorly for prompt-based representation learning due to a superficial understanding of the relation. To this end, we hig… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  44. arXiv:2312.17247  [pdf, other

    cs.CV

    Amodal Ground Truth and Completion in the Wild

    Authors: Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

    Abstract: This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for part… ▽ More

    Submitted 29 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  45. arXiv:2312.17183  [pdf, other

    eess.IV cs.CV

    One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

    Authors: Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this study, we focus on building up a model that aims to Segment Anything in medical scenarios, driven by Text prompts, termed as SAT. Our main contributions are three folds: (i) for dataset construction, we combine multiple knowledge sources to construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most compreh… ▽ More

    Submitted 1 May, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: 53 pages

  46. arXiv:2312.16943  [pdf, other

    cs.CV

    SAR-Net: Multi-scale Direction-aware SAR Network via Global Information Fusion

    Authors: Mingxiang Cao, Jie Lei, Weiying Xie, Jiaqing Zhang, Daixun Li, Yunsong Li

    Abstract: Deep learning has driven significant progress in object detection using Synthetic Aperture Radar (SAR) imagery. Existing methods, while achieving promising results, often struggle to effectively integrate local and global information, particularly direction-aware features. This paper proposes SAR-Net, a novel framework specifically designed for global fusion of direction-aware information in SAR o… ▽ More

    Submitted 27 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  47. arXiv:2312.16151  [pdf, other

    cs.CV

    Large-scale Long-tailed Disease Diagnosis on Radiology Images

    Authors: Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this study, we aim to investigate the problem of large-scale, large-vocabulary disease classification for radiologic images, which can be formulated as a multi-modal, multi-anatomy, multi-label, long-tailed classification. Our main contributions are three folds: (i), on dataset construction, we build up an academically accessible, large-scale diagnostic dataset that encompasses 5568 disorders l… ▽ More

    Submitted 28 December, 2023; v1 submitted 26 December, 2023; originally announced December 2023.

  48. arXiv:2312.16012  [pdf, other

    cs.CV cs.AI

    Detection-based Intermediate Supervision for Visual Question Answering

    Authors: Yuhang Liu, Daowan Peng, Wei Wei, Yuanyuan Fu, Wenfeng Xie, Dangyang Chen

    Abstract: Recently, neural module networks (NMNs) have yielded ongoing success in answering compositional visual questions, especially those involving multi-hop visual and logical reasoning. NMNs decompose the complex question into several sub-tasks using instance-modules from the reasoning paths of that question and then exploit intermediate supervisions to guide answer prediction, thereby improving infere… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI24

  49. arXiv:2312.14055  [pdf, other

    cs.CV

    A Strong Baseline for Temporal Video-Text Alignment

    Authors: Zeqian Li, Qirui Chen, Tengda Han, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this paper, we consider the problem of temporally aligning the video and texts from instructional videos, specifically, given a long-term video, and associated text sentences, our goal is to determine their corresponding timestamps in the video. To this end, we establish a simple, yet strong model that adopts a Transformer-based architecture with all texts as queries, iteratively attending to t… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  50. arXiv:2312.13173  [pdf, other

    cs.LG math.OC

    Learning Fair Policies for Multi-stage Selection Problems from Observational Data

    Authors: Zhuangzhuang Jia, Grani A. Hanasusanto, Phebe Vayanos, Weijun Xie

    Abstract: We consider the problem of learning fair policies for multi-stage selection problems from observational data. This problem arises in several high-stakes domains such as company hiring, loan approval, or bail decisions where outcomes (e.g., career success, loan repayment, recidivism) are only observed for those selected. We propose a multi-stage framework that can be augmented with various fairness… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 38th Annual AAAI Conference on Artificial Intelligence, 2024