Skip to main content

Showing 1–50 of 137 results for author: Xia, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  2. arXiv:2405.13049  [pdf, other

    cs.CL cs.AI cs.MM

    SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations

    Authors: Fanfan Wang, Heqing Ma, Jianfei Yu, Rui Xia, Erik Cambria

    Abstract: The ability to understand emotions is an essential component of human-like artificial intelligence, as emotions greatly influence human cognition, decision making, and social interactions. In addition to emotion recognition in conversations, the task of identifying the potential causes behind an individual's emotional state in conversations, is of great importance in many application scenarios. We… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, 4 Tables

    Journal ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  3. arXiv:2405.04645  [pdf, other

    cs.HC cs.CY

    Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences

    Authors: John Stamper, Ruiwei Xiao, Xinying Hou

    Abstract: The field of Artificial Intelligence in Education (AIED) focuses on the intersection of technology, education, and psychology, placing a strong emphasis on supporting learners' needs with compassion and understanding. The growing prominence of Large Language Models (LLMs) has led to the development of scalable solutions within educational settings, including generating different types of feedback… ▽ More

    Submitted 11 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to 25th International Conference on Artificial Intelligence in Education (AIED 2024) BlueSky special track

  4. arXiv:2405.00313  [pdf, other

    cs.CV

    Streamlining Image Editing with Layered Diffusion Brushes

    Authors: Peyman Gholami, Robert Xiao

    Abstract: Denoising diffusion models have recently gained prominence as powerful tools for a variety of image generation and manipulation tasks. Building on this, we propose a novel tool for real-time editing of images that provides users with fine-grained region-targeted supervision in addition to existing prompt-based controls. Our novel editing technique, termed Layered Diffusion Brushes, leverages promp… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.00219

  5. arXiv:2404.15675  [pdf, other

    cs.IR

    Hi-Gen: Generative Retrieval For Large-Scale Personalized E-commerce Search

    Authors: Yanjing Wu, Yinfu Feng, Jian Wang, Wenji Zhou, Yunan Ye, Rong Xiao

    Abstract: Leveraging generative retrieval (GR) techniques to enhance search systems is an emerging methodology that has shown promising results in recent years. In GR, a text-to-text model maps string queries directly to relevant document identifiers (docIDs), so it dramatically simplifies the whole retrieval process. However, when applying most GR models in large-scale E-commerce for personalized item sear… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  6. arXiv:2404.15353  [pdf, other

    eess.SP cs.AI cs.LG

    SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals

    Authors: Runze Yan, Cheng Ding, Ran Xiao, Aleksandr Fedorov, Randall J Lee, Fadi Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)

  7. arXiv:2404.11889  [pdf, other

    eess.IV cs.CV

    Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

    Authors: Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao

    Abstract: X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume d… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 10 figures

  8. arXiv:2404.02213  [pdf, other

    cs.HC cs.AI cs.CY

    Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

    Authors: Ruiwei Xiao, Xinying Hou, John Stamper

    Abstract: Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-sol… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted CHI 2024 LBW - 10 pages

  9. arXiv:2403.15901  [pdf, other

    cs.AI cs.CV

    MatchSeg: Towards Better Segmentation via Reference Image Matching

    Authors: Ruiqiang Xiao, Jiayu Huo, Haotian Zheng, Yang Liu, Sebastien Ourselin, Rachel Sparks

    Abstract: Recently, automated medical image segmentation methods based on deep learning have achieved great success. However, they heavily rely on large annotated datasets, which are costly and time-consuming to acquire. Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the q… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  10. arXiv:2403.15835  [pdf, other

    cs.CV

    Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression

    Authors: Hancheng Ye, Chong Yu, Peng Ye, Renqiu Xia, Yansong Tang, Jiwen Lu, Tao Chen, Bo Zhang

    Abstract: Recent Vision Transformer Compression (VTC) works mainly follow a two-stage scheme, where the importance score of each model unit is first evaluated or preset in each submodule, followed by the sparsity score evaluation according to the target sparsity constraint. Such a separate evaluation process induces the gap between importance and sparsity score distributions, thus causing high search costs… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Our code will be available at www.github.com/HankYe/Once-for-Both

  11. arXiv:2403.02799  [pdf, other

    cs.CL cs.AI

    DPPA: Pruning Method for Large Language Model to Model Merging

    Authors: Yaochen Zhu, Rui Xia, Jiajun Zhang

    Abstract: Model merging is to combine fine-tuned models derived from multiple domains, with the intent of enhancing the model's proficiency across various domains. The principal concern is the resolution of parameter conflicts. A substantial amount of existing research remedy this issue during the merging stage, with the latest study focusing on resolving this issue throughout the pruning stage. The DARE ap… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  12. arXiv:2402.17213  [pdf, other

    cs.CV cs.AI

    VCD: Knowledge Base Guided Visual Commonsense Discovery in Images

    Authors: Xiangqing Shen, Yurun Song, Siwei Wu, Rui Xia

    Abstract: Visual commonsense contains knowledge about object properties, relationships, and behaviors in visual data. Discovering visual commonsense can provide a more comprehensive and richer understanding of images, and enhance the reasoning and decision-making capabilities of computer vision systems. However, the visual commonsense defined in existing visual commonsense discovery studies is coarse-graine… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  13. arXiv:2402.12185  [pdf, other

    cs.CV

    ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

    Authors: Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, Yu Qiao

    Abstract: Recently, many versatile Multi-modal Large Language Models (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in the chart domain, we construct ChartX, a multi-modal eva… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Code and dataset are available for downloading at: https://github.com/UniModal4Reasoning/ChartVLM 22 pages, 15 figures

  14. arXiv:2402.11809  [pdf, other

    cs.CL cs.AI cs.LG

    Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

    Authors: Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao

    Abstract: This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables… ▽ More

    Submitted 19 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings

  15. arXiv:2402.07913  [pdf, other

    cs.CL cs.AI cs.HC

    QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners

    Authors: Rui Xiao, Lu Han, Xiaoying Zhou, Jiong Wang, Na Zong, Pengyu Zhang

    Abstract: In online learning platforms, particularly in rapidly growing computer programming courses, addressing the thousands of students' learning queries requires considerable human cost. The creation of intelligent assistant large language models (LLMs) tailored for programming education necessitates distinct data support. However, in real application scenarios, the data resources for training such LLMs… ▽ More

    Submitted 22 February, 2024; v1 submitted 30 January, 2024; originally announced February 2024.

  16. arXiv:2401.13588  [pdf

    cs.CL cs.AI cs.SE

    Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes

    Authors: Darren Liu, Cheng Ding, Delgersuren Bold, Monique Bouvier, Jiaying Lu, Benjamin Shickel, Craig S. Jabaley, Wenhui Zhang, Soojin Park, Michael J. Young, Mark S. Wainwright, Gilles Clermont, Parisa Rashidi, Eric S. Rosenthal, Laurie Dimisko, Ran Xiao, Joo Heung Yoon, Carl Yang, Xiao Hu

    Abstract: The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in r… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  17. arXiv:2401.12522  [pdf, other

    cs.CL cs.AI cs.LG

    BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

    Authors: Feng Lin, Hanling Yi, Hongbin Li, Yifan Yang, Xiaotian Yu, Guangming Lu, Rong Xiao

    Abstract: Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of pro… ▽ More

    Submitted 25 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: An appendix has been included. Source code at https://github.com/linfeng93/BiTA

  18. arXiv:2401.02847  [pdf, other

    cs.CV cs.GR cs.LG

    Generating Non-Stationary Textures using Self-Rectification

    Authors: Yang Zhou, Rongjun Xiao, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while fa… ▽ More

    Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Project page: https://github.com/xiaorongjun000/Self-Rectification

  19. arXiv:2312.17120  [pdf, other

    cs.CL cs.AI cs.LG

    Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math

    Authors: Zengzhi Wang, Rui Xia, Pengfei Liu

    Abstract: High-quality, large-scale corpora are the cornerstone of building foundation models. In this work, we introduce \textsc{MathPile}, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. Throughout its creation, we adhered to the principle of ``\emph{less is more}'', firmly believing in the supremacy of data quality over quantity, even in the pre-training phase. Our met… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 37 pages. Working in Progress. https://github.com/GAIR-NLP/MathPile/

  20. arXiv:2312.08718  [pdf, other

    cs.RO

    Trajectory Planning and Tracking of Hybrid Flying-Crawling Quadrotors

    Authors: Dongnan Hu, Ruihao Xia, Xin Jin, Yang Tang

    Abstract: Hybrid Flying-Crawling Quadrotors (HyFCQs) are transformable robots with the ability of terrestrial and aerial hybrid motion. This article presents a trajectory planning and tracking framework designed for HyFCQs. In this framework, a terrestrial-aerial path-searching method with the crawling limitation of HyFCQs is proposed to guarantee the dynamical feasibility of trajectories. Additionally, a t… ▽ More

    Submitted 14 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  21. arXiv:2312.07075  [pdf, other

    cs.RO

    Motion Planning and Control of A Morphing Quadrotor in Restricted Scenarios

    Authors: Guiyang Cui, Ruihao Xia, Xin Jin, Yang Tang

    Abstract: Morphing quadrotors with four external actuators can adapt to different restricted scenarios by changing their geometric structure. However, previous works mainly focus on the improvements in structures and controllers, and existing planning algorithms don't consider the morphological modifications, which leads to safety and dynamic feasibility issues. In this paper, we propose a unified planning… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 8 pages, 9 figures

  22. arXiv:2312.02300  [pdf

    cs.LG eess.SP

    Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

    Authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Fadi B Nahab, Xiao Hu

    Abstract: This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart stu… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  23. arXiv:2311.18399  [pdf, other

    eess.AS cs.SD

    Audio Prompt Tuning for Universal Sound Separation

    Authors: Yuzhuo Liu, Xubo Liu, Yan Zhao, Yuanyuan Wang, Rui Xia, Pingchuan Tain, Yuxuan Wang

    Abstract: Universal sound separation (USS) is a task to separate arbitrary sounds from an audio mixture. Existing USS systems are capable of separating arbitrary sources, given a few examples of the target sources as queries. However, separating arbitrary sounds with a single system is challenging, and the robustness is not always guaranteed. In this work, we propose audio prompt tuning (APT), a simple yet… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  24. arXiv:2311.15614  [pdf, other

    cs.CL

    FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models

    Authors: Ruixuan Xiao, Yiwen Dong, Junbo Zhao, Runze Wu, Minmin Lin, Gang Chen, Haobo Wang

    Abstract: Collecting high-quality labeled data for model training is notoriously time-consuming and labor-intensive for various NLP tasks. While copious solutions, such as active learning for small language models (SLMs) and prevalent in-context learning in the era of large language models (LLMs), have been proposed and alleviate the labeling burden to some extent, their performances are still subject to hu… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 (Main conference)

  25. In-Context Learning for Knowledge Base Question Answering for Unmanned Systems based on Large Language Models

    Authors: Yunlong Chen, Yaming Zhang, Jianfei Yu, Li Yang, Rui Xia

    Abstract: Knowledge Base Question Answering (KBQA) aims to answer factoid questions based on knowledge bases. However, generating the most appropriate knowledge base query code based on Natural Language Questions (NLQ) poses a significant challenge in KBQA. In this work, we focus on the CCKS2023 Competition of Question Answering with Knowledge Graph Inference for Unmanned Systems. Inspired by the recent suc… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Runner up of the CCKS 2023 question answering with knowledge graph inference for unmanned systems evaluation task, accepted as an evaluation paper

    ACM Class: I.2.7

  26. arXiv:2310.10219  [pdf, other

    cs.CV cs.AI

    Using Global Land Cover Product as Prompt for Cropland Mapping via Visual Foundation Model

    Authors: Chao Tao, Aoran Hu, Rong Xiao, Haifeng Li, Yuze Wang

    Abstract: Data-driven deep learning methods have shown great potential in cropland mapping. However, due to multiple factors such as attributes of cropland (topography, climate, crop type) and imaging conditions (viewing angle, illumination, scale), croplands under different scenes demonstrate a great domain gap. This makes it difficult for models trained in the specific scenes to directly generalize to oth… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  27. arXiv:2310.06594  [pdf, other

    cs.CV

    On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets

    Authors: Ning Liao, Shaofeng Zhang, Renqiu Xia, Min Cao, Yu Qiao, Junchi Yan

    Abstract: There is an emerging line of research on multimodal instruction tuning, and a line of benchmarks has been proposed for evaluating these models recently. Instead of evaluating the models directly, in this paper, we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets. Also, we seek the way of building a dataset for developing an all-powerful VLIT model, which we believe could also… ▽ More

    Submitted 29 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  28. arXiv:2310.06502  [pdf, other

    cs.CL

    The Limits of ChatGPT in Extracting Aspect-Category-Opinion-Sentiment Quadruples: A Comparative Analysis

    Authors: Xiancai Xu, Jia-Dong Zhang, Rongchang Xiao, Lei Xiong

    Abstract: Recently, ChatGPT has attracted great attention from both industry and academia due to its surprising abilities in natural language understanding and generation. We are particularly curious about whether it can achieve promising performance on one of the most complex tasks in aspect-based sentiment analysis, i.e., extracting aspect-category-opinion-sentiment quadruples from texts. To this end, in… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  29. arXiv:2310.03293  [pdf, other

    cs.CL

    A New Dialogue Response Generation Agent for Large Language Models by Asking Questions to Detect User's Intentions

    Authors: Siwei Wu, Xiangqing Shen, Rui Xia

    Abstract: Large Language Models (LLMs), such as ChatGPT, have recently been applied to various NLP tasks due to its open-domain generation capabilities. However, there are two issues with applying LLMs to dialogue tasks. 1. During the dialogue process, users may have implicit intentions that might be overlooked by LLMs. Consequently, generated responses couldn't align with the user's intentions. 2. It is un… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  30. arXiv:2310.02174  [pdf, other

    cs.CL cs.AI cs.LG

    Ask Again, Then Fail: Large Language Models' Vacillations in Judgement

    Authors: Qiming Xie, Zengzhi Wang, Yi Feng, Rui Xia

    Abstract: We observe that current conversational language models often waver in their judgements when faced with follow-up questions, even if the original judgement was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quanti… ▽ More

    Submitted 27 February, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Update abstract and mitigation results of fine-tuning the model on synthesized high-quality preference data with DPO algorithm

  31. arXiv:2309.11883  [pdf

    cs.CV cs.RO

    On-the-Fly SfM: What you capture is What you get

    Authors: Zongqian Zhan, Rui Xia, Yifei Yu, Yibo Xu, Xin Wang

    Abstract: Over the last decades, ample achievements have been made on Structure from motion (SfM). However, the vast majority of them basically work in an offline manner, i.e., images are firstly captured and then fed together into a SfM pipeline for obtaining poses and sparse point cloud. In this work, on the contrary, we present an on-the-fly SfM: running online SfM while image capturing, the newly taken… ▽ More

    Submitted 13 February, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  32. arXiv:2309.11268  [pdf, other

    cs.CV

    StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

    Authors: Renqiu Xia, Bo Zhang, Haoyang Peng, Hancheng Ye, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan

    Abstract: Charts are common in literature across different scientific fields, conveying rich information easily accessible to readers. Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data, e.g. in a tabular form. In this paper, we aim to establish a unified and label-efficient learning par… ▽ More

    Submitted 18 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: SimChart9K is available for downloading at: https://github.com/UniModal4Reasoning/SimChart9K 26 pages, 15 figures

  33. arXiv:2309.07408  [pdf, other

    cs.RO

    An Explicit Method for Fast Monocular Depth Recovery in Corridor Environments

    Authors: Yehao Liu, Ruoyan Xia, Xiaosu Xu, Zijian Wang, Yiqing Ya, Mingze Fan

    Abstract: Monocular cameras are extensively employed in indoor robotics, but their performance is limited in visual odometry, depth estimation, and related applications due to the absence of scale information.Depth estimation refers to the process of estimating a dense depth map from the corresponding input image, existing researchers mostly address this issue through deep learning-based approaches, yet the… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures. arXiv admin note: text overlap with arXiv:2111.08600 by other authors

  34. arXiv:2309.05527  [pdf, other

    cs.CV

    ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

    Authors: Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao

    Abstract: Domain shifts such as sensor type changes and geographical situation variations are prevalent in Autonomous Driving (AD), which poses a challenge since AD model relying on the previous domain knowledge can be hardly directly deployed to a new domain without additional costs. In this paper, we provide a new perspective and approach of alleviating the domain shifts, by proposing a Reconstruction-Sim… ▽ More

    Submitted 25 January, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted by ICLR 2024. Code and simulated points are available at https://github.com/PJLab-ADG/3DTrans#resimad

  35. arXiv:2308.08345  [pdf, other

    eess.IV cs.CV

    GAEI-UNet: Global Attention and Elastic Interaction U-Net for Vessel Image Segmentation

    Authors: Ruiqiang Xiao, Zhuoyue Wan

    Abstract: Vessel image segmentation plays a pivotal role in medical diagnostics, aiding in the early detection and treatment of vascular diseases. While segmentation based on deep learning has shown promising results, effectively segmenting small structures and maintaining connectivity between them remains challenging. To address these limitations, we propose GAEI-UNet, a novel model that combines global at… ▽ More

    Submitted 22 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: text overlap with arXiv:2004.03696 by other authors

  36. arXiv:2308.07723  [pdf, other

    cs.RO cs.MA

    Extended Preintegration for Relative State Estimation of Leader-Follower Platform

    Authors: Ruican Xia, Hailong Pei

    Abstract: Relative state estimation using exteroceptive sensors suffers from limitations of the field of view (FOV) and false detection, that the proprioceptive sensor (IMU) data are usually engaged to compensate. Recently ego-motion constraint obtained by Inertial measurement unit (IMU) preintegration has been extensively used in simultaneous localization and mapping (SLAM) to alleviate the computation bur… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  37. arXiv:2308.06948  [pdf, other

    cs.CV

    MixBCT: Towards Self-Adapting Backward-Compatible Training

    Authors: Yu Liang, Yufeng Zhang, Shiliang Zhang, Yaowei Wang, Sheng Xiao, Rong Xiao, Xiaoyu Wang

    Abstract: Backward-compatible training circumvents the need for expensive updates to the old gallery database when deploying an advanced new model in the retrieval system. Previous methods achieved backward compatibility by aligning prototypes of the new model with the old one, yet they often overlooked the distribution of old features, limiting their effectiveness when the low quality of the old model resu… ▽ More

    Submitted 26 May, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

  38. arXiv:2308.05037  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Separate Anything You Describe

    Authors: Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instr… ▽ More

    Submitted 27 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Code, benchmark and pre-trained models: https://github.com/Audio-AGI/AudioSep

  39. arXiv:2307.15942  [pdf, other

    cs.CV

    CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation

    Authors: Ruihao Xia, Chaoqiang Zhao, Meng Zheng, Ziyan Wu, Qiyu Sun, Yang Tang

    Abstract: Most nighttime semantic segmentation studies are based on domain adaptation approaches and image input. However, limited by the low dynamic range of conventional cameras, images fail to capture structural details and boundary information in low-light conditions. Event cameras, as a new form of vision sensors, are complementary to conventional cameras with their high dynamic range. To this end, we… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023

  40. arXiv:2307.13259  [pdf, other

    cs.CV

    GaitFormer: Revisiting Intrinsic Periodicity for Gait Recognition

    Authors: Qian Wu, Ruixuan Xiao, Kaixin Xu, Jingcheng Ni, Boxun Li, Ziyao Xu

    Abstract: Gait recognition aims to distinguish different walking patterns by analyzing video-level human silhouettes, rather than relying on appearance information. Previous research on gait recognition has primarily focused on extracting local or global spatial-temporal representations, while overlooking the intrinsic periodic features of gait sequences, which, when fully utilized, can significantly enhanc… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  41. arXiv:2307.10580  [pdf, other

    cs.LG physics.ao-ph

    Intelligent model for offshore China sea fog forecasting

    Authors: Yanfei Xiang, Qinghong Zhang, Mingqing Wang, Ruixue Xia, Yang Kong, Xiaomeng Huang

    Abstract: Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using t… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 19 pages, 9 figures

  42. arXiv:2307.02733  [pdf, other

    cs.CV

    MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential Deepfake Detection

    Authors: Ruiyang Xia, Decheng Liu, Jie Li, Lin Yuan, Nannan Wang, Xinbo Gao

    Abstract: Advanced manipulation techniques have provided criminals with opportunities to make social panic or gain illicit profits through the generation of deceptive media, such as forged face images. In response, various deepfake detection methods have been proposed to assess image authenticity. Sequential deepfake detection, which is an extension of deepfake detection, aims to identify forged facial regi… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  43. arXiv:2307.00370  [pdf, other

    cs.IR cs.CL

    Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model

    Authors: Jiong Cai, Yong Jiang, Yue Zhang, Chengyue Jiang, Ke Yu, Jianhui Ji, Rong Xiao, Haihong Tang, Tao Wang, Zhongqiang Huang, Pengjun Xie, Fei Huang, Kewei Tu

    Abstract: Discovering the intended items of user queries from a massive repository of items is one of the main goals of an e-commerce search system. Relevance prediction is essential to the search system since it helps improve performance. When online serving a relevance model, the model is required to perform fast and accurate inference. Currently, the widely used models such as Bi-encoder and Cross-encode… ▽ More

    Submitted 19 July, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

  44. arXiv:2306.16956  [pdf, other

    cs.CL cs.AI

    MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis

    Authors: Hongjie Cai, Nan Song, Zengzhi Wang, Qiming Xie, Qiankun Zhao, Ke Li, Siwei Wu, Shijie Liu, Jianfei Yu, Rui Xia

    Abstract: Aspect-based sentiment analysis is a long-standing research interest in the field of opinion mining, and in recent years, researchers have gradually shifted their focus from simple ABSA subtasks to end-to-end multi-element ABSA tasks. However, the datasets currently used in the research are limited to individual elements of specific tasks, usually focusing on in-domain settings, ignoring implicit… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

  45. arXiv:2306.10792  [pdf, other

    cs.LG cs.CV

    NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning

    Authors: Yun Yi, Haokui Zhang, Rong Xiao, Nannan Wang, Xiaoyu Wang

    Abstract: As more deep learning models are being applied in real-world applications, there is a growing need for modeling and learning the representations of neural networks themselves. An efficient representation can be used to predict target attributes of networks without the need for actual training and deployment procedures, facilitating efficient network deployment and design. Recently, inspired by the… ▽ More

    Submitted 16 October, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: 9 pages, 2 figures, 6 tables. Code is available at https://github.com/yuny220/NAR-Former-V2

  46. arXiv:2306.08299  [pdf, other

    cs.LG

    SaDI: A Self-adaptive Decomposed Interpretable Framework for Electric Load Forecasting under Extreme Events

    Authors: Hengbo Liu, Ziqing Ma, Linxiao Yang, Tian Zhou, Rui Xia, Yi Wang, Qingsong Wen, Liang Sun

    Abstract: Accurate prediction of electric load is crucial in power grid planning and management. In this paper, we solve the electric load forecasting problem under extreme events such as scorching heats. One challenge for accurate forecasting is the lack of training samples under extreme conditions. Also load usually changes dramatically in these extreme conditions, which calls for interpretable model to m… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  47. arXiv:2306.07306  [pdf, other

    cs.CV cs.AI

    Active Globally Explainable Learning for Medical Images via Class Association Embedding and Cyclic Adversarial Generation

    Authors: Ruitao Xie, Jingbang Chen, Limai Jiang, Rui Xiao, Yi Pan, Yunpeng Cai

    Abstract: Explainability poses a major challenge to artificial intelligence (AI) techniques. Current studies on explainable AI (XAI) lack the efficiency of extracting global knowledge about the learning task, thus suffer deficiencies such as imprecise saliency, context-aware absence and vague meaning. In this paper, we propose the class association embedding (CAE) approach to address these issues. We employ… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  48. arXiv:2306.00219  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images

    Authors: Peyman Gholami, Robert Xiao

    Abstract: Text-to-image generative models have made remarkable advancements in generating high-quality images. However, generated images often contain undesirable artifacts or other errors due to model limitations. Existing techniques to fine-tune generated images are time-consuming (manual editing), produce poorly-integrated results (inpainting), or result in unexpected changes across the entire image (var… ▽ More

    Submitted 26 October, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

  49. arXiv:2305.17019  [pdf, other

    cs.CL

    Commonsense Knowledge Graph Completion Via Contrastive Pretraining and Node Clustering

    Authors: Siwei Wu, Xiangqing Shen, Rui Xia

    Abstract: The nodes in the commonsense knowledge graph (CSKG) are normally represented by free-form short text (e.g., word or phrase). Different nodes may represent the same concept. This leads to the problems of edge sparsity and node redundancy, which challenges CSKG representation and completion. On the one hand, edge sparsity limits the performance of graph representation learning; On the other hand, no… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Findings of ACL 2023

  50. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yuping Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.