Skip to main content

Showing 1–50 of 1,245 results for author: Zhu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05763  [pdf

    cs.CV cs.AI

    DP-MDM: Detail-Preserving MR Reconstruction via Multiple Diffusion Models

    Authors: Mengxiao Geng, Jiahao Zhu, Xiaolin Zhu, Qiqing Liu, Dong Liang, Qiegen Liu

    Abstract: Detail features of magnetic resonance images play a cru-cial role in accurate medical diagnosis and treatment, as they capture subtle changes that pose challenges for doc-tors when performing precise judgments. However, the widely utilized naive diffusion model has limitations, as it fails to accurately capture more intricate details. To en-hance the quality of MRI reconstruction, we propose a com… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.04756  [pdf, other

    cs.CL cs.LG

    BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

    Authors: Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu, Faiza Khan Khattak

    Abstract: Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective the… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2405.04285  [pdf, other

    cs.AI eess.SP

    On the Foundations of Earth and Climate Foundation Models

    Authors: Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

    Abstract: Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an en… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2405.03280  [pdf, other

    cs.CV cs.AI

    Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

    Authors: Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Huiguang He

    Abstract: Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of nat… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2405.03235  [pdf

    cs.CV cs.LG

    Cross-Modal Domain Adaptation in Brain Disease Diagnosis: Maximum Mean Discrepancy-based Convolutional Neural Networks

    Authors: Xuran Zhu

    Abstract: Brain disorders are a major challenge to global health, causing millions of deaths each year. Accurate diagnosis of these diseases relies heavily on advanced medical imaging techniques such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). However, the scarcity of annotated data poses a significant challenge in deploying machine learning models for medical diagnosis. To address thi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.01663  [pdf, ps, other

    cs.LG cs.AI

    ATNPA: A Unified View of Oversmoothing Alleviation in Graph Neural Networks

    Authors: Yufei Jin, Xingquan Zhu

    Abstract: Oversmoothing is a commonly observed challenge in graph neural network (GNN) learning, where, as layers increase, embedding features learned from GNNs quickly become similar/indistinguishable, making them incapable of differentiating network proximity. A GNN with shallow layer architectures can only learn short-term relation or localized structure information, limiting its power of learning long-t… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 pages

  7. arXiv:2405.01217  [pdf, other

    cs.CV

    CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation

    Authors: Chenying Liu, Conrad Albrecht, Yi Wang, Xiao Xiang Zhu

    Abstract: We study the potential of noisy labels y to pretrain semantic segmentation models in a multi-modal learning framework for geospatial applications. Specifically, we propose a novel Cross-modal Sample Selection method (CromSS) that utilizes the class distributions P^{(d)}(x,c) over pixels x and classes c modelled by multiple sensors/modalities d of a given geospatial scene. Consistency of prediction… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted as an oral presentation by ICLR 2024 ML4RS workshop

  8. arXiv:2405.00358  [pdf, other

    cs.AI cs.LG

    Arbitrary Time Information Modeling via Polynomial Approximation for Temporal Knowledge Graph Embedding

    Authors: Zhiyu Fang, Jingyan Qin, Xiaobin Zhu, Chun Yang, Xu-Cheng Yin

    Abstract: Distinguished from traditional knowledge graphs (KGs), temporal knowledge graphs (TKGs) must explore and reason over temporally evolving facts adequately. However, existing TKG approaches still face two main challenges, i.e., the limited capability to model arbitrary timestamps continuously and the lack of rich inference patterns under temporal constraints. In this paper, we propose an innovative… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by LREC-COLING 2024 (long paper, camera-ready version)

  9. Transformer-based Reasoning for Learning Evolutionary Chain of Events on Temporal Knowledge Graph

    Authors: Zhiyu Fang, Shuai-Long Lei, Xiaobin Zhu, Chun Yang, Shi-Xue Zhang, Xu-Cheng Yin, Jingyan Qin

    Abstract: Temporal Knowledge Graph (TKG) reasoning often involves completing missing factual elements along the timeline. Although existing methods can learn good embeddings for each factual element in quadruples by integrating temporal information, they often fail to infer the evolution of temporal facts. This is mainly because of (1) insufficiently exploring the internal structure and semantic relationshi… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by SIGIR 2024 (the Full paper track, camera ready version)

  10. arXiv:2405.00168  [pdf, other

    cs.CV

    Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method

    Authors: Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, He Wang, Pengcheng Shao, Chunyang Cheng, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler

    Abstract: RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing benchmarks predominantly consist of videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quali… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  11. arXiv:2404.19243  [pdf, other

    cs.DB

    Co-occurrence order-preserving pattern mining

    Authors: Youxi Wu, Zhen Wang, Yan Li, Yingchun Guo, He Jiang, Xingquan Zhu, Xindong Wu

    Abstract: Recently, order-preserving pattern (OPP) mining has been proposed to discover some patterns, which can be seen as trend changes in time series. Although existing OPP mining algorithms have achieved satisfactory performance, they discover all frequent patterns. However, in some cases, users focus on a particular trend and its associated trends. To efficiently discover trend information related to a… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  12. arXiv:2404.17875  [pdf, other

    cs.LG

    Noisy Node Classification by Bi-level Optimization based Multi-teacher Distillation

    Authors: Yujing Liu, Zongqian Wu, Zhengyu Lu, Ci Nie, Guoqiu Wen, Ping Hu, Xiaofeng Zhu

    Abstract: Previous graph neural networks (GNNs) usually assume that the graph data is with clean labels for representation learning, but it is not true in real applications. In this paper, we propose a new multi-teacher distillation method based on bi-level optimization (namely BO-NNC), to conduct noisy node classification on the graph data. Specifically, we first employ multiple self-supervised learning me… ▽ More

    Submitted 8 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

  13. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  14. arXiv:2404.16304  [pdf, other

    cs.CV

    BezierFormer: A Unified Architecture for 2D and 3D Lane Detection

    Authors: Zhiwei Dong, Xi Zhu, Xiya Cao, Ran Ding, Wei Li, Caifa Zhou, Yongliang Wang, Qiangbo Liu

    Abstract: Lane detection has made significant progress in recent years, but there is not a unified architecture for its two sub-tasks: 2D lane detection and 3D lane detection. To fill this gap, we introduce BézierFormer, a unified 2D and 3D lane detection architecture based on Bézier curve lane representation. BézierFormer formulate queries as Bézier control points and incorporate a novel Bézier curve atten… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: ICME 2024, 11 pages, 8 figures

  15. arXiv:2404.16037  [pdf, other

    cs.CV cs.LG physics.ao-ph

    VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting

    Authors: Yutong Xiong, Xun Zhu, Ming Wu, Weiqing Li, Fanbin Mo, Chuang Zhang, Bin Zhang

    Abstract: Sparse meteorological forecasting is indispensable for fine-grained weather forecasting and deserves extensive attention. Recent studies have highlighted the potential of spatio-temporal graph convolutional networks (ST-GCNs) in predicting numerical data from ground weather stations. However, as one of the highest fidelity and lowest latency data, the application of the vision data from satellites… ▽ More

    Submitted 26 January, 2024; originally announced April 2024.

  16. arXiv:2404.15790  [pdf, other

    cs.CV

    Leveraging Large Language Models for Multimodal Search

    Authors: Oriol Barbany, Michael Huang, Xinliang Zhu, Arnab Dhua

    Abstract: Multimodal search has become increasingly important in providing users with a natural and effective way to ex-press their search intentions. Images offer fine-grained details of the desired products, while text allows for easily incorporating search modifications. However, some existing multimodal search systems are unreliable and fail to address simple queries. The problem becomes harder with the… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Published at CVPRW 2024

  17. arXiv:2404.13911  [pdf

    cs.CV

    Global OpenBuildingMap -- Unveiling the Mystery of Global Buildings

    Authors: Xiao Xiang Zhu, Qingyu Li, Yilei Shi, Yuanyuan Wang, Adam Stewart, Jonathan Prexl

    Abstract: Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. To… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  18. arXiv:2404.13878  [pdf, other

    cs.IR

    Multi-Level Sequence Denoising with Cross-Signal Contrastive Learning for Sequential Recommendation

    Authors: Xiaofei Zhu, Liang Li, Stefan Dietze, Xin Luo

    Abstract: Sequential recommender systems (SRSs) aim to suggest next item for a user based on her historical interaction sequences. Recently, many research efforts have been devoted to attenuate the influence of noisy items in sequences by either assigning them with lower attention weights or discarding them directly. The major limitation of these methods is that the former would still prone to overfit noisy… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  19. arXiv:2404.12349  [pdf, other

    cs.AI cs.HC

    Evaluating AI for Law: Bridging the Gap with Open-Source Solutions

    Authors: Rohan Bhambhoria, Samuel Dahan, Jonathan Li, Xiaodan Zhu

    Abstract: This study evaluates the performance of general-purpose AI, like ChatGPT, in legal question-answering tasks, highlighting significant risks to legal professionals and clients. It suggests leveraging foundational models enhanced by domain-specific knowledge to overcome these issues. The paper advocates for creating open-source legal AI systems to improve accuracy, transparency, and narrative divers… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  20. arXiv:2404.12081  [pdf, other

    cs.CV

    MaskCD: A Remote Sensing Change Detection Network Based on Mask Classification

    Authors: Weikang Yu, Xiaokang Zhang, Samiran Das, Xiao Xiang Zhu, Pedram Ghamisi

    Abstract: Change detection (CD) from remote sensing (RS) images using deep learning has been widely investigated in the literature. It is typically regarded as a pixel-wise labeling task that aims to classify each pixel as changed or unchanged. Although per-pixel classification networks in encoder-decoder structures have shown dominance, they still suffer from imprecise boundaries and incomplete object deli… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  21. arXiv:2404.10378  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

    Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

    Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.10476

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

  22. arXiv:2404.09768  [pdf, other

    cs.CV

    Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes

    Authors: Ivica Obadic, Alex Levering, Lars Pennig, Dario Oliveira, Diego Marcos, Xiaoxiang Zhu

    Abstract: Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the inter… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  23. Towards Sim-to-Real Industrial Parts Classification with Synthetic Dataset

    Authors: Xiaomeng Zhu, Talha Bilal, Pär Mårtensson, Lars Hanson, Mårten Björkman, Atsuto Maki

    Abstract: This paper is about effectively utilizing synthetic data for training deep neural networks for industrial parts classification, in particular, by taking into account the domain gap against real-world images. To this end, we introduce a synthetic dataset that may serve as a preliminary testbed for the Sim-to-Real challenge; it contains 17 objects of six industrial use cases, including isolated and… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    Journal ref: 2023 IEEE/CVF CVPRW, pp. 4454-4463

  24. arXiv:2404.07932  [pdf, other

    cs.CV eess.IV

    FusionMamba: Efficient Image Fusion with State Space Model

    Authors: Siran Peng, Xiangyu Zhu, Haoyu Deng, Zhen Lei, Liang-Jian Deng

    Abstract: Image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion primarily rely on CNNs or Transformers to extract features and merge different types of data. While CNNs are efficient, their receptive fiel… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  25. Adaptive Fair Representation Learning for Personalized Fairness in Recommendations via Information Alignment

    Authors: Xinyu Zhu, Lilin Zhang, Ning Yang

    Abstract: Personalized fairness in recommendations has been attracting increasing attention from researchers. The existing works often treat a fairness requirement, represented as a collection of sensitive attributes, as a hyper-parameter, and pursue extreme fairness by completely removing information of sensitive attributes from the learned fair embedding, which suffer from two challenges: huge training co… ▽ More

    Submitted 12 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by SIGIR '24

  26. arXiv:2404.04943  [pdf

    cs.LG cs.AI cs.AR

    Chiplet Placement Order Exploration Based on Learning to Rank with Graph Representation

    Authors: Zhihui Deng, Yuanyuan Duan, Leilai Shao, Xiaolei Zhu

    Abstract: Chiplet-based systems, integrating various silicon dies manufactured at different integrated circuit technology nodes on a carrier interposer, have garnered significant attention in recent years due to their cost-effectiveness and competitive performance. The widespread adoption of reinforcement learning as a sequential placement method has introduced a new challenge in determining the optimal pla… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 6 pages, 8 figures and 6 tables, accepted by the Conference ISEDA

  27. arXiv:2404.04886  [pdf, other

    cs.CR cs.AI

    PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer

    Authors: Xingyu Su, Xiaojie Zhu, Yang Li, Yong Li, Chi Chen, Paulo Esteves-Veríssimo

    Abstract: Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge,… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  28. arXiv:2404.04050  [pdf, other

    cs.CV

    No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

    Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

    Abstract: To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' c… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR Highlight. Code is available at https://github.com/yangyangyang127/Seg-NN. arXiv admin note: text overlap with arXiv:2308.12961

  29. arXiv:2404.03180  [pdf, other

    cs.LG cs.CR

    Goldfish: An Efficient Federated Unlearning Framework

    Authors: Houzhe Wang, Xiaojie Zhu, Chi Chen, Paulo Esteves-Veríssimo

    Abstract: With recent legislation on the right to be forgotten, machine unlearning has emerged as a crucial research area. It facilitates the removal of a user's data from federated trained machine learning models without the necessity for retraining from scratch. However, current machine unlearning algorithms are confronted with challenges of efficiency and validity. To address the above issues, we propose… ▽ More

    Submitted 23 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  30. arXiv:2404.01579  [pdf

    cs.CV

    Diffusion Deepfake

    Authors: Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, Xiatian Zhu

    Abstract: Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection. The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes. Acknowledging the urgency to address the vulnerability of current deepfake detectors to th… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 28 pages including Supplementary material

  31. arXiv:2404.00938  [pdf, ps, other

    cs.HC cs.CL cs.CV cs.RO

    How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey

    Authors: Zhonghao Shi, Ellen Landrum, Amy O' Connell, Mina Kian, Leticia Pinto-Alva, Kaleen Shrestha, Xiaoyuan Zhu, Maja J Matarić

    Abstract: Socially assistive robots (SARs) have shown great success in providing personalized cognitive-affective support for user populations with special needs such as older adults, children with autism spectrum disorder (ASD), and individuals with mental health challenges. The large body of work on SAR demonstrates its potential to provide at-home support that complements clinic-based interventions deliv… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 2 pages, accepted to the Proceedings of the AAAI Symposium Series, 2024

  32. arXiv:2404.00872  [pdf, other

    cs.IT eess.SP

    Performance Evaluation of RIS-Assisted Spatial Modulation for Downlink Transmission

    Authors: Xusheng Zhu, Qingqing Wu, Wen Chen

    Abstract: This paper explores the performance of reconfigurable intelligent surface (RIS) assisted spatial modulation (SM) downlink communication systems, focusing on the average bit error probability (ABEP). Notably, in scenarios with a large number of reflecting units, the composite channel can be approximated by a Gaussian distribution using the central limit theorem. The receiver utilizes a maximum like… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.02893

  33. arXiv:2403.20031  [pdf, other

    cs.CV

    A Unified Framework for Human-centric Point Cloud Video Understanding

    Authors: Yiteng Xu, Kecheng Ye, Xiao Han, Yiming Ren, Xinge Zhu, Yuexin Ma

    Abstract: Human-centric Point Cloud Video Understanding (PVU) is an emerging field focused on extracting and interpreting human-related features from sequences of human point clouds, further advancing downstream human-centric tasks and applications. Previous works usually focus on tackling one specific task and rely on huge labeled data, which has poor generalization capability. Considering that human has s… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  34. arXiv:2403.20001  [pdf, other

    cs.RO

    Adaptive Energy Regularization for Autonomous Gait Transition and Energy-Efficient Quadruped Locomotion

    Authors: Boyuan Liang, Lingfeng Sun, Xinghao Zhu, Bike Zhang, Ziyin Xiong, Chenran Li, Koushil Sreenath, Masayoshi Tomizuka

    Abstract: In reinforcement learning for legged robot locomotion, crafting effective reward strategies is crucial. Pre-defined gait patterns and complex reward systems are widely used to stabilize policy training. Drawing from the natural locomotion behaviors of humans and animals, which adapt their gaits to minimize energy consumption, we propose a simplified, energy-centric reward strategy to foster the de… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures

  35. arXiv:2403.19457  [pdf, other

    cs.IT eess.SP

    Transmissive RIS Transmitter Enabled Spatial Modulation for MIMO Systems

    Authors: Xusheng Zhu, Qingqing Wu, Wen Chen

    Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface (TRIS) transmitter-enabled spatial modulation (SM) multiple-input multiple-output (MIMO) system. In the transmission phase, a column-wise activation strategy is implemented for the TRIS panel, where the specific column elements are activated per time slot. Concurrently, the receiver employs the maximum likelihood det… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  36. arXiv:2403.18846  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    The Blind Normalized Stein Variational Gradient Descent-Based Detection for Intelligent Massive Random Access

    Authors: Xin Zhu, Ahmet Enis Cetin

    Abstract: The lack of an efficient preamble detection algorithm remains a challenge for solving preamble collision problems in intelligent massive random access (RA) in practical communication scenarios. To solve this problem, we present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model at the first step of the grant-based RA procedure. A novel blind normalized Ste… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  37. arXiv:2403.17751  [pdf, other

    cs.IT eess.SP

    Robust Analysis of Full-Duplex Two-Way Space Shift Keying With RIS Systems

    Authors: Xusheng Zhu, Wen Chen, Qingqing Wu, Wen Fang, Chaoying Huang, Jun Li

    Abstract: Reconfigurable intelligent surface (RIS)-assisted index modulation system schemes are considered a promising technology for sixth-generation (6G) wireless communication systems, which can enhance various system capabilities such as coverage and reliability. However, obtaining perfect channel state information (CSI) is challenging due to the lack of a radio frequency chain in RIS. In this paper, we… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  38. arXiv:2403.17369  [pdf, other

    cs.CV

    CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning

    Authors: Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xiangwei Zhu, Zhenming Ji

    Abstract: Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these disc… ▽ More

    Submitted 4 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  39. arXiv:2403.17025  [pdf, other

    cs.CV

    Boosting Few-Shot Learning via Attentive Feature Regularization

    Authors: Xingyu Zhu, Shuo Wang, Jinda Lu, Yanbin Hao, Haifeng Liu, Xiangnan He

    Abstract: Few-shot learning (FSL) based on manifold regularization aims to improve the recognition capacity of novel objects with limited training samples by mixing two samples from different categories with a blending factor. However, this mixing operation weakens the feature representation due to the linear interpolation and the overlooking of the importance of specific channels. To solve these issues, th… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted to AAAI 2024

  40. arXiv:2403.16242  [pdf, other

    cs.CV

    Adversarially Masked Video Consistency for Unsupervised Domain Adaptation

    Authors: Xiaoyu Zhu, Junwei Liang, Po-Yao Huang, Alex Hauptmann

    Abstract: We study the problem of unsupervised domain adaptation for egocentric videos. We propose a transformer-based model to learn class-discriminative and domain-invariant feature representations. It consists of two novel designs. The first module is called Generative Adversarial Domain Alignment Network with the aim of learning domain-invariant representations. It simultaneously learns a mask generator… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  41. arXiv:2403.15356  [pdf, other

    cs.CV

    Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities

    Authors: Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, Xiao Xiang Zhu

    Abstract: The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data. Traditional models have been siloed, tailored to specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders the potential for a holistic analysis that could benefit from the combine… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 33 pages, 10 figures

  42. arXiv:2403.15257  [pdf, other

    cs.SI cs.AI

    Hierarchical Information Enhancement Network for Cascade Prediction in Social Networks

    Authors: Fanrui Zhang, Jiawei Liu, Qiang Zhang, Xiaoling Zhu, Zheng-Jun Zha

    Abstract: Understanding information cascades in networks is a fundamental issue in numerous applications. Current researches often sample cascade information into several independent paths or subgraphs to learn a simple cascade representation. However, these approaches fail to exploit the hierarchical semantic associations between different modalities, limiting their predictive performance. In this work, we… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 2 figures

  43. arXiv:2403.15235  [pdf, other

    cs.SI cs.AI

    Multi-perspective Memory Enhanced Network for Identifying Key Nodes in Social Networks

    Authors: Qiang Zhang, Jiawei Liu, Fanrui Zhang, Xiaoling Zhu, Zheng-Jun Zha

    Abstract: Identifying key nodes in social networks plays a crucial role in timely blocking false information. Existing key node identification methods usually consider node influence only from the propagation structure perspective and have insufficient generalization ability to unknown scenarios. In this paper, we propose a novel Multi-perspective Memory Enhanced Network (MMEN) for identifying key nodes in… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 1 figures

  44. arXiv:2403.14203  [pdf, other

    cs.CV cs.AI

    Unsupervised Audio-Visual Segmentation with Modality Alignment

    Authors: Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu

    Abstract: Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound. Current AVS methods rely on costly fine-grained annotations of mask-audio pairs, making them impractical for scalability. To address this, we introduce unsupervised AVS, eliminating the need for such expensive annotation. To tackle this more challenging problem, we propos… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  45. arXiv:2403.13433  [pdf, other

    cs.AI cs.CL cs.CY

    AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior

    Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoran Guo, Lin Zhang, Yin Cai, Hao Shen, Jiangjie Chen, Zheyu Ye, Yifei Dai, Yan Gao, Yao Hu, Hongwei Feng, Yanghua Xiao

    Abstract: Language significantly influences the formation and evolution of Human emergent behavior, which is crucial in understanding collective intelligence within human societies. Considering that the study of how language affects human behavior needs to put it into the dynamic scenarios in which it is used, we introduce AgentGroupChat in this paper, a simulation that delves into the complex role of langu… ▽ More

    Submitted 4 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  46. arXiv:2403.13307  [pdf, other

    cs.CV

    LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment

    Authors: Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma

    Abstract: Language-guided scene-aware human motion generation has great significance for entertainment and robotics. In response to the limitations of existing datasets, we introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research. LaserHuman stands out with its inclusion of genuine human motions within 3D environments, unbounded free-form natural language descript… ▽ More

    Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  47. arXiv:2403.12686  [pdf, other

    cs.CV cs.MM cs.RO

    WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar

    Authors: Runwei Guan, Liye Jia, Fengyufan Yang, Shanliang Yao, Erick Purwanto, Xiaohui Zhu, Eng Gee Lim, Jeremy Smith, Ka Lok Man, Xuming Hu, Yutao Yue

    Abstract: The perception of waterways based on human intent is significant for autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water environments. Inspired by visual grounding, we introduce WaterVG, the first visual grounding dataset designed for USV-based waterway perception based on human prompts. WaterVG encompasses prompts describing multiple targets, with annotations at the… ▽ More

    Submitted 4 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 10 pages, 10 figures

  48. arXiv:2403.12676  [pdf, other

    cs.RO

    In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing

    Authors: Mingrui Yu, Boyuan Liang, Xiang Zhang, Xinghao Zhu, Xiang Li, Masayoshi Tomizuka

    Abstract: Most research on deformable linear object (DLO) manipulation assumes rigid grasping. However, beyond rigid grasping and re-grasping, in-hand following is also an essential skill that humans use to dexterously manipulate DLOs, which requires continuously changing the grasp point by in-hand sliding while holding the DLO to prevent it from falling. Achieving such a skill is very challenging for robot… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  49. arXiv:2403.12226  [pdf, other

    cs.LG cs.CV physics.flu-dyn

    Large-scale flood modeling and forecasting with FloodCast

    Authors: Qingsong Xu, Yilei Shi, Jonathan Bamber, Chaojun Ouyang, Xiao Xiang Zhu

    Abstract: Large-scale hydrodynamic models generally rely on fixed-resolution spatial grids and model parameters as well as incurring a high computational cost. This limits their ability to accurately forecast flood crests and issue time-critical hazard warnings. In this work, we build a fast, stable, accurate, resolution-invariant, and geometry-adaptative flood modeling and forecasting framework that can pe… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 40 pages, 16 figures, under review

  50. Uncertainty-Aware Pseudo-Label Filtering for Source-Free Unsupervised Domain Adaptation

    Authors: Xi Chen, Haosen Yang, Huicong Zhang, Hongxun Yao, Xiatian Zhu

    Abstract: Source-free unsupervised domain adaptation (SFUDA) aims to enable the utilization of a pre-trained source model in an unlabeled target domain without access to source data. Self-training is a way to solve SFUDA, where confident target samples are iteratively selected as pseudo-labeled samples to guide target model learning. However, prior heuristic noisy pseudo-label filtering methods all involve… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Neurocomputing 2024