Skip to main content

Showing 1–50 of 625 results for author: Yang, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18897  [pdf, other

    cs.CV

    MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning

    Authors: Junjie Wang, Guangjing Yang, Wentao Chen, Huahui Yi, Xiaohu Wu, Qicheng Lao

    Abstract: In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely in… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Tech report

  2. arXiv:2405.18353  [pdf, other

    cs.LG stat.ML

    Simulating infinite-dimensional nonlinear diffusion bridges

    Authors: Gefan Yang, Elizabeth Louise Baker, Michael L. Severinsen, Christy Anna Hipsley, Stefan Sommer

    Abstract: The diffusion bridge is a type of diffusion process that conditions on hitting a specific state within a finite time period. It has broad applications in fields such as Bayesian inference, financial mathematics, control theory, and shape analysis. However, simulating the diffusion bridge for natural data can be challenging due to both the intractability of the drift term and continuous representat… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2405.17659  [pdf, other

    eess.IV cs.CV

    Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba

    Authors: Jiahao Huang, Liutao Yang, Fanwen Wang, Yinzhe Wu, Yang Nan, Weiwen Wu, Chengyan Wang, Kuangyu Shi, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang

    Abstract: Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has sh… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2405.15241  [pdf, other

    eess.IV cs.CV

    Blaze3DM: Marry Triplane Representation with Diffusion for 3D Medical Inverse Problem Solving

    Authors: Jia He, Bonan Li, Ge Yang, Ziwen Liu

    Abstract: Solving 3D medical inverse problems such as image restoration and reconstruction is crucial in modern medical field. However, the curse of dimensionality in 3D medical data leads mainstream volume-wise methods to suffer from high resource consumption and challenges models to successfully capture the natural distribution, resulting in inevitable volume inconsistency and artifacts. Some recent works… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2405.09597  [pdf

    cs.LG cs.AI

    When AI Eats Itself: On the Caveats of Data Pollution in the Era of Generative AI

    Authors: Xiaodan Xing, Fadong Shi, Jiahao Huang, Yinzhe Wu, Yang Nan, Sheng Zhang, Yingying Fang, Mike Roberts, Carola-Bibiane Schönlieb, Javier Del Ser, Guang Yang

    Abstract: Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effe… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  6. arXiv:2405.09443  [pdf, other

    cs.IT eess.SP

    Low-Complexity Joint Azimuth-Range-Velocity Estimation for Integrated Sensing and Communication with OFDM Waveform

    Authors: Jun Zhang, Gang Yang, Qibin Ye, Yixuan Huang, Su Hu

    Abstract: Integrated sensing and communication (ISAC) is a main application scenario of the sixth-generation mobile communication systems. Due to the fast-growing number of antennas and subcarriers in cellular systems, the computational complexity of joint azimuth-range-velocity estimation (JARVE) in ISAC systems is extremely high. This paper studies the JARVE problem for a monostatic ISAC system with ortho… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 12 figures, submitted to IEEE journal

  7. arXiv:2405.09024  [pdf, other

    cs.CV

    Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels

    Authors: Guozhang Liu, Ting Liu, Mengke Yuan, Tao Pang, Guangxing Yang, Hao Fu, Tao Wang, Tongkui Liao

    Abstract: The ambiguous appearance, tiny scale, and fine-grained classes of objects in remote sensing imagery inevitably lead to the noisy annotations in category labels of detection dataset. However, the effects and treatments of the label noises are underexplored in modern oriented remote sensing object detectors. To address this issue, we propose a robust oriented remote sensing object detection method t… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  8. arXiv:2405.07411  [pdf, other

    cs.CV cs.AI

    MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks

    Authors: Haijiang Tian, Jingkun Yue, Xiaohong Liu, Guoxing Yang, Zeyu Jiang, Guangyu Wang

    Abstract: Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). How… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  9. arXiv:2405.07409  [pdf, other

    cs.NI

    ZBanner: Fast Stateless Scanning Capable of Obtaining Responses over TCP

    Authors: Chiyu Chen, Yuliang Lu, Guozheng Yang, Yi Xie, Shasha Guo

    Abstract: Fast large-scale network scanning is an important way to understand internet service configurations and security in real time, among which stateless scan is representative. Existing stateless scanners can perform single-packet scans for internet-wide network measurements but are limited to host discovery or port scanning. To obtain further information over TCP, slower stateful scanners must be use… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: The paper has been submitted and the code will be published later

  10. arXiv:2405.06964  [pdf, other

    cs.RO cs.AI

    ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots

    Authors: Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianyi Chen, Zhouliang Yu, Lin Shao

    Abstract: To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to dev… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  11. arXiv:2405.04753  [pdf, other

    cs.CR cs.AI

    AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models

    Authors: Yongheng Zhang, Tingwen Du, Yunshan Ma, Xiang Wang, Yi Xie, Guozheng Yang, Yuliang Lu, Ee-Chien Chang

    Abstract: Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of ex… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 20 pages, 5 figures

  12. arXiv:2405.03851  [pdf, other

    cs.DB cs.DS

    Upper Bounds for Complexity of Asymptotically Optimal Learned Indexes

    Authors: Luis Croquevielle, Guang Yang, Liang Lian, Ali Hadian, Thomas Heinis

    Abstract: Learned indexes leverage machine learning models to accelerate query answering in databases, showing impressive practical performance. However, theoretical understanding of these methods remains incomplete. Existing research suggests that learned indexes have superior asymptotic complexity compared to their non-learned counterparts, but these findings have been established under restrictive probab… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  13. arXiv:2404.18209  [pdf, other

    cs.LG cs.DB

    4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs

    Authors: Minjie Wang, Quan Gan, David Wipf, Zhenkun Cai, Ning Li, Jianheng Tang, Yanlin Zhang, Zizhao Zhang, Zunyao Mao, Yakun Song, Yanbo Wang, Jiahang Li, Han Zhang, Guang Yang, Xiao Qin, Chuan Lei, Muhan Zhang, Weinan Zhang, Christos Faloutsos, Zheng Zhang

    Abstract: Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and eva… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Under review

  14. arXiv:2404.17672  [pdf, other

    cs.CV cs.GR

    BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

    Authors: Ian Huang, Guandao Yang, Leonidas Guibas

    Abstract: Graphics design is important for various applications, including movie production and game design. To create a high-quality scene, designers usually need to spend hours in software like Blender, in which they might need to interleave and repeat operations, such as connecting material nodes, hundreds of times. Moreover, slightly different design goals may require completely different sequences, mak… ▽ More

    Submitted 22 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  15. arXiv:2404.12670  [pdf, other

    cs.IR cs.CL cs.HC

    Towards Human-centered Proactive Conversational Agents

    Authors: Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, Tat-Seng Chua

    Abstract: Recent research on proactive conversational agents (PCAs) mainly focuses on improving the system's capabilities in anticipating and planning action sequences to accomplish tasks and achieve goals before users articulate their requests. This perspectives paper highlights the importance of moving towards building human-centered PCAs that emphasize human needs and expectations, and that considers eth… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024 (Perspectives Track)

  16. arXiv:2404.10253  [pdf, other

    cs.DC

    Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

    Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

    Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 18 pages, 13 figures

  17. arXiv:2404.04421  [pdf, other

    cs.GR cs.CV

    PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

    Authors: Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

    Abstract: Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along w… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Project Page: https://qingqing-zhao.github.io/PhysAvatar

  18. arXiv:2404.01223  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing

    Authors: Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang

    Abstract: Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Project website: https://feature-splatting.github.io/

  19. arXiv:2404.00598  [pdf, other

    cs.IT eess.SP

    Robust Beamforming Design and Antenna Selection for Dynamic HRIS-aided Massive MIMO Systems

    Authors: Jintao Wang, Binggui Zhou, Chengzhi Ma, Shiqi Gong, Guanghua Yang, Shaodan Ma

    Abstract: In this paper, a dynamic hybrid active-passive reconfigurable intelligent surface (HRIS) is proposed to further enhance the massive multiple-input-multiple-output (MIMO) system, since it supports the dynamic placement of active and passive elements. Specifically, considering the impact of the hardware impairments (HWIs), we investigate the channel-aware configuration of the receive antennas at the… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 5 pages, 2 figures

  20. arXiv:2403.16520  [pdf, other

    cs.CV

    CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification

    Authors: Guangqian Yang, Kangrui Du, Zhihan Yang, Ye Du, Yongping Zheng, Shujun Wang

    Abstract: Alzheimer's disease (AD) is an incurable neurodegenerative condition leading to cognitive and functional deterioration. Given the lack of a cure, prompt and precise AD diagnosis is vital, a complex process dependent on multiple factors and multi-modal data. While successful efforts have been made to integrate multi-modal representation learning into medical datasets, scant attention has been given… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 11 pages, 1 figure

  21. arXiv:2403.14583  [pdf, other

    cs.RO cs.LG cs.MA

    Co-Optimization of Environment and Policies for Decentralized Multi-Agent Navigation

    Authors: Zhan Gao, Guang Yang, Amanda Prorok

    Abstract: This work views the multi-agent system and its surrounding environment as a co-evolving system, where the behavior of one affects the other. The goal is to take both agent actions and environment configurations as decision variables, and optimize these two components in a coordinated manner to improve some measure of interest. Towards this end, we consider the problem of decentralized multi-agent… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  22. arXiv:2403.11004  [pdf, other

    cs.LG cs.SI

    Forward Learning of Graph Neural Networks

    Authors: Namyong Park, Xing Wang, Antoine Simoulin, Shuai Yang, Grey Yang, Ryan Rossi, Puja Trivedi, Nesreen Ahmed

    Abstract: Graph neural networks (GNNs) have achieved remarkable success across a wide range of applications, such as recommendation, drug discovery, and question answering. Behind the success of GNNs lies the backpropagation (BP) algorithm, which is the de facto standard for training deep neural networks (NNs). However, despite its effectiveness, BP imposes several constraints, which are not only biological… ▽ More

    Submitted 12 April, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  23. arXiv:2403.10860  [pdf, other

    cs.CV cs.AI

    Efficient Domain Adaptation for Endoscopic Visual Odometry

    Authors: Junyang Wu, Yun Gu, Guang-Zhong Yang

    Abstract: Visual odometry plays a crucial role in endoscopic imaging, yet the scarcity of realistic images with ground truth poses poses a significant challenge. Therefore, domain adaptation offers a promising approach to bridge the pre-operative planning domain with the intra-operative real domain for learning odometry information. However, existing methodologies suffer from inefficiencies in the training… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  24. arXiv:2403.08157  [pdf

    cs.CV

    Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks

    Authors: Fuzhi Wu, Jiasong Wu, Youyong Kong, Chunfeng Yang, Guanyu Yang, Huazhong Shu, Guy Carrault, Lotfi Senhadji

    Abstract: Deep learning and Convolutional Neural Networks (CNNs) have driven major transformations in diverse research areas. However, their limitations in handling low-frequency information present obstacles in certain tasks like interpreting global structures or managing smooth transition images. Despite the promising performance of transformer structures in numerous tasks, their intricate optimization co… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 9 pages, 10 figures,6 tables. AAAI 2024 conference

  25. arXiv:2403.07563  [pdf, other

    cs.RO cs.CV cs.LG

    Learning Generalizable Feature Fields for Mobile Manipulation

    Authors: Ri-Zhao Qiu, Yafei Hu, Ge Yang, Yuchen Song, Yang Fu, Jianglong Ye, Jiteng Mu, Ruihan Yang, Nikolay Atanasov, Sebastian Scherer, Xiaolong Wang

    Abstract: An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Preprint. Project website is at: https://geff-b1.github.io/

  26. arXiv:2403.06122  [pdf, other

    cs.CV

    Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning

    Authors: Woo-Jin Ahn, Geun-Yeong Yang, Hyun-Duck Choi, Myo-Taeg Lim

    Abstract: Deep learning models for semantic segmentation often experience performance degradation when deployed to unseen target domains unidentified during the training phase. This is mainly due to variations in image texture (\ie style) from different data sources. To tackle this challenge, existing domain generalized semantic segmentation (DGSS) methods attempt to remove style variations from the feature… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  27. arXiv:2403.05280  [pdf, other

    cs.CV

    ContrastDiagnosis: Enhancing Interpretability in Lung Nodule Diagnosis Using Contrastive Learning

    Authors: Chenglong Wang, Yinqiao Yi, Yida Wang, Chengxiu Zhang, Yun Liu, Kensaku Mori, Mei Yuan, Guang Yang

    Abstract: With the ongoing development of deep learning, an increasing number of AI models have surpassed the performance levels of human clinical practitioners. However, the prevalence of AI diagnostic products in actual clinical practice remains significantly lower than desired. One crucial reason for this gap is the so-called `black box' nature of AI models. Clinicians' distrust of black box models has d… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  28. arXiv:2403.03677  [pdf, other

    cs.SE

    Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning

    Authors: Shaoyu Yang, Xiang Chen, Ke Liu, Guang Yang, Chi Yu

    Abstract: When drafting question posts for Stack Overflow, developers may not accurately summarize the core problems in the question titles, which can cause these questions to not get timely help. Therefore, improving the quality of question titles has attracted the wide attention of researchers. An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted by Empirical Software Engineering 2024 (EMSE)

  29. arXiv:2403.01246  [pdf, other

    cs.CV

    Dual Graph Attention based Disentanglement Multiple Instance Learning for Brain Age Estimation

    Authors: Fanzhe Yan, Gang Yang, Yu Li, Aiping Liu, Xun Chen

    Abstract: Deep learning techniques have demonstrated great potential for accurately estimating brain age by analyzing Magnetic Resonance Imaging (MRI) data from healthy individuals. However, current methods for brain age estimation often directly utilize whole input images, overlooking two important considerations: 1) the heterogeneous nature of brain aging, where different brain regions may degenerate at d… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 12 pages, 9 figures

  30. arXiv:2402.18975  [pdf, other

    cs.CV cs.AI

    Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

    Authors: Zi-Kai Xiao, Guo-Ye Yang, Xue Yang, Tai-Jiang Mu, Junchi Yan, Shi-min Hu

    Abstract: Considerable efforts have been devoted to Oriented Object Detection (OOD). However, one lasting issue regarding the discontinuity in Oriented Bounding Box (OBB) representation remains unresolved, which is an inherent bottleneck for extant OOD methods. This paper endeavors to completely solve this issue in a theoretically guaranteed manner and puts an end to the ad-hoc efforts in this direction. Pr… ▽ More

    Submitted 16 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 17 pages, 12 tables, 8 figures. Accepted by CVPR'24. Code: https://github.com/514flowey/JDet-COBB

  31. arXiv:2402.18451  [pdf, other

    eess.IV cs.CV

    MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation

    Authors: Jiahao Huang, Liutao Yang, Fanwen Wang, Yinzhe Wu, Yang Nan, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang

    Abstract: The recent Mamba model has shown remarkable adaptability for visual representation learning, including in medical imaging tasks. This study introduces MambaMIR, a Mamba-based model for medical image reconstruction, as well as its Generative Adversarial Network-based variant, MambaMIR-GAN. Our proposed MambaMIR inherits several advantages, such as linear complexity, global receptive fields, and dyn… ▽ More

    Submitted 19 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  32. arXiv:2402.16870  [pdf, other

    cs.NI

    Pioneering Deterministic Scheduling and Network Structure Optimization for Time-Critical Computing Tasks in Industrial IoT

    Authors: Yujiao Hu, Yining Zhu, Huayu Zhang, Yan Pan, Qingmin Jia, Renchao Xie, Gang Yang, F. Richard Yu

    Abstract: The Industrial Internet of Things (IIoT) has become a critical technology to accelerate the process of digital and intelligent transformation of industries. As the cooperative relationship between smart devices in IIoT becomes more complex, getting deterministic responses of IIoT periodic time-critical computing tasks becomes a crucial and nontrivial problem. However, few current works in cloud/ed… ▽ More

    Submitted 23 January, 2024; originally announced February 2024.

    Comments: Under Review

  33. arXiv:2402.16796  [pdf, other

    cs.RO cs.LG

    Expressive Whole-Body Control for Humanoid Robots

    Authors: Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, Xiaolong Wang

    Abstract: Can we enable humanoid robots to generate rich, diverse, and expressive motions in the real world? We propose to learn a whole-body control policy on a human-sized robot to mimic human motions as realistic as possible. To train such a policy, we leverage the large-scale human motion capture data from the graphics community in a Reinforcement Learning framework. However, directly performing imitati… ▽ More

    Submitted 5 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Website: https://expressive-humanoid.github.io

  34. arXiv:2402.16674  [pdf, other

    cs.CV

    ConSept: Continual Semantic Segmentation via Adapter-based Vision Transformer

    Authors: Bowen Dong, Guanglei Yang, Wangmeng Zuo, Lei Zhang

    Abstract: In this paper, we delve into the realm of vision transformers for continual semantic segmentation, a problem that has not been sufficiently explored in previous literature. Empirical investigations on the adaptation of existing frameworks to vanilla ViT reveal that incorporating visual adapters into ViTs or fine-tuning ViTs with distillation terms is advantageous for enhancing the segmentation cap… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  35. arXiv:2402.15939  [pdf

    eess.IV cs.LG

    Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI

    Authors: Zi Wang, Min Xiao, Yirong Zhou, Chengyan Wang, Naiming Wu, Yi Li, Yiwen Gong, Shufu Chang, Yinyin Chen, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Di Guo, Guang Yang, Xiaobo Qu

    Abstract: Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, levera… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 10 pages, 11 figures, 3 tables

  36. arXiv:2402.12100  [pdf, other

    cs.CL cs.AI cs.CR cs.SE

    Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation

    Authors: Yi Liu, Guowei Yang, Gelei Deng, Feiyue Chen, Yuqi Chen, Ling Shi, Tianwei Zhang, Yang Liu

    Abstract: With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tre… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  37. arXiv:2402.12076  [pdf, other

    cs.GR

    Periodic Implicit Representation, Design and Optimization of Porous Structures Using Periodic B-splines

    Authors: Gao Depeng, Gao Yang, Lin Hongwei

    Abstract: Porous structures are intricate solid materials with numerous small pores, extensively used in fields like medicine, chemical engineering, and aerospace. However, the design of such structures using computer-aided tools is a time-consuming and tedious process.In this study, we propose a novel representation method and design approach for porous units that can be infinitely spliced to form a porous… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 38 pages, 12 figures

  38. In-Vivo Hyperspectral Human Brain Image Database for Brain Cancer Detection

    Authors: H. Fabelo, S. Ortega, A. Szolna, D. Bulters, J. F. Pineiro, S. Kabwama, A. Shanahan, H. Bulstrode, S. Bisshopp, B. R. Kiran, D. Ravi, R. Lazcano, D. Madronal, C. Sosa, C. Espino, M. Marquez, M. De la Luz Plaza, R. Camacho, D. Carrera, M. Hernandez, G. M. Callico, J. Morera, B. Stanciulescu, G. Z. Yang, R. Salvador , et al. (3 additional authors not shown)

    Abstract: The use of hyperspectral imaging for medical applications is becoming more common in recent years. One of the main obstacles that researchers find when developing hyperspectral algorithms for medical applications is the lack of specific, publicly available, and hyperspectral medical data. The work described in this paper was developed within the framework of the European project HELICoiD (HypErspe… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 19 pages, 12 figures

    Journal ref: IEEE Access, 2019, 7, pp. 39098 39116

  39. arXiv:2402.08846  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

    Authors: Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

    Abstract: In this paper, we focus on solving one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning f… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Working in progress and will open-source soon

  40. arXiv:2402.07403  [pdf

    cs.CV eess.IV

    Make it more specific: A novel uncertainty based airway segmentation application on 3D U-Net and its variants

    Authors: Shiyi Wang, Yang Nan, Felder Federico N, Sheng Zhang, Walsh Simon L F, Guang Yang

    Abstract: Each medical segmentation task should be considered with a specific AI algorithm based on its scenario so that the most accurate prediction model can be obtained. The most popular algorithms in medical segmentation, 3D U-Net and its variants, can directly implement the task of lung trachea segmentation, but its failure to consider the special tree-like structure of the trachea suggests that there… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  41. Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations

    Authors: H. Fabelo, S. Ortega, D. Ravi, B. R. Kiran, C. Sosa, D. Bulters, G. M. Callico, H. Bulstrode, A. Szolna, J. F. Pineiro, S. Kabwama, D. Madronal, R. Lazcano, A. J. OShanahan, S. Bisshopp, M. Hernandez, A. Baez-Quevedo, G. Z. Yang, B. Stanciulescu, R. Salvador, E. Juarez, R. Sarmiento

    Abstract: Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries du… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  42. arXiv:2402.03541  [pdf, other

    cs.LG math.NA

    HAMLET: Graph Transformer Neural Operator for Partial Differential Equations

    Authors: Andrey Bryutkin, Jiahao Huang, Zhongying Deng, Guang Yang, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero

    Abstract: We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 17 pages, 7 figures, 6 tables

  43. arXiv:2402.03473  [pdf, other

    eess.IV cs.CV

    Assessing the Efficacy of Invisible Watermarks in AI-Generated Medical Images

    Authors: Xiaodan Xing, Huiyu Zhou, Yingying Fang, Guang Yang

    Abstract: AI-generated medical images are gaining growing popularity due to their potential to address the data scarcity challenge in the real world. However, the issue of accurate identification of these synthetic images, particularly when they exhibit remarkable realism with their real copies, remains a concern. To mitigate this challenge, image generators such as DALLE and Imagen, have integrated digital… ▽ More

    Submitted 21 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 5 pages

    Journal ref: ISBI 2024

  44. arXiv:2402.01434  [pdf, other

    stat.ML cs.LG stat.CO

    Conditioning non-linear and infinite-dimensional diffusion processes

    Authors: Elizabeth Louise Baker, Gefan Yang, Michael L. Severinsen, Christy Anna Hipsley, Stefan Sommer

    Abstract: Generative diffusion models and many stochastic models in science and engineering naturally live in infinite dimensions before discretisation. To incorporate observed data for statistical and learning tasks, one needs to condition on observations. While recent work has treated conditioning linear processes in infinite dimensions, conditioning non-linear processes in infinite dimensions has not bee… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  45. arXiv:2401.13564  [pdf, ps, other

    cs.IT eess.SP

    RIS Empowered Near-Field Covert Communications

    Authors: Jun Liu, Gang Yang, Yuanwei Liu, Xiangyun Zhou

    Abstract: This paper studies an extremely large-scale reconfigurable intelligent surface (XL-RIS) empowered covert communication system in the near-field region. Alice covertly transmits messages to Bob with the assistance of the XL-RIS, while evading detection by Willie. To enhance the covert communication performance, we maximize the achievable covert rate by jointly optimizing the hybrid analog and digit… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 15 pages, 8 figures, submitted to IEEE journal

  46. arXiv:2401.08079  [pdf, other

    cs.CV

    Adversarial Masking Contrastive Learning for vein recognition

    Authors: Huafeng Qin, Yiquan Wu, Mounim A. El-Yacoubi, Jun Wang, Guangxiang Yang

    Abstract: Vein recognition has received increasing attention due to its high security and privacy. Recently, deep neural networks such as Convolutional neural networks (CNN) and Transformers have been introduced for vein recognition and achieved state-of-the-art performance. Despite the recent advances, however, existing solutions for finger-vein feature extraction are still not optimal due to scarce traini… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  47. arXiv:2401.04092  [pdf, other

    cs.CV

    GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

    Authors: Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

    Abstract: Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion each, such as how well the asset aligned with the input text. These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences. Conducting user preference studies is… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Project page: https://gpteval3d.github.io/ ; Code: https://github.com/3DTopia/GPTEval3D

  48. arXiv:2401.03179  [pdf, other

    cs.CV

    Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification

    Authors: Jiaqing Zhang, Jie Lei, Weiying Xie, Geng Yang, Daixun Li, Yunsong Li

    Abstract: In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundanc… ▽ More

    Submitted 23 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

  49. arXiv:2401.01369  [pdf, other

    cs.IR cs.AI cs.LG

    RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems

    Authors: Jiahong Zhou, Shunhui Mao, Guoliang Yang, Bo Tang, Qianlong Xie, Lebin Lin, Xingxing Wang, Dong Wang

    Abstract: Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus… ▽ More

    Submitted 27 December, 2023; originally announced January 2024.

    Comments: 11 pages, 7 figures, published to Proceedings of the ACM Web Conference 2023

  50. arXiv:2312.17051  [pdf, other

    cs.CV

    FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models

    Authors: Wan Xu, Tianyu Huang, Tianyu Qu, Guanglei Yang, Yiwen Guo, Wangmeng Zuo

    Abstract: Few-shot class-incremental learning (FSCIL) aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. While the Contrastive Vision-Language Pre-Training (CLIP) model has been effective in addressing 2D few/zero-shot learning tasks, its direct application to 3D FSCIL faces limitations. These limitations arise from feature space misalignment and signif… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.