Skip to main content

Showing 1–50 of 683 results for author: Xiao, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04861  [pdf, other

    cs.SE

    Insights into Deep Learning Refactoring: Bridging the Gap Between Practices and Expectations

    Authors: SiQi Wang, Xing Hu, Bei Wang, WenXin Yao, Xin Xia, XingYu Wang

    Abstract: With the rapid development of deep learning, the implementation of intricate algorithms and substantial data processing have become standard elements of deep learning projects. As a result, the code has become progressively complex as the software evolves, which is difficult to maintain and understand. Existing studies have investigated the impact of refactoring on software quality within traditio… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 24 pages, 18 figures

  2. arXiv:2405.04065  [pdf, other

    cs.CL

    FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

    Authors: Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu

    Abstract: Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work using utilizing retrieved content by simply prepending retrieved contents to the input poses a high runtime issue, which degrades the inferen… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 14 pages

  3. arXiv:2405.03924  [pdf, other

    cs.DB cs.AI cs.LG

    NeurDB: An AI-powered Autonomous Data System

    Authors: Beng Chin Ooi, Shaofeng Cai, Gang Chen, Kian Lee Tan, Yuncheng Wu, Xiaokui Xiao, Naili Xing, Cong Yue, Lingze Zeng, Meihui Zhang, Zhanhao Zhao

    Abstract: In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, sel… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. Easy over Hard: A Simple Baseline for Test Failures Causes Prediction

    Authors: Zhipeng Gao, Zhipeng Xue, Xing Hu, Weiyi Shang, Xin Xia

    Abstract: The test failure causes analysis is critical since it determines the subsequent way of handling different types of bugs, which is the prerequisite to get the bugs properly analyzed and fixed. After a test case fails, software testers have to inspect the test execution logs line by line to identify its root cause. However, manual root cause determination is often tedious and time-consuming, which c… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  5. arXiv:2404.18891  [pdf, other

    cs.CV cs.AI cs.LG

    IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation

    Authors: Kebin Wu, Wenbin Li, Xiaofei Xiao

    Abstract: The scarcity of labeled data in real-world scenarios is a critical bottleneck of deep learning's effectiveness. Semi-supervised semantic segmentation has been a typical solution to achieve a desirable tradeoff between annotation cost and segmentation performance. However, previous approaches, whether based on consistency regularization or self-training, tend to neglect the contextual knowledge emb… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 7 pages, 2 figures

  6. arXiv:2404.17964  [pdf, other

    cs.SE

    Automating Zero-Shot Patch Porting for Hard Forks

    Authors: Shengyi Pan, You Wang, Zhongxin Liu, Xing Hu, Xin Xia, Shanping Li

    Abstract: Forking is a typical way of code reuse, which provides a simple way for developers to create a variant software (denoted as hard fork) by copying and modifying an existing codebase. Despite of the benefits, forking also leads to duplicate efforts in software maintenance. Developers need to port patches across the hard forks to address similar bugs or implement similar features. Due to the divergen… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted by ISSTA 2024

  7. arXiv:2404.15909  [pdf, other

    cs.CV

    Learning Long-form Video Prior via Generative Pre-Training

    Authors: Jinheng Xie, Jiajun Feng, Zhaoxu Tian, Kevin Qinghong Lin, Yawen Huang, Xi Xia, Nanxu Gong, Xu Zuo, Jiaqi Yang, Yefeng Zheng, Mike Zheng Shou

    Abstract: Concepts involved in long-form videos such as people, objects, and their interactions, can be viewed as following an implicit prior. They are notably complex and continue to pose challenges to be comprehensively learned. In recent years, generative pre-training (GPT) has exhibited versatile capacities in modeling any kind of text content even visual locations. Can this manner work for learning lon… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  8. arXiv:2404.15449  [pdf, other

    cs.CV cs.AI

    ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

    Authors: Weifeng Chen, Jiacheng Zhang, Jie Wu, Hefeng Wu, Xuefeng Xiao, Liang Lin

    Abstract: The rapid development of diffusion models has triggered diverse applications. Identity-preserving text-to-image generation (ID-T2I) particularly has received significant attention due to its wide range of application scenarios like AI portrait and advertising. While existing ID-T2I methods have demonstrated impressive results, several key challenges remain: (1) It is hard to maintain the identity… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  9. arXiv:2404.14649  [pdf, other

    cs.RO

    Bi-CL: A Reinforcement Learning Framework for Robots Coordination Through Bi-level Optimization

    Authors: Zechen Hu, Daigo Shishika, Xuesu Xiao, Xuan Wang

    Abstract: In multi-robot systems, achieving coordinated missions remains a significant challenge due to the coupled nature of coordination behaviors and the lack of global information for individual robots. To mitigate these challenges, this paper introduces a novel approach, Bi-level Coordination Learning (Bi-CL), that leverages a bi-level optimization structure within a centralized training and decentrali… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2404.13686  [pdf, other

    cs.CV

    Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

    Authors: Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao

    Abstract: Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degra… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  11. arXiv:2404.10004  [pdf

    cs.LG physics.soc-ph stat.AP

    A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

    Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

    Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 20 pages, 9 figures

  12. arXiv:2404.09734  [pdf, other

    cs.IT eess.SP

    Weighted Sum-Rate Maximization for Movable Antenna-Enhanced Wireless Networks

    Authors: Biqian Feng, Yongpeng Wu, Xiang-Gen Xia, Chengshan Xiao

    Abstract: This letter investigates the weighted sum rate maximization problem in movable antenna (MA)-enhanced systems. To reduce the computational complexity, we transform it into a more tractable weighted minimum mean square error (WMMSE) problem well-suited for MA. We then adopt the WMMSE algorithm and majorization-minimization algorithm to optimize the beamforming and antenna positions, respectively. Mo… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Wireless Communications Letters

  13. arXiv:2404.09403  [pdf, other

    cs.LG

    Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

    Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

    Abstract: Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most tra… ▽ More

    Submitted 22 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: Accepted by ICLR 2024. Camera Ready Version

  14. arXiv:2404.08677  [pdf, other

    cs.IR cs.AI cs.CL

    PMG : Personalized Multimodal Generation with Large Language Models

    Authors: Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, Xi Xiao

    Abstract: The emergence of large language models (LLMs) has revolutionized the capabilities of text comprehension and generation. Multi-modal generation attracts great attention from both the industry and academia, but there is little work on personalized generation, which has important applications such as recommender systems. This paper proposes the first method for personalized multimodal generation usin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  15. arXiv:2404.07987  [pdf, other

    cs.CV cs.AI cs.LG

    ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

    Authors: Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen

    Abstract: To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating images that align with the image conditional controls. To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicit… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Project Page: https://liming-ai.github.io/ControlNet_Plus_Plus

  16. arXiv:2404.07671  [pdf

    cs.CV

    Deep learning-driven pulmonary arteries and veins segmentation reveals demography-associated pulmonary vasculature anatomy

    Authors: Yuetan Chu, Gongning Luo, Longxi Zhou, Shaodong Cao, Guolin Ma, Xianglin Meng, Juexiao Zhou, Changchun Yang, Dexuan Xie, Ricardo Henao, Xigang Xiao, Lianming Wu, Zhaowen Qiu, Xin Gao

    Abstract: Pulmonary artery-vein segmentation is crucial for diagnosing pulmonary diseases and surgical planning, and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  17. arXiv:2404.07425  [pdf, ps, other

    eess.SP cs.IT

    Precoder Design for User-Centric Network Massive MIMO with Matrix Manifold Optimization

    Authors: Rui Sun, Li You, An-An Lu, Chen Sun, Xiqi Gao, Xiang-Gen Xia

    Abstract: In this paper, we investigate the precoder design for user-centric network (UCN) massive multiple-input multiple-output (mMIMO) downlink with matrix manifold optimization. In UCN mMIMO systems, each user terminal (UT) is served by a subset of base stations (BSs) instead of all the BSs, facilitating the implementation of the system and lowering the dimension of the precoders to be designed. By prov… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 13 pages, 9 figures, journal

  18. arXiv:2404.05595  [pdf, other

    cs.CV

    UniFL: Improve Stable Diffusion via Unified Feedback Learning

    Authors: Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Weilin Huang, Min Zheng, Lean Fu, Guanbin Li

    Abstract: Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications. However, despite these significant advancements, the current competitive solutions still suffer from several limitations, including inferior visual quality, a lack of aesthetic appeal, and inefficient inference, without a comprehensive solutio… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  19. arXiv:2404.04860  [pdf, other

    cs.CV

    ByteEdit: Boost, Comply and Accelerate Generative Image Editing

    Authors: Yuxi Ren, Jie Wu, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean Fu

    Abstract: Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we p… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  20. arXiv:2404.01625  [pdf

    cs.CR

    AAA: an Adaptive Mechanism for Locally Differential Private Mean Estimation

    Authors: Fei Wei, Ergute Bao, Xiaokui Xiao, Yin Yang, Bolin Ding

    Abstract: Local differential privacy (LDP) is a strong privacy standard that has been adopted by popular software systems. The main idea is that each individual perturbs their own data locally, and only submits the resulting noisy version to a data aggregator. Although much effort has been devoted to computing various types of aggregates and building machine learning applications under LDP, research on fund… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  21. arXiv:2404.01291  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Evaluating Text-to-Visual Generation with Image-to-Text Generation

    Authors: Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

    Abstract: Despite significant progress in generative AI, comprehensive evaluation remains challenging because of the lack of effective metrics and standardized benchmarks. For instance, the widely-used CLIPScore measures the alignment between a (generated) image and text prompt, but it fails to produce reliable scores for complex prompts involving compositions of objects, attributes, and relations. One reas… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: We open-source our data, model, and code at: https://github.com/linzhiqiu/t2v_metrics ; Project page: https://linzhiqiu.github.io/papers/vqascore

  22. arXiv:2404.00514  [pdf, other

    cs.RO

    Human-Robot Co-Transportation with Human Uncertainty-Aware MPC and Pose Optimization

    Authors: Al Jaber Mahmud, Amir Hossain Raj, Duc M. Nguyen, Xuesu Xiao, Xuan Wang

    Abstract: This paper proposes a new control algorithm for human-robot co-transportation based on a robot manipulator equipped with a mobile base and a robotic arm. The primary focus is to adapt to human uncertainties through the robot's whole-body dynamics and pose optimization. We introduce an augmented Model Predictive Control (MPC) formulation that explicitly models human uncertainties and contains extra… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures

  23. arXiv:2404.00210  [pdf, other

    cs.RO

    Socially Aware Robot Navigation through Scoring Using Vision-Language Models

    Authors: Daeun Song, Jing Liang, Amirreza Payandeh, Xuesu Xiao, Dinesh Manocha

    Abstract: We propose VLM-Social-Nav, a novel Vision-Language Model (VLM) based navigation approach to compute a robot's trajectory in human-centered environments. Our goal is to make real-time decisions on robot actions that are socially compliant with human expectations. We utilize a perception model to detect important social entities and prompt a VLM to generate guidance for socially compliant robot beha… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  24. arXiv:2403.19248  [pdf, other

    cs.CR cs.NI

    Genos: General In-Network Unsupervised Intrusion Detection by Rule Extraction

    Authors: Ruoyu Li, Qing Li, Yu Zhang, Dan Zhao, Xi Xiao, Yong Jiang

    Abstract: Anomaly-based network intrusion detection systems (A-NIDS) use unsupervised models to detect unforeseen attacks. However, existing A-NIDS solutions suffer from low throughput, lack of interpretability, and high maintenance costs. Recent in-network intelligence (INI) exploits programmable switches to offer line-rate deployment of NIDS. Nevertheless, current in-network NIDS are either model-specific… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: accepted by IEEE International Conference on Computer Communications (INFOCOM 2024)

  25. arXiv:2403.17231  [pdf, other

    cs.RO cs.LG

    Dyna-LfLH: Learning Agile Navigation in Dynamic Environments from Learned Hallucination

    Authors: Saad Abdul Ghani, Zizhao Wang, Peter Stone, Xuesu Xiao

    Abstract: This paper presents a self-supervised learning method to safely learn a motion planner for ground robots to navigate environments with dense and dynamic obstacles. When facing highly-cluttered, fast-moving, hard-to-predict obstacles, classical motion planners may not be able to keep up with limited onboard computation. For learning-based planners, high-quality demonstrations are difficult to acqui… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Submitted to International Conference on Intelligent Robots and Systems (IROS) 2024

  26. arXiv:2403.16437  [pdf, other

    cs.SE cs.CL

    Evaluating Large Language Models with Runtime Behavior of Program Execution

    Authors: Junkai Chen, Zhiyuan Pan, Xing Hu, Zhenhao Li, Ge Li, Xin Xia

    Abstract: Large language models for code (i.e., code LLMs) have shown strong code understanding and generation capabilities. To evaluate the capabilities of code LLMs in various aspects, many benchmarks have been proposed (e.g., HumanEval and ClassEval). Code reasoning is one of the most essential abilities of code LLMs, but existing benchmarks for code reasoning are not sufficient. Typically, they focus on… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  27. arXiv:2403.16419  [pdf, other

    cs.RO

    Terrain-Attentive Learning for Efficient 6-DoF Kinodynamic Modeling on Vertically Challenging Terrain

    Authors: Aniket Datar, Chenhui Pan, Mohammad Nazeri, Anuj Pokhrel, Xuesu Xiao

    Abstract: Wheeled robots have recently demonstrated superior mechanical capability to traverse vertically challenging terrain (e.g., extremely rugged boulders comparable in size to the vehicles themselves). Negotiating such terrain introduces significant variations of vehicle pose in all six Degrees-of-Freedom (DoFs), leading to imbalanced contact forces, varying momentum, and chassis deformation due to non… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  28. arXiv:2403.16034  [pdf, other

    cs.CV

    V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

    Authors: Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li Jin, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma

    Abstract: Recent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to facilitate the real V2X cooperative perception research -- existing datasets either only support Vehicle-to-Infrastructure cooperation or Vehicle-to-Vehicle c… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  29. arXiv:2403.15946  [pdf, other

    cs.MA

    Team Coordination on Graphs: Problem, Analysis, and Algorithms

    Authors: Manshi Limbu, Yanlin Zhou, Gregory Stein, Xuan Wang, Daigo Shishika, Xuesu Xiao

    Abstract: Team Coordination on Graphs with Risky Edges (TCGRE) is a recently emerged problem, in which a robot team collectively reduces graph traversal cost through support from one robot to another when the latter traverses a risky edge. Resembling the traditional Multi-Agent Path Finding (MAPF) problem, both classical and learning-based methods have been proposed to solve TCGRE, however, they lacked eith… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures

  30. arXiv:2403.14774  [pdf, other

    cs.CV cs.CL cs.CR cs.LG

    Few-Shot Adversarial Prompt Learning on Vision-Language Models

    Authors: Yiwei Zhou, Xiaobo Xia, Zhiwei Lin, Bo Han, Tongliang Liu

    Abstract: The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Inspired by the success of vision-language foundation models, previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. However, in practice, they are still unsatisfactory due to several issues, including heavy adap… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 25 pages, 13 tables, 8 figures

  31. arXiv:2403.14144  [pdf, other

    cs.IR

    Understanding the Ranking Loss for Recommendation with Sparse User Feedback

    Authors: Zhutian Lin, Junwei Pan, Shangyu Zhang, Ximei Wang, Xi Xiao, Shudong Huang, Lei Xiao, Jie Jiang

    Abstract: Click-through rate (CTR) prediction holds significant importance in the realm of online advertising. While many existing approaches treat it as a binary classification problem and utilize binary cross entropy (BCE) as the optimization objective, recent advancements have indicated that combining BCE loss with ranking loss yields substantial performance improvements. However, the full efficacy of th… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  32. arXiv:2403.13241  [pdf, other

    cs.LG

    Tackling Noisy Labels with Network Parameter Additive Decomposition

    Authors: Jingyi Wang, Xiaobo Xia, Long Lan, Xinghao Wu, Jun Yu, Wenjing Yang, Bo Han, Tongliang Liu

    Abstract: Given data with noisy labels, over-parameterized deep networks suffer overfitting mislabeled data, resulting in poor generalization. The memorization effect of deep networks shows that although the networks have the ability to memorize all noisy data, they would first memorize clean training data, and then gradually memorize mislabeled training data. A simple and effective method that exploits the… ▽ More

    Submitted 28 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE T-PAMI

  33. arXiv:2403.12544  [pdf, other

    cs.LG

    AffineQuant: Affine Transformation Quantization for Large Language Models

    Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji

    Abstract: The significant resource requirements associated with Large-scale Language Models (LLMs) have generated considerable interest in the development of techniques aimed at compressing and accelerating neural networks. Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the cont… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  34. arXiv:2403.12327  [pdf, other

    cs.CV cs.LG

    GT-Rain Single Image Deraining Challenge Report

    Authors: Howard Zhang, Yunhao Ba, Ethan Yang, Rishi Upadhyay, Alex Wong, Achuta Kadambi, Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan, Chaochao Zheng, Luping Wang, Bin Liu, Sunder Ali Khowaja, Jiseok Yoon, Ik-Hyun Lee, Zhao Zhang, Yanyan Wei, Jiahuan Ren, Suiyi Zhao, Huan Zheng

    Abstract: This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  35. arXiv:2403.11656  [pdf, other

    cs.CV

    LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model

    Authors: Yuxin Cao, Jinghao Li, Xi Xiao, Derui Wang, Minhui Xue, Hao Ge, Wei Liu, Guangwu Hu

    Abstract: Previous work has shown that well-crafted adversarial perturbations can threaten the security of video recognition systems. Attackers can invade such models with a low query budget when the perturbations are semantic-invariant, such as StyleFool. Despite the query efficiency, the naturalness of the minutia areas still requires amelioration, since StyleFool leverages style transfer to all pixels in… ▽ More

    Submitted 27 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to 2024 IEEE Security and Privacy Workshops (SPW)

  36. arXiv:2403.11423  [pdf, other

    cs.CV

    VmambaIR: Visual State Space Model for Image Restoration

    Authors: Yuan Shi, Bin Xia, Xiaoyu Jin, Xing Wang, Tianyu Zhao, Xin Xia, Xuefeng Xiao, Wenming Yang

    Abstract: Image restoration is a critical task in low-level computer vision, aiming to restore high-quality images from degraded inputs. Various models, such as convolutional neural networks (CNNs), generative adversarial networks (GANs), transformers, and diffusion models (DMs), have been employed to address this problem with significant impact. However, CNNs have limitations in capturing long-range depend… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 23 pages

  37. arXiv:2403.09900  [pdf, other

    cs.RO

    DTG : Diffusion-based Trajectory Generation for Mapless Global Navigation

    Authors: Jing Liang, Amirreza Payandeh, Daeun Song, Xuesu Xiao, Dinesh Manocha

    Abstract: We present a novel end-to-end diffusion-based trajectory generation method, DTG, for mapless global navigation in challenging outdoor scenarios with occlusions and unstructured off-road features like grass, buildings, bushes, etc. Given a distant goal, our approach computes a trajectory that satisfies the following goals: (1) minimize the travel distance to the goal; (2) maximize the traversabilit… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 10 pages

  38. Unsupervised Modality-Transferable Video Highlight Detection with Representation Activation Sequence Learning

    Authors: Tingtian Li, Zixun Sun, Xinyu Xiao

    Abstract: Identifying highlight moments of raw video materials is crucial for improving the efficiency of editing videos that are pervasive on internet platforms. However, the extensive work of manually labeling footage has created obstacles to applying supervised methods to videos of unseen categories. The absence of an audio modality that contains valuable cues for highlight detection in many videos also… ▽ More

    Submitted 18 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Transactions on Image Processing, 2024

  39. arXiv:2403.08109  [pdf, other

    cs.RO cs.CV

    VANP: Learning Where to See for Navigation with Self-Supervised Vision-Action Pre-Training

    Authors: Mohammad Nazeri, Junzhe Wang, Amirreza Payandeh, Xuesu Xiao

    Abstract: Humans excel at efficiently navigating through crowds without collision by focusing on specific visual regions relevant to navigation. However, most robotic visual navigation methods rely on deep learning models pre-trained on vision tasks, which prioritize salient objects -- not necessarily relevant to navigation and potentially misleading. Alternative approaches train specialized navigation mode… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures

  40. arXiv:2403.07954  [pdf, other

    cs.LG eess.SP

    Optimizing Polynomial Graph Filters: A Novel Adaptive Krylov Subspace Approach

    Authors: Keke Huang, Wencai Cao, Hoang Ta, Xiaokui Xiao, Pietro Liò

    Abstract: Graph Neural Networks (GNNs), known as spectral graph filters, find a wide range of applications in web networks. To bypass eigendecomposition, polynomial graph filters are proposed to approximate graph filters by leveraging various polynomial bases for filter training. However, no existing studies have explored the diverse polynomial graph filters from a unified perspective for optimization. In… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  41. arXiv:2403.05852  [pdf, other

    cs.CV

    SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking

    Authors: Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du, Jing Tian

    Abstract: Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction, resulting in limited exploration of spectral information and difficul… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  42. arXiv:2403.05787  [pdf, other

    cs.RO

    Scaling Team Coordination on Graphs with Reinforcement Learning

    Authors: Manshi Limbu, Zechen Hu, Xuan Wang, Daigo Shishika, Xuesu Xiao

    Abstract: This paper studies Reinforcement Learning (RL) techniques to enable team coordination behaviors in graph environments with support actions among teammates to reduce the costs of traversing certain risky edges in a centralized manner. While classical approaches can solve this non-standard multi-agent path planning problem by converting the original Environment Graph (EG) into a Joint State Graph (J… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  43. arXiv:2403.03848  [pdf, other

    cs.RO cs.LG

    Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning

    Authors: Zifan Xu, Amir Hossain Raj, Xuesu Xiao, Peter Stone

    Abstract: Recent advances of locomotion controllers utilizing deep reinforcement learning (RL) have yielded impressive results in terms of achieving rapid and robust locomotion across challenging terrain, such as rugged rocks, non-rigid ground, and slippery surfaces. However, while these controllers primarily address challenges underneath the robot, relatively little research has investigated legged mobilit… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  44. arXiv:2403.02867  [pdf, other

    cs.SI cs.LG

    Scalable Continuous-time Diffusion Framework for Network Inference and Influence Estimation

    Authors: Keke Huang, Ruize Gao, Bogdan Cautis, Xiaokui Xiao

    Abstract: The study of continuous-time information diffusion has been an important area of research for many applications in recent years. When only the diffusion traces (cascades) are accessible, cascade-based network inference and influence estimation are two essential problems to explore. Alas, existing methods exhibit limited capability to infer and process networks with more than a few thousand nodes,… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  45. arXiv:2403.02713  [pdf, other

    cs.CL cs.CV cs.HC cs.LG

    Android in the Zoo: Chain-of-Action-Thought for GUI Agents

    Authors: Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

    Abstract: Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of actions of API. Even though the task highly relies on past actions and visual observations, existing studies typical consider little semantic information carried out by intermediate screenshots and screen operations. To address t… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Dataset could be found in https://github.com/IMNearth/CoAT

  46. arXiv:2403.02084  [pdf, other

    cs.CV

    ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

    Authors: Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Min Zheng, Lean Fu

    Abstract: Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 21 pages, 16 figures

  47. arXiv:2403.01942  [pdf, other

    cs.LG

    Mitigating Label Noise on Graph via Topological Sample Selection

    Authors: Yuhao Wu, Jiangchao Yao, Xiaobo Xia, Jun Yu, Ruxin Wang, Bo Han, Tongliang Liu

    Abstract: Despite the success of the carefully-annotated benchmarks, the effectiveness of existing graph neural networks (GNNs) can be considerably impaired in practice when the real-world graph data is noisily labeled. Previous explorations in sample selection have been demonstrated as an effective way for robust learning with noisy labels, however, the conventional studies focus on i.i.d data, and when mo… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  48. arXiv:2402.18818  [pdf, other

    cs.SE cs.CR

    CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection

    Authors: Hao Wang, Zeyu Gao, Chao Zhang, Mingyang Sun, Yuchen Zhou, Han Qiu, Xi Xiao

    Abstract: Binary code similarity detection (BCSD) is a fundamental technique for various application. Many BCSD solutions have been proposed recently, which mostly are embedding-based, but have shown limited accuracy and efficiency especially when the volume of target binaries to search is large. To address this issue, we propose a cost-effective BCSD framework, CEBin, which fuses embedding-based and compar… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  49. arXiv:2402.18607  [pdf, other

    cs.LG cs.AI cs.CR

    Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective

    Authors: Xinjian Luo, Yangfan Jiang, Fei Wei, Yuncheng Wu, Xiaokui Xiao, Beng Chin Ooi

    Abstract: Diffusion models have recently gained significant attention in both academia and industry due to their impressive generative performance in terms of both sampling quality and distribution coverage. Accordingly, proposals are made for sharing pre-trained diffusion models across different organizations, as a way of improving data utilization while enhancing privacy protection by avoiding sharing pri… ▽ More

    Submitted 3 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  50. arXiv:2402.16928  [pdf, other

    cs.SE cs.AI

    CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision

    Authors: Hao Wang, Zeyu Gao, Chao Zhang, Zihan Sha, Mingyang Sun, Yuchen Zhou, Wenyu Zhu, Wenju Sun, Han Qiu, Xi Xiao

    Abstract: Binary code representation learning has shown significant performance in binary analysis tasks. But existing solutions often have poor transferability, particularly in few-shot and zero-shot scenarios where few or no training samples are available for the tasks. To address this problem, we present CLAP (Contrastive Language-Assembly Pre-training), which employs natural language supervision to lear… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.