Skip to main content

Showing 1–50 of 320 results for author: Wu, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03873  [pdf, other

    cs.AI cs.HC

    Investigating Personalized Driving Behaviors in Dilemma Zones: Analysis and Prediction of Stop-or-Go Decisions

    Authors: Ziye Qin, Siyan Li, Guoyuan Wu, Matthew J. Barth, Amr Abdelraouf, Rohit Gupta, Kyungtae Han

    Abstract: Dilemma zones at signalized intersections present a commonly occurring but unsolved challenge for both drivers and traffic operators. Onsets of the yellow lights prompt varied responses from different drivers: some may brake abruptly, compromising the ride comfort, while others may accelerate, increasing the risk of red-light violations and potential safety hazards. Such diversity in drivers' stop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  2. arXiv:2405.03103  [pdf, other

    cs.LG cs.CV

    Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

    Authors: Jordan Dotzel, Yuzong Chen, Bahaa Kotb, Sushma Prasad, Gang Wu, Sheng Li, Mohamed S. Abdelfattah, Zhiru Zhang

    Abstract: Large language models (LLMs) have recently achieved state-of-the-art performance across various tasks, yet due to their large computational requirements, they struggle with strict latency and power demands. Deep neural network (DNN) quantization has traditionally addressed these limitations by converting models to low-precision integer formats. Yet recently alternative formats, such as Normal Floa… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  3. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  4. arXiv:2404.11214  [pdf, other

    cs.CV cs.AI

    Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions

    Authors: Chuheng Wei, Guoyuan Wu, Matthew J. Barth

    Abstract: A significant challenge in the field of object detection lies in the system's performance under non-ideal imaging conditions, such as rain, fog, low illumination, or raw Bayer images that lack ISP processing. Our study introduces "Feature Corrective Transfer Learning", a novel approach that leverages transfer learning and a bespoke loss function to facilitate the end-to-end detection of objects in… ▽ More

    Submitted 19 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 2024 CVPR UG2+ Workshop

  5. arXiv:2404.11181  [pdf, other

    cs.LG cs.AI cs.RO

    KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections

    Authors: Chuheng Wei, Guoyuan Wu, Matthew J. Barth, Amr Abdelraouf, Rohit Gupta, Kyungtae Han

    Abstract: Reliable prediction of vehicle trajectories at signalized intersections is crucial to urban traffic management and autonomous driving systems. However, it presents unique challenges, due to the complex roadway layout at intersections, involvement of traffic signal controls, and interactions among different types of road users. To address these issues, we present in this paper a novel model called… ▽ More

    Submitted 19 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 2024 CVPR AICity Workshop

  6. arXiv:2404.09842  [pdf, other

    cs.CV

    STMixer: A One-Stage Sparse Action Detector

    Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

    Abstract: Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and the feature sampling is constrained inside the box, failing to effectively leverage richer context inf… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Extended version of the paper arXiv:2303.15879 presented at CVPR 2023. Accepted by TPAMI 2024

  7. arXiv:2404.06692  [pdf, other

    cs.CV

    Perception-Oriented Video Frame Interpolation via Asymmetric Blending

    Authors: Guangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, Qingqing Zheng

    Abstract: Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. These issues can be traced back to two pivotal factors: unavoidable motion errors and misalignment in supervision. In practice, motion estimates often prove to be error-prone, resulting in misaligned features. Furthermore, the reconstruction loss tends to bring… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  8. arXiv:2404.04565  [pdf, other

    cs.CV

    SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

    Authors: Tao Wu, Runyu He, Gangshan Wu, Limin Wang

    Abstract: Video-based visual relation detection tasks, such as video scene graph generation, play important roles in fine-grained video understanding. However, current video visual relation detection datasets have two main limitations that hinder the progress of research in this area. First, they do not explore complex human-human interactions in multi-person scenarios. Second, the relation types of existin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  9. arXiv:2404.00653  [pdf, other

    cs.CV

    Dual DETRs for Multi-Label Temporal Action Detection

    Authors: Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, Limin Wang

    Abstract: Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task. However, these approaches primarily followed DETR to predict actions at the instance level (i.e., identify each action by its center point), leading… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  10. arXiv:2404.00260  [pdf, other

    cs.CV eess.IV

    Exploiting Self-Supervised Constraints in Image Super-Resolution

    Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

    Abstract: Recent advances in self-supervised learning, predominantly studied in high-level visual tasks, have been explored in low-level image processing. This paper introduces a novel self-supervised constraint for single image super-resolution, termed SSC-SR. SSC-SR uniquely addresses the divergence in image complexity by employing a dual asymmetric paradigm and a target model updated via exponential movi… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: ICME 2024

  11. arXiv:2404.00246  [pdf, other

    cs.CL cs.AI cs.HC

    Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World

    Authors: Guande Wu, Chen Zhao, Claudio Silva, He He

    Abstract: Language agents that interact with the world on their own have great potential for automating digital tasks. While large language model (LLM) agents have made progress in understanding and executing tasks such as textual games and webpage control, many real-world tasks also require collaboration with humans or other LLMs in equal roles, which involves intent understanding, task coordination, and c… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  12. arXiv:2403.19586  [pdf, other

    cs.CV cs.GR

    TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering

    Authors: Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu

    Abstract: Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  13. arXiv:2403.15576  [pdf, other

    cs.LG cs.CV

    Data-centric Prediction Explanation via Kernelized Stein Discrepancy

    Authors: Mahtab Sarvmaili, Hassan Sajjad, Ga Wu

    Abstract: Existing example-based prediction explanation methods often bridge test and training data points through the model's parameters or latent representations. While these methods offer clues to the causes of model predictions, they often exhibit innate shortcomings, such as incurring significant computational overhead or producing coarse-grained explanations. This paper presents a Highly-precise and D… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  14. arXiv:2403.06452  [pdf, other

    cs.CV

    Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

    Authors: Guangyang Wu, Xiaohong Liu, Jun Jia, Xuehao Cui, Guangtao Zhai

    Abstract: In the digital era, QR codes serve as a linchpin connecting virtual and physical realms. Their pervasive integration across various applications highlights the demand for aesthetically pleasing codes without compromised scannability. However, prevailing methods grapple with the intrinsic challenge of balancing customization and scannability. Notably, stable-diffusion models have ushered in an epoc… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  15. arXiv:2403.05304  [pdf, other

    cs.RO

    Spatiotemporal Predictive Pre-training for Robotic Motor Control

    Authors: Jiange Yang, Bei Liu, Jianlong Fu, Bocheng Pan, Gangshan Wu, Limin Wang

    Abstract: Robotic motor control necessitates the ability to predict the dynamics of environments and interaction objects. However, advanced self-supervised pre-trained visual representations (PVRs) in robotic motor control, leveraging large-scale egocentric videos, often focus solely on learning the static content features of sampled image frames. This neglects the crucial temporal motion clues in human vid… ▽ More

    Submitted 14 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 25 pages, 6 figures, 11 tables

  16. arXiv:2403.03425  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    Sculpting Molecules in 3D: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

    Authors: Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo wang, Xiaoyu Zhang, Weitao Du

    Abstract: The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  17. arXiv:2403.01501  [pdf, other

    cs.LG cs.CR

    Applying Self-supervised Learning to Network Intrusion Detection for Network Flows with Graph Neural Network

    Authors: Renjie Xu, Guangwei Wu, Weiping Wang, Xing Gao, An He, Zhengpeng Zhang

    Abstract: Graph Neural Networks (GNNs) have garnered intensive attention for Network Intrusion Detection System (NIDS) due to their suitability for representing the network traffic flows. However, most present GNN-based methods for NIDS are supervised or semi-supervised. Network flows need to be manually annotated as supervisory labels, a process that is time-consuming or even impossible, making NIDS diffic… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 15pages,8figures

  18. arXiv:2403.00170  [pdf, other

    cs.SE cs.PL

    AlloyASG: Alloy Predicate Code Representation as a Compact Structurally Balanced Graph

    Authors: Guanxuan Wu, Allison Sullivan

    Abstract: Writing declarative models has numerous benefits, ranging from automated reasoning and correction of design-level properties before systems are built to automated testing and debugging of their implementations after they are built. Unfortunately, the model itself needs to be correct to gain these benefits. Alloy is a commonly used modeling language that has several existing efforts to repair fault… ▽ More

    Submitted 4 May, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: 12 pages

  19. ARTiST: Automated Text Simplification for Task Guidance in Augmented Reality

    Authors: Guande Wu, Jing Qian, Sonia Castelo, Shaoyu Chen, Joao Rulff, Claudio Silva

    Abstract: Text presented in augmented reality provides in-situ, real-time information for users. However, this content can be challenging to apprehend quickly when engaging in cognitively demanding AR tasks, especially when it is presented on a head-mounted display. We propose ARTiST, an automatic text simplification system that uses a few-shot prompt and GPT-3 models to specifically optimize the text lengt… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Conditionally accepted by CHI '24

    ACM Class: H.1.2; I.2.7

  20. arXiv:2402.10834  [pdf, other

    stat.AP cs.CY

    Agent-based Simulation Evaluation of CBD Tolling: A Case Study from New York City

    Authors: Qingnan Liang, Ruili Yao, Ruixuan Zhang, Zhibin Chen, Guoyuan Wu

    Abstract: Congestion tollings have been widely developed and adopted as an effective tool to mitigate urban traffic congestion and enhance transportation system sustainability. Nevertheless, these tolling schemes are often tailored on a city-by-city or even area-by-area basis, and the cost of conducting field experiments often makes the design and evaluation process challenging. In this work, we leverage MA… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted by 2024 IEEE Forum on Integrated and Sustainable Transportation Systems

  21. arXiv:2402.08085  [pdf, other

    cs.LG cs.AI cs.CG

    Message Detouring: A Simple Yet Effective Cycle Representation for Expressive Graph Learning

    Authors: Ziquan Wei, Tingting Dan, Guorong Wu

    Abstract: Graph learning is crucial in the fields of bioinformatics, social networks, and chemicals. Although high-order graphlets, such as cycles, are critical to achieving an informative graph representation for node classification, edge prediction, and graph recognition, modeling high-order topological characteristics poses significant computational challenges, restricting its widespread applications in… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 16 pages, 5 figures

  22. arXiv:2402.02968  [pdf, other

    cs.CV cs.LG

    Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

    Authors: Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, Ling Shao, Yi Yang, Bojun Gao, Qun Li, Guobin Wu

    Abstract: Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilitie… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 24 pages, 9 figures, 1 table

  23. arXiv:2401.16433  [pdf, other

    cs.IR cs.LG

    Within-basket Recommendation via Neural Pattern Associator

    Authors: Kai Luo, Tianshu Shen, Lan Yao, Ga Wu, Aaron Liblong, Istvan Fehervari, Ruijian An, Jawad Ahmed, Harshit Mishra, Charu Pujari

    Abstract: Within-basket recommendation (WBR) refers to the task of recommending items to the end of completing a non-empty shopping basket during a shopping session. While the latest innovations in this space demonstrate remarkable performance improvement on benchmark datasets, they often overlook the complexity of user behaviors in practice, such as 1) co-existence of multiple shopping intentions, 2) multi… ▽ More

    Submitted 14 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 13 pages, 9 figures

  24. arXiv:2401.14729  [pdf, other

    cs.CV

    Sketch and Refine: Towards Fast and Accurate Lane Detection

    Authors: Chao Chen, Jie Liu, Chang Zhou, Jie Tang, Gangshan Wu

    Abstract: Lane detection is to determine the precise location and shape of lanes on the road. Despite efforts made by current methods, it remains a challenging task due to the complexity of real-world scenarios. Existing approaches, whether proposal-based or keypoint-based, suffer from depicting lanes effectively and efficiently. Proposal-based methods detect lanes by distinguishing and regressing a collect… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  25. arXiv:2401.11840  [pdf, other

    cs.LG cs.AI

    Learning to Approximate Adaptive Kernel Convolution on Graphs

    Authors: Jaeyoon Sim, Sooyeon Jeon, InJun Choi, Guorong Wu, Won Hwa Kim

    Abstract: Various Graph Neural Networks (GNNs) have been successful in analyzing data in non-Euclidean spaces, however, they have limitations such as oversmoothing, i.e., information becomes excessively averaged as the number of hidden layers increases. The issue stems from the intrinsic formulation of conventional graph convolution where the nodal features are aggregated from a direct neighborhood per laye… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 15 pages, Accepted to AAAI 2024

  26. arXiv:2401.10636  [pdf, other

    cs.SE

    Catch the Butterfly: Peeking into the Terms and Conflicts among SPDX Licenses

    Authors: Tao Liu, Chengwei Liu, Tianwei Liu, He Wang, Gaofei Wu, Yang Liu, Yuqing Zhang

    Abstract: The widespread adoption of third-party libraries (TPLs) in software development has accelerated the creation of modern software. However, this convenience comes with potential legal risks. Developers may inadvertently violate the licenses of TPLs, leading to legal issues. While existing studies have explored software licenses and potential incompatibilities, these studies often focus on a limited… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 10 pages, 6 figures, accepted by SANER2024

  27. arXiv:2401.06052  [pdf, other

    cs.CV cs.GR

    Fast High Dynamic Range Radiance Fields for Dynamic Scenes

    Authors: Guanjun Wu, Taoran Yi, Jiemin Fang, Wenyu Liu, Xinggang Wang

    Abstract: Neural Radiances Fields (NeRF) and their extensions have shown great success in representing 3D scenes and synthesizing novel-view images. However, most NeRF methods take in low-dynamic-range (LDR) images, which may lose details, especially with nonuniform illumination. Some previous NeRF methods attempt to introduce high-dynamic-range (HDR) techniques but mainly target static scenes. To extend HD… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 3DV 2024. Project page: https://guanjunwu.github.io/HDR-HexPlane

  28. arXiv:2401.05633  [pdf, other

    cs.CV eess.IV

    Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach

    Authors: Gang Wu, Junjun Jiang, Junpeng Jiang, Xianming Liu

    Abstract: Recent progress in single-image super-resolution (SISR) has achieved remarkable performance, yet the computational costs of these methods remain a challenge for deployment on resource-constrained devices. Especially for transformer-based methods, the self-attention mechanism in such models brings great breakthroughs while incurring substantial computational costs. To tackle this issue, we introduc… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: submitting to TIP

  29. arXiv:2312.16954  [pdf, other

    cs.CR

    Blockchain-based Privacy-Preserving Public Key Searchable Encryption with Strong Traceability

    Authors: Yue Han, Jinguang Han, Weizhi Meng, Jianchang Lai, Ge Wu

    Abstract: Public key searchable encryption (PKSE) scheme allows data users to search over encrypted data. To identify illegal users, many traceable PKSE schemes have been proposed. However, existing schemes cannot trace the keywords which illegal users searched and protect users' privacy simultaneously. In some practical applications, tracing both illegal users' identities and the keywords which they search… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  30. arXiv:2312.06348  [pdf, other

    cs.LG

    DiffAIL: Diffusion Adversarial Imitation Learning

    Authors: Bingzheng Wang, Guoqiang Wu, Teng Pang, Yan Zhang, Yilong Yin

    Abstract: Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks. The current popular approach is the Adversarial Imitation Learning (AIL) framework, which matches expert state-action occupancy measures to obtain a surrogate reward for forward reinforcement learning. However, the traditional discriminator is a simple binary classifier and doesn't learn… ▽ More

    Submitted 11 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI 2024

  31. arXiv:2312.02310  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding

    Authors: Yizhou Wang, Ruiyi Zhang, Haoliang Wang, Uttaran Bhattacharya, Yun Fu, Gang Wu

    Abstract: Recent advancements in language-model-based video understanding have been progressing at a remarkable pace, spurred by the introduction of Large Language Models (LLMs). However, the focus of prior research has been predominantly on devising a projection layer that maps video features to tokens, an approach that is both rudimentary and inefficient. In our study, we introduce a cutting-edge framewor… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  32. arXiv:2311.13254  [pdf, other

    cs.CV cs.AI eess.IV

    DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

    Authors: Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, Dacheng Tao, Tianyou Chai

    Abstract: Video semantic segmentation is a pivotal aspect of video representation learning. However, significant domain shifts present a challenge in effectively learning invariant spatio-temporal features across the labeled source domain and unlabeled target domain for video semantic segmentation. To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, whic… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 18 pages,9 figures

  33. arXiv:2311.12341  [pdf, other

    cs.GT

    Game Theoretic Application to Intersection Management: A Literature Review

    Authors: Ziye Qin, Ang Ji, Zhanbo Sun, Guoyuan Wu, Peng Hao, Xishun Liao

    Abstract: The emergence of vehicle-to-everything (V2X) technology offers new insights into intersection management. This, however, has also presented new challenges, such as the need to understand and model the interactions of traffic participants, including their competition and cooperation behaviors. Game theory has been widely adopted to study rationally selfish or cooperative behaviors during interactio… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  34. arXiv:2311.11509  [pdf, other

    cs.CL cs.LG

    Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

    Authors: Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan

    Abstract: In recent years, Large Language Models (LLM) have emerged as pivotal tools in various applications. However, these models are susceptible to adversarial prompt attacks, where attackers can carefully curate input strings that mislead LLMs into generating incorrect or undesired outputs. Previous work has revealed that with relatively simple yet effective attacks based on discrete optimization, it is… ▽ More

    Submitted 18 February, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

  35. arXiv:2311.03149  [pdf, other

    cs.CV

    Asymmetric Masked Distillation for Pre-Training Small Foundation Models

    Authors: Zhiyu Zhao, Bingkun Huang, Sen Xing, Gangshan Wu, Yu Qiao, Limin Wang

    Abstract: Self-supervised foundation models have shown great potential in computer vision thanks to the pre-training paradigm of masked autoencoding. Scale is a primary factor influencing the performance of these foundation models. However, these large foundation models often result in high computational cost. This paper focuses on pre-training relatively small vision transformer models that could be effici… ▽ More

    Submitted 1 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024

  36. arXiv:2310.15140  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models

    Authors: Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun

    Abstract: Safety alignment of Large Language Models (LLMs) can be compromised with manual jailbreak attacks and (automatic) adversarial attacks. Recent studies suggest that defending against these attacks is possible: adversarial attacks generate unlimited but unreadable gibberish prompts, detectable by perplexity-based filters; manual jailbreak attacks craft readable prompts, but their limited number due t… ▽ More

    Submitted 14 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Version 2 updates: Added comparison of three more evaluation methods and their reliability check using human labeling. Added results for jailbreaking Llama2 (individual behavior) and included complexity and hyperparameter analysis. Revised objectives for prompt leaking. Other minor changes made

  37. arXiv:2310.12531  [pdf, other

    cs.CL

    ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding

    Authors: Guojun Wu

    Abstract: Most multilingual vision-and-language (V&L) research aims to accomplish multilingual and multimodal capabilities within one model. However, the scarcity of multilingual captions for images has hindered the development. To overcome this obstacle, we propose ICU, Image Caption Understanding, which divides a V&L task into two stages: a V&L model performs image captioning in English, and a multilingua… ▽ More

    Submitted 5 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 (Findings)

  38. arXiv:2310.08529  [pdf, other

    cs.CV cs.GR

    GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

    Authors: Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang

    Abstract: In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can help generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine ge… ▽ More

    Submitted 5 December, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Project page: https://taoranyi.com/gaussiandreamer/

  39. arXiv:2310.08528  [pdf, other

    cs.CV cs.GR

    4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

    Authors: Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang

    Abstract: Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to guarantee. To achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency, we propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS… ▽ More

    Submitted 7 December, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Project page: https://guanjunwu.github.io/4dgs/

  40. arXiv:2310.07756  [pdf, other

    cs.LG

    Self-supervised Representation Learning From Random Data Projectors

    Authors: Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs

    Abstract: Self-supervised representation learning~(SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. While augmentation-based SSRL algorithms push the boundaries of performance in computer vision and natural language processing, they are often not directly applicable to other data modalities, and can conflict with applicati… ▽ More

    Submitted 20 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Published as a conference paper of ICLR 2024. https://openreview.net/pdf?id=EpYnZpDpsQ

  41. arXiv:2309.16715  [pdf, other

    cs.CV cs.AI

    MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D Vehicle Reconstruction in Autonomous Driving

    Authors: Yibo Liu, Kelly Zhu, Guile Wu, Yuan Ren, Bingbing Liu, Yang Liu, Jinjun Shan

    Abstract: Reconstructing 3D vehicles from noisy and sparse partial point clouds is of great significance to autonomous driving. Most existing 3D reconstruction methods cannot be directly applied to this problem because they are elaborately designed to deal with dense inputs with trivial noise. In this work, we propose a novel framework, dubbed MV-DeepSDF, which estimates the optimal Signed Distance Function… ▽ More

    Submitted 21 August, 2023; originally announced September 2023.

  42. arXiv:2309.16110  [pdf, other

    cs.CV

    Learning Effective NeRFs and SDFs Representations with 3D Generative Adversarial Networks for 3D Object Generation: Technical Report for ICCV 2023 OmniObject3D Challenge

    Authors: Zheyuan Yang, Yibo Liu, Guile Wu, Tongtong Cao, Yuan Ren, Yang Liu, Bingbing Liu

    Abstract: In this technical report, we present a solution for 3D object generation of ICCV 2023 OmniObject3D Challenge. In recent years, 3D object generation has made great process and achieved promising results, but it remains a challenging task due to the difficulty of generating complex, textured and high-fidelity results. To resolve this problem, we study learning effective NeRFs and SDFs representation… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  43. arXiv:2309.08095  [pdf, other

    cs.RO

    RELAX: Reinforcement Learning Enabled 2D-LiDAR Autonomous System for Parsimonious UAVs

    Authors: Guanlin Wu, Zhuokai Zhao, Yutao He

    Abstract: Unmanned Aerial Vehicles (UAVs) have become increasingly prominence in recent years, finding applications in surveillance, package delivery, among many others. Despite considerable efforts in developing algorithms that enable UAVs to navigate through complex unknown environments autonomously, they often require expensive hardware and sensors, such as RGB-D cameras and 3D-LiDAR, leading to a persis… ▽ More

    Submitted 2 March, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  44. arXiv:2309.06023  [pdf, other

    cs.CV

    Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration

    Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

    Abstract: Contrastive learning has emerged as a prevailing paradigm for high-level vision tasks, which, by introducing properly negative samples, has also been exploited for low-level vision tasks to achieve a compact optimization space to account for their ill-posed nature. However, existing methods rely on manually predefined and task-oriented negatives, which often exhibit pronounced task-specific biases… ▽ More

    Submitted 25 January, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: Camera Ready Version. Accepted to The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  45. arXiv:2309.05115  [pdf, other

    eess.SY cs.HC

    Real-time Learning of Driving Gap Preference for Personalized Adaptive Cruise Control

    Authors: Zhouqiao Zhao, Xishun Liao, Amr Abdelraouf, Kyungtae Han, Rohit Gupta, Matthew J. Barth, Guoyuan Wu

    Abstract: Advanced Driver Assistance Systems (ADAS) are increasingly important in improving driving safety and comfort, with Adaptive Cruise Control (ACC) being one of the most widely used. However, pre-defined ACC settings may not always align with driver's preferences and habits, leading to discomfort and potential safety issues. Personalized ACC (P-ACC) has been proposed to address this problem, but most… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  46. arXiv:2309.01113  [pdf, other

    cs.CV

    Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for Loss-free Multi-Exposure Image Fusion

    Authors: Guanyao Wu, Hongming Fu, Jinyuan Liu, Long Ma, Xin Fan, Risheng Liu

    Abstract: Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels. Despite its advancements, the field grapples with challenges, notably the reliance on manual designs for network structures and loss functions, and the constraints of utilizing simulated reference images as ground truths. Consequently, current… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  47. arXiv:2308.13505  [pdf, other

    cs.CV

    Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation

    Authors: Jiaming Zhang, Yutao Cui, Gangshan Wu, Limin Wang

    Abstract: Current prevailing Video Object Segmentation (VOS) methods usually perform dense matching between the current and reference frames after extracting their features. One on hand, the decoupled modeling restricts the targets information propagation only at high-level feature space. On the other hand, the pixel-wise matching leads to a lack of holistic understanding of the targets. To overcome these i… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: 9 pages, 8 figures

  48. arXiv:2308.13133  [pdf, other

    cs.CV

    AccFlow: Backward Accumulation for Long-Range Optical Flow

    Authors: Guangyang Wu, Xiaohong Liu, Kunming Luo, Xi Liu, Qingqing Zheng, Shuaicheng Liu, Xinyang Jiang, Guangtao Zhai, Wenyi Wang

    Abstract: Recent deep learning-based optical flow estimators have exhibited impressive performance in generating local flows between consecutive frames. However, the estimation of long-range flows between distant frames, particularly under complex object deformation and large motion occlusion, remains a challenging task. One promising solution is to accumulate local flows explicitly or implicitly to obtain… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  49. arXiv:2308.10307  [pdf, other

    cs.RO cs.AI

    UAV 3-D path planning based on MOEA/D with adaptive areal weight adjustment

    Authors: Yougang Xiao, Hao Yang, Huan Liu, Keyu Wu, Guohua Wu

    Abstract: Unmanned aerial vehicles (UAVs) are desirable platforms for time-efficient and cost-effective task execution. 3-D path planning is a key challenge for task decision-making. This paper proposes an improved multi-objective evolutionary algorithm based on decomposition (MOEA/D) with an adaptive areal weight adjustment (AAWA) strategy to make a tradeoff between the total flight path length and the ter… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 23 pages,11 figures

  50. arXiv:2308.10061  [pdf, other

    cs.CV

    DPL: Decoupled Prompt Learning for Vision-Language Models

    Authors: Chen Xu, Yuhan Zhu, Guozhen Zhang, Haocheng Shen, Yixuan Liao, Xiaoxin Chen, Gangshan Wu, Limin Wang

    Abstract: Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e.g., CLIP) to downstream tasks. However, current methods tend to overfit to seen categories, thereby limiting their generalization ability for unseen classes. In this paper, we propose a new method, Decoupled Prompt Learning (DPL), which reformulates the attention in prompt lea… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures, 8 tables