Skip to main content

Showing 1–50 of 1,793 results for author: Xu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05579  [pdf

    cs.HC eess.SY

    Intelligent EC Rearview Mirror: Enhancing Driver Safety with Dynamic Glare Mitigation via Cloud Edge Collaboration

    Authors: Junyi Yang, Zefei Xu, Huayi Lai, Hongjian Chen, Sifan Kong, Yutong Wu, Huan Yang

    Abstract: Sudden glare from trailing vehicles significantly increases driving safety risks. Existing anti-glare technologies such as electronic, manually-adjusted, and electrochromic rearview mirrors, are expensive and lack effective adaptability in different lighting conditions. To address these issues, our research introduces an intelligent rearview mirror system utilizing novel all-liquid electrochromic… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05409  [pdf, other

    cs.LG

    Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

    Authors: Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu

    Abstract: Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks using anchor functions. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solu… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. Masked Graph Transformer for Large-Scale Recommendation

    Authors: Huiyuan Chen, Zhe Xu, Chin-Chia Michael Yeh, Vivian Lai, Yan Zheng, Minghua Xu, Hanghang Tong

    Abstract: Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturi… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2405.03917  [pdf, other

    cs.LG

    KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

    Authors: Tianyi Zhang, Jonah Yi, Zhaozhuo Xu, Anshumali Shrivastava

    Abstract: Efficient deployment of Large Language Models (LLMs) requires batching multiple requests together to improve throughput. As the batch size, context length, or model size increases, the size of the key and value (KV) cache can quickly become the main contributor to GPU memory usage and the bottleneck of inference latency. Quantization has emerged as an effective technique for KV cache compression,… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2405.03095  [pdf, other

    cs.LG math-ph

    Loss Jump During Loss Switch in Solving PDEs with Neural Networks

    Authors: Zhiwei Wang, Lulu Zhang, Zhongwang Zhang, Zhi-Qin John Xu

    Abstract: Using neural networks to solve partial differential equations (PDEs) is gaining popularity as an alternative approach in the scientific computing community. Neural networks can integrate different types of information into the loss function. These include observation data, governing equations, and variational forms, etc. These loss functions can be broadly categorized into two types: observation d… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  6. Easy over Hard: A Simple Baseline for Test Failures Causes Prediction

    Authors: Zhipeng Gao, Zhipeng Xue, Xing Hu, Weiyi Shang, Xin Xia

    Abstract: The test failure causes analysis is critical since it determines the subsequent way of handling different types of bugs, which is the prerequisite to get the bugs properly analyzed and fixed. After a test case fails, software testers have to inspect the test execution logs line by line to identify its root cause. However, manual root cause determination is often tedious and time-consuming, which c… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  7. arXiv:2405.02544  [pdf, ps, other

    cs.CR

    A Novel Endorsement Protocol to Secure BFT-Based Consensus in Permissionless Blockchain

    Authors: Ziqiang Xu, Ahmad Salehi Shahraki, Naveen Chilamkurti

    Abstract: Permissionless blockchain technology offers numerous potential benefits for decentralised applications, such as security, transparency, and openness. BFT-based consensus mechanisms are widely adopted in the permissioned blockchain to meet the high scalability requirements of the network. Sybil attacks are one of the most potential threats when applying BFT-based consensus mechanisms in permissionl… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted at IEEE Wireless Communications and Networking Conference (WCNC), 2024

  8. arXiv:2405.02341  [pdf, other

    cs.CR cs.LG

    Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

    Authors: Wei-Ning Chen, Berivan Isik, Peter Kairouz, Albert No, Sewoong Oh, Zheng Xu

    Abstract: We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  9. arXiv:2405.01615  [pdf, other

    cs.NE cs.LG

    Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning

    Authors: Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu

    Abstract: Evolution Strategies (ES) have emerged as a competitive alternative for model-free reinforcement learning, showcasing exemplary performance in tasks like Mujoco and Atari. Notably, they shine in scenarios with imperfect reward functions, making them invaluable for real-world applications where dense reward signals may be elusive. Yet, an inherent assumption in ES, that all input features are task-… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 16 pages, including proofs in the appendix

  10. arXiv:2405.01607  [pdf, other

    cs.LG cs.CV

    Wildfire Risk Prediction: A Review

    Authors: Zhengsen Xu, Jonathan Li, Linlin Xu

    Abstract: Wildfires have significant impacts on global vegetation, wildlife, and humans. They destroy plant communities and wildlife habitats and contribute to increased emissions of carbon dioxide, nitrogen oxides, methane, and other pollutants. The prediction of wildfires relies on various independent variables combined with regression or machine learning methods. In this technical review, we describe the… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  11. arXiv:2405.01319  [pdf, other

    cs.LG cs.CE

    Data Scoping: Effectively Learning the Evolution of Generic Transport PDEs

    Authors: Jiangce Chen, Wenzhuo Xu, Zeda Xu, Noelia Grande Gutiérrez, Sneha Prabha Narra, Christopher McComb

    Abstract: Transport phenomena (e.g., fluid flows) are governed by time-dependent partial differential equations (PDEs) describing mass, momentum, and energy conservation, and are ubiquitous in many engineering applications. However, deep learning architectures are fundamentally incompatible with the simulation of these PDEs. This paper clearly articulates and then solves this incompatibility. The local-depe… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  12. arXiv:2405.01041  [pdf, other

    cs.LG

    Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation

    Authors: Tianyi Chen, Zhi-Qin John Xu

    Abstract: Neural networks have been extensively applied to a variety of tasks, achieving astounding results. Applying neural networks in the scientific field is an important research direction that is gaining increasing attention. In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference during application. Additionally, comparing neural net… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  13. arXiv:2405.00987  [pdf, other

    cs.LG

    S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic

    Authors: Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An, Haipeng Chen, Sanjay Chawla

    Abstract: Learning expressive stochastic policies instead of deterministic ones has been proposed to achieve better stability, sample complexity, and robustness. Notably, in Maximum Entropy Reinforcement Learning (MaxEnt RL), the policy is modeled as an expressive Energy-Based Model (EBM) over the Q-values. However, this formulation requires the estimation of the entropy of such EBMs, which is an open probl… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at ICLR 2024

  14. arXiv:2404.19702  [pdf, other

    cs.CV

    GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

    Authors: Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu

    Abstract: We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian para… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project webpage: https://sai-bi.github.io/project/gs-lrm/

  15. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  16. arXiv:2404.19403  [pdf, other

    cs.RO cs.AI

    Transformer-Enhanced Motion Planner: Attention-Guided Sampling for State-Specific Decision Making

    Authors: Lei Zhuang, Jingdong Zhao, Yuntao Li, Zichun Xu, Liangliang Zhao, Hong Liu

    Abstract: Sampling-based motion planning (SBMP) algorithms are renowned for their robust global search capabilities. However, the inherent randomness in their sampling mechanisms often result in inconsistent path quality and limited search efficiency. In response to these challenges, this work proposes a novel deep learning-based motion planning framework, named Transformer-Enhanced Motion Planner (TEMP), w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  17. arXiv:2404.19097  [pdf, other

    cs.HC

    Exploring the Capability of LLMs in Performing Low-Level Visual Analytic Tasks on SVG Data Visualizations

    Authors: Zhongzheng Xu, Emily Wall

    Abstract: Data visualizations help extract insights from datasets, but reaching these insights requires decomposing high level goals into low-level analytic tasks that can be complex due to varying degrees of data literacy and visualization experience. Recent advancements in large language models (LLMs) have shown promise for lowering barriers for users to achieve tasks such as writing code and may likewise… ▽ More

    Submitted 30 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  18. arXiv:2404.18886  [pdf, other

    cs.LG cs.AI

    A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

    Authors: Yiyuan Yang, Ming Jin, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, Jiang Bian, Shirui Pan, Qingsong Wen

    Abstract: The study of time series data is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Ongoing work; 27 pages, 8 figures, 2 tables; Github Repo: https://github.com/yyysjz1997/Awesome-TimeSeries-SpatioTemporal-Diffusion-Model

  19. arXiv:2404.18814  [pdf, ps, other

    cs.CR

    Belt and Brace: When Federated Learning Meets Differential Privacy

    Authors: Xuebin Ren, Shusen Yang, Cong Zhao, Julie McCann, Zongben Xu

    Abstract: Federated learning (FL) has great potential for large-scale machine learning (ML) without exposing raw data.Differential privacy (DP) is the de facto standard of privacy protection with provable guarantees.Advances in ML suggest that DP would be a perfect fit for FL with comprehensive privacy preservation. Hence, extensive efforts have been devoted to achieving practically usable FL with DP, which… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, accepted by and to appear in Communications of the ACM (CACM)

  20. arXiv:2404.18533  [pdf, other

    cs.AI cs.HC

    Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

    Authors: Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang

    Abstract: Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic a… ▽ More

    Submitted 29 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  21. arXiv:2404.18337  [pdf, ps, other

    cs.DS

    Additive Spanner Lower Bounds with Optimal Inner Graph Structure

    Authors: Greg Bodwin, Gary Hoppenworth, Virginia Vassilevska Williams, Nicole Wein, Zixuan Xu

    Abstract: We construct $n$-node graphs on which any $O(n)$-size spanner has additive error at least $+Ω(n^{3/17})$, improving on the previous best lower bound of $Ω(n^{1/7})$ [Bodwin-Hoppenworth FOCS '22]. Our construction completes the first two steps of a particular three-step research program, introduced in prior work and overviewed here, aimed at producing tight bounds for the problem by aligning aspect… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: ICALP 2024

  22. arXiv:2404.17809  [pdf, other

    cs.CL cs.AI

    Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction

    Authors: Guozheng Li, Peng Wang, Wenjun Ke, Yikai Guo, Ke Ji, Ziyu Shang, Jiajun Liu, Zijie Xu

    Abstract: Relation extraction (RE) aims to identify relations between entities mentioned in texts. Although large language models (LLMs) have demonstrated impressive in-context learning (ICL) abilities in various tasks, they still suffer from poor performances compared to most supervised fine-tuned RE methods. Utilizing ICL for RE with LLMs encounters two challenges: (1) retrieving good demonstrations from… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: IJCAI 2024

  23. arXiv:2404.17807  [pdf, other

    cs.CL cs.AI

    Meta In-Context Learning Makes Large Language Models Better Zero and Few-Shot Relation Extractors

    Authors: Guozheng Li, Peng Wang, Jiajun Liu, Yikai Guo, Ke Ji, Ziyu Shang, Zijie Xu

    Abstract: Relation extraction (RE) is an important task that aims to identify the relationships between entities in texts. While large language models (LLMs) have revealed remarkable in-context learning (ICL) capability for general zero and few-shot learning, recent studies indicate that current LLMs still struggle with zero and few-shot RE. Previous studies are mainly dedicated to design prompt formats and… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: IJCAI 2024

  24. arXiv:2404.17802  [pdf, other

    cs.CL cs.AI

    Empirical Analysis of Dialogue Relation Extraction with Large Language Models

    Authors: Guozheng Li, Zijie Xu, Ziyu Shang, Jiajun Liu, Ke Ji, Yikai Guo

    Abstract: Dialogue relation extraction (DRE) aims to extract relations between two arguments within a dialogue, which is more challenging than standard RE due to the higher person pronoun frequency and lower information density in dialogues. However, existing DRE methods still suffer from two serious issues: (1) hard to capture long and sparse multi-turn information, and (2) struggle to extract golden relat… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: IJCAI 2024

  25. arXiv:2404.17780  [pdf, other

    cs.MA cs.AI

    Verco: Learning Coordinated Verbal Communication for Multi-agent Reinforcement Learning

    Authors: Dapeng Li, Hang Dong, Lu Wang, Bo Qiao, Si Qin, Qingwei Lin, Dongmei Zhang, Qi Zhang, Zhiwei Xu, Bin Zhang, Guoliang Fan

    Abstract: In recent years, multi-agent reinforcement learning algorithms have made significant advancements in diverse gaming environments, leading to increased interest in the broader application of such techniques. To address the prevalent challenge of partial observability, communication-based algorithms have improved cooperative performance through the sharing of numerical embedding between agents. Howe… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures

  26. arXiv:2404.17723  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering

    Authors: Zhentao Xu, Mark Jerome Cruz, Matthew Guevara, Tie Wang, Manasi Deshpande, Xiaofeng Wang, Zheng Li

    Abstract: In customer service technical support, swiftly and accurately retrieving relevant past issues is critical for efficiently resolving customer inquiries. The conventional retrieval methods in retrieval-augmented generation (RAG) for large language models (LLMs) treat a large corpus of past issue tracking tickets as plain text, ignoring the crucial intra-issue structure and inter-issue relations, whi… ▽ More

    Submitted 6 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    ACM Class: I.2

  27. arXiv:2404.17571  [pdf, other

    cs.CV

    Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos

    Authors: Zhengze Xu, Mengting Chen, Zhao Wang, Linyu Xing, Zhonghua Zhai, Nong Sang, Jinsong Lan, Shuai Xiao, Changxin Gao

    Abstract: Video try-on is a challenging task and has not been well tackled in previous works. The main obstacle lies in preserving the details of the clothing and modeling the coherent motions simultaneously. Faced with those difficulties, we address video try-on by proposing a diffusion-based framework named "Tunnel Try-on." The core idea is excavating a "focus tunnel" in the input video that gives close-u… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Project Page: https://mengtingchen.github.io/tunnel-try-on-page/

  28. arXiv:2404.17275  [pdf, other

    cs.CV cs.LG

    Adversarial Reweighting with $α$-Power Maximization for Domain Adaptation

    Authors: Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu

    Abstract: The practical Domain Adaptation (DA) tasks, e.g., Partial DA (PDA), open-set DA, universal DA, and test-time adaptation, have gained increasing attention in the machine learning community. In this paper, we propose a novel approach, dubbed Adversarial Reweighting with $α$-Power Maximization (ARPM), for PDA where the source domain contains private classes absent in target domain. In ARPM, we propos… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: To appear in IJCV

  29. arXiv:2404.16824  [pdf, other

    cs.CV

    V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

    Authors: Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

    Abstract: AI-generated video has revolutionized short video production, filmmaking, and personalized media, making video local editing an essential tool. However, this progress also blurs the line between reality and fiction, posing challenges in multimedia forensics. To solve this urgent issue, V2A-Mark is proposed to address the limitations of current video tampering forensics, such as poor generalizabili… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  30. arXiv:2404.16789  [pdf, other

    cs.LG cs.AI cs.CL

    Continual Learning of Large Language Models: A Comprehensive Survey

    Authors: Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Hao Wang

    Abstract: The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 57 pages, 2 figures, 4 tables. Work in progress

  31. arXiv:2404.16349  [pdf, ps, other

    cs.DS cs.CC

    More Asymmetry Yields Faster Matrix Multiplication

    Authors: Josh Alman, Ran Duan, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, Renfei Zhou

    Abstract: We present a new improvement on the laser method for designing fast matrix multiplication algorithms. The new method further develops the recent advances by [Duan, Wu, Zhou FOCS 2023] and [Vassilevska Williams, Xu, Xu, Zhou SODA 2024]. Surprisingly the new improvement is achieved by incorporating more asymmetry in the analysis, circumventing a fundamental tool of prior work that requires two of th… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 44 pages. arXiv admin note: text overlap with arXiv:2307.07970

  32. arXiv:2404.14963  [pdf, other

    cs.CL cs.AI

    Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Reasoners

    Authors: Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, Dacheng Tao

    Abstract: Chain of Thought prompting strategy has enhanced the performance of Large Language Models (LLMs) across various NLP tasks. However, it still has shortcomings when dealing with complex reasoning tasks, including understanding errors, calculation errors and process errors (e.g., missing-step and hallucinations). Subsequently, our in-depth analyses among various error types show that deeply understan… ▽ More

    Submitted 28 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Work in progress

  33. arXiv:2404.13599  [pdf, other

    cs.CL

    "A good pun is its own reword": Can Large Language Models Understand Puns?

    Authors: Zhijun Xu, Siyu Yuan, Lingjie Chen, Deqing Yang

    Abstract: Puns play a vital role in academic research due to their distinct structure and clear definition, which aid in the comprehensive analysis of linguistic humor. However, the understanding of puns in large language models (LLMs) has not been thoroughly examined, limiting their use in creative writing and humor creation. In this paper, we leverage three popular tasks, i.e., pun recognition, explanatio… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  34. arXiv:2404.13445  [pdf, other

    cs.CV cs.GR

    DMesh: A Differentiable Representation for General Meshes

    Authors: Sanghyun Son, Matheus Gadelha, Yang Zhou, Zexiang Xu, Ming C. Lin, Yi Zhou

    Abstract: We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and formulate probability of faces to exist on our desired mesh in a differentiable manner based on… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 17 pages, 9 figures

  35. arXiv:2404.12861  [pdf, other

    cs.CV

    Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation

    Authors: Yilong Chen, Zongyi Xu, xiaoshui Huang, Ruicheng Zhang, Xinqi Jiang, Xinbo Gao

    Abstract: Current point cloud semantic segmentation has achieved great advances when given sufficient labels. However, the dense annotation of LiDAR point clouds remains prohibitively expensive and time-consuming, unable to keep up with the continuously growing volume of data. In this paper, we propose annotating images with scattered points, followed by utilizing SAM (a Foundation model) to generate semant… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  36. arXiv:2404.12524  [pdf, other

    cs.CV cs.LG cs.RO

    DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects

    Authors: Dominik Bauer, Zhenjia Xu, Shuran Song

    Abstract: Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions with elastoplastic objects. We present DoughNet, a Transformer-based architecture for handling these challenges, consisting of two components. First, a… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Under review. 17 pages, 14 figures

  37. arXiv:2404.12385  [pdf, other

    cs.CV cs.GR

    MeshLRM: Large Reconstruction Model for High-Quality Mesh

    Authors: Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, Zexiang Xu

    Abstract: We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a p… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  38. arXiv:2404.12242  [pdf, other

    cs.CL

    CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

    Authors: Mengna Zhu, Zijie Xu, Kaisheng Zeng, Kaiming Xiao, Mao Wang, Wenjun Ke, Hongbin Huang

    Abstract: Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE,… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 7 figures, accepted to LREC-COLING 2024

  39. arXiv:2404.12038  [pdf, other

    cs.CL

    Uncovering Safety Risks in Open-source LLMs through Concept Activation Vector

    Authors: Zhihao Xu, Ruixuan Huang, Xiting Wang, Fangzhao Wu, Jing Yao, Xing Xie

    Abstract: Current open-source large language models (LLMs) are often undergone careful safety alignment before public release. Some attack methods have also been proposed that help check for safety vulnerabilities in LLMs to ensure alignment robustness. However, many of these methods have moderate attack success rates. Even when successful, the harmfulness of their outputs cannot be guaranteed, leading to s… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  40. arXiv:2404.10838  [pdf, other

    cs.CV cs.CL cs.MM

    Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning

    Authors: Zhengyang Liang, Meiyu Liang, Wei Huang, Yawen Li, Zhe Xue

    Abstract: In recent years, pre-trained multimodal large models have attracted widespread attention due to their outstanding performance in various multimodal applications. Nonetheless, the extensive computational resources and vast datasets required for their training present significant hurdles for deployment in environments with limited computational resources. To address this challenge, we propose a nove… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 10 pages

  41. arXiv:2404.10760  [pdf, other

    cs.CV

    Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark

    Authors: Jiangning Zhang, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Zhucun Xue, Yong Liu, Guansong Pang, Dacheng Tao

    Abstract: Anomaly detection (AD) is often focused on detecting anomaly areas for industrial quality inspection and medical lesion examination. However, due to the specific scenario targets, the data scale for AD is relatively small, and evaluation metrics are still deficient compared to classic vision tasks, such as object detection and semantic segmentation. To fill these gaps, this work first constructs a… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  42. arXiv:2404.09872  [pdf, other

    cs.CV

    Conditional Prototype Rectification Prompt Learning

    Authors: Haoxing Chen, Yaohui Li, Zizheng Huang, Yan Hong, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang

    Abstract: Pre-trained large-scale vision-language models (VLMs) have acquired profound understanding of general visual concepts. Recent advancements in efficient transfer learning (ETL) have shown remarkable success in fine-tuning VLMs within the scenario of limited data, introducing only a few parameters to harness task-specific insights from VLMs. Despite significant progress, current leading ETL methods… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  43. arXiv:2404.09494  [pdf, ps, other

    cs.LG

    On the Necessity of Collaboration in Online Model Selection with Decentralized Data

    Authors: Junfan Li, Zenglin Xu, Zheshun Wu, Irwin King

    Abstract: We consider online model selection with decentralized data over $M$ clients, and study a fundamental problem: the necessity of collaboration. Previous work gave a negative answer from the perspective of worst-case regret minimization, while we give a different answer from the perspective of regret-computational cost trade-off. We separately propose a federated algorithm with and without communicat… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  44. arXiv:2404.09336  [pdf, other

    cs.CL cs.AI

    Self-Selected Attention Span for Accelerating Large Language Model Inference

    Authors: Tian Jin, Wanzin Yazar, Zifei Xu, Sayeh Sharify, Xin Wang

    Abstract: Large language models (LLMs) can solve challenging tasks. However, their inference computation on modern GPUs is highly inefficient due to the increasing number of tokens they must attend to as they generate new ones. To address this inefficiency, we capitalize on LLMs' problem-solving capabilities to optimize their own inference-time efficiency. We demonstrate with two specific tasks: (a) evaluat… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  45. arXiv:2404.07515  [pdf, ps, other

    cs.IT math.FA math.NA

    Stability in Phase Retrieval: Characterizing Condition Numbers and the Optimal Vector Set

    Authors: Yu Xia, Zhiqiang Xu, Zili Xu

    Abstract: In this paper, we primarily focus on analyzing the stability property of phase retrieval by examining the bi-Lipschitz property of the map $Φ_{\boldsymbol{A}}(\boldsymbol{x})=|\boldsymbol{A}\boldsymbol{x}|\in \mathbb{R}_+^m$, where $\boldsymbol{x}\in \mathbb{H}^d$ and $\boldsymbol{A}\in \mathbb{H}^{m\times d}$ is the measurement matrix for $\mathbb{H}\in\{\mathbb{R},\mathbb{C}\}$. We define the co… ▽ More

    Submitted 16 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  46. arXiv:2404.07399  [pdf, other

    cs.CV

    Post-hurricane building damage assessment using street-view imagery and structured data: A multi-modal deep learning approach

    Authors: Zhuoqun Xue, Xiaojian Zhang, David O. Prevatt, Jennifer Bridge, Susu Xu, Xilei Zhao

    Abstract: Accurately assessing building damage is critical for disaster response and recovery. However, many existing models for detecting building damage have poor prediction accuracy due to their limited capabilities of identifying detailed, comprehensive structural and/or non-structural damage from the street-view image. Additionally, these models mainly rely on the imagery data for damage classification… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  47. arXiv:2404.06892  [pdf, other

    cs.CV

    SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

    Authors: Diankun Zhang, Guoan Wang, Runwen Zhu, Jianbo Zhao, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang, Feiyang Tan, Hangning Zhou, Ziyao Xu, Haotian Yao, Chi Zhang, Xiaojun Liu, Xiaoguang Di, Bin Li

    Abstract: End-to-End paradigms use a unified framework to implement multi-tasks in an autonomous driving system. Despite simplicity and clarity, the performance of end-to-end autonomous driving methods on sub-tasks is still far behind the single-task methods. Meanwhile, the widely used dense BEV features in previous end-to-end methods make it costly to extend to more modalities or tasks. In this paper, we p… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  48. arXiv:2404.06227  [pdf

    cs.HC

    Multimodal Road Network Generation Based on Large Language Model

    Authors: Jiajing Chen, Weihang Xu, Haiming Cao, Zihuan Xu, Yu Zhang, Zhao Zhang, Siyao Zhang

    Abstract: With the increasing popularity of ChatGPT, large language models (LLMs) have demonstrated their capabilities in communication and reasoning, promising for transportation sector intelligentization. However, they still face challenges in domain-specific knowledge. This paper aims to leverage LLMs' reasoning and recognition abilities to replace traditional user interfaces and create an "intelligent o… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 17 pages, 8 figures

  49. arXiv:2404.05384  [pdf, other

    cs.CV cs.AI

    Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

    Authors: Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu

    Abstract: Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space. However, we argue that a global CFG scale results in spatial inconsistency on varying semantic strengths and suboptimal image quality. To address this problem, we present a novel approach, Semantic-aware Classifi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: accepted by CVPR-2024

  50. arXiv:2404.05136  [pdf, other

    cs.CV cs.AI

    Self-Supervised Multi-Object Tracking with Path Consistency

    Authors: Zijia Lu, Bing Shuai, Yanbei Chen, Zhenlin Xu, Davide Modolo

    Abstract: In this paper, we propose a novel concept of path consistency to learn robust object matching without using manual object identity supervision. Our key idea is that, to track a object through frames, we can obtain multiple different association results from a model by varying the frames it can observe, i.e., skipping frames in observation. As the differences in observations do not alter the identi… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024