Skip to main content

Showing 1–50 of 2,221 results for author: Li, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20295  [pdf, other

    quant-ph cs.CR

    How (not) to Build Quantum PKE in Minicrypt

    Authors: Longcheng Li, Qian Li, Xingjian Li, Qipeng Liu

    Abstract: The seminal work by Impagliazzo and Rudich (STOC'89) demonstrated the impossibility of constructing classical public key encryption (PKE) from one-way functions (OWF) in a black-box manner. However, the question remains: can quantum PKE (QPKE) be constructed from quantumly secure OWF? A recent line of work has shown that it is indeed possible to build QPKE from OWF, but with one caveat -- they rel… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2405.20031  [pdf, other

    cs.RO cs.CV

    Structure Gaussian SLAM with Manhattan World Hypothesis

    Authors: Shuhong Liu, Heng Zhou, Liuzhuozheng Li, Yun Liu, Tianchen Deng, Yiming Zhou, Mingrui Li

    Abstract: Gaussian SLAM systems have made significant advancements in improving the efficiency and fidelity of real-time reconstructions. However, these systems often encounter incomplete reconstructions in complex indoor environments, characterized by substantial holes due to unobserved geometry caused by obstacles or limited view angles. To address this challenge, we present Manhattan Gaussian SLAM (MG-SL… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  3. arXiv:2405.19779  [pdf, other

    cs.NE cs.GR cs.LG

    Automatic Graph Topology-Aware Transformer

    Authors: Chao Wang, Jiaxuan Zhao, Lingling Li, Licheng Jiao, Fang Liu, Shuyuan Yang

    Abstract: Existing efforts are dedicated to designing many topologies and graph-aware strategies for the graph Transformer, which greatly improve the model's representation capabilities. However, manually determining the suitable Transformer architecture for a specific graph dataset or task requires extensive expert knowledge and laborious trials. This paper proposes an evolutionary graph Transformer archit… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE (Under Second Review). Copyright may be transferred without notice, after which this version may no longer be accessible

  4. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  5. arXiv:2405.19295  [pdf, other

    cs.CV

    3D Neural Edge Reconstruction

    Authors: Lei Li, Songyou Peng, Zehao Yu, Shaohui Liu, Rémi Pautrat, Xiaochuan Yin, Marc Pollefeys

    Abstract: Real-world objects and environments are predominantly composed of edge features, including straight lines and curves. Such edges are crucial elements for various applications, such as CAD modeling, surface meshing, lane mapping, etc. However, existing traditional methods only prioritize lines over curves for simplicity in geometric modeling. To this end, we introduce EMAP, a new method for learnin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project page: https://neural-edge-map.github.io

  6. arXiv:2405.18860  [pdf, other

    cs.RO

    Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks

    Authors: Tianle Zhang, Dongjiang Li, Yihang Li, Zecui Zeng, Lin Zhao, Lei Sun, Yue Chen, Xuelong Wei, Yibing Zhan, Lusong Li, Xiaodong He

    Abstract: As Embodied AI advances, it increasingly enables robots to handle the complexity of household manipulation tasks more effectively. However, the application of robots in these settings remains limited due to the scarcity of bimanual-mobile robot manipulation datasets. Existing datasets either focus solely on simple grasping tasks using single-arm robots without mobility, or collect sensor data limi… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.18729  [pdf, other

    cs.LG cs.AI

    Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

    Authors: Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li, Dongjiang Li, Zecui Zeng, Lei Sun, Yue Chen, Xuelong Wei, Lusong Li, Xiaodong He

    Abstract: Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach opt… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  8. arXiv:2405.17976  [pdf

    cs.AI cs.CL

    Yuan 2.0-M32: Mixture of Experts with Attention Router

    Authors: Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

    Abstract: Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages,3 figures, 7 tables

  9. arXiv:2405.17386  [pdf, other

    cs.CL cs.AI

    MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

    Authors: Zixian Huang, Wenhao Zhu, Gong Cheng, Lei Li, Fei Yuan

    Abstract: Reasoning capabilities are crucial for Large Language Models (LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understand… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  10. arXiv:2405.17061  [pdf, ps, other

    cs.LG

    Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation

    Authors: Long-Fei Li, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou

    Abstract: We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space. Despite its benefits, introducing non-linear function approximation raises significant challenges in both computational and statistical efficiency. The best-known method of Hwang and Oh [2023] has achieved an… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.16854  [pdf, other

    cs.MA

    Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

    Authors: Zhihao Liu, Xianliang Yang, Zichuan Liu, Yifan Xia, Wei Jiang, Yuanyu Zhang, Lijuan Li, Guoliang Fan, Lei Song, Bian Jiang

    Abstract: Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which may result in algorithmic instability, difficulty in convergence, or entrapment in local optima. While research… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  12. arXiv:2405.16765  [pdf, ps, other

    cs.LG eess.SP

    Study of Robust Direction Finding Based on Joint Sparse Representation

    Authors: Y. Li, W. Xiao, L. Zhao, Z. Huang, Q. Li, L. Li, R. C. de Lamare

    Abstract: Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  13. arXiv:2405.16449  [pdf, other

    cs.LG math.OC q-fin.MF

    Reinforcement Learning for Jump-Diffusions

    Authors: Xuefeng Gao, Lingfei Li, Xun Yu Zhou

    Abstract: We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the explora… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  14. arXiv:2405.16380  [pdf, other

    cs.LG quant-ph

    Dynamic Inhomogeneous Quantum Resource Scheduling with Reinforcement Learning

    Authors: Linsen Li, Pratyush Anand, Kaiming He, Dirk Englund

    Abstract: A central challenge in quantum information science and technology is achieving real-time estimation and feedforward control of quantum systems. This challenge is compounded by the inherent inhomogeneity of quantum resources, such as qubit properties and controls, and their intrinsically probabilistic nature. This leads to stochastic challenges in error detection and probabilistic outcomes in proce… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  15. arXiv:2405.16047  [pdf, ps, other

    cs.IT eess.SP

    Unified Timing Analysis for Closed-Loop Goal-Oriented Wireless Communication

    Authors: Lintao Li, Anders E. Kalør, Petar Popovski, Wei Chen

    Abstract: Goal-oriented communication has become one of the focal concepts in sixth-generation communication systems owing to its potential to provide intelligent, immersive, and real-time mobile services. The emerging paradigms of goal-oriented communication constitute closed loops integrating communication, computation, and sensing. However, challenges arise for closed-loop timing analysis due to multiple… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: submitted to IEEE Trans. Wireless Commun

  16. arXiv:2405.15414  [pdf, other

    cs.AI

    Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

    Authors: Yuxuan Guo, Shaohui Peng, Jiaming Guo, Di Huang, Xishan Zhang, Rui Zhang, Yifan Hao, Ling Li, Zikang Tian, Mingju Gao, Yutai Li, Yiming Gan, Shuai Liang, Zihao Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  17. arXiv:2405.15124  [pdf, other

    cs.LG cs.AI

    Scaling Law for Time Series Forecasting

    Authors: Jingzhe Shi, Qinwei Ma, Huan Ma, Lei Li

    Abstract: Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 20 pages

  18. arXiv:2405.14870  [pdf, other

    cs.CV cs.RO

    An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

    Authors: Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, Jingwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

    Abstract: In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient tra… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 4 figures, 7 tables; Code at https://github.com/open-mmlab/mmdetection3d

  19. arXiv:2405.14516  [pdf, other

    cs.CV cs.AI

    Towards Realistic Long-tailed Semi-supervised Learning in an Open World

    Authors: Yuanpeng He, Lijian Li

    Abstract: Open-world long-tailed semi-supervised learning (OLSSL) has increasingly attracted attention. However, existing OLSSL algorithms generally assume that the distributions between known and novel categories are nearly identical. Against this backdrop, we construct a more \emph{Realistic Open-world Long-tailed Semi-supervised Learning} (\textbf{ROLSSL}) setting where there is no premise on the distrib… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  20. arXiv:2405.14075  [pdf, other

    cs.CL cs.AI cs.LG

    $T^2$ of Thoughts: Temperature Tree Elicits Reasoning in Large Language Models

    Authors: Chengkun Cai, Xu Zhao, Yucheng Du, Haoliang Liu, Lei Li

    Abstract: Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, especially in complex decision-making scenarios, but their static problem-solving strategies often limit their adaptability to dynamic environments. We explore the enhancement of reasoning capabilities in LLMs through Temperature Tree ($T^2$) prompting via Particle Swarm Optimization, termed as $T^2$ of Thought… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figures

  21. arXiv:2405.12493  [pdf, other

    cs.LG

    Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks

    Authors: Xin-Chun Li, Lan Li, De-Chuan Zhan

    Abstract: The loss landscape of deep neural networks (DNNs) is commonly considered complex and wildly fluctuated. However, an interesting observation is that the loss surfaces plotted along Gaussian noise directions are almost v-basin ones with the perturbed model lying on the basin. This motivates us to rethink whether the 1D or 2D subspace could cover more complex local geometry structures, and how to min… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  22. arXiv:2405.12489  [pdf, other

    cs.LG cs.AI

    Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

    Authors: Xin-Chun Li, Jin-Lin Tang, Bo Zhang, Lan Li, De-Chuan Zhan

    Abstract: Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initial… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  23. arXiv:2405.12398  [pdf, other

    cs.LG

    ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference

    Authors: Jason Chun Lok Li, Steven Tin Sui Luo, Le Xu, Ngai Wong

    Abstract: Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals (such as images and videos) with the benefits of a compact neural representation. While numerous methods have been proposed to increase the encoding capabilities of an INR, an often overlooked aspect is the inference efficiency, usually measured in multiply-accumulate (MAC) count. This… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: ICLR 2024 (v3: 21 pages, 11 figures, Project Page: https://github.com/stevolopolis/asmr.git)

  24. arXiv:2405.12001  [pdf, other

    cs.LG cs.AI

    Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning

    Authors: Hai Zhang, Boyuan Zheng, Anqi Guo, Tianying Ji, Pheng-Ann Heng, Junqiao Zhao, Lanqing Li

    Abstract: Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that maximizing the mutual information between the task and the task representation ($I(Z;M)$) can lead to performance impro… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  25. arXiv:2405.11930  [pdf, other

    cs.LG

    Data Contamination Calibration for Black-box LLMs

    Authors: Wentao Ye, Jiaqi Hu, Liyao Li, Haobo Wang, Gang Chen, Junbo Zhao

    Abstract: The rapid advancements of Large Language Models (LLMs) tightly associate with the expansion of the training data size. However, the unchecked ultra-large-scale training sets introduce a series of potential risks like data contamination, i.e. the benchmark data is used for training. In this work, we propose a holistic method named Polarized Augment Calibration (PAC) along with a new to-be-released… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  26. arXiv:2405.10531  [pdf, other

    cs.LG cs.CV

    Nonparametric Teaching of Implicit Neural Representations

    Authors: Chen Zhang, Steven Tin Sui Luo, Jason Chun Lok Li, Yik-Chung Wu, Ngai Wong

    Abstract: We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of I… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: ICML 2024 (24 pages, 13 figures)

  27. arXiv:2405.09713  [pdf, other

    cs.CV cs.AI cs.CL

    SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

    Authors: Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

    Abstract: Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically withi… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: CVPR

  28. arXiv:2405.09342  [pdf, other

    cs.CV

    Progressive Depth Decoupling and Modulating for Flexible Depth Completion

    Authors: Zhiwen Yang, Jiehua Zhang, Liang Li, Chenggang Yan, Yaoqi Sun, Haibing Yin

    Abstract: Image-guided depth completion aims at generating a dense depth map from sparse LiDAR data and RGB image. Recent methods have shown promising performance by reformulating it as a classification problem with two sub-tasks: depth discretization and probability prediction. They divide the depth range into several discrete depth values as depth categories, serving as priors for scene depth distribution… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: The article is accepted by IEEE Transactions on Instrumentation & Measurement

  29. arXiv:2405.08205  [pdf, other

    cs.LG

    Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

    Authors: Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei Li

    Abstract: Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  30. arXiv:2405.06693  [pdf, other

    q-bio.BM cs.LG

    SurfPro: Functional Protein Design Based on Continuous Surface

    Authors: Zhenqiao Song, Tinglin Huang, Lei Li, Wengong Jin

    Abstract: How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  31. arXiv:2405.06312  [pdf, other

    cs.LG cs.DC

    FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization

    Authors: Zhiyuan Ning, Chunlin Tian, Meng Xiao, Wei Fan, Pengyang Wang, Li Li, Pengfei Wang, Yuanchun Zhou

    Abstract: Federated Learning faces significant challenges in statistical and system heterogeneity, along with high energy consumption, necessitating efficient client selection strategies. Traditional approaches, including heuristic and learning-based methods, fall short of addressing these complexities holistically. In response, we propose FedGCS, a novel generative client selection framework that innovativ… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI-2024

  32. arXiv:2405.06241  [pdf, other

    cs.CV cs.RO

    MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization

    Authors: Pengcheng Zhu, Yaoming Zhuang, Baoquan Chen, Li Li, Chengdong Wu, Zhanlin Liu

    Abstract: This letter introduces a novel framework for dense Visual Simultaneous Localization and Mapping (VSLAM) based on Gaussian Splatting. Recently Gaussian Splatting-based SLAM has yielded promising results, but rely on RGB-D input and is weak in tracking. To address these limitations, we uniquely integrates advanced sparse visual odometry with a dense Gaussian Splatting scene representation for the fi… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  33. arXiv:2405.05004  [pdf, other

    cs.CV

    TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking

    Authors: Pengcheng Shao, Tianyang Xu, Zhangyong Tang, Linze Li, Xiao-Jun Wu, Josef Kittler

    Abstract: There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intri… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  34. arXiv:2405.04496  [pdf, other

    cs.CV

    Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

    Authors: Yi Zuo, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Shuyuan Yang, Yuwei Guo

    Abstract: Existing diffusion-based video editing methods have achieved impressive results in motion editing. Most of the existing methods focus on the motion alignment between the edited video and the reference video. However, these methods do not constrain the background and object content of the video to remain unchanged, which makes it possible for users to generate unexpected videos. In this paper, we p… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  35. arXiv:2405.04167  [pdf, other

    cs.CV eess.IV

    Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment

    Authors: Aobo Li, Jinjian Wu, Yongxu Liu, Leida Li

    Abstract: The annotation of blind image quality assessment (BIQA) is labor-intensive and time-consuming, especially for authentic images. Training on synthetic data is expected to be beneficial, but synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that introducing more distortion types in the synthetic dataset may… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  36. arXiv:2405.04160  [pdf, other

    cs.CL

    A Causal Explainable Guardrails for Large Language Models

    Authors: Zhixuan Chu, Yan Wang, Longfei Li, Zhibo Wang, Zhan Qin, Kui Ren

    Abstract: Large Language Models (LLMs) have shown impressive performance in natural language tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for steering LLMs towards desired attributes often assume unbiased representations and rely solely on steering prompts. However, the representations learned from pre-training can introduce semantic biases that influence the steer… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 23 pages

  37. arXiv:2405.04122  [pdf, other

    cs.LG cs.DC

    Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning

    Authors: Chunlin Tian, Zhan Shi, Xinpeng Qin, Li Li, Chengzhong Xu

    Abstract: Federated Learning (FL) enables multiple devices to collaboratively train a shared model while ensuring data privacy. The selection of participating devices in each training round critically affects both the model performance and training efficiency, especially given the vast heterogeneity in training capabilities and data distribution across devices. To address these challenges, we introduce a no… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  38. arXiv:2405.04044  [pdf, other

    cs.CV

    DMOFC: Discrimination Metric-Optimized Feature Compression

    Authors: Changsheng Gao, Yiheng Jiang, Li Li, Dong Liu, Feng Wu

    Abstract: Feature compression, as an important branch of video coding for machines (VCM), has attracted significant attention and exploration. However, the existing methods mainly focus on intra-feature similarity, such as the Mean Squared Error (MSE) between the reconstructed and original features, while neglecting the importance of inter-feature relationships. In this paper, we analyze the inter-feature r… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  39. arXiv:2405.03644  [pdf, other

    cs.CR cs.AI

    When LLMs Meet Cybersecurity: A Systematic Literature Review

    Authors: Jie Zhang, Haoyu Bu, Hui Wen, Yu Chen, Lun Li, Hongsong Zhu

    Abstract: The rapid advancements in large language models (LLMs) have opened new avenues across various fields, including cybersecurity, which faces an ever-evolving threat landscape and need for innovative technologies. Despite initial explorations into the application of LLMs in cybersecurity, there is a lack of a comprehensive overview of this research area. This paper bridge this gap by providing a syst… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 36 pages, 7 figures

  40. arXiv:2405.03066  [pdf

    cs.ET

    A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)

    Authors: Lingyao Li, Jiayan Zhou, Zhenxiang Gao, Wenyue Hua, Lizhou Fan, Huizi Yu, Loni Hagen, Yongfeng Zhang, Themistocles L. Assimes, Libby Hemphill, Siyuan Ma

    Abstract: Electronic Health Records (EHRs) play an important role in the healthcare system. However, their complexity and vast volume pose significant challenges to data interpretation and analysis. Recent advancements in Artificial Intelligence (AI), particularly the development of Large Language Models (LLMs), open up new opportunities for researchers in this domain. Although prior studies have demonstrat… ▽ More

    Submitted 22 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  41. arXiv:2405.02287  [pdf, other

    cs.CL cs.AI cs.CV

    Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

    Authors: Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei Li, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay

    Abstract: We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing a… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  42. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  43. arXiv:2405.01055  [pdf

    cs.LG cs.AI cs.ET

    Leverage Multi-source Traffic Demand Data Fusion with Transformer Model for Urban Parking Prediction

    Authors: Yin Huang, Yongqi Dong, Youhua Tang, Li Li

    Abstract: The escalation in urban private car ownership has worsened the urban parking predicament, necessitating effective parking availability prediction for urban planning and management. However, the existing prediction methods suffer from low prediction accuracy with the lack of spatial-temporal correlation features related to parking volume, and neglect of flow patterns and correlations between simila… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 7 pages, 5 figures, under review by the 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)

  44. arXiv:2405.01044  [pdf, other

    cs.RO

    Differentiable Particles for General-Purpose Deformable Object Manipulation

    Authors: Siwei Chen, Yiqing Xu, Cunjun Yu, Linfeng Li, David Hsu

    Abstract: Deformable object manipulation is a long-standing challenge in robotics. While existing approaches often focus narrowly on a specific type of object, we seek a general-purpose algorithm, capable of manipulating many different types of objects: beans, rope, cloth, liquid, . . . . One key difficulty is a suitable representation, rich enough to capture object shape, dynamics for manipulation and yet… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  45. arXiv:2405.01030  [pdf

    cs.CR cs.SE

    Towards Trust Proof for Secure Confidential Virtual Machines

    Authors: Jingkai Mao, Haoran Zhu, Junchao Fan, Lin Li, Xiaolin Chang

    Abstract: The Virtual Machine (VM)-based Trusted-Execution-Environment (TEE) technology, like AMD Secure-Encrypted-Virtualization (SEV), enables the establishment of Confidential VMs (CVMs) to protect data privacy. But CVM lacks ways to provide the trust proof of its running state, degrading the user confidence of using CVM. The technology of virtual Trusted Platform Module (vTPM) can be used to generate tr… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  46. arXiv:2405.00885  [pdf, other

    cs.LG cs.NI eess.IV

    WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

    Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Xin Fu, Miao Pan

    Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training b… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  47. arXiv:2405.00700  [pdf

    cs.NE cond-mat.str-el

    Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

    Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

    Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More

    Submitted 16 April, 2024; originally announced May 2024.

    Comments: 18 pages,4 figures

  48. arXiv:2405.00696  [pdf, other

    cs.RO

    Life-long Learning and Testing for Automated Vehicles via Adaptive Scenario Sampling as A Continuous Optimization Process

    Authors: Jingwei Ge, Pengbo Wang, Cheng Chang, Yi Zhang, Danya Yao, Li Li

    Abstract: Sampling critical testing scenarios is an essential step in intelligence testing for Automated Vehicles (AVs). However, due to the lack of prior knowledge on the distribution of critical scenarios in sampling space, we can hardly efficiently find the critical scenarios or accurately evaluate the intelligence of AVs. To solve this problem, we formulate the testing as a continuous optimization proce… ▽ More

    Submitted 28 March, 2024; originally announced May 2024.

  49. Scenarios Engineering driven Autonomous Transportation in Open-Pit Mines

    Authors: Siyu Teng, Xuan Li, Yuchen Li, Lingxi Li, Yunfeng Ai, Long Chen

    Abstract: One critical bottleneck that impedes the development and deployment of autonomous transportation in open-pit mines is guaranteed robustness and trustworthiness in prohibitively extreme scenarios. In this research, a novel scenarios engineering (SE) methodology for the autonomous mining truck is proposed for open-pit mines. SE increases the trustworthiness and robustness of autonomous trucks from f… ▽ More

    Submitted 14 March, 2024; originally announced May 2024.

  50. arXiv:2404.19245  [pdf, other

    cs.CL cs.AI

    HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

    Authors: Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, Chengzhong Xu

    Abstract: Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for… ▽ More

    Submitted 23 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures