Skip to main content

Showing 1–50 of 3,441 results for author: Wang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05525  [pdf, other

    cs.CR cs.LG

    Ditto: Quantization-aware Secure Inference of Transformers upon MPC

    Authors: Haoqi Wu, Wenjing Fang, Yancheng Zheng, Junming Ma, Jin Tan, Yinggui Wang, Lei Wang

    Abstract: Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: to be published in ICML 2024

  2. arXiv:2405.05189  [pdf, other

    cs.CL cs.AI

    MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning

    Authors: Inderjeet Nair, Lu Wang

    Abstract: We study the task of conducting structured reasoning as generating a reasoning graph from natural language input using large language models (LLMs). Previous approaches have explored various prompting schemes, yet they suffer from error propagation due to the autoregressive nature and single-pass-based decoding, which lack error correction capability. Additionally, relying solely on a single sampl… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Under review at ACL 2024

  3. arXiv:2405.05187  [pdf, other

    math.NA cs.LG

    A score-based particle method for homogeneous Landau equation

    Authors: Yan Huang, Li Wang

    Abstract: We propose a novel score-based particle method for solving the Landau equation in plasmas, that seamlessly integrates learning with structure-preserving particle methods [arXiv:1910.03080]. Building upon the Lagrangian viewpoint of the Landau equation, a central challenge stems from the nonlinear dependence of the velocity field on the density. Our primary innovation lies in recognizing that this… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2405.04950  [pdf, other

    cs.CV cs.AI cs.CL

    VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

    Authors: Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang

    Abstract: Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context. Yet, a challenging type of visual math lies in the multimodal graph theory problem, which demands that LMMs understand the graphical structures accurately and perform multi-step reasoning on the visual graph. Addi… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 17 pages; Accepted by ICML 2024

  5. arXiv:2405.04476  [pdf, other

    eess.AS cs.SD

    BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals

    Authors: Lijun Wang, Yixian Lu, Ziyan Gao, Kai Li, Jianqiang Huang, Yuntao Kong, Shogo Okada

    Abstract: Room acoustic parameters (RAPs) and room physical parameters ( RPPs) are essential metrics for parameterizing the room acoustical characteristics (RAC) of a sound field around a listener's local environment, offering comprehensive indications for various applications. The current RAPs and RPPs estimation methods either fall short of covering broad real-world acoustic environments in the context of… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE/ACM Transaction on Audio Speech and Language Processing (TASLP)

  6. arXiv:2405.04066  [pdf, other

    cs.SI eess.SY

    Characterizing Regional Importance in Cities with Human Mobility Motifs in Metro Networks

    Authors: Shuyang Shi, Ding Lyu, Lin Wang, Xiaofan Wang, Guanrong Chen

    Abstract: Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  7. arXiv:2405.03481  [pdf, other

    cs.LG

    AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers

    Authors: Wenhao Zhu, Guojie Song, Liang Wang, Shaoguo Liu

    Abstract: Graph Transformers (GTs) have significantly advanced the field of graph representation learning by overcoming the limitations of message-passing graph neural networks (GNNs) and demonstrating promising performance and expressive power. However, the quadratic complexity of self-attention mechanism in GTs has limited their scalability, and previous approaches to address this issue often suffer from… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  8. arXiv:2405.03393  [pdf, other

    cs.RO eess.SY

    On-site scale factor linearity calibration of MEMS triaxial gyroscopes

    Authors: Yaqi Li, Li Wang, Zhitao Wang, Xiangqing Li, Jiaojiao Li, Steven weidong Su

    Abstract: The calibration of MEMS triaxial gyroscopes is crucial for achieving precise attitude estimation for various wearable health monitoring applications. However, gyroscope calibration poses greater challenges compared to accelerometers and magnetometers. This paper introduces an efficient method for calibrating MEMS triaxial gyroscopes via only a servo motor, making it well-suited for field environme… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  9. arXiv:2405.03177  [pdf, other

    cs.CV

    Transformer-based RGB-T Tracking with Channel and Spatial Feature Fusion

    Authors: Yunfeng Li, Bo Wang, Ye Li, Zhiwen Yu, Liang Wang

    Abstract: Complementary RGB and TIR modalities enable RGB-T tracking to achieve competitive performance in challenging scenarios. Therefore, how to better fuse cross-modal features is the core issue of RGB-T tracking. Some previous methods either insufficiently fuse RGB and TIR features, or depend on intermediaries containing information from both modalities to achieve cross-modal information interaction. T… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  10. arXiv:2405.03170  [pdf, other

    cs.CL

    Oracle-Checker Scheme for Evaluating a Generative Large Language Model

    Authors: Yueling Jenny Zeng, Li-C. Wang, Thomas Ibbetson

    Abstract: This work presents a novel approach called oracle-checker scheme for evaluating the answer given by a generative large language model (LLM). Two types of checkers are presented. The first type of checker follows the idea of property testing. The second type of checker follows the idea of program checking. Their applications are demonstrated in two separate contexts, entity extraction and paraphras… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  11. arXiv:2405.02673  [pdf, other

    cs.CL

    On the Information Redundancy in Non-Autoregressive Translation

    Authors: Zhihao Wang, Longyue Wang, Jinsong Su, Junfeng Yao, Zhaopeng Tu

    Abstract: Token repetition is a typical form of multi-modal problem in fully non-autoregressive translation (NAT). In this work, we revisit the multi-modal problem in recently proposed NAT models. Our study reveals that these advanced models have introduced other types of information redundancy errors, which cannot be measured by the conventional metric - the continuous repetition ratio. By manually annotat… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 tables

  12. arXiv:2405.02364  [pdf, other

    cs.LG cs.DC

    A Survey on Contribution Evaluation in Vertical Federated Learning

    Authors: Yue Cui, Chung-ju Huang, Yuzhu Zhang, Leye Wang, Lixin Fan, Xiaofang Zhou, Qiang Yang

    Abstract: Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns associated with centralized data storage and processing. VFL facilitates collaboration among multiple entities with distinct feature sets on the same user population, enabling the joint training of predictive models without direct data sharing. A key aspect of VFL is the fair and ac… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  13. arXiv:2405.02238  [pdf, other

    cs.CR

    Secure and Efficient General Matrix Multiplication On Cloud Using Homomorphic Encryption

    Authors: Yang Gao, Gang Quan, Soamar Homsi, Wujie Wen, Liqiang Wang

    Abstract: Despite the cloud enormous technical and financial advantages, security and privacy have always been the primary concern for adopting cloud computing facility, especially for government agencies and commercial sectors with high-security requirements. Homomorphic Encryption (HE) has recently emerged as an effective tool in assuring privacy and security for sensitive applications by allowing computi… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures. 4 tables

  14. arXiv:2405.02068  [pdf, other

    cs.CV

    Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

    Authors: Canhui Tang, Sanping Zhou, Yizhe Li, Yonghao Dong, Le Wang

    Abstract: With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model c… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: The paper is under review

  15. arXiv:2405.01796  [pdf, other

    cs.CL cs.DL cs.IR

    TOPICAL: TOPIC Pages AutomagicaLly

    Authors: John Giorgi, Amanpreet Singh, Doug Downey, Sergey Feldman, Lucy Lu Wang

    Abstract: Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to traditional web search. While most prior work has focused on generating topic pages about biographical entities, in this work, we develop a completely automated pr… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures, 2 tables, NAACL System Demonstrations 2024

  16. arXiv:2405.01677  [pdf, other

    cs.LG cs.AI

    Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation

    Authors: Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, Alois Knoll

    Abstract: Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the t… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  17. arXiv:2405.01461  [pdf, other

    cs.CV

    SATO: Stable Text-to-Motion Framework

    Authors: Wenshuo Chen, Hongru Xiao, Erhang Zhang, Lijie Hu, Lei Wang, Mengyuan Liu, Chen Chen

    Abstract: Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, re… ▽ More

    Submitted 3 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  18. arXiv:2405.01228  [pdf, other

    cs.CV

    RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation

    Authors: Heng Li, Haojin Li, Jianyu Chen, Zhongxi Qiu, Huazhu Fu, Lidai Wang, Yan Hu, Jiang Liu

    Abstract: Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinical settings due to the scarcity of annotated data resulting from the professional and private nature of medical data. Despite the existence of decent solutions, many of them are hindered in clinical settings du… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  19. arXiv:2405.01115  [pdf

    cs.RO eess.SY

    A New Self-Alignment Method without Solving Wahba Problem for SINS in Autonomous Vehicles

    Authors: Hongliang Zhang, Yilan Zhou, Lei Wang, Tengchao Huang

    Abstract: Initial alignment is one of the key technologies in strapdown inertial navigation system (SINS) to provide initial state information for vehicle attitude and navigation. For some situations, such as the attitude heading reference system, the position is not necessarily required or even available, then the self-alignment that does not rely on any external aid becomes very necessary. This study pres… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  20. arXiv:2405.00753  [pdf, other

    q-bio.QM cs.AI

    HMAMP: Hypervolume-Driven Multi-Objective Antimicrobial Peptides Design

    Authors: Li Wang, Yiping Li, Xiangzheng Fu, Xiucai Ye, Junfeng Shi, Gary G. Yen, Xiangxiang Zeng

    Abstract: Antimicrobial peptides (AMPs) have exhibited unprecedented potential as biomaterials in combating multidrug-resistant bacteria. Despite the increasing adoption of artificial intelligence for novel AMP design, challenges pertaining to conflicting attributes such as activity, hemolysis, and toxicity have significantly impeded the progress of researchers. This paper introduces a paradigm shift by con… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  21. arXiv:2405.00393  [pdf, other

    cs.CR

    Inferring State Machine from the Protocol Implementation via Large Langeuage Model

    Authors: Haiyang Wei, Zhengjie Du, Haohui Huang, Yue Liu, Guang Cheng, Linzhang Wang, Bing Mao

    Abstract: State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex c… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  22. arXiv:2405.00351  [pdf, other

    cs.HC cs.AI cs.CV cs.MM

    Learning High-Quality Navigation and Zooming on Omnidirectional Images in Virtual Reality

    Authors: Zidong Cao, Zhan Wang, Yexin Liu, Yan-Pei Cao, Ying Shan, Wei Zeng, Lin Wang

    Abstract: Viewing omnidirectional images (ODIs) in virtual reality (VR) represents a novel form of media that provides immersive experiences for users to navigate and interact with digital content. Nonetheless, this sense of immersion can be greatly compromised by a blur effect that masks details and hampers the user's ability to engage with objects of interest. In this paper, we present a novel system, cal… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 11 pages

  23. arXiv:2405.00301  [pdf, other

    cs.CL

    LITO: Learnable Intervention for Truthfulness Optimization

    Authors: Farima Fatahi Bayat, Xin Liu, H. V. Jagadish, Lu Wang

    Abstract: Large language models (LLMs) can generate long-form and coherent text, but they still frequently hallucinate facts, thus limiting their reliability. To address this issue, inference-time methods that elicit truthful responses have been proposed by shifting LLM representations towards learned "truthful directions". However, applying the truthful directions with the same intensity fails to generaliz… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures

  24. arXiv:2405.00201  [pdf, other

    cs.CL cs.AI

    SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models

    Authors: Samir Arora, Liangliang Wang

    Abstract: Full fine-tuning is a popular approach to adapt Transformer-based pre-trained large language models to a specific downstream task. However, the substantial requirements for computational power and storage have discouraged its widespread use. Moreover, increasing evidence of catastrophic forgetting and overparameterization in the Transformer architecture has motivated researchers to seek more effic… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  25. arXiv:2404.19666  [pdf, other

    cs.CV eess.IV

    Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

    Authors: Lei Wang, Desen Yuan

    Abstract: Image quality assessment often relies on raw opinion scores provided by subjects in subjective experiments, which can be noisy and unreliable. To address this issue, postprocessing procedures such as ITU-R BT.500, ITU-T P.910, and ITU-T P.913 have been standardized to clean up the original opinion scores. These methods use annotator-based statistical priors, but they do not take into account exten… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  26. arXiv:2404.19595  [pdf, other

    cs.CV eess.IV

    Perceptual Constancy Constrained Single Opinion Score Calibration for Image Quality Assessment

    Authors: Lei Wang, Desen Yuan

    Abstract: In this paper, we propose a highly efficient method to estimate an image's mean opinion score (MOS) from a single opinion score (SOS). Assuming that each SOS is the observed sample of a normal distribution and the MOS is its unknown expectation, the MOS inference is formulated as a maximum likelihood estimation problem, where the perceptual correlation of pairwise images is considered in modeling… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  27. arXiv:2404.19567  [pdf, other

    cs.CV eess.IV

    Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment

    Authors: Lei Wang, Desen Yuan

    Abstract: Despite great success in modeling visual perception, deep neural network based image quality assessment (IQA) still remains unreliable in real-world applications due to its vulnerability to adversarial perturbations and the inexplicit black-box structure. In this paper, we propose to build a trustworthy IQA model via Causal Perception inspired Representation Learning (CPRL), and a score reflection… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  28. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  29. arXiv:2404.19518  [pdf, other

    cs.MA cs.AI cs.RO

    MGCBS: An Optimal and Efficient Algorithm for Solving Multi-Goal Multi-Agent Path Finding Problem

    Authors: Mingkai Tang, Yuanhang Li, Hongji Liu, Yingbing Chen, Ming Liu, Lujia Wang

    Abstract: With the expansion of the scale of robotics applications, the multi-goal multi-agent pathfinding (MG-MAPF) problem began to gain widespread attention. This problem requires each agent to visit pre-assigned multiple goal points at least once without conflict. Some previous methods have been proposed to solve the MG-MAPF problem based on Decoupling the goal Vertex visiting order search and the Singl… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: to be published in IJCAI2024

  30. Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning

    Authors: Chenjia Bai, Lingxiao Wang, Jianye Hao, Zhuoran Yang, Bin Zhao, Zhen Wang, Xuelong Li

    Abstract: Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi-Task Data Sha… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by Artificial Intelligence (AIJ)

  31. arXiv:2404.19335  [pdf, other

    cs.CL

    StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

    Authors: Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen

    Abstract: Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Submitted to ACL 2024

  32. arXiv:2404.18922  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    DPO Meets PPO: Reinforced Token Optimization for RLHF

    Authors: Han Zhong, Guhao Feng, Wei Xiong, Li Zhao, Di He, Jiang Bian, Liwei Wang

    Abstract: In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards -- a challenging scenario in traditional deep reinforcement learning. Despite the great successes of PPO in the alignment of state-of-the-art closed-source large language models (LLMs), its open-source implementation is still larg… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  33. arXiv:2404.18758  [pdf, other

    cs.CV cs.LG

    Transitive Vision-Language Prompt Learning for Domain Generalization

    Authors: Liyuan Wang, Yan Jin, Zhen Chen, Jinlin Wu, Mengke Li, Yang Lu, Hanzi Wang

    Abstract: The vision-language pre-training has enabled deep models to make a huge step forward in generalizing across unseen domains. The recent learning method based on the vision-language pre-training model is a great tool for domain generalization and can solve this problem to a large extent. However, there are still some issues that an advancement still suffers from trading-off between domain invariance… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  34. arXiv:2404.18669  [pdf, other

    cs.GR cs.AI cs.CV

    Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting

    Authors: Yifei Gao, Jie Ou, Lei Wang, Jun Cheng

    Abstract: Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly de… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    MSC Class: I.4.8

  35. arXiv:2404.17894  [pdf, ps, other

    cs.CV

    Unpaired Multi-view Clustering via Reliable View Guidance

    Authors: Like Xin, Wanqi Yang, Lei Wang, Ming Yang

    Abstract: This paper focuses on unpaired multi-view clustering (UMC), a challenging problem where paired observed samples are unavailable across multiple views. The goal is to perform effective joint clustering using the unpaired observed samples in all views. In incomplete multi-view clustering, existing methods typically rely on sample pairing between views to capture their complementary. However, that is… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  36. arXiv:2404.17845  [pdf, other

    cs.CV

    Instance-free Text to Point Cloud Localization with Relative Position Awareness

    Authors: Lichao Wang, Zhihao Yuan, Jinke Ren, Shuguang Cui, Zhen Li

    Abstract: Text-to-point-cloud cross-modal localization is an emerging vision-language task critical for future robot-human collaboration. It seeks to localize a position from a city-scale point cloud scene based on a few natural language instructions. In this paper, we address two key limitations of existing approaches: 1) their reliance on ground-truth instances as input; and 2) their neglect of the relati… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 12 pages, 10 figures, conference

  37. arXiv:2404.17780  [pdf, other

    cs.MA cs.AI

    Verco: Learning Coordinated Verbal Communication for Multi-agent Reinforcement Learning

    Authors: Dapeng Li, Hang Dong, Lu Wang, Bo Qiao, Si Qin, Qingwei Lin, Dongmei Zhang, Qi Zhang, Zhiwei Xu, Bin Zhang, Guoliang Fan

    Abstract: In recent years, multi-agent reinforcement learning algorithms have made significant advancements in diverse gaming environments, leading to increased interest in the broader application of such techniques. To address the prevalent challenge of partial observability, communication-based algorithms have improved cooperative performance through the sharing of numerical embedding between agents. Howe… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures

  38. arXiv:2404.17778  [pdf, other

    cs.CL cs.AI

    MRScore: Evaluating Radiology Report Generation with LLM-based Reward System

    Authors: Yunyi Liu, Zhanyu Wang, Yingshu Li, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou

    Abstract: In recent years, automated radiology report generation has experienced significant growth. This paper introduces MRScore, an automatic evaluation metric tailored for radiology report generation by leveraging Large Language Models (LLMs). Conventional NLG (natural language generation) metrics like BLEU are inadequate for accurately assessing the generated radiology reports, as systematically demons… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  39. arXiv:2404.17462  [pdf, other

    cs.NI

    Integrated Sensing and Communication Channel Modeling: A Survey

    Authors: Zhiqing Wei, Jinzhu Jia, Yangyang Niu, Lin Wang, Huici Wu, Heng Yang, Zhiyong Feng

    Abstract: Integrated sensing and communication (ISAC) is expected to play a crucial role in the sixth-generation (6G) mobile communication systems, offering potential applications in the scenarios of intelligent transportation, smart factories, etc. The performance of radar sensing in ISAC systems is closely related to the characteristics of radar sensing and communication channels. Therefore, ISAC channel… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  40. arXiv:2404.17140  [pdf, other

    cs.CL

    Small Language Models Need Strong Verifiers to Self-Correct Reasoning

    Authors: Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang

    Abstract: Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether smaller-size (<= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs. We propose a novel pipel… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  41. PASGAL: Parallel And Scalable Graph Algorithm Library

    Authors: Xiaojun Dong, Yan Gu, Yihan Sun, Letong Wang

    Abstract: In this paper, we introduce PASGAL (Parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graph sizes. One special focus of PASGAL is the efficiency on \textit{large-diameter graphs}, which is a common challenge for many existing parallel graph processing systems: many existing graph processing systems can be ev… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  42. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  43. arXiv:2404.16501  [pdf, other

    cs.CV

    360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

    Authors: Xu Zheng, Pengyuan Zhou, Athanasios V. Vasilakos, Lin Wang

    Abstract: In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2)… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.12505

  44. arXiv:2404.16416  [pdf, other

    cs.CV

    Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition

    Authors: Yu Wang, Sanping Zhou, Kun Xia, Le Wang

    Abstract: Semi-supervised action recognition aims to improve spatio-temporal reasoning ability with a few labeled data in conjunction with a large amount of unlabeled data. Albeit recent advancements, existing powerful methods are still prone to making ambiguous predictions under scarce labeled data, embodied as the limitation of distinguishing different actions with similar spatio-temporal information. In… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures, 6 tables, 56 conferences

    MSC Class: 68U10; 68T45 ACM Class: I.2.10

  45. arXiv:2404.16375  [pdf, other

    cs.CV cs.AI cs.CL

    List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

    Authors: An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

    Abstract: Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. These tags, marked with alphanumerics, can be indexed via text tokens for easy reference. Despite the extraordinary performance from GPT-4V, we observe that other Multimodal Large Language Models (MLLMs) struggle to understand these vis… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Preprint

  46. arXiv:2404.16362  [pdf, other

    cs.CR

    Feature graph construction with static features for malware detection

    Authors: Binghui Zou, Chunjie Cao, Longjuan Wang, Yinan Cheng, Jingzhang Sun

    Abstract: Malware can greatly compromise the integrity and trustworthiness of information and is in a constant state of evolution. Existing feature fusion-based detection methods generally overlook the correlation between features. And mere concatenation of features will reduce the model's characterization ability, lead to low detection accuracy. Moreover, these methods are susceptible to concept drift and… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  47. arXiv:2404.16339  [pdf, other

    cs.CV cs.AI

    Training-Free Unsupervised Prompt for Vision-Language Models

    Authors: Sifan Long, Linbin Wang, Zhen Zhao, Zichang Tan, Yiming Wu, Shengsheng Wang, Jingdong Wang

    Abstract: Prompt learning has become the most effective paradigm for adapting large pre-trained vision-language models (VLMs) to downstream tasks. Recently, unsupervised prompt tuning methods, such as UPL and POUF, directly leverage pseudo-labels as supervisory information to fine-tune additional adaptation modules on unlabeled data. However, inaccurate pseudo labels easily misguide the tuning process and r… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  48. arXiv:2404.16296  [pdf

    cs.CV cs.AI

    Research on Splicing Image Detection Algorithms Based on Natural Image Statistical Characteristics

    Authors: Ao Xiang, Jingyu Zhang, Qin Yang, Liyang Wang, Yu Cheng

    Abstract: With the development and widespread application of digital image processing technology, image splicing has become a common method of image manipulation, raising numerous security and legal issues. This paper introduces a new splicing image detection algorithm based on the statistical characteristics of natural images, aimed at improving the accuracy and efficiency of splicing image detection. By a… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  49. arXiv:2404.16076  [pdf, other

    cs.SI cs.AI cs.CL cs.LG

    Semantic Evolvement Enhanced Graph Autoencoder for Rumor Detection

    Authors: Xiang Tao, Liang Wang, Qiang Liu, Shu Wu, Liang Wang

    Abstract: Due to the rapid spread of rumors on social media, rumor detection has become an extremely important challenge. Recently, numerous rumor detection models which utilize textual information and the propagation structure of events have been proposed. However, these methods overlook the importance of semantic evolvement information of event in propagation process, which is often challenging to be trul… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  50. arXiv:2404.16038  [pdf, other

    cs.CV cs.AI cs.MM

    A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

    Authors: Pengyuan Zhou, Lin Wang, Zhi Liu, Yanbin Hao, Pan Hui, Sasu Tarkoma, Jussi Kangasharju

    Abstract: This paper offers an insightful examination of how currently top-trending AI technologies, i.e., generative artificial intelligence (Generative AI) and large language models (LLMs), are reshaping the field of video technology, including video generation, understanding, and streaming. It highlights the innovative use of these technologies in producing highly realistic videos, a significant leap in… ▽ More

    Submitted 30 January, 2024; originally announced April 2024.

    Comments: 16 pages, 10 figures, 4 tables