Skip to main content

Showing 1–50 of 1,955 results for author: Li, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05916  [pdf

    cs.HC

    Leveraging Artificial Intelligence to Promote Awareness in Augmented Reality Systems

    Authors: Wangfan Li, Rohit Mallick, Carlos Toxtli-Hernandez, Christopher Flathmann, Nathan J. McNeese

    Abstract: Recent developments in artificial intelligence (AI) have permeated through an array of different immersive environments, including virtual, augmented, and mixed realities. AI brings a wealth of potential that centers on its ability to critically analyze environments, identify relevant artifacts to a goal or action, and then autonomously execute decision-making strategies to optimize the reward-to-… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

  2. arXiv:2405.05811  [pdf, other

    cs.CV

    Parallel Cross Strip Attention Network for Single Image Dehazing

    Authors: Lihan Tong, Yun Liu, Tian Ye, Weijia Li, Liyuan Chen, Erkang Chen

    Abstract: The objective of single image dehazing is to restore hazy images and produce clear, high-quality visuals. Traditional convolutional models struggle with long-range dependencies due to their limited receptive field size. While Transformers excel at capturing such dependencies, their quadratic computational complexity in relation to feature map resolution makes them less suitable for pixel-to-pixel… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 10 pages , 4 figures, CTISC'24

    Report number: C052

  3. arXiv:2405.05170  [pdf, other

    cs.MM cs.CV eess.IV

    Picking watermarks from noise (PWFN): an improved robust watermarking model against intensive distortions

    Authors: Sijing Xie, Chengxin Zhao, Nan Sun, Wei Li, Hefei Ling

    Abstract: Digital watermarking is the process of embedding secret information by altering images in a way that is undetectable to the human eye. To increase the robustness of the model, many deep learning-based watermarking methods use the encoder-decoder architecture by adding different noises to the noise layer. The decoder then extracts the watermarked information from the distorted image. However, this… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME2024

  4. arXiv:2405.05126  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Exploring Speech Pattern Disorders in Autism using Machine Learning

    Authors: Chuanbo Hu, Jacob Thrasher, Wenqi Li, Mindi Ruan, Xiangxu Yu, Lynn K Paul, Shuo Wang, Xin Li

    Abstract: Diagnosing autism spectrum disorder (ASD) by identifying abnormal speech patterns from examiner-patient dialogues presents significant challenges due to the subtle and diverse manifestations of speech-related symptoms in affected individuals. This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues. Utilizing a dataset… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2405.04416  [pdf, other

    cs.CV

    DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid

    Authors: Sidun Liu, Peng Qiao, Zongxin Ye, Wenyu Li, Yong Dou

    Abstract: Neural Radiance Field~(NeRF) achieves extremely high quality in object-scaled and indoor scene reconstruction. However, there exist some challenges when reconstructing large-scale scenes. MLP-based NeRFs suffer from limited network capacity, while volume-based NeRFs are heavily memory-consuming when the scene resolution increases. Recent approaches propose to geographically partition the scene and… ▽ More

    Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Originally submitted to Siggraph Asia 2023

  6. arXiv:2405.04100  [pdf, other

    cs.CV cs.LG

    ESP: Extro-Spective Prediction for Long-term Behavior Reasoning in Emergency Scenarios

    Authors: Dingrui Wang, Zheyuan Lai, Yuda Li, Yi Wu, Yuexin Ma, Johannes Betz, Ruigang Yang, Wei Li

    Abstract: Emergent-scene safety is the key milestone for fully autonomous driving, and reliable on-time prediction is essential to maintain safety in emergency scenarios. However, these emergency scenarios are long-tailed and hard to collect, which restricts the system from getting reliable predictions. In this paper, we build a new dataset, which aims at the long-term prediction with the inconspicuous stat… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICRA 2024 as Oral Presentation

  7. arXiv:2405.03971  [pdf, other

    cs.CV cs.MA

    Unified End-to-End V2X Cooperative Autonomous Driving

    Authors: Zhiwei Li, Bozhen Zhang, Lei Yang, Tianyu Shen, Nuo Xu, Ruosen Hao, Weiting Li, Tao Yan, Huaping Liu

    Abstract: V2X cooperation, through the integration of sensor data from both vehicles and infrastructure, is considered a pivotal approach to advancing autonomous driving technology. Current research primarily focuses on enhancing perception accuracy, often overlooking the systematic improvement of accident prediction accuracy through end-to-end learning, leading to insufficient attention to the safety issue… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  8. arXiv:2405.02958  [pdf, ps, other

    cs.CV

    Score-based Generative Priors Guided Model-driven Network for MRI Reconstruction

    Authors: Xiaoyu Qiao, Weisheng Li, Yuping Huang, Lijian Yang

    Abstract: Score matching with Langevin dynamics (SMLD) method has been successfully applied to accelerated MRI. However, the hyperparameters in the sampling process require subtle tuning, otherwise the results can be severely corrupted by hallucination artifacts, particularly with out-of-distribution test data. In this study, we propose a novel workflow in which SMLD results are regarded as additional prior… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  9. arXiv:2405.02957  [pdf, other

    cs.AI

    Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents

    Authors: Junkai Li, Siyu Wang, Meng Zhang, Weitao Li, Yunghwei Lai, Xinhui Kang, Weizhi Ma, Yang Liu

    Abstract: In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can s… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  10. arXiv:2405.01971  [pdf, other

    cs.RO cs.CV

    A Sonar-based AUV Positioning System for Underwater Environments with Low Infrastructure Density

    Authors: Emilio Olivastri, Daniel Fusaro, Wanmeng Li, Simone Mosco, Alberto Pretto

    Abstract: The increasing demand for underwater vehicles highlights the necessity for robust localization solutions in inspection missions. In this work, we present a novel real-time sonar-based underwater global positioning algorithm for AUVs (Autonomous Underwater Vehicles) designed for environments with a sparse distribution of human-made assets. Our approach exploits two synergistic data interpretation f… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

    Journal ref: IEEE ICRA Workshop on Field Robotics 2024

  11. arXiv:2405.01920  [pdf

    cs.CV

    Lightweight Change Detection in Heterogeneous Remote Sensing Images with Online All-Integer Pruning Training

    Authors: Chengyang Zhang, Weiming Li, Gang Li, Huina Song, Zhaohui Song, Xueqian Wang, Antonio Plaza

    Abstract: Detection of changes in heterogeneous remote sensing images is vital, especially in response to emergencies like earthquakes and floods. Current homogenous transformation-based change detection (CD) methods often suffer from high computation and memory costs, which are not friendly to edge-computation devices like onboard CD devices at satellites. To address this issue, this paper proposes a new l… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  12. arXiv:2405.01799  [pdf, other

    cs.CL cs.AI

    Exploiting ChatGPT for Diagnosing Autism-Associated Language Disorders and Identifying Distinct Features

    Authors: Chuanbo Hu, Wenqi Li, Mindi Ruan, Xiangxu Yu, Lynn K. Paul, Shuo Wang, Xin Li

    Abstract: Diagnosing language disorders associated with autism is a complex and nuanced challenge, often hindered by the subjective nature and variability of traditional assessment methods. Traditional diagnostic methods not only require intensive human effort but also often result in delayed interventions due to their lack of speed and specificity. In this study, we explored the application of ChatGPT, a s… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  13. arXiv:2405.00770  [pdf, other

    quant-ph cs.CC cs.LG

    Quantum-Classical Separations in Shallow-Circuit-Based Learning with and without Noises

    Authors: Zhihan Zhang, Weiyuan Gong, Weikang Li, Dong-Ling Deng

    Abstract: We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly wit… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 14 pages, 3 figures

  14. arXiv:2405.00566  [pdf, other

    cs.CE cs.CL q-fin.GN

    NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance

    Authors: Huan-Yi Su, Ke Wu, Yu-Hao Huang, Wu-Jun Li

    Abstract: Recently, many works have proposed various financial large language models (FinLLMs) by pre-training from scratch or fine-tuning open-sourced LLMs on financial corpora. However, existing FinLLMs exhibit unsatisfactory performance in understanding financial text when numeric variables are involved in questions. In this paper, we propose a novel LLM, called numeric-sensitive large language model (Nu… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  15. arXiv:2405.00542  [pdf, other

    eess.IV cs.CV

    UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement

    Authors: Ruiquan Ge, Zhaojie Fang, Pengxue Wei, Zhanghao Chen, Hongyang Jiang, Ahmed Elazab, Wangting Li, Xiang Wan, Shaochong Zhang, Changmiao Wang

    Abstract: Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  16. arXiv:2404.19043  [pdf, other

    cs.CV

    Improving Interpretability of Deep Active Learning for Flood Inundation Mapping Through Class Ambiguity Indices Using Multi-spectral Satellite Imagery

    Authors: Hyunho Lee, Wenwen Li

    Abstract: Flood inundation mapping is a critical task for responding to the increasing risk of flooding linked to global warming. Significant advancements of deep learning in recent years have triggered its extensive applications, including flood inundation mapping. To cope with the time-consuming and labor-intensive data labeling process in supervised learning, deep active learning strategies are one of th… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 46 pages, 11 figures, 5 tables

  17. arXiv:2404.18891  [pdf, other

    cs.CV cs.AI cs.LG

    IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation

    Authors: Kebin Wu, Wenbin Li, Xiaofei Xiao

    Abstract: The scarcity of labeled data in real-world scenarios is a critical bottleneck of deep learning's effectiveness. Semi-supervised semantic segmentation has been a typical solution to achieve a desirable tradeoff between annotation cost and segmentation performance. However, previous approaches, whether based on consistency regularization or self-training, tend to neglect the contextual knowledge emb… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 7 pages, 2 figures

  18. arXiv:2404.18359  [pdf, other

    cs.CL cs.AI

    FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

    Authors: Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He

    Abstract: In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choi… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  19. arXiv:2404.16924  [pdf, other

    cs.IR cs.CL

    A Survey of Generative Search and Recommendation in the Era of Large Language Models

    Authors: Yongqi Li, Xinyu Lin, Wenjie Wang, Fuli Feng, Liang Pang, Wenjie Li, Liqiang Nie, Xiangnan He, Tat-Seng Chua

    Abstract: With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs. As the two sides of the same coin, both revolve around the same core research problem, matching queries with documents or users with items. In the recent few decades, search and recommendation have experienced synchronous technological paradigm shifts, inclu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  20. arXiv:2404.16825  [pdf, other

    cs.CV eess.IV

    ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

    Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

    Abstract: With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  21. arXiv:2404.16824  [pdf, other

    cs.CV

    V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

    Authors: Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

    Abstract: AI-generated video has revolutionized short video production, filmmaking, and personalized media, making video local editing an essential tool. However, this progress also blurs the line between reality and fiction, posing challenges in multimedia forensics. To solve this urgent issue, V2A-Mark is proposed to address the limitations of current video tampering forensics, such as poor generalizabili… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  22. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  23. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  24. arXiv:2404.16444  [pdf, other

    cs.LG math.DS stat.AP stat.ML

    Automating the Discovery of Partial Differential Equations in Dynamical Systems

    Authors: Weizhen Li, Rui Carvalho

    Abstract: Identifying partial differential equations (PDEs) from data is crucial for understanding the governing mechanisms of natural phenomena, yet it remains a challenging task. We present an extension to the ARGOS framework, ARGOS-RAL, which leverages sparse regression with the recurrent adaptive lasso to identify PDEs from limited prior knowledge automatically. Our method automates calculating partial… ▽ More

    Submitted 2 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 18 pages, 6 figures, 1 table

  25. arXiv:2404.16322  [pdf, other

    cs.DB

    Bridging Speed and Accuracy to Approximate $K$-Nearest Neighbor Search

    Authors: Mingyu Yang, Jiabao Jin, Xiangyu Wang, Zhitao Shen, Wei Jia, Wentao Li, Wei Wang

    Abstract: Approximate K-Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. The efficiency of AKNN search largely depends on the computation of distances, a process that significantly affects the runtime. To improve computational efficiency, existing work often opts for estimating approximate distances rather than computing exact distances, at the cost of reduced… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages

  26. arXiv:2404.16304  [pdf, other

    cs.CV

    BezierFormer: A Unified Architecture for 2D and 3D Lane Detection

    Authors: Zhiwei Dong, Xi Zhu, Xiya Cao, Ran Ding, Wei Li, Caifa Zhou, Yongliang Wang, Qiangbo Liu

    Abstract: Lane detection has made significant progress in recent years, but there is not a unified architecture for its two sub-tasks: 2D lane detection and 3D lane detection. To fill this gap, we introduce BézierFormer, a unified 2D and 3D lane detection architecture based on Bézier curve lane representation. BézierFormer formulate queries as Bézier control points and incorporate a novel Bézier curve atten… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: ICME 2024, 11 pages, 8 figures

  27. arXiv:2404.16037  [pdf, other

    cs.CV cs.LG physics.ao-ph

    VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting

    Authors: Yutong Xiong, Xun Zhu, Ming Wu, Weiqing Li, Fanbin Mo, Chuang Zhang, Bin Zhang

    Abstract: Sparse meteorological forecasting is indispensable for fine-grained weather forecasting and deserves extensive attention. Recent studies have highlighted the potential of spatio-temporal graph convolutional networks (ST-GCNs) in predicting numerical data from ground weather stations. However, as one of the highest fidelity and lowest latency data, the application of the vision data from satellites… ▽ More

    Submitted 26 January, 2024; originally announced April 2024.

  28. arXiv:2404.14815  [pdf, other

    cs.LG

    Time-aware Heterogeneous Graph Transformer with Adaptive Attention Merging for Health Event Prediction

    Authors: Shibo Li, Hengliang Cheng, Runze Li, Weihua Li

    Abstract: The widespread application of Electronic Health Records (EHR) data in the medical field has led to early successes in disease risk prediction using deep learning methods. These methods typically require extensive data for training due to their large parameter sets. However, existing works do not exploit the full potential of EHR data. A significant challenge arises from the infrequent occurrence o… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 38 pages, 7 figures, 5 tables

  29. arXiv:2404.14720  [pdf, other

    cs.CR

    Incorporating Gradients to Rules: Towards Lightweight, Adaptive Provenance-based Intrusion Detection

    Authors: Lingzhi Wang, Xiangmin Shen, Weijian Li, Zhenyuan Li, R. Sekar, Han Liu, Yan Chen

    Abstract: As cyber-attacks become increasingly sophisticated and stealthy, it becomes more imperative and challenging to detect intrusion from normal behaviors. Through fine-grained causality analysis, provenance-based intrusion detection systems (PIDS) demonstrated a promising capacity to distinguish benign and malicious behaviors, attracting widespread attention from both industry and academia. Among dive… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  30. arXiv:2404.13983  [pdf, other

    cs.CV

    Structure-Aware Human Body Reshaping with Adaptive Affinity-Graph Network

    Authors: Qiwen Deng, Yangcen Liu, Wen Li, Guoqing Wang

    Abstract: Given a source portrait, the automatic human body reshaping task aims at editing it to an aesthetic body shape. As the technology has been widely used in media, several methods have been proposed mainly focusing on generating optical flow to warp the body shape. However, those previous works only consider the local transformation of different body parts (arms, torso, and legs), ignoring the global… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 11 pages;

  31. arXiv:2404.13647  [pdf, other

    cs.LG

    Mean Aggregator Is More Robust Than Robust Aggregators Under Label Poisoning Attacks

    Authors: Jie Peng, Weiyu Li, Qing Ling

    Abstract: Robustness to malicious attacks is of paramount importance for distributed learning. Existing works often consider the classical Byzantine attacks model, which assumes that some workers can send arbitrarily malicious messages to the server and disturb the aggregation steps of the distributed learning process. To defend against such worst-case Byzantine attacks, various robust aggregators have been… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  32. arXiv:2404.13311  [pdf, other

    cs.CV

    STAT: Towards Generalizable Temporal Action Localization

    Authors: Yangcen Liu, Ziyi Liu, Yuanhao Zhai, Wen Li, David Doerman, Junsong Yuan

    Abstract: Weakly-supervised temporal action localization (WTAL) aims to recognize and localize action instances with only video-level labels. Despite the significant progress, existing methods suffer from severe performance degradation when transferring to different distributions and thus may hardly adapt to real-world scenarios . To address this problem, we propose the Generalizable Temporal Action Localiz… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 14 pages, LaTeX;

  33. arXiv:2404.11960  [pdf, other

    cs.IR cs.AI

    Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers

    Authors: Fang Guo, Wenyu Li, Honglei Zhuang, Yun Luo, Yafu Li, Le Yan, Yue Zhang

    Abstract: The most recent pointwise Large Language Model (LLM) rankers have achieved remarkable ranking results. However, these rankers are hindered by two major drawbacks: (1) they fail to follow a standardized comparison guidance during the ranking process, and (2) they struggle with comprehensive considerations when dealing with complicated passages. To address these shortcomings, we propose to build a r… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  34. arXiv:2404.11958  [pdf, other

    cs.CV cs.RO

    Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation

    Authors: Song Wang, Jiawei Yu, Wentong Li, Wenyu Liu, Xiaolu Liu, Junbo Chen, Jianke Zhu

    Abstract: Semantic scene completion, also known as semantic occupancy prediction, can provide dense geometric and semantic information for autonomous vehicles, which attracts the increasing attention of both academia and industry. Unfortunately, existing methods usually formulate this task as a voxel-wise classification problem and treat each voxel equally in 3D space during training. As the hard voxels hav… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  35. arXiv:2404.11903  [pdf, other

    cs.CV

    Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition

    Authors: Xunsong Li, Pengzhan Sun, Yangcen Liu, Lixin Duan, Wen Li

    Abstract: The interactions between human and objects are important for recognizing object-centric actions. Existing methods usually adopt a two-stage pipeline, where object proposals are first detected using a pretrained detector, and then are fed to an action recognition model for extracting video features and learning the object relations for action recognition. However, since the action prior is unknown… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE Transactions on Multimedia

  36. arXiv:2404.10662  [pdf, other

    cs.LG cs.AI

    Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay

    Authors: Jinmei Liu, Wenbin Li, Xiangyu Yue, Shilin Zhang, Chunlin Chen, Zhi Wang

    Abstract: We study continual offline reinforcement learning, a practical paradigm that facilitates forward transfer and mitigates catastrophic forgetting to tackle sequential offline tasks. We propose a dual generative replay framework that retains previous knowledge by concurrent replay of generated pseudo-data. First, we decouple the continual learning policy into a diffusion-based generative behavior mod… ▽ More

    Submitted 18 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  37. arXiv:2404.10584  [pdf, other

    cs.CV

    ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig

    Authors: Chunli Peng, Xuan Dong, Tiantian Cao, Zhengqing Li, Kun Dong, Weixin Li

    Abstract: The fusion of images from dual camera systems featuring a wide-angle and a telephoto camera has become a hotspot problem recently. By integrating simultaneously captured wide-angle and telephoto images from these systems, the resulting fused image achieves a wide field of view (FOV) coupled with high-definition quality. Existing approaches are mostly deep learning methods, and predominantly rely o… ▽ More

    Submitted 29 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  38. arXiv:2404.10484  [pdf, other

    cs.CV

    AbsGS: Recovering Fine Details for 3D Gaussian Splatting

    Authors: Zongxin Ye, Wenyu Li, Sidun Liu, Peng Qiao, Yong Dou

    Abstract: 3D Gaussian Splatting (3D-GS) technique couples 3D Gaussian primitives with differentiable rasterization to achieve high-quality novel view synthesis results while providing advanced real-time rendering performance. However, due to the flaw of its adaptive density control strategy in 3D-GS, it frequently suffers from over-reconstruction issue in intricate scenes containing high-frequency details,… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  39. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  40. arXiv:2404.10312  [pdf, other

    cs.CV eess.IV

    OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

    Authors: Runyi Li, Xuhan Sheng, Weiqi Li, Jian Zhang

    Abstract: Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation method… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  41. arXiv:2404.10108  [pdf, other

    cs.CV cs.LG

    GeoAI Reproducibility and Replicability: a computational and spatial perspective

    Authors: Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Peter Kedron

    Abstract: GeoAI has emerged as an exciting interdisciplinary research area that combines spatial theories and data with cutting-edge AI models to address geospatial problems in a novel, data-driven manner. While GeoAI research has flourished in the GIScience literature, its reproducibility and replicability (R&R), fundamental principles that determine the reusability, reliability, and scientific rigor of re… ▽ More

    Submitted 22 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by Annals of the American Association of Geographers

  42. arXiv:2404.08016  [pdf, other

    cs.LG

    ONNXPruner: ONNX-Based General Model Pruning Adapter

    Authors: Dongdong Ren, Wenbin Li, Tianyu Ding, Lei Wang, Qi Fan, Jing Huo, Hongbing Pan, Yang Gao

    Abstract: Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process acros… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  43. arXiv:2404.07919  [pdf, other

    cs.LG cs.AI

    Low-rank Adaptation for Spatio-Temporal Forecasting

    Authors: Weilin Ruan, Wei Chen, Xilin Dang, Jianxiang Zhou, Weichuang Li, Xu Liu, Yuxuan Liang

    Abstract: Spatio-temporal forecasting is crucial in real-world dynamic systems, predicting future changes using historical data from diverse locations. Existing methods often prioritize the development of intricate neural networks to capture the complex dependencies of the data, yet their accuracy fails to show sustained improvement. Besides, these methods also overlook node heterogeneity, hindering customi… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  44. arXiv:2404.06835  [pdf, other

    cs.CV

    Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer

    Authors: Yanqi Ge, Jiaqi Liu, Qingnan Fan, Xi Jiang, Ye Huang, Shuai Qin, Hong Gu, Wen Li, Lixin Duan

    Abstract: In this work, we target the task of text-driven style transfer in the context of text-to-image (T2I) diffusion models. The main challenge is consistent structure preservation while enabling effective style transfer effects. The past approaches in this field directly concatenate the content and style prompts for a prompt-level style injection, leading to unavoidable structure distortions. In this w… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  45. arXiv:2404.06812  [pdf, other

    cs.CL

    Emotion-cause pair extraction method based on multi-granularity information and multi-module interaction

    Authors: Mingrui Fu, Weijiang Li

    Abstract: The purpose of emotion-cause pair extraction is to extract the pair of emotion clauses and cause clauses. On the one hand, the existing methods do not take fully into account the relationship between the emotion extraction of two auxiliary tasks. On the other hand, the existing two-stage model has the problem of error propagation. In addition, existing models do not adequately address the emotion… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  46. arXiv:2404.06787  [pdf, other

    cs.LG cs.AI

    Private Wasserstein Distance with Random Noises

    Authors: Wenqian Li, Haozhi Wang, Zhe Huang, Yan Pang

    Abstract: Wasserstein distance is a principle measure of data divergence from a distributional standpoint. However, its application becomes challenging in the context of data privacy, where sharing raw data is restricted. Prior attempts have employed techniques like Differential Privacy or Federated optimization to approximate Wasserstein distance. Nevertheless, these approaches often lack accuracy and robu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  47. arXiv:2404.06524  [pdf, other

    cs.NE cs.AI

    An Enhanced Grey Wolf Optimizer with Elite Inheritance and Balance Search Mechanisms

    Authors: Jianhua Jiang, Ziying Zhao, Weihua Li, Keqin Li

    Abstract: The Grey Wolf Optimizer (GWO) is recognized as a novel meta-heuristic algorithm inspired by the social leadership hierarchy and hunting mechanism of grey wolves. It is well-known for its simple parameter setting, fast convergence speed, and strong optimization capability. In the original GWO, there are two significant design flaws in its fundamental optimization mechanisms. Problem (1): the algori… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 51 pages, 21 tables, 16 figures, journal

  48. arXiv:2404.06512  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

    Authors: Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution. Recent efforts have aimed to enhance the high-resolution understanding capabilities of LVLMs, yet they remain capped at approximately 1500 x 1500 pixels and constrained to a relatively narrow reso… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Code and models are publicly available at https://github.com/InternLM/InternLM-XComposer

  49. arXiv:2404.06330  [pdf, other

    cs.LG cs.AI

    Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning

    Authors: Yanjie Li, Weijun Li, Lina Yu, Min Wu, Jingyi Liu, Wenqiang Li, Meilan Hao, Shu Wei, Yusong Deng

    Abstract: The mathematical formula is the human language to describe nature and is the essence of scientific research. Finding mathematical formulas from observational data is a major demand of scientific research and a major challenge of artificial intelligence. This area is called symbolic regression. Originally symbolic regression was often formulated as a combinatorial optimization problem and solved us… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 21 pages

  50. arXiv:2404.06075  [pdf, other

    cs.CV

    LIPT: Latency-aware Image Processing Transformer

    Authors: Junbo Qiao, Wei Li, Haizhen Xie, Hanting Chen, Yunshuai Zhou, Zhijun Tu, Jie Hu, Shaohui Lin

    Abstract: Transformer is leading a trend in the field of image processing. Despite the great success that existing lightweight image processing transformers have achieved, they are tailored to FLOPs or parameters reduction, rather than practical inference acceleration. In this paper, we present a latency-aware image processing transformer, termed LIPT. We devise the low-latency proportion LIPT block that su… ▽ More

    Submitted 28 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.