Skip to main content

Showing 1–50 of 974 results for author: Li, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05512  [pdf, other

    cs.LG cs.AI math.NA math.ST

    Characteristic Learning for Provable One Step Generation

    Authors: Zhao Ding, Chenguang Duan, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, Pingwen Zhang

    Abstract: We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  2. arXiv:2405.04918  [pdf, other

    cs.CV cs.AI

    Delve into Base-Novel Confusion: Redundancy Exploration for Few-Shot Class-Incremental Learning

    Authors: Haichen Zhou, Yixiong Zou, Ruixuan Li, Yuhua Li, Kui Xiao

    Abstract: Few-shot class-incremental learning (FSCIL) aims to acquire knowledge from novel classes with limited samples while retaining information about base classes. Existing methods address catastrophic forgetting and overfitting by freezing the feature extractor during novel-class learning. However, these methods usually tend to cause the confusion between base and novel classes, i.e., classifying novel… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.04823  [pdf, other

    cs.DS

    Counting Cohesive Subgraphs with Hereditary Properties

    Authors: Rong-Hua Li, Xiaowei Ye, Fusheng Jin, Yu-Ping Wang, Ye Yuan, Guoren Wang

    Abstract: Counting small cohesive subgraphs in a graph is a fundamental operation with numerous applications in graph analysis. Previous studies on cohesive subgraph counting are mainly based on the clique model, which aim to count the number of $k$-cliques in a graph with a small $k$. However, the clique model often proves too restrictive for practical use. To address this issue, we investigate a new probl… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2405.04103  [pdf, other

    cs.CV

    COM3D: Leveraging Cross-View Correspondence and Cross-Modal Mining for 3D Retrieval

    Authors: Hao Wu, Ruochong LI, Hao Wang, Hui Xiong

    Abstract: In this paper, we investigate an open research task of cross-modal retrieval between 3D shapes and textual descriptions. Previous approaches mainly rely on point cloud encoders for feature extraction, which may ignore key inherent features of 3D shapes, including depth, spatial hierarchy, geometric continuity, etc. To address this issue, we propose COM3D, making the first attempt to exploit the cr… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024 oral

  5. arXiv:2405.03712  [pdf, other

    cs.LG cs.AI cs.CR cs.NE

    Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition

    Authors: Xiaoyan Su, Yinghao Zhu, Run Li

    Abstract: In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first met… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  6. arXiv:2405.03372  [pdf, other

    cs.NI cs.AI

    Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G

    Authors: Xiaoxue Yu, Xingfu Yi, Rongpeng Li, Fei Wang, Chenghui Peng, Zhifeng Zhao, Honggang Zhang

    Abstract: In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, cos… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 7 pages, 6 figures

  7. arXiv:2405.03205  [pdf, other

    cs.CL cs.AI cs.LG

    Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions

    Authors: Ruizhe Li, Yanjun Gao

    Abstract: Large Language Models (LLMs), such as the GPT-4 and LLaMA families, have demonstrated considerable success across diverse tasks, including multiple-choice questions (MCQs). However, these models exhibit a positional bias, particularly an even worse anchored bias in the GPT-2 family, where they consistently favour the first choice 'A' in MCQs during inference. This anchored bias challenges the inte… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Work in process

  8. arXiv:2405.02357  [pdf, other

    cs.LG

    Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks

    Authors: Zijian Zhang, Yujie Sun, Zepu Wang, Yuqi Nie, Xiaobo Ma, Peng Sun, Ruolin Li

    Abstract: Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban plan… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 9 pages

  9. arXiv:2404.17590  [pdf, other

    cs.IR cs.AI

    Leveraging Intra-modal and Inter-modal Interaction for Multi-Modal Entity Alignment

    Authors: Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Ru Li, Jeff Z. Pan

    Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entity pairs across different multi-modal knowledge graphs (MMKGs). Existing approaches focus on how to better encode and aggregate information from different modalities. However, it is not trivial to leverage multi-modal knowledge in entity alignment due to the modal heterogeneity. In this paper, we propose a Multi-Grained Interactio… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  10. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  11. arXiv:2404.16824  [pdf, other

    cs.CV

    V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

    Authors: Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

    Abstract: AI-generated video has revolutionized short video production, filmmaking, and personalized media, making video local editing an essential tool. However, this progress also blurs the line between reality and fiction, posing challenges in multimedia forensics. To solve this urgent issue, V2A-Mark is proposed to address the limitations of current video tampering forensics, such as poor generalizabili… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  12. arXiv:2404.16221  [pdf, other

    cs.CV cs.DC cs.GR

    NeRF-XL: Scaling NeRFs with Multiple GPUs

    Authors: Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams

    Abstract: We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improve… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Webpage: https://research.nvidia.com/labs/toronto-ai/nerfxl/

  13. arXiv:2404.15238  [pdf, other

    cs.CL cs.AI

    CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies

    Authors: Weiyan Shi, Ryan Li, Yutong Zhang, Caleb Ziems, Chunhua yu, Raya Horesh, Rogério Abreu de Paula, Diyi Yang

    Abstract: To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 32 pages, 7 figures, preprint

  14. arXiv:2404.14835  [pdf, other

    cs.CV

    Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking

    Authors: Kexin Meng, Ruirui Li, Daguang Jiang

    Abstract: Human pose estimation is a fundamental and challenging task in computer vision. Larger-scale and more accurate keypoint annotations, while helpful for improving the accuracy of supervised pose estimation, are often expensive and difficult to obtain. Semi-supervised pose estimation tries to leverage a large amount of unlabeled data to improve model performance, which can alleviate the problem of in… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: China Multimedia 2023

  15. arXiv:2404.14815  [pdf, other

    cs.LG

    Time-aware Heterogeneous Graph Transformer with Adaptive Attention Merging for Health Event Prediction

    Authors: Shibo Li, Hengliang Cheng, Runze Li, Weihua Li

    Abstract: The widespread application of Electronic Health Records (EHR) data in the medical field has led to early successes in disease risk prediction using deep learning methods. These methods typically require extensive data for training due to their large parameter sets. However, existing works do not exploit the full potential of EHR data. A significant challenge arises from the infrequent occurrence o… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 38 pages, 7 figures, 5 tables

  16. arXiv:2404.14061  [pdf, other

    cs.LG cs.AI cs.DB cs.SI

    FedTAD: Topology-aware Data-free Knowledge Distillation for Subgraph Federated Learning

    Authors: Yinlin Zhu, Xunkai Li, Zhengyu Wu, Di Wu, Miao Hu, Rong-Hua Li

    Abstract: Subgraph federated learning (subgraph-FL) is a new distributed paradigm that facilitates the collaborative training of graph neural networks (GNNs) by multi-client subgraphs. Unfortunately, a significant challenge of subgraph-FL arises from subgraph heterogeneity, which stems from node and topology variation, causing the impaired performance of the global GNN. Despite various studies, they have no… ▽ More

    Submitted 25 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  17. arXiv:2404.13501  [pdf, other

    cs.AI

    A Survey on the Memory Mechanism of Large Language Model based Agents

    Authors: Zeyu Zhang, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Quanyu Dai, Jieming Zhu, Zhenhua Dong, Ji-Rong Wen

    Abstract: Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions. The key component to support agent-environment interactions is the m… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 39 pages, 5 figures, 4 tables

  18. arXiv:2404.12739  [pdf, other

    cs.CV

    The Solution for the CVPR2024 NICE Image Captioning Challenge

    Authors: Longfei Huang, Shupeng Zhong, Xiangyu Wu, Ruoxuan Li

    Abstract: This report introduces a solution to the Topic 1 Zero-shot Image Captioning of 2024 NICE : New frontiers for zero-shot Image Captioning Evaluation. In contrast to NICE 2023 datasets, this challenge involves new annotations by humans with significant differences in caption style and content. Therefore, we enhance image captions effectively through retrieval augmentation and caption grading methods.… ▽ More

    Submitted 29 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  19. arXiv:2404.12738  [pdf, other

    cs.NI cs.CR

    DeviceRadar: Online IoT Device Fingerprinting in ISPs using Programmable Switches

    Authors: Ruoyu Li, Qing Li, Tao Lin, Qingsong Zou, Dan Zhao, Yucheng Huang, Gareth Tyson, Guorui Xie, Yong Jiang

    Abstract: Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of teraby… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE/ACM Transactions on Networking (ToN)

  20. arXiv:2404.12693  [pdf, other

    cs.CV cs.LG

    Improving Chinese Character Representation with Formation Tree

    Authors: Yang Hong, Yinfei Li, Xiaojun Qiao, Rui Li, Junsong Zhang

    Abstract: Learning effective representations for Chinese characters presents unique challenges, primarily due to the vast number of characters and their continuous growth, which requires models to handle an expanding category space. Additionally, the inherent sparsity of character usage complicates the generalization of learned representations. Prior research has explored radical-based sequences to overcome… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  21. arXiv:2404.11797  [pdf, other

    cs.CV cs.AI cs.LG

    When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery

    Authors: Yiqun Xie, Zhihao Wang, Weiye Chen, Zhili Li, Xiaowei Jia, Yanhua Li, Ruichen Wang, Kangyang Chai, Ruohan Li, Sergii Skakun

    Abstract: Foundation models, i.e., very large deep learning models, have demonstrated impressive performances in various language and vision tasks that are otherwise difficult to reach using smaller-size models. The major success of GPT-type of language models is particularly exciting and raises expectations on the potential of foundation models in other domains including satellite remote sensing. In this c… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  22. arXiv:2404.11068  [pdf, other

    cs.LG cs.AI cs.DC q-bio.QM

    ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

    Authors: Feiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch

    Abstract: AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training code. OpenFold is the first trainable public reimplementation of AlphaFold. AlphaFold training procedure is prohibitively time-consuming, and gets diminishing benefits from scaling to more compute res… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  23. arXiv:2404.10354  [pdf

    q-bio.QM cs.CE cs.LG

    Physical formula enhanced multi-task learning for pharmacokinetics prediction

    Authors: Ruifeng Li, Dongzhan Zhou, Ancheng Shen, Ao Zhang, Mao Su, Mingqian Li, Hongyang Chen, Gang Chen, Yin Zhang, Shufei Zhang, Yuqiang Li, Wanli Ouyang

    Abstract: Artificial intelligence (AI) technology has demonstrated remarkable potential in drug dis-covery, where pharmacokinetics plays a crucial role in determining the dosage, safety, and efficacy of new drugs. A major challenge for AI-driven drug discovery (AIDD) is the scarcity of high-quality data, which often requires extensive wet-lab work. A typical example of this is pharmacokinetic experiments. I… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  24. arXiv:2404.10312  [pdf, other

    cs.CV eess.IV

    OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

    Authors: Runyi Li, Xuhan Sheng, Weiqi Li, Jian Zhang

    Abstract: Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation method… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  25. arXiv:2404.09848  [pdf, other

    cs.AI cs.LG

    HyperMono: A Monotonicity-aware Approach to Hyper-Relational Knowledge Representation

    Authors: Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Ru Li, Jeff Z. Pan

    Abstract: In a hyper-relational knowledge graph (HKG), each fact is composed of a main triple associated with attribute-value qualifiers, which express additional factual knowledge. The hyper-relational knowledge graph completion (HKGC) task aims at inferring plausible missing links in a HKG. Most existing approaches to HKGC focus on enhancing the communication between qualifier pairs and main triples, whil… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  26. arXiv:2404.09506  [pdf, other

    cs.IT eess.SP

    Performance analysis of satellite-terrestrial integrated radio access networks based on stochastic geometry

    Authors: Yaohua Sun, Ruiwen Li

    Abstract: To enhance coverage and improve service continuity, satellite-terrestrial integrated radio access network (STIRAN) has been seen as an essential trend in the development of 6G. However, there is still a lack of theoretical analysis on its coverage performance. To fill this gap, we first establish a system model to characterize a typical scenario where low-earth-orbit (LEO) satellites and terrestri… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  27. arXiv:2404.09447  [pdf, other

    cs.CV cs.LG

    kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies

    Authors: Zhongrui Gui, Shuyang Sun, Runjia Li, Jianhao Yuan, Zhaochong An, Karsten Roth, Ameya Prabhu, Philip Torr

    Abstract: Rapid advancements in continual segmentation have yet to bridge the gap of scaling to large continually expanding vocabularies under compute-constrained scenarios. We discover that traditional continual training leads to catastrophic forgetting under compute constraints, unable to outperform zero-shot segmentation methods. We introduce a novel strategy for semantic and panoptic segmentation with z… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 10 pages, 3 figures

  28. arXiv:2404.09313  [pdf, other

    eess.AS cs.AI

    Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

    Authors: Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

    Abstract: A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consi… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  29. arXiv:2404.07992  [pdf, other

    cs.CV

    GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

    Authors: Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang

    Abstract: Matching cost aggregation plays a fundamental role in learning-based multi-view stereo networks. However, directly aggregating adjacent costs can lead to suboptimal results due to local geometric inconsistency. Related methods either seek selective aggregation or improve aggregated depth in the 2D space, both are unable to handle geometric inconsistency in the cost volume effectively. In this pape… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page: https://wuuu3511.github.io/gomvs/ Code: https://github.com/Wuuu3511/GoMVS

  30. arXiv:2404.07696  [pdf, other

    cs.LG cs.CV

    Flatness Improves Backbone Generalisation in Few-shot Classification

    Authors: Rui Li, Martin Trapp, Marcus Klasson, Arno Solin

    Abstract: Deployment of deep neural networks in real-world settings typically requires adaptation to new tasks with few examples. Few-shot classification (FSC) provides a solution to this problem by leveraging pre-trained backbones for fast adaptation to new classes. Surprisingly, most efforts have only focused on developing architectures for easing the adaptation to the target domain without considering th… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  31. arXiv:2404.07473  [pdf

    eess.IV cs.CV cs.LG

    LUCF-Net: Lightweight U-shaped Cascade Fusion Network for Medical Image Segmentation

    Authors: Songkai Sun, Qingshan She, Yuliang Ma, Rihui Li, Yingchun Zhang

    Abstract: In this study, the performance of existing U-shaped neural network architectures was enhanced for medical image segmentation by adding Transformer. Although Transformer architectures are powerful at extracting global information, its ability to capture local information is limited due to its high complexity. To address this challenge, we proposed a new lightweight U-shaped cascade fusion network (… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  32. arXiv:2404.06753  [pdf, other

    cs.CV

    MonoSelfRecon: Purely Self-Supervised Explicit Generalizable 3D Reconstruction of Indoor Scenes from Monocular RGB Views

    Authors: Runfa Li, Upal Mahbub, Vasudev Bhaskaran, Truong Nguyen

    Abstract: Current monocular 3D scene reconstruction (3DR) works are either fully-supervised, or not generalizable, or implicit in 3D representation. We propose a novel framework - MonoSelfRecon that for the first time achieves explicit 3D mesh reconstruction for generalizable indoor scenes with monocular RGB views by purely self-supervision on voxel-SDF (signed distance function). MonoSelfRecon follows an A… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  33. arXiv:2404.06325  [pdf, other

    cs.AI

    Automatically Learning HTN Methods from Landmarks

    Authors: Ruoxi Li, Dana Nau, Mark Roberts, Morgan Fine-Morris

    Abstract: Hierarchical Task Network (HTN) planning usually requires a domain engineer to provide manual input about how to decompose a planning problem. Even HTN-MAKER, a well-known method-learning algorithm, requires a domain engineer to annotate the tasks with information about what to learn. We introduce CURRICULAMA, an HTN method learning algorithm that completely automates the learning process. It uses… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to FLAIRS-24

  34. arXiv:2404.04997  [pdf, other

    cs.LG cs.AI cs.CL

    Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

    Authors: Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd

    Abstract: The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context windo… ▽ More

    Submitted 18 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by the 2024 International Conference on Image Processing and Computer Applications (IPCA 2024)

  35. arXiv:2404.04800  [pdf, other

    cs.LG cs.CV stat.ML

    Coordinated Sparse Recovery of Label Noise

    Authors: Yukun Yang, Naihao Wang, Haixin Yang, Ruirui Li

    Abstract: Label noise is a common issue in real-world datasets that inevitably impacts the generalization of models. This study focuses on robust classification tasks where the label noise is instance-dependent. Estimating the transition matrix accurately in this task is challenging, and methods based on sample selection often exhibit confirmation bias to varying degrees. Sparse over-parameterized training… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Pre-print prior to submission to journal

  36. arXiv:2404.04579  [pdf, other

    cs.HC

    TeleAware Robot: Designing Awareness-augmented Telepresence Robot for Remote Collaborative Locomotion

    Authors: Ruyi Li, Yaxin Zhu, Min Liu, Yihang Zeng, Shanning Zhuang, Jiayi Fu, Yi Lu, Guyue Zhou, Can Liu, Jiangtao Gong

    Abstract: Telepresence robots can be used to support users to navigate an environment remotely and share the visiting experience with their social partners. Although such systems allow users to see and hear the remote environment and communicate with their partners via live video feed, this does not provide enough awareness of the environment and their remote partner's activities. In this paper, we introduc… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 33 pages, 12 figures

    MSC Class: H.5.2

    Journal ref: IMUWT 2024

  37. arXiv:2404.03658  [pdf, other

    cs.CV

    Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

    Authors: Rui Li, Tobias Fischer, Mattia Segu, Marc Pollefeys, Luc Van Gool, Federico Tombari

    Abstract: Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page: https://ruili3.github.io/kyn

  38. arXiv:2404.01780  [pdf, other

    astro-ph.IM astro-ph.GA cs.CV

    CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST)

    Authors: Xu Li, Ruiqi Sun, Jiameng Lv, Peng Jia, Nan Li, Chengliang Wei, Zou Hu, Xinzhong Er, Yun Chen, Zhang Ban, Yuedong Fang, Qi Guo, Dezi Liu, Guoliang Li, Lin Lin, Ming Li, Ran Li, Xiaobo Li, Yu Luo, Xianmin Meng, Jundan Nie, Zhaoxiang Qi, Yisheng Qiu, Li Shao, Hao Tian , et al. (7 additional authors not shown)

    Abstract: Strong gravitational lensing is a powerful tool for investigating dark matter and dark energy properties. With the advent of large-scale sky surveys, we can discover strong lensing systems on an unprecedented scale, which requires efficient tools to extract them from billions of astronomical objects. The existing mainstream lens-finding tools are based on machine learning algorithms and applied to… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: The paper is accepted by the AJ. The complete code could be downloaded with DOI of: 10.12149/101393. Comments are welcome

  39. arXiv:2404.01642  [pdf, ps, other

    cs.LG cs.CR

    ADVREPAIR:Provable Repair of Adversarial Attack

    Authors: Zhiming Chi, Jianan Ma, Pengfei Yang, Cheng-Chao Huang, Renjue Li, Xiaowei Huang, Lijun Zhang

    Abstract: Deep neural networks (DNNs) are increasingly deployed in safety-critical domains, but their vulnerability to adversarial attacks poses serious safety risks. Existing neuron-level methods using limited data lack efficacy in fixing adversaries due to the inherent complexity of adversarial attack mechanisms, while adversarial training, leveraging a large number of adversarial samples to enhance robus… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  40. arXiv:2404.01153  [pdf, other

    stat.ML cs.DC cs.LG math.ST stat.ME

    TransFusion: Covariate-Shift Robust Transfer Learning for High-Dimensional Regression

    Authors: Zelin He, Ying Sun, Jingyuan Liu, Runze Li

    Abstract: The main challenge that sets transfer learning apart from traditional supervised learning is the distribution shift, reflected as the shift between the source and target models and that between the marginal covariate distributions. In this work, we tackle model shifts in the presence of covariate shifts in the high-dimensional regression setting. Specifically, we propose a two-step method with a n… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

  41. arXiv:2404.00909  [pdf, other

    cs.CV

    Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

    Authors: Rongjie Li, Yu Wu, Xuming He

    Abstract: Generative vision-language models (VLMs) have shown impressive performance in zero-shot vision-language tasks like image captioning and visual question answering. However, improving their zero-shot reasoning typically requires second-stage instruction tuning, which relies heavily on human-labeled or large language model-generated annotation, incurring high labeling costs. To tackle this challenge,… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  42. arXiv:2404.00906  [pdf, other

    cs.CV

    From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

    Authors: Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He

    Abstract: Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks. Despite recent advancements, existing methods struggle to generate scene graphs with novel visual relation concepts. To address this challenge, we introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-t… ▽ More

    Submitted 24 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  43. arXiv:2403.20168  [pdf, other

    eess.IV cs.CV

    Unsupervised Tumor-Aware Distillation for Multi-Modal Brain Image Translation

    Authors: Chuan Huang, Jia Wei, Rui Li

    Abstract: Multi-modal brain images from MRI scans are widely used in clinical diagnosis to provide complementary information from different modalities. However, obtaining fully paired multi-modal images in practice is challenging due to various factors, such as time, cost, and artifacts, resulting in modality-missing brain images. To address this problem, unsupervised multi-modal brain image translation has… ▽ More

    Submitted 24 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures. It has been provisionally accepted for IJCNN 2024

  44. arXiv:2403.19248  [pdf, other

    cs.CR cs.NI

    Genos: General In-Network Unsupervised Intrusion Detection by Rule Extraction

    Authors: Ruoyu Li, Qing Li, Yu Zhang, Dan Zhao, Xi Xiao, Yong Jiang

    Abstract: Anomaly-based network intrusion detection systems (A-NIDS) use unsupervised models to detect unforeseen attacks. However, existing A-NIDS solutions suffer from low throughput, lack of interpretability, and high maintenance costs. Recent in-network intelligence (INI) exploits programmable switches to offer line-rate deployment of NIDS. Nevertheless, current in-network NIDS are either model-specific… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: accepted by IEEE International Conference on Computer Communications (INFOCOM 2024)

  45. arXiv:2403.18341  [pdf, other

    cs.CL

    IterAlign: Iterative Constitutional Alignment of Large Language Models

    Authors: Xiusi Chen, Hongzhi Wen, Sreyashi Nag, Chen Luo, Qingyu Yin, Ruirui Li, Zheng Li, Wei Wang

    Abstract: With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are l… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: NAACL 2024

  46. arXiv:2403.18253  [pdf, other

    cs.CL

    Enhancing Metaphor Detection through Soft Labels and Target Word Prediction

    Authors: Kaidi Jia, Rongsheng Li

    Abstract: Metaphors play a significant role in our everyday communication, yet detecting them presents a challenge. Traditional methods often struggle with improper application of language rules and a tendency to overlook data sparsity. To address these issues, we integrate knowledge distillation and prompt learning into metaphor detection. Our approach revolves around a tailored prompt learning framework s… ▽ More

    Submitted 8 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  47. arXiv:2403.16204  [pdf, other

    cs.CL cs.DB cs.HC

    SQL-Encoder: Improving NL2SQL In-Context Learning Through a Context-Aware Encoder

    Authors: Mohammadreza Pourreza, Davood Rafiei, Yuxi Feng, Raymond Li, Zhenan Fan, Weiwei Zhang

    Abstract: Detecting structural similarity between queries is essential for selecting examples in in-context learning models. However, assessing structural similarity based solely on the natural language expressions of queries, without considering SQL queries, presents a significant challenge. This paper explores the significance of this similarity metric and proposes a model for accurately estimating it. To… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  48. arXiv:2403.16137  [pdf, other

    cs.LG cs.SI

    A Survey on Self-Supervised Pre-Training of Graph Foundation Models: A Knowledge-Based Perspective

    Authors: Ziwen Zhao, Yuhua Li, Yixiong Zou, Ruixuan Li, Rui Zhang

    Abstract: Graph self-supervised learning is now a go-to method for pre-training graph foundation models, including graph neural networks, graph transformers, and more recent large language model (LLM)-based graph models. There is a wide variety of knowledge patterns embedded in the structure and properties of graphs which may be used for pre-training, but we lack a systematic overview of self-supervised pre… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Work in progress

  49. arXiv:2403.15770  [pdf, other

    eess.IV cs.CV

    Graph Image Prior for Unsupervised Dynamic MRI Reconstruction

    Authors: Zhongsen Li, Wenxuan Chen, Shuai Wang, Chuyu Liu, Rui Li

    Abstract: The inductive bias of the convolutional neural network (CNN) can act as a strong prior for image restoration, which is known as the Deep Image Prior (DIP). In recent years, DIP has been utilized in unsupervised dynamic MRI reconstruction, which adopts a generative model from the latent space to the image space. However, existing methods usually utilize a single pyramid-shaped CNN architecture to p… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  50. arXiv:2403.15382  [pdf, other

    cs.CV

    DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

    Authors: Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

    Abstract: We introduce DragAPart, a method that, given an image and a set of drags as input, can generate a new image of the same object in a new state, compatible with the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://dragapart.github.io/