Skip to main content

Showing 1–50 of 1,305 results for author: Sun, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05587  [pdf, other

    cs.CV cs.LG

    Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse

    Authors: Yining Wang, Junjie Sun, Chenyue Wang, Mi Zhang, Min Yang

    Abstract: Recent studies have noted an intriguing phenomenon termed Neural Collapse, that is, when the neural networks establish the right correlation between feature spaces and the training targets, their last-layer features, together with the classifier weights, will collapse into a stable and symmetric structure. In this paper, we extend the investigation of Neural Collapse to the biased datasets with im… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: CVPR 2024 Highlight

  2. arXiv:2405.05160  [pdf, other

    cs.LG cs.AI cs.CV

    Selective Classification Under Distribution Shifts

    Authors: Hengyue Liang, Le Peng, Ju Sun

    Abstract: In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers -- imperfect either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond -- in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Total 25 pages (14 pages for main body); preprint for journal submission

  3. arXiv:2405.04662  [pdf, other

    cs.CV

    Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar

    Authors: David Borts, Erich Liang, Tim Brödermann, Andrea Ramazzina, Stefanie Walz, Edoardo Palladin, Jipeng Sun, David Bruggemann, Christos Sakaridis, Luc Van Gool, Mario Bijelic, Felix Heide

    Abstract: Neural fields have been broadly investigated as scene representations for the reproduction and novel generation of diverse outdoor scenes, including those autonomous vehicles and robots must handle. While successful approaches for RGB and LiDAR data exist, neural reconstruction methods for radar as a sensing modality have been largely unexplored. Operating at millimeter wavelengths, radar sensors… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures, to be published in SIGGRAPH 2024

  4. Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning

    Authors: Yubo Mai, Zhipeng Gao, Xing Hu, Lingfeng Bao, Yu Liu, Jianling Sun

    Abstract: Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2405.03273  [pdf, other

    cs.RO

    Evaluation of Drivers' Interaction Ability at Social Scenarios: A Process-Based Framework

    Authors: Jiaqi Liu, Peng Hang, Xiangwang Hu, Jian Sun

    Abstract: Assessing drivers' interaction capabilities is crucial for understanding human driving behavior and enhancing the interactive abilities of autonomous vehicles. In scenarios involving strong interaction, existing metrics focused on interaction outcomes struggle to capture the evolutionary process of drivers' interactive behaviors, making it challenging for autonomous vehicles to dynamically assess… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.03185  [pdf, other

    cs.LG

    Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner

    Authors: Tong Nie, Guoyang Qin, Wei Ma, Jian Sun

    Abstract: Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system. Existing methods aim to reconstruct STTD using low-dimensional models. However, they are limited to data-specific dimensions or source-dependent patterns, restricting them from unifying representations. Here, we present a novel paradigm to address the STTD learning problem by parame… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  7. arXiv:2405.02843  [pdf, other

    cs.CV

    Residual-Conditioned Optimal Transport: Towards Structure-preserving Unpaired and Paired Image Restoration

    Authors: Xiaole Tang, Xin Hu, Xiang Gu, Jian Sun

    Abstract: Deep learning-based image restoration methods have achieved promising performance. However, how to faithfully preserve the structure of the original image remains challenging. To address this challenge, we propose a novel Residual-Conditioned Optimal Transport (RCOT) approach, which models the image restoration as an optimal transport (OT) problem for both unpaired and paired settings, integrating… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  8. arXiv:2405.01216  [pdf, other

    cs.CL cs.AI

    DMON: A Simple yet Effective Approach for Argument Structure Learning

    Authors: Wei Sun, Mingxiao Li, Jingyuan Sun, Jesse Davis, Marie-Francine Moens

    Abstract: Argument structure learning~(ASL) entails predicting relations between arguments. Because it can structure a document to facilitate its understanding, it has been widely applied in many fields~(medical, commercial, and scientific domains). Despite its broad utilization, ASL remains a challenging task because it involves examining the complex relationships between the sentences in a potentially uns… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: COLING 2024

  9. arXiv:2405.00797  [pdf, other

    cs.RO cs.CV

    ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

    Authors: Jiahui Li, Tianle Shen, Zekai Gu, Jiawei Sun, Chengran Yuan, Yuhang Han, Shuo Sun, Marcelo H. Ang Jr

    Abstract: Motion prediction is a challenging problem in autonomous driving as it demands the system to comprehend stochastic dynamics and the multi-modal nature of real-world agent interactions. Diffusion models have recently risen to prominence, and have proven particularly effective in pedestrian motion prediction tasks. However, the significant time consumption and sensitivity to noise have limited the r… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures

  10. arXiv:2405.00715  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation

    Authors: Hanyin Wang, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Korsapati, Jimeng Sun

    Abstract: Large Language Models (LLMs) have shown promising capabilities in handling clinical text summarization tasks. In this study, we demonstrate that a small open-source LLM can be effectively trained to generate high-quality clinical notes from outpatient patient-doctor dialogues. We achieve this through a comprehensive domain- and task-specific adaptation process for the LLaMA-2 13 billion parameter… ▽ More

    Submitted 25 April, 2024; originally announced May 2024.

  11. arXiv:2404.18534  [pdf, other

    cs.CL cs.AI cs.CR cs.SE

    Evaluating and Mitigating Linguistic Discrimination in Large Language Models

    Authors: Guoliang Dong, Haoyu Wang, Jun Sun, Xinyu Wang

    Abstract: By training on text in various languages, large language models (LLMs) typically possess multilingual support and demonstrate remarkable capabilities in solving tasks described in different languages. However, LLMs can exhibit linguistic discrimination due to the uneven distribution of training data across languages. That is, LLMs are hard to keep the consistency of responses when faced with the s… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  12. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 7 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  13. arXiv:2404.18191  [pdf, other

    cs.CL cs.AI cs.CR cs.LG math.OC

    Exploring the Robustness of In-Context Learning with Noisy Labels

    Authors: Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei

    Abstract: Recently, the mysterious In-Context Learning (ICL) ability exhibited by Transformer architectures, especially in large language models (LLMs), has sparked significant research interest. However, the resilience of Transformers' in-context learning capabilities in the presence of noisy samples, prevalent in both training corpora and prompt demonstrations, remains underexplored. In this paper, inspir… ▽ More

    Submitted 1 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models

  14. arXiv:2404.18166  [pdf, other

    cs.IR

    Behavior-Contextualized Item Preference Modeling for Multi-Behavior Recommendation

    Authors: Mingshi Yan, Fan Liu, Jing Sun, Fuming Sun, Zhiyong Cheng, Yahong Han

    Abstract: In recommender systems, multi-behavior methods have demonstrated their effectiveness in mitigating issues like data sparsity, a common challenge in traditional single-behavior recommendation approaches. These methods typically infer user preferences from various auxiliary behaviors and apply them to the target behavior for recommendations. However, this direct transfer can introduce noise to the t… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by SIGIR 2024

  15. arXiv:2404.17275  [pdf, other

    cs.CV cs.LG

    Adversarial Reweighting with $α$-Power Maximization for Domain Adaptation

    Authors: Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu

    Abstract: The practical Domain Adaptation (DA) tasks, e.g., Partial DA (PDA), open-set DA, universal DA, and test-time adaptation, have gained increasing attention in the machine learning community. In this paper, we propose a novel approach, dubbed Adversarial Reweighting with $α$-Power Maximization (ARPM), for PDA where the source domain contains private classes absent in target domain. In ARPM, we propos… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: To appear in IJCV

  16. arXiv:2404.16887  [pdf, other

    cs.LG cs.AI

    Anomaly Detection for Incident Response at Scale

    Authors: Hanzhang Wang, Gowtham Kumar Tangirala, Gilkara Pranav Naidu, Charles Mayville, Arighna Roy, Joanne Sun, Ramesh Babu Mandava

    Abstract: We present a machine learning-based anomaly detection product, AI Detect and Respond (AIDR), that monitors Walmart's business and system health in real-time. During the validation over 3 months, the product served predictions from over 3000 models to more than 25 application, platform, and operation teams, covering 63\% of major incidents and reducing the mean-time-to-detect (MTTD) by more than 7… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: ASPLOS 2024 AIOps workshop

  17. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  18. arXiv:2404.16561  [pdf

    cs.CV

    Research on geometric figure classification algorithm based on Deep Learning

    Authors: Ruiyang Wang, Haonan Wang, Junfeng Sun, Mingjia Zhao, Meng Liu

    Abstract: In recent years, with the rapid development of computer information technology, the development of artificial intelligence has been accelerating. The traditional geometry recognition technology is relatively backward and the recognition rate is low. In the face of massive information database, the traditional algorithm model inevitably has the problems of low recognition accuracy and poor performa… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 6 pages,9 figures

    Report number: ISSN: 2664-9640

    Journal ref: Scientific Journal of Intelligent Systems Research,Volume 4 Issue 6, 2022

  19. arXiv:2404.16362  [pdf, other

    cs.CR

    Feature graph construction with static features for malware detection

    Authors: Binghui Zou, Chunjie Cao, Longjuan Wang, Yinan Cheng, Jingzhang Sun

    Abstract: Malware can greatly compromise the integrity and trustworthiness of information and is in a constant state of evolution. Existing feature fusion-based detection methods generally overlook the correlation between features. And mere concatenation of features will reduce the model's characterization ability, lead to low detection accuracy. Moreover, these methods are susceptible to concept drift and… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  20. arXiv:2404.15696  [pdf, other

    cs.RO

    Delay-Aware Multi-Agent Reinforcement Learning for Cooperative Adaptive Cruise Control with Model-based Stability Enhancement

    Authors: Jiaqi Liu, Ziran Wang, Peng Hang, Jian Sun

    Abstract: Cooperative Adaptive Cruise Control (CACC) represents a quintessential control strategy for orchestrating vehicular platoon movement within Connected and Automated Vehicle (CAV) systems, significantly enhancing traffic efficiency and reducing energy consumption. In recent years, the data-driven methods, such as reinforcement learning (RL), have been employed to address this task due to their signi… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  21. arXiv:2404.15341  [pdf, other

    eess.SP cs.LG

    Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise

    Authors: Jing-Xiao Liao, Chao He, Jipu Li, Jinwei Sun, Shiping Zhang, Xiaoge Zhang

    Abstract: Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  22. arXiv:2404.14719  [pdf, other

    cs.CR

    Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs

    Authors: Ruitong Liu, Yanbin Wang, Haitao Xu, Bin Liu, Jianguo Sun, Zhenhao Guo, Wenrui Ma

    Abstract: Currently, deep learning successfully applies to code vulnerability detection by learning from code sequences or property graphs. However, sequence-based methods often overlook essential code attributes such as syntax, control flow, and data dependencies, whereas graph-based approaches might underestimate the semantics of code and face challenges in capturing long-distance contextual information.… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures

  23. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  24. arXiv:2404.13896  [pdf, other

    cs.CV

    CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory

    Authors: Yunlong Ran, Yanxu Li, Qi Ye, Yuchi Huo, Zechun Bai, Jiahao Sun, Jiming Chen

    Abstract: Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address t… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  25. arXiv:2404.13752  [pdf, other

    cs.LG cs.AI cs.CL cs.CR math.OC

    Towards General Conceptual Model Editing via Adversarial Representation Engineering

    Authors: Yihao Zhang, Zeming Wei, Jun Sun, Meng Sun

    Abstract: Recent research has introduced Representation Engineering (RepE) as a promising approach for understanding complex inner workings of large-scale models like Large Language Models (LLMs). However, finding practical and efficient methods to apply these representations for general and flexible model editing remains an open problem. Inspired by the Generative Adversarial Network (GAN) framework, we in… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  26. arXiv:2404.13584  [pdf, other

    cs.CV cs.LG

    Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning

    Authors: Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo

    Abstract: Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the q… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by CVIU

  27. arXiv:2404.12355  [pdf, other

    cs.LG math.NA

    Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation

    Authors: Jingmin Sun, Yuxuan Liu, Zecheng Zhang, Hayden Schaeffer

    Abstract: Foundation models, such as large language models, have demonstrated success in addressing various language and image processing tasks. In this work, we introduce a multi-modal foundation model for scientific problems, named PROSE-PDE. Our model, designed for bi-modality to bi-modality learning, is a multi-operator learning approach which can predict future states of spatiotemporal systems while co… ▽ More

    Submitted 19 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  28. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  29. arXiv:2404.11052  [pdf, other

    cs.CV cs.LG

    Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification

    Authors: Mohammad Shiri, Monalika Padma Reddy, Jiangwen Sun

    Abstract: Invasive ductal carcinoma (IDC) is the most prevalent form of breast cancer. Breast tissue histopathological examination is critical in diagnosing and classifying breast cancer. Although existing methods have shown promising results, there is still room for improvement in the classification accuracy and generalization of IDC using histopathology images. We present a novel approach, Supervised Cont… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 8 pages, 7 figures

  30. arXiv:2404.10295  [pdf, other

    cs.RO

    ControlMTR: Control-Guided Motion Transformer with Scene-Compliant Intention Points for Feasible Motion Prediction

    Authors: Jiawei Sun, Chengran Yuan, Shuo Sun, Shanze Wang, Yuhang Han, Shuailei Ma, Zefan Huang, Anthony Wong, Keng Peng Tee, Marcelo H. Ang Jr

    Abstract: The ability to accurately predict feasible multimodal future trajectories of surrounding traffic participants is crucial for behavior planning in autonomous vehicles. The Motion Transformer (MTR), a state-of-the-art motion prediction method, alleviated mode collapse and instability during training and enhanced overall prediction performance by replacing conventional dense future endpoints with a s… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  31. arXiv:2404.10292  [pdf, other

    cs.CV cs.MM

    From Data Deluge to Data Curation: A Filtering-WoRA Paradigm for Efficient Text-based Person Search

    Authors: Jintao Sun, Zhedong Zheng, Gangyi Ding

    Abstract: In text-based person search endeavors, data generation has emerged as a prevailing practice, addressing concerns over privacy preservation and the arduous task of manual annotation. Although the number of synthesized data can be infinite in theory, the scientific conundrum persists that how much generated data optimally fuels subsequent model training. We observe that only a subset of the data in… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  32. arXiv:2404.10218  [pdf, other

    cs.RO cs.AI

    Autonomous Implicit Indoor Scene Reconstruction with Frontier Exploration

    Authors: Jing Zeng, Yanxu Li, Jiahao Sun, Qi Ye, Yunlong Ran, Jiming Chen

    Abstract: Implicit neural representations have demonstrated significant promise for 3D scene reconstruction. Recent works have extended their applications to autonomous implicit reconstruction through the Next Best View (NBV) based method. However, the NBV method cannot guarantee complete scene coverage and often necessitates extensive viewpoint sampling, particularly in complex scenes. In the paper, we pro… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 7 pages

    Journal ref: IEEE International Conference on Robotics and Automation (ICRA 2024)

  33. arXiv:2404.09987  [pdf, other

    cs.CV

    OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

    Authors: Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information. Similar to popular LVLMs, OneChart incorpo… ▽ More

    Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures and 6 tables

  34. arXiv:2404.09939  [pdf, other

    cs.AI

    A Survey on Deep Learning for Theorem Proving

    Authors: Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si

    Abstract: Theorem proving is a fundamental aspect of mathematics, spanning from informal reasoning in mathematical language to rigorous derivations in formal systems. In recent years, the advancement of deep learning, especially the emergence of large language models, has sparked a notable surge of research exploring these techniques to enhance the process of theorem proving. This paper presents a pioneerin… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  35. arXiv:2404.09897  [pdf, other

    cs.AI cs.CL cs.LG

    Progressive Knowledge Graph Completion

    Authors: Jiayi Li, Ruilin Luo, Jiaqi Sun, Jing Xiao, Yujiu Yang

    Abstract: Knowledge Graph Completion (KGC) has emerged as a promising solution to address the issue of incompleteness within Knowledge Graphs (KGs). Traditional KGC research primarily centers on triple classification and link prediction. Nevertheless, we contend that these tasks do not align well with real-world scenarios and merely serve as surrogate benchmarks. In this paper, we investigate three crucial… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 14 pages, 10 figures

  36. arXiv:2404.09412  [pdf, other

    cs.CV

    DeferredGS: Decoupled and Editable Gaussian Splatting with Deferred Shading

    Authors: Tong Wu, Jia-Mu Sun, Yu-Kun Lai, Yuewen Ma, Leif Kobbelt, Lin Gao

    Abstract: Reconstructing and editing 3D objects and scenes both play crucial roles in computer graphics and computer vision. Neural radiance fields (NeRFs) can achieve realistic reconstruction and editing results but suffer from inefficiency in rendering. Gaussian splatting significantly accelerates rendering by rasterizing Gaussian ellipsoids. However, Gaussian splatting utilizes a single Spherical Harmoni… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  37. arXiv:2404.08273  [pdf, other

    cs.CV cs.CR

    Struggle with Adversarial Defense? Try Diffusion

    Authors: Yujie Li, Yanbin Wang, Haitao Xu, Bin Liu, Jianguo Sun, Zhenhao Guo, Wenrui Ma

    Abstract: Adversarial attacks induce misclassification by introducing subtle perturbations. Recently, diffusion models are applied to the image classifiers to improve adversarial robustness through adversarial training or by purifying adversarial noise. However, diffusion-based adversarial training often encounters convergence challenges and high computational expenses. Additionally, diffusion-based purific… ▽ More

    Submitted 18 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  38. arXiv:2404.07992  [pdf, other

    cs.CV

    GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

    Authors: Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang

    Abstract: Matching cost aggregation plays a fundamental role in learning-based multi-view stereo networks. However, directly aggregating adjacent costs can lead to suboptimal results due to local geometric inconsistency. Related methods either seek selective aggregation or improve aggregated depth in the 2D space, both are unable to handle geometric inconsistency in the cost volume effectively. In this pape… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page: https://wuuu3511.github.io/gomvs/ Code: https://github.com/Wuuu3511/GoMVS

  39. arXiv:2404.07950  [pdf, other

    cs.CV cs.AI cs.LG

    Reinforcement Learning with Generalizable Gaussian Splatting

    Authors: Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu

    Abstract: An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. Ho… ▽ More

    Submitted 18 March, 2024; originally announced April 2024.

    Comments: 7 pages,2 figures

  40. arXiv:2404.05689  [pdf, other

    cs.LG cs.AI

    Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

    Authors: Sannyuya Liu, Qing Li, Xiaoxuan Shen, Jianwen Sun, Zongkai Yang

    Abstract: Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and al… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  41. arXiv:2404.02936  [pdf, other

    cs.CL cs.LG

    Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models

    Authors: Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Yang, Hai Li

    Abstract: The problem of pre-training data detection for large language models (LLMs) has received growing attention due to its implications in critical issues like copyright violation and test data contamination. The current state-of-the-art approach, Min-K%, measures the raw token probability which we argue may not be the most informative signal. Instead, we propose Min-K%++ to normalize the token probabi… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Work in progress; project page is available at https://zjysteven.github.io/mink-plus-plus/

  42. arXiv:2403.20213  [pdf, other

    cs.CV

    H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

    Authors: Chao Pang, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Xingxing Weng, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He

    Abstract: The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs. Existing Remote Sensing specific Vision Language Models (RSVLMs) still have considerable potential for improvement, primarily owing to the lack… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Equal contribution: Chao Pang, Jiang Wu; Corresponding author: Gui-Song Xia, Conghui He

  43. arXiv:2403.17516  [pdf, other

    cs.CL cs.AI

    MapGuide: A Simple yet Effective Method to Reconstruct Continuous Language from Brain Activities

    Authors: Xinpei Zhao, Jingyuan Sun, Shaonan Wang, Jing Ye, Xiaohan Zhang, Chengqing Zong

    Abstract: Decoding continuous language from brain activity is a formidable yet promising field of research. It is particularly significant for aiding people with speech disabilities to communicate through brain signals. This field addresses the complex task of mapping brain signals to text. The previous best attempt reverse-engineered this process in an indirect way: it began by learning to encode brain act… ▽ More

    Submitted 2 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 main conference

  44. arXiv:2403.17301  [pdf, other

    cs.CV cs.CR

    Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving

    Authors: Junhao Zheng, Chenhao Lin, Jiahao Sun, Zhengyu Zhao, Qian Li, Chao Shen

    Abstract: Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks. Previous physical attacks against MDE models rely on 2D adversarial patches, so they only affect a small, localized region in the MDE map but fail under various viewpoints. To address these limitations, we propose 3D Depth Fool (3D$^2$Fool), the first 3… ▽ More

    Submitted 27 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  45. arXiv:2403.15448  [pdf, other

    eess.SP cs.LG

    What is Wrong with End-to-End Learning for Phase Retrieval?

    Authors: Wenjie Zhang, Yuxiang Wan, Zhong Zhuang, Ju Sun

    Abstract: For nonlinear inverse problems that are prevalent in imaging science, symmetries in the forward model are common. When data-driven deep learning approaches are used to solve such problems, these intrinsic symmetries can cause substantial learning difficulties. In this paper, we explain how such difficulties arise and, more importantly, how to overcome them by preprocessing the training set before… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  46. arXiv:2403.14983  [pdf, other

    physics.soc-ph cs.SI

    Reconstructing the evolution history of networked complex systems

    Authors: Junya Wang, Yi-Jiao Zhang, Cong Xu, Jiaze Li, Jiachen Sun, Jiarong Xie, Ling Feng, Tianshou Zhou, Yanqing Hu

    Abstract: The evolution processes of complex systems carry key information in the systems' functional properties. Applying machine learning algorithms, we demonstrate that the historical formation process of various networked complex systems can be extracted, including protein-protein interaction, ecology, and social network systems. The recovered evolution process has demonstrations of immense scientific v… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  47. arXiv:2403.14112  [pdf, other

    cs.CL

    Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

    Authors: Jiaxing Sun, Weiquan Huang, Jiang Wu, Chenya Gu, Wei Li, Songyang Zhang, Hang Yan, Conghui He

    Abstract: We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. We evaluated 7 English and 12 Chinese-oriented LLMs on CHARM, employing 5 representative prompt strategies for improving LLMs' reasoning ability, such as Chain-of-Thought.… ▽ More

    Submitted 19 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Equal contribution: Jiaxing Sun, Weiquan Huang, Jiang Wu; Corresponding author: Conghui He

  48. arXiv:2403.13368  [pdf, other

    cs.CL cs.AI

    Computational Models to Study Language Processing in the Human Brain: A Survey

    Authors: Shaonan Wang, Jingyuan Sun, Yunhao Zhang, Nan Lin, Marie-Francine Moens, Chengqing Zong

    Abstract: Despite differing from the human language processing mechanism in implementation and algorithms, current language models demonstrate remarkable human-like or surpassing language capabilities. Should computational language models be employed in studying the brain, and if so, when and how? To delve into this topic, this paper reviews efforts in using computational models for brain research, highligh… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  49. arXiv:2403.12728  [pdf, other

    cs.CV

    Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation

    Authors: Jingtao Sun, Yaonan Wang, Mingtao Feng, Chao Ding, Mike Zheng Shou, Ajmal Saeed Mian

    Abstract: Fully-supervised category-level pose estimation aims to determine the 6-DoF poses of unseen instances from known categories, requiring expensive mannual labeling costs. Recently, various self-supervised category-level pose estimation methods have been proposed to reduce the requirement of the annotated datasets. However, most methods rely on synthetic data or 3D CAD model for self-supervised train… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  50. arXiv:2403.11838  [pdf, other

    cs.CL cs.AI

    Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models

    Authors: Yi Luo, Zhenghao Lin, Yuhao Zhang, Jiashuo Sun, Chen Lin, Chengjin Xu, Xiangdong Su, Yelong Shen, Jian Guo, Yeyun Gong

    Abstract: Large Language Models (LLMs) exhibit impressive capabilities but also present risks such as biased content generation and privacy issues. One of the current alignment techniques includes principle-driven integration, but it faces challenges arising from the imprecision of manually crafted rules and inadequate risk perception in models without safety training. To address these, we introduce Guide-A… ▽ More

    Submitted 23 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 main conference