Skip to main content

Showing 1–50 of 188 results for author: Zheng, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03152  [pdf, other

    eess.AS cs.SD

    MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

    Authors: Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. H… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  2. arXiv:2405.00751  [pdf, other

    q-bio.QM cs.AI cs.LG

    F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching

    Authors: Shaoning Li, Yusong Wang, Mingyu Li, Jian Zhang, Bin Shao, Nanning Zheng, Jian Tang

    Abstract: Molecular dynamics (MD) is a crucial technique for simulating biological systems, enabling the exploration of their dynamic nature and fostering an understanding of their functions and properties. To address exploration inefficiency, emerging enhanced sampling approaches like coarse-graining (CG) and generative models have been employed. In this work, we propose a \underline{Frame-to-Frame} genera… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by ICLR 2024 GEM workshop

  3. arXiv:2404.17683  [pdf, other

    math.OC cs.GT cs.LG eess.SY

    Energy Storage Arbitrage in Two-settlement Markets: A Transformer-Based Approach

    Authors: Saud Alghumayjan, Jiajun Han, Ningkun Zheng, Ming Yi, Bolun Xu

    Abstract: This paper presents an integrated model for bidding energy storage in day-ahead and real-time markets to maximize profits. We show that in integrated two-stage bidding, the real-time bids are independent of day-ahead settlements, while the day-ahead bids should be based on predicted real-time prices. We utilize a transformer-based model for real-time price prediction, which captures complex dynami… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  4. arXiv:2404.16811  [pdf, other

    cs.CL cs.AI

    Make Your LLM Fully Utilize the Context

    Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou

    Abstract: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on t… ▽ More

    Submitted 26 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures, 3 tables, 9 examples

  5. arXiv:2404.12804  [pdf, other

    cs.CV eess.IV

    Linearly-evolved Transformer for Pan-sharpening

    Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

    Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages

  6. arXiv:2404.10499  [pdf, other

    cs.CV cs.AI

    Robust Noisy Label Learning via Two-Stream Sample Distillation

    Authors: Sihan Bai, Sanping Zhou, Zheng Qin, Le Wang, Nanning Zheng

    Abstract: Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning. Existing work either conducts sample selection or label correction to deal with noisy labels during the model training process. In this paper, we design a simple yet effective sample selection framework, termed Two-Stream Sample Distillation (TSSD), for noisy labe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  7. arXiv:2404.10210  [pdf, other

    cs.CV

    MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition

    Authors: Naichuan Zheng, Hailun Xia, Zeyu Liang

    Abstract: In recent years, skeleton-based action recognition, leveraging multimodal Graph Convolutional Networks (GCN), has achieved remarkable results. However, due to their deep structure and reliance on continuous floating-point operations, GCN-based methods are energy-intensive. To address this issue, we propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Disti… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  8. arXiv:2404.04880  [pdf, other

    cs.CV

    GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF

    Authors: Butian Xiong, Nanjun Zheng, Junhua Liu, Zhen Li

    Abstract: We introduce a novel, multimodal large-scale scene reconstruction benchmark that utilizes newly developed 3D representation approaches: Gaussian Splatting and Neural Radiance Fields (NeRF). Our expansive U-Scene dataset surpasses any previously existing real large-scale outdoor LiDAR and image dataset in both area and point count. GauU-Scene encompasses over 6.5 square kilometers and features a co… ▽ More

    Submitted 13 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

  9. arXiv:2404.02187  [pdf

    cs.LG cs.AI

    A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data

    Authors: Junlan Chen, Ziyuan Pu, Nan Zheng, Xiao Wen, Hongliang Ding, Xiucheng Guo

    Abstract: Crash data is often greatly imbalanced, with the majority of crashes being non-fatal crashes, and only a small number being fatal crashes due to their rarity. Such data imbalance issue poses a challenge for crash severity modeling since it struggles to fit and interpret fatal crash outcomes with very limited samples. Usually, such data imbalance issues are addressed by data resampling methods, suc… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  10. arXiv:2403.09560  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    Self-Consistency Training for Hamiltonian Prediction

    Authors: He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng, Bin Shao, Tie-Yan Liu

    Abstract: Hamiltonian prediction is a versatile formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose an exact training method that does not require labeled data. This merit addresses t… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  11. arXiv:2403.06361  [pdf, other

    cs.CV cs.HC

    See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI

    Authors: Yulong Liu, Yongqiang Ma, Guibo Zhu, Haodong Jing, Nanning Zheng

    Abstract: Deciphering visual content from functional Magnetic Resonance Imaging (fMRI) helps illuminate the human vision system. However, the scarcity of fMRI data and noise hamper brain decoding model performance. Previous approaches primarily employ subject-specific models, sensitive to training sample size. In this paper, we explore a straightforward but overlooked solution to address data scarcity. We p… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: A versatile brain decoding method learning from cross-subject fMRI data

  12. arXiv:2403.06352  [pdf

    cs.CV

    Exploring Hardware Friendly Bottleneck Architecture in CNN for Embedded Computing Systems

    Authors: Xing Lei, Longjun Liu, Zhiheng Zhou, Hongbin Sun, Nanning Zheng

    Abstract: In this paper, we explore how to design lightweight CNN architecture for embedded computing systems. We propose L-Mobilenet model for ZYNQ based hardware platform. L-Mobilenet can adapt well to the hardware computing and accelerating, and its network structure is inspired by the state-of-the-art work of Inception-ResnetV1 and MobilenetV2, which can effectively reduce parameters and delay while mai… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  13. arXiv:2403.04706  [pdf, other

    cs.CL cs.AI

    Common 7B Language Models Already Possess Strong Math Capabilities

    Authors: Chen Li, Weiqi Wang, Jingcheng Hu, Yixuan Wei, Nanning Zheng, Han Hu, Zheng Zhang, Houwen Peng

    Abstract: Mathematical capabilities were previously believed to emerge in common language models only at a very large scale or require extensive math-related pre-training. This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities, as evidenced by its impressive accuracy of 97.7% and 72.0% on the GSM8K and MATH benchmarks, respectively, when selecting… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  14. arXiv:2403.01978  [pdf, other

    cs.CV

    Leveraging Anchor-based LiDAR 3D Object Detection via Point Assisted Sample Selection

    Authors: Shitao Chen, Haolin Zhang, Nanning Zheng

    Abstract: 3D object detection based on LiDAR point cloud and prior anchor boxes is a critical technology for autonomous driving environment perception and understanding. Nevertheless, an overlooked practical issue in existing methods is the ambiguity in training sample allocation based on box Intersection over Union (IoU_box). This problem impedes further enhancements in the performance of anchor-based LiDA… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  15. arXiv:2403.01713  [pdf, other

    cs.CV

    MCA: Moment Channel Attention Networks

    Authors: Yangbo Jiang, Zhiwei Jiang, Le Han, Zenan Huang, Nenggan Zheng

    Abstract: Channel attention mechanisms endeavor to recalibrate channel weights to enhance representation abilities of networks. However, mainstream methods often rely solely on global average pooling as the feature squeezer, which significantly limits the overall potential of models. In this paper, we investigate the statistical moments of feature maps within a neural network. Our findings highlight the cri… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  16. arXiv:2402.18157  [pdf, other

    cs.AI cs.CL cs.CV

    From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs

    Authors: Yulong Liu, Yunlong Yuan, Chunwei Wang, Jianhua Han, Yongqiang Ma, Li Zhang, Nanning Zheng, Hang Xu

    Abstract: The distinction between humans and animals lies in the unique ability of humans to use and create tools. Tools empower humans to overcome physiological limitations, fostering the creation of magnificent civilizations. Similarly, enabling foundational models like Large Language Models (LLMs) with the capacity to learn external tool usage may serve as a pivotal step toward realizing artificial gener… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  17. arXiv:2402.17179  [pdf, other

    cs.LG q-bio.BM

    Dual-Space Optimization: Improved Molecule Sequence Design by Latent Prompt Transformer

    Authors: Deqian Kong, Yuhao Huang, Jianwen Xie, Edouardo Honig, Ming Xu, Shuanghong Xue, Pei Lin, Sanping Zhou, Sheng Zhong, Nanning Zheng, Ying Nian Wu

    Abstract: Designing molecules with desirable properties, such as drug-likeliness and high binding affinities towards protein targets, is a challenging problem. In this paper, we propose the Dual-Space Optimization (DSO) method that integrates latent space sampling and data space selection to solve this problem. DSO iteratively updates a latent space generative model and a synthetic dataset in an optimizatio… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  18. arXiv:2402.09712  [pdf, other

    cs.CV cs.AI

    Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

    Authors: Tao Yang, Cuiling Lan, Yan Lu, Nanning zheng

    Abstract: Disentangled representation learning strives to extract the intrinsic factors within observed data. Factorizing these representations in an unsupervised manner is notably challenging and usually requires tailored loss functions or specific structural designs. In this paper, we introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerfu… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  19. DeepBranchTracer: A Generally-Applicable Approach to Curvilinear Structure Reconstruction Using Multi-Feature Learning

    Authors: Chao Liu, Ting Zhao, Nenggan Zheng

    Abstract: Curvilinear structures, which include line-like continuous objects, are fundamental geometrical elements in image-based applications. Reconstructing these structures from images constitutes a pivotal research area in computer vision. However, the complex topology and ambiguous image evidence render this process a challenging task. In this paper, we introduce DeepBranchTracer, a novel method that l… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 10 pages, 6 figures, AAAI 2024 accepted

  20. arXiv:2401.08613  [pdf, other

    cs.NI

    Digital Infrastructure for Connected and Automated Vehicles

    Authors: Quang-Hung Luu, Thai M. Nguyen, Nan Zheng, Hai L. Vu

    Abstract: Connected and automated vehicles (CAV) are expected to deliver a much safer, more efficient, and eco-friendlier mobility. Being an indispensable component of the future transportation, their key driving features of CAVs include not only the automated functionality but also the cooperative capability. Despite the CAVs themselves are emerging and active research areas, there is a lack of a comprehen… ▽ More

    Submitted 30 November, 2023; originally announced January 2024.

    Comments: 24 pages, 2 figures, 1 table

  21. arXiv:2401.07041  [pdf, other

    eess.IV cs.CV

    An automated framework for brain vessel centerline extraction from CTA images

    Authors: Sijie Liu, Ruisheng Su, Jianghang Su, Jingmin Xin, Jiayi Wu, Wim van Zwam, Pieter Jan van Doormaal, Aad van der Lugt, Wiro J. Niessen, Nanning Zheng, Theo van Walsum

    Abstract: Accurate automated extraction of brain vessel centerlines from CTA images plays an important role in diagnosis and therapy of cerebrovascular diseases, such as stroke. However, this task remains challenging due to the complex cerebrovascular structure, the varying imaging quality, and vessel pathology effects. In this paper, we consider automatic lumen segmentation generation without additional an… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  22. arXiv:2312.12648  [pdf, other

    cs.LG cs.CV

    IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate Importance

    Authors: Hongyi He, Longjun Liu, Haonan Zhang, Nanning Zheng

    Abstract: Among existing Neural Architecture Search methods, DARTS is known for its efficiency and simplicity. This approach applies continuous relaxation of network representation to construct a weight-sharing supernet and enables the identification of excellent subnets in just a few GPU days. However, performance collapse in DARTS results in deteriorating architectures filled with parameter-free operation… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: accepted by AAAI2024, paper + supplementary, 11 pages

  23. arXiv:2312.10210  [pdf, other

    cs.CL

    VK-G2T: Vision and Context Knowledge enhanced Gloss2Text

    Authors: Liqiang Jing, Xuemeng Song, Xinxing Zu, Na Zheng, Zhongzhou Zhao, Liqiang Nie

    Abstract: Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i.e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i.e. Gloss2Text). While previous studies have focused on boosting the performance of the Sign2Gloss stage, we emphasize the optimization of the Gloss2Text stage. Howe… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  24. arXiv:2312.07088  [pdf, other

    cs.CL cs.AI

    BED: Bi-Encoder-Decoder Model for Canonical Relation Extraction

    Authors: Nantao Zheng, Siyu Long, Xinyu Dai

    Abstract: Canonical relation extraction aims to extract relational triples from sentences, where the triple elements (entity pairs and their relationship) are mapped to the knowledge base. Recently, methods based on the encoder-decoder architecture are proposed and achieve promising results. However, these methods cannot well utilize the entity information, which is merely used as augmented training data. M… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  25. arXiv:2312.02237  [pdf, other

    cs.CV

    Singular Regularization with Information Bottleneck Improves Model's Adversarial Robustness

    Authors: Guanlin Li, Naishan Zheng, Man Zhou, Jie Zhang, Tianwei Zhang

    Abstract: Adversarial examples are one of the most severe threats to deep learning models. Numerous works have been proposed to study and defend adversarial examples. However, these works lack analysis of adversarial information or perturbation, which cannot reveal the mystery of adversarial examples and lose proper interpretation. In this paper, we aim to fill this gap by studying adversarial information a… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  26. arXiv:2311.10382  [pdf, other

    cs.CV

    Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking

    Authors: Yizhe Li, Sanping Zhou, Zheng Qin, Le Wang, Jinjun Wang, Nanning Zheng

    Abstract: Multi-Object Tracking (MOT) remains a vital component of intelligent video analysis, which aims to locate targets and maintain a consistent identity for each target throughout a video sequence. Existing works usually learn a discriminative feature representation, such as motion and appearance, to associate the detections across frames, which are easily affected by mutual occlusion and background c… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  27. arXiv:2311.04512  [pdf, other

    cs.CV cs.AI

    FFINet: Future Feedback Interaction Network for Motion Forecasting

    Authors: Miao Kang, Shengqi Wang, Sanping Zhou, Ke Ye, Jingjing Jiang, Nanning Zheng

    Abstract: Motion forecasting plays a crucial role in autonomous driving, with the aim of predicting the future reasonable motions of traffic agents. Most existing methods mainly model the historical interactions between agents and the environment, and predict multi-modal trajectories in a feedforward process, ignoring potential trajectory changes caused by future interactions between agents. In this paper,… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 11 pages, 8 figures, 12 tables

  28. arXiv:2310.20689  [pdf, other

    cs.CL cs.AI

    Learning From Mistakes Makes LLM Better Reasoner

    Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen

    Abstract: Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve their reasoning capabilities, this work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this… ▽ More

    Submitted 29 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 23 pages, 13 figures, 6 tables

  29. arXiv:2310.17998  [pdf, ps, other

    cs.LG math.OC

    Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity

    Authors: Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, Wei Chen

    Abstract: Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam's convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Ad… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 Accept

  30. arXiv:2310.15422  [pdf

    cs.CV

    G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data

    Authors: Haotian Wang, Meng Yang, Nanning Zheng

    Abstract: Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 18 pages, 16 figures

  31. arXiv:2310.05374  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

    Authors: Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

    Abstract: Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech proces… ▽ More

    Submitted 24 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: 15 pages, 8 figures, 8 tables, Accepted to EMNLP 2023 Findings

  32. arXiv:2310.05056  [pdf, other

    cs.CV

    Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

    Authors: Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang

    Abstract: Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches. The former typically relies on laborious and time-consuming manual annotations, posing considerable challenges in expanding keypoint detection to a broader range of keypoint categories and animal species. The latter, though… ▽ More

    Submitted 11 December, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  33. arXiv:2310.02648  [pdf, other

    cs.RO

    Long-Term Dynamic Window Approach for Kinodynamic Local Planning in Static and Crowd Environments

    Authors: Zhiqiang Jian, Songyi Zhang, Lingfeng Sun, Wei Zhan, Nanning Zheng, Masayoshi Tomizuka

    Abstract: Local planning for a differential wheeled robot is designed to generate kinodynamic feasible actions that guide the robot to a goal position along the navigation path while avoiding obstacles. Reactive, predictive, and learning-based methods are widely used in local planning. However, few of them can fit static and crowd environments while satisfying kinodynamic constraints simultaneously. To solv… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 9 pages, 7 figures

    Journal ref: 2023 IEEE RA-L

  34. arXiv:2310.02629  [pdf, other

    cs.SD eess.AS

    BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

    Authors: Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these dr… ▽ More

    Submitted 7 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  35. arXiv:2310.02625  [pdf, other

    cs.RO

    Adaptive Spatio-Temporal Voxels Based Trajectory Planning for Autonomous Driving in Highway Traffic Flow

    Authors: Zhiqiang Jian, Songyi Zhang, Lingfeng Sun, Wei Zhan, Masayoshi Tomizuka, Nanning Zheng

    Abstract: Trajectory planning is crucial for the safe driving of autonomous vehicles in highway traffic flow. Currently, some advanced trajectory planning methods utilize spatio-temporal voxels to construct feasible regions and then convert trajectory planning into optimization problem solving based on the feasible regions. However, these feasible region construction methods cannot adapt to the changes in d… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 8 pages, 5 figures

    Journal ref: IEEE ITSC 2023

  36. arXiv:2309.16578  [pdf, other

    stat.ML cs.LG physics.chem-ph

    Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

    Authors: He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao

    Abstract: Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT… ▽ More

    Submitted 9 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Published in Nature Computational Science, March 2024. Full paper with supplementary information

  37. arXiv:2309.06054  [pdf, other

    cs.LG cs.CL cs.CV

    Breaking through the learning plateaus of in-context learning in Transformer

    Authors: Jingwen Fu, Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng

    Abstract: In-context learning, i.e., learning from context examples, is an impressive ability of Transformer. Training Transformers to possess this in-context learning skill is computationally intensive due to the occurrence of learning plateaus, which are periods within the training process where there is minimal or no enhancement in the model's in-context learning capability. To study the mechanism behind… ▽ More

    Submitted 29 January, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

  38. arXiv:2309.05077  [pdf, ps, other

    cs.LG stat.ML

    Generalization error bounds for iterative learning algorithms with bounded updates

    Authors: Jingwen Fu, Nanning Zheng

    Abstract: This paper explores the generalization characteristics of iterative learning algorithms with bounded updates for non-convex loss functions, employing information-theoretic techniques. Our key contribution is a novel bound for the generalization error of these algorithms with bounded updates. Our approach introduces two main novelties: 1) we reformulate the mutual information as the uncertainty of… ▽ More

    Submitted 14 October, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

  39. arXiv:2309.03475  [pdf, other

    cs.RO cs.AI

    InteractionNet: Joint Planning and Prediction for Autonomous Driving with Transformers

    Authors: Jiawei Fu, Yanqing Shen, Zhiqiang Jian, Shitao Chen, Jingmin Xin, Nanning Zheng

    Abstract: Planning and prediction are two important modules of autonomous driving and have experienced tremendous advancement recently. Nevertheless, most existing methods regard planning and prediction as independent and ignore the correlation between them, leading to the lack of consideration for interaction and dynamic changes of traffic scenarios. To address this challenge, we propose InteractionNet, wh… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted to IROS 2023

  40. arXiv:2309.02752  [pdf, other

    cs.LG cs.AI cs.CR

    SWAP: Exploiting Second-Ranked Logits for Adversarial Attacks on Time Series

    Authors: Chang George Dong, Liangwei Nathan Zheng, Weitong Chen, Wei Emma Zhang, Lin Yue

    Abstract: Time series classification (TSC) has emerged as a critical task in various domains, and deep neural models have shown superior performance in TSC tasks. However, these models are vulnerable to adversarial attacks, where subtle perturbations can significantly impact the prediction results. Existing adversarial methods often suffer from over-parameterization or random logit perturbation, hindering t… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 10 pages, 8 figures

    ACM Class: I.2.0

  41. arXiv:2309.01958  [pdf, other

    cs.CV eess.IV

    Empowering Low-Light Image Enhancer through Customized Learnable Priors

    Authors: Naishan Zheng, Man Zhou, Yanmeng Dong, Xiangyu Rui, Jie Huang, Chongyi Li, Feng Zhao

    Abstract: Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issu… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  42. arXiv:2308.16083  [pdf, other

    cs.CV eess.IV

    Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-Spectral Image Fusion

    Authors: Man Zhou, Jie Huang, Naishan Zheng, Chongyi Li

    Abstract: The success of deep neural networks for pan-sharpening is commonly in a form of black box, lacking transparency and interpretability. To alleviate this issue, we propose a novel model-driven deep unfolding framework with image reasoning prior tailored for the pan-sharpening task. Different from existing unfolding solutions that deliver the proximal operator networks as the uncertain and vague prio… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 10 pages; Accepted by ICCV 2023

  43. arXiv:2308.15427  [pdf, other

    cs.CV cs.AI

    Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction

    Authors: Wenjie Gao, Jiawei Fu, Yanqing Shen, Haodong Jing, Shitao Chen, Nanning Zheng

    Abstract: High-definition (HD) maps play a crucial role in autonomous driving systems. Recent methods have attempted to construct HD maps in real-time using vehicle onboard sensors. Due to the inherent limitations of onboard sensors, which include sensitivity to detection range and susceptibility to occlusion by nearby vehicles, the performance of these methods significantly declines in complex scenarios an… ▽ More

    Submitted 29 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted by ICRA 2024

  44. arXiv:2308.04409  [pdf, other

    cs.CV

    V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

    Authors: Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo

    Abstract: We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  45. arXiv:2308.01904  [pdf, other

    cs.CV

    DETR Doesn't Need Multi-Scale or Locality Design

    Authors: Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

    Abstract: This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder. We show that two simple technologies are surprisingly effectiv… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: To be published in ICCV2023

  46. arXiv:2307.14624  [pdf, other

    cs.CV

    FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene

    Authors: Chengrui Wei, Meng Yang, Lei He, Nanning Zheng

    Abstract: It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes. We observe that it is essentially due to not only the scale-ambiguous problem but also the focal-ambiguous problem that decreases the generalization ability of monocular depth estimation. That is, images may be captured by cameras of different focal lengths in scenes of different… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  47. arXiv:2307.09155  [pdf, other

    cs.CV

    MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

    Authors: Zewei Lin, Yanqing Shen, Sanping Zhou, Shitao Chen, Nanning Zheng

    Abstract: In this paper, we propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection, which integrates both the feature-level fusion and decision-level fusion to fully utilize the information in the image. For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel fea… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  48. arXiv:2307.02486  [pdf, other

    cs.CL cs.LG

    LongNet: Scaling Transformers to 1,000,000,000 Tokens

    Authors: Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei

    Abstract: Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. To address this issue, we introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance… ▽ More

    Submitted 19 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Work in progress

  49. arXiv:2306.09000  [pdf, other

    cs.LG cs.CV

    When and Why Momentum Accelerates SGD:An Empirical Study

    Authors: Jingwen Fu, Bohan Wang, Huishuai Zhang, Zhizheng Zhang, Wei Chen, Nanning Zheng

    Abstract: Momentum has become a crucial component in deep learning optimizers, necessitating a comprehensive understanding of when and why it accelerates stochastic gradient descent (SGD). To address the question of ''when'', we establish a meaningful comparison framework that examines the performance of SGD with Momentum (SGDM) under the \emph{effective learning rates} $η_{ef}$, a notion unifying the influ… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  50. arXiv:2306.03763  [pdf, other

    q-fin.ST cs.AI cs.CL cs.LG q-fin.CP

    ChatGPT Informed Graph Neural Network for Stock Movement Prediction

    Authors: Zihan Chen, Lei Nico Zheng, Cheng Lu, Jialu Yuan, Di Zhu

    Abstract: ChatGPT has demonstrated remarkable capabilities across various natural language processing (NLP) tasks. However, its potential for inferring dynamic network structures from temporal textual data, specifically financial news, remains an unexplored frontier. In this research, we introduce a novel framework that leverages ChatGPT's graph inference capabilities to enhance Graph Neural Networks (GNN).… ▽ More

    Submitted 18 September, 2023; v1 submitted 28 May, 2023; originally announced June 2023.

    Comments: Dataset is available at [https://github.com/ZihanChen1995/ChatGPT-GNN-StockPredict]. Accepted for the oral presentation at SIGKDD 2023 Workshop on Robust NLP for Finance

    ACM Class: I.2.7; J.1