Search | arXiv e-print repository

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

Authors: Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Abstract: Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem. These discriminative methods focus on learning the mapping between the inputs and occupancy map in a single step, lacking the ability to gradually refine the occupancy map and the reasonable scene imaginative capacity to complete the local regions somewhere.… ▽ More Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem. These discriminative methods focus on learning the mapping between the inputs and occupancy map in a single step, lacking the ability to gradually refine the occupancy map and the reasonable scene imaginative capacity to complete the local regions somewhere. In this paper, we introduce OccGen, a simple yet powerful generative perception model for the task of 3D semantic occupancy prediction. OccGen adopts a ''noise-to-occupancy'' generative paradigm, progressively inferring and refining the occupancy map by predicting and eliminating noise originating from a random Gaussian distribution. OccGen consists of two main components: a conditional encoder that is capable of processing multi-modal inputs, and a progressive refinement decoder that applies diffusion denoising using the multi-modal features as conditions. A key insight of this generative pipeline is that the diffusion denoising process is naturally able to model the coarse-to-fine refinement of the dense 3D occupancy map, therefore producing more detailed predictions. Extensive experiments on several occupancy benchmarks demonstrate the effectiveness of the proposed method compared to the state-of-the-art methods. For instance, OccGen relatively enhances the mIoU by 9.5%, 6.3%, and 13.3% on nuScenes-Occupancy dataset under the muli-modal, LiDAR-only, and camera-only settings, respectively. Moreover, as a generative perception model, OccGen exhibits desirable properties that discriminative models cannot achieve, such as providing uncertainty estimates alongside its multiple-step predictions. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.13026 [pdf, other]

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Authors: Tianyuan Zhang, Hong-Xing Yu, Rundi Wu, Brandon Y. Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, William T. Freeman

Abstract: Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these… ▽ More Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant challenge. Unlike unconditional or text-conditioned dynamics generation, action-conditioned dynamics requires perceiving the physical material properties of objects and grounding the 3D motion prediction on these properties, such as object stiffness. However, estimating physical material properties is an open problem due to the lack of material ground-truth data, as measuring these properties for real objects is highly difficult. We present PhysDreamer, a physics-based approach that endows static 3D objects with interactive dynamics by leveraging the object dynamics priors learned by video generation models. By distilling these priors, PhysDreamer enables the synthesis of realistic object responses to novel interactions, such as external forces or agent manipulations. We demonstrate our approach on diverse examples of elastic objects and evaluate the realism of the synthesized interactions through a user study. PhysDreamer takes a step towards more engaging and realistic virtual experiences by enabling static 3D objects to dynamically respond to interactive stimuli in a physically plausible manner. See our project page at https://physdreamer.github.io/. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Project website at: https://physdreamer.github.io/

arXiv:2404.09734 [pdf, other]

Weighted Sum-Rate Maximization for Movable Antenna-Enhanced Wireless Networks

Authors: Biqian Feng, Yongpeng Wu, Xiang-Gen Xia, Chengshan Xiao

Abstract: This letter investigates the weighted sum rate maximization problem in movable antenna (MA)-enhanced systems. To reduce the computational complexity, we transform it into a more tractable weighted minimum mean square error (WMMSE) problem well-suited for MA. We then adopt the WMMSE algorithm and majorization-minimization algorithm to optimize the beamforming and antenna positions, respectively. Mo… ▽ More This letter investigates the weighted sum rate maximization problem in movable antenna (MA)-enhanced systems. To reduce the computational complexity, we transform it into a more tractable weighted minimum mean square error (WMMSE) problem well-suited for MA. We then adopt the WMMSE algorithm and majorization-minimization algorithm to optimize the beamforming and antenna positions, respectively. Moreover, we propose a planar movement mode, which constrains each MA to a specified area, we obtain a low-complexity closed-form solution. Numerical results demonstrate that the MA-enhanced system outperforms the conventional system. Besides, the computation time for the planar movement mode is reduced by approximately 30\% at a little performance expense. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE Wireless Communications Letters

arXiv:2404.09502 [pdf, other]

SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

Authors: Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

Abstract: Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied. However, operating on dense latent spaces introduces a cubic time and space complexity, which limits scalability in terms of perception range or spatial resolution. Existing approaches compress the dense representation using… ▽ More Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied. However, operating on dense latent spaces introduces a cubic time and space complexity, which limits scalability in terms of perception range or spatial resolution. Existing approaches compress the dense representation using projections like Bird's Eye View (BEV) or Tri-Perspective View (TPV). Although efficient, these projections result in information loss, especially for tasks like semantic occupancy prediction. To address this, we propose SparseOcc, an efficient occupancy network inspired by sparse point cloud processing. It utilizes a lossless sparse latent representation with three key innovations. Firstly, a 3D sparse diffuser performs latent completion using spatially decomposed 3D sparse convolutional kernels. Secondly, a feature pyramid and sparse interpolation enhance scales with information from others. Finally, the transformer head is redesigned as a sparse variant. SparseOcc achieves a remarkable 74.9% reduction on FLOPs over the dense baseline. Interestingly, it also improves accuracy, from 12.8% to 14.1% mIOU, which in part can be attributed to the sparse representation's ability to avoid hallucinations on empty voxels. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 10 pages, 4 figures, accepted by CVPR 2024

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024)

arXiv:2404.07985 [pdf, other]

WaveMo: Learning Wavefront Modulations to See Through Scattering

Authors: Mingyang Xie, Haiyun Guo, Brandon Y. Feng, Lingbo Jin, Ashok Veeraraghavan, Christopher A. Metzler

Abstract: Imaging through scattering media is a fundamental and pervasive challenge in fields ranging from medical diagnostics to astronomy. A promising strategy to overcome this challenge is wavefront modulation, which induces measurement diversity during image acquisition. Despite its importance, designing optimal wavefront modulations to image through scattering remains under-explored. This paper introdu… ▽ More Imaging through scattering media is a fundamental and pervasive challenge in fields ranging from medical diagnostics to astronomy. A promising strategy to overcome this challenge is wavefront modulation, which induces measurement diversity during image acquisition. Despite its importance, designing optimal wavefront modulations to image through scattering remains under-explored. This paper introduces a novel learning-based framework to address the gap. Our approach jointly optimizes wavefront modulations and a computationally lightweight feedforward "proxy" reconstruction network. This network is trained to recover scenes obscured by scattering, using measurements that are modified by these modulations. The learned modulations produced by our framework generalize effectively to unseen scattering scenarios and exhibit remarkable versatility. During deployment, the learned modulations can be decoupled from the proxy network to augment other more computationally expensive restoration algorithms. Through extensive experiments, we demonstrate our approach significantly advances the state of the art in imaging through scattering media. Our project webpage is at https://wavemo-2024.github.io/. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.00471 [pdf, other]

doi 10.1109/ICASSP48485.2024.10447579

Score-Based Diffusion Models for Photoacoustic Tomography Image Reconstruction

Authors: Sreemanti Dey, Snigdha Saha, Berthy T. Feng, Manxiu Cui, Laure Delisle, Oscar Leong, Lihong V. Wang, Katherine L. Bouman

Abstract: Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use… ▽ More Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use score-based diffusion models to solve the inverse problem of reconstructing an image from limited PAT measurements. The proposed approach allows us to incorporate an expressive prior learned by a diffusion model on simulated vessel structures while still being robust to varying transducer sparsity conditions. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 5 pages

Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 2470-2474

arXiv:2403.16095 [pdf, other]

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

Authors: Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui

Abstract: Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system… ▽ More Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, i.e., CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz. We will make our source code publicly available. Project page: https://zju3dv.github.io/cg-slam. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Project Page: https://zju3dv.github.io/cg-slam

arXiv:2403.13800 [pdf, other]

TimeRewind: Rewinding Time with Image-and-Events Video Diffusion

Authors: Jingxi Chen, Brandon Y. Feng, Haoming Cai, Mingyang Xie, Christopher Metzler, Cornelia Fermuller, Yiannis Aloimonos

Abstract: This paper addresses the novel challenge of ``rewinding'' time from a single captured image to recover the fleeting moments missed just before the shutter button is pressed. This problem poses a significant challenge in computer vision and computational photography, as it requires predicting plausible pre-capture motion from a single static frame, an inherently ill-posed task due to the high degre… ▽ More This paper addresses the novel challenge of ``rewinding'' time from a single captured image to recover the fleeting moments missed just before the shutter button is pressed. This problem poses a significant challenge in computer vision and computational photography, as it requires predicting plausible pre-capture motion from a single static frame, an inherently ill-posed task due to the high degree of freedom in potential pixel movements. We overcome this challenge by leveraging the emerging technology of neuromorphic event cameras, which capture motion information with high temporal resolution, and integrating this data with advanced image-to-video diffusion models. Our proposed framework introduces an event motion adaptor conditioned on event camera data, guiding the diffusion model to generate videos that are visually coherent and physically grounded in the captured events. Through extensive experimentation, we demonstrate the capability of our approach to synthesize high-quality videos that effectively ``rewind'' time, showcasing the potential of combining event camera technology with generative models. Our work opens new avenues for research at the intersection of computer vision, computational photography, and generative modeling, offering a forward-thinking solution to capturing missed moments and enhancing future consumer cameras and smartphones. Please see the project page at https://timerewind.github.io/ for video results and code release. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.11050 [pdf, other]

Endora: Video Generation Models as Endoscopy Simulators

Authors: Chenxin Li, Hengyu Liu, Yifan Liu, Brandon Y. Feng, Wuyang Li, Xinyu Liu, Zhen Chen, Jing Shao, Yixuan Yuan

Abstract: Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped.This paper introduces \model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We present a… ▽ More Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped.This paper introduces \model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We present a novel generative model design that integrates a meticulously crafted spatial-temporal video transformer with advanced 2D vision foundation model priors, explicitly modeling spatial-temporal dynamics during video generation. We also pioneer the first public benchmark for endoscopy simulation with video generation models, adapting existing state-of-the-art methods for this endeavor.Endora demonstrates exceptional visual quality in generating endoscopy videos, surpassing state-of-the-art methods in extensive testing. Moreover, we explore how this endoscopy simulator can empower downstream video analysis tasks and even generate 3D medical scenes with multi-view consistency. In a nutshell, Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research, setting a substantial stage for further advances in medical content generation. For more details, please visit our project page: https://endora-medvidgen.github.io/. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Project page: https://endora-medvidgen.github.io/

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1320 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 2 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.04679 [pdf, other]

ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations

Authors: Haoming Cai, Jingxi Chen, Brandon Y. Feng, Weiyun Jiang, Mingyang Xie, Kevin Zhang, Ashok Veeraraghavan, Christopher Metzler

Abstract: tmospheric turbulence presents a significant challenge in long-range imaging. Current restoration algorithms often struggle with temporal inconsistency, as well as limited generalization ability across varying turbulence levels and scene content different than the training data. To tackle these issues, we introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT)… ▽ More tmospheric turbulence presents a significant challenge in long-range imaging. Current restoration algorithms often struggle with temporal inconsistency, as well as limited generalization ability across varying turbulence levels and scene content different than the training data. To tackle these issues, we introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT) a test-time optimization method featuring a neural video representation designed to enhance temporal consistency in restoration. A key innovation of ConVRT is the integration of a pretrained vision-language model (CLIP) for semantic-oriented supervision, which steers the restoration towards sharp, photorealistic images in the CLIP latent space. We further develop a principled selection strategy of text prompts, based on their statistical correlation with a perceptual metric. ConVRT's test-time optimization allows it to adapt to a wide range of real-world turbulence conditions, effectively leveraging the insights gained from pre-trained models on simulated data. ConVRT offers a comprehensive and effective solution for mitigating real-world turbulence in dynamic videos. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: https://convrt-2024.github.io/

arXiv:2312.03788 [pdf, other]

SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM

Authors: Jiayi Pan, Chengcan Wang, Kaifu Zheng, Yangguang Li, Zhenyu Wang, Bin Feng

Abstract: Large language models (LLMs) have shown remarkable capabilities in various tasks. However their huge model size and the consequent demand for computational and memory resources also pose challenges to model deployment. Currently, 4-bit post-training quantization (PTQ) has achieved some success in LLMs, reducing the memory footprint by approximately 75% compared to FP16 models, albeit with some acc… ▽ More Large language models (LLMs) have shown remarkable capabilities in various tasks. However their huge model size and the consequent demand for computational and memory resources also pose challenges to model deployment. Currently, 4-bit post-training quantization (PTQ) has achieved some success in LLMs, reducing the memory footprint by approximately 75% compared to FP16 models, albeit with some accuracy loss. In this paper, we propose SmoothQuant+, an accurate and efficient 4-bit weight-only PTQ that requires no additional training, which enables lossless in accuracy for LLMs for the first time. Based on the fact that the loss of weight quantization is amplified by the activation outliers, SmoothQuant+ smoothes the activation outliers by channel before quantization, while adjusting the corresponding weights for mathematical equivalence, and then performs group-wise 4-bit weight quantization for linear layers. We have integrated SmoothQuant+ into the vLLM framework, an advanced high-throughput inference engine specially developed for LLMs, and equipped it with an efficient W4A16 CUDA kernels, so that vLLM can seamlessly support SmoothQuant+ 4-bit weight quantization. Our results show that, with SmoothQuant+, the Code Llama-34B model can be quantized and deployed on a A100 40GB GPU, achieving lossless accuracy and a throughput increase of 1.9 to 4.0 times compared to the FP16 model deployed on two A100 40GB GPUs. Moreover, the latency per token is only 68% of the FP16 model deployed on two A100 40GB GPUs. This is the state-of-the-art 4-bit weight quantization for LLMs as we know. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.01195 [pdf, other]

AIM: Automatic Interrupt Modeling for Dynamic Firmware Analysis

Authors: Bo Feng, Meng Luo, Changming Liu, Long Lu, Engin Kirda

Abstract: The security of microcontrollers, which drive modern IoT and embedded devices, continues to raise major concerns. Within a microcontroller (MCU), the firmware is a monolithic piece of software that contains the whole software stack, whereas a variety of peripherals represent the hardware. As MCU firmware contains vulnerabilities, it is ideal to test firmware with off-the-shelf software testing tec… ▽ More The security of microcontrollers, which drive modern IoT and embedded devices, continues to raise major concerns. Within a microcontroller (MCU), the firmware is a monolithic piece of software that contains the whole software stack, whereas a variety of peripherals represent the hardware. As MCU firmware contains vulnerabilities, it is ideal to test firmware with off-the-shelf software testing techniques, such as dynamic symbolic execution and fuzzing. Nevertheless, no emulator can emulate the diverse MCU peripherals or execute/test the firmware. Specifically, the interrupt interface, among all I/O interfaces used by MCU peripherals, is extremely challenging to emulate. In this paper, we present AIM -- a generic, scalable, and hardware-independent dynamic firmware analysis framework that supports unemulated MCU peripherals by a novel interrupt modeling mechanism. AIM effectively and efficiently covers interrupt-dependent code in firmware by a novel, firmware-guided, Just-in-Time Interrupt Firing technique. We implemented our framework in angr and performed dynamic symbolic execution for eight real-world MCU firmware. According to testing results, our framework covered up to 11.2 times more interrupt-dependent code than state-of-the-art approaches while accomplishing several challenging goals not feasible previously. Finally, a comparison with a state-of-the-art firmware fuzzer demonstrates dynamic symbolic execution and fuzzing together can achieve better firmware testing coverage. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: This paper was accepted to IEEE Transactions on Dependable and Secure Computing at Oct 12, 2023

arXiv:2310.10835 [pdf, other]

Provable Probabilistic Imaging using Score-Based Generative Priors

Authors: Yu Sun, Zihui Wu, Yifan Chen, Berthy T. Feng, Katherine L. Bouman

Abstract: Estimating high-quality images while also quantifying their uncertainty are two desired features in an image reconstruction algorithm for solving ill-posed inverse problems. In this paper, we propose plug-and-play Monte Carlo (PMC) as a principled framework for characterizing the space of possible solutions to a general inverse problem. PMC is able to incorporate expressive score-based generative… ▽ More Estimating high-quality images while also quantifying their uncertainty are two desired features in an image reconstruction algorithm for solving ill-posed inverse problems. In this paper, we propose plug-and-play Monte Carlo (PMC) as a principled framework for characterizing the space of possible solutions to a general inverse problem. PMC is able to incorporate expressive score-based generative priors for high-quality image reconstruction while also performing uncertainty quantification via posterior sampling. In particular, we introduce two PMC algorithms which can be viewed as the sampling analogues of the traditional plug-and-play priors (PnP) and regularization by denoising (RED) algorithms. We also establish a theoretical analysis for characterizing the convergence of the PMC algorithms. Our analysis provides non-asymptotic stationarity guarantees for both algorithms, even in the presence of non-log-concave likelihoods and imperfect score networks. We demonstrate the performance of the PMC algorithms on multiple representative inverse problems with both linear and nonlinear forward models. Experimental results show that PMC significantly improves reconstruction quality and enables high-fidelity uncertainty quantification. △ Less

Submitted 29 December, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.06504 [pdf, other]

Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

Authors: Guanting Dong, Jinxu Zhao, Tingfeng Hui, Daichi Guo, Wenlong Wan, Boqi Feng, Yueyan Qiu, Zhuoma Gongque, Keqing He, Zechen Wang, Weiran Xu

Abstract: With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, w… ▽ More With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data pool, and carefully design two ways of automatic task demonstration construction strategies (instance-level and entity-level) with various prompt templates. Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios. The experiments have demonstrated that the current open-source LLMs generally achieve limited perturbation robustness performance. Based on these experimental observations, we make some forward-looking suggestions to fuel the research in this direction. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted at NLPCC 2023 (Oral Presentation)

arXiv:2310.03125 [pdf, other]

Shielding the Unseen: Privacy Protection through Poisoning NeRF with Spatial Deformation

Authors: Yihan Wu, Brandon Y. Feng, Heng Huang

Abstract: In this paper, we introduce an innovative method of safeguarding user privacy against the generative capabilities of Neural Radiance Fields (NeRF) models. Our novel poisoning attack method induces changes to observed views that are imperceptible to the human eye, yet potent enough to disrupt NeRF's ability to accurately reconstruct a 3D scene. To achieve this, we devise a bi-level optimization alg… ▽ More In this paper, we introduce an innovative method of safeguarding user privacy against the generative capabilities of Neural Radiance Fields (NeRF) models. Our novel poisoning attack method induces changes to observed views that are imperceptible to the human eye, yet potent enough to disrupt NeRF's ability to accurately reconstruct a 3D scene. To achieve this, we devise a bi-level optimization algorithm incorporating a Projected Gradient Descent (PGD)-based spatial deformation. We extensively test our approach on two common NeRF benchmark datasets consisting of 29 real-world scenes with high-quality images. Our results compellingly demonstrate that our privacy-preserving method significantly impairs NeRF's performance across these benchmark datasets. Additionally, we show that our method is adaptable and versatile, functioning across various perturbation strengths and NeRF architectures. This work offers valuable insights into NeRF's vulnerabilities and emphasizes the need to account for such potential privacy risks when developing robust 3D scene reconstruction algorithms. Our study contributes to the larger conversation surrounding responsible AI and generative machine learning, aiming to protect user privacy and respect creative ownership in the digital age. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.17293 [pdf, other]

doi 10.1007/s10773-023-05382-0

Quantum Privacy-preserving Two-party Circle Intersection Protocol Based on Phase-encoded Query

Authors: Zi-Xian Li, Qi Yang, Bao Feng, Wen-Jie Liu

Abstract: Privacy-preserving geometric intersection (PGI) is an important issue in Secure multiparty computation (SMC). The existing quantum PGI protocols are mainly based on grid coding, which requires a lot of computational complexity. The phase-encoded query method which has been used in some Quantum SMC protocols is suitable to solve the decision problem, but it needs to apply high dimensional Oracle op… ▽ More Privacy-preserving geometric intersection (PGI) is an important issue in Secure multiparty computation (SMC). The existing quantum PGI protocols are mainly based on grid coding, which requires a lot of computational complexity. The phase-encoded query method which has been used in some Quantum SMC protocols is suitable to solve the decision problem, but it needs to apply high dimensional Oracle operators. In this paper, we use the principle of phase-encoded query to solve an important PGI problem, namely privacy-preserving two-party circle intersection. We study the implementation of Oracle operator in detail, and achieve polynomial computational complexity by decompsing it into quantum arithmetic operations. Performance analysis shows that our protocol is correct and efficient, and can protect the privacy of all participants against internal and external attacks. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Comments: 16 pages, 2 figures

Journal ref: International Journal of Theoretical Physics, 2023. 62(7): p. 138

arXiv:2309.14349 [pdf, other]

Corporate Credit Rating: A Survey

Authors: Bojing Feng, Xi Cheng, Dan Li, Zeyu Liu, Wenfang Xue

Abstract: Corporate credit rating (CCR) plays a very important role in the process of contemporary economic and social development. How to use credit rating methods for enterprises has always been a problem worthy of discussion. Through reading and studying the relevant literature at home and abroad, this paper makes a systematic survey of CCR. This paper combs the context of the development of CCR methods… ▽ More Corporate credit rating (CCR) plays a very important role in the process of contemporary economic and social development. How to use credit rating methods for enterprises has always been a problem worthy of discussion. Through reading and studying the relevant literature at home and abroad, this paper makes a systematic survey of CCR. This paper combs the context of the development of CCR methods from the three levels: statistical models, machine learning models and neural network models, summarizes the common databases of CCR, and deeply compares the advantages and disadvantages of the models. Finally, this paper summarizes the problems existing in the current research and prospects the future of CCR. Compared with the existing review of CCR, this paper expounds and analyzes the progress of neural network model in this field in recent years. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 11 pages

arXiv:2309.11591 [pdf, other]

Continuous Levels of Detail for Light Field Networks

Authors: David Li, Brandon Y. Feng, Amitabh Varshney

Abstract: Recently, several approaches have emerged for generating neural representations with multiple levels of detail (LODs). LODs can improve the rendering by using lower resolutions and smaller model sizes when appropriate. However, existing methods generally focus on a few discrete LODs which suffer from aliasing and flicker artifacts as details are changed and limit their granularity for adapting to… ▽ More Recently, several approaches have emerged for generating neural representations with multiple levels of detail (LODs). LODs can improve the rendering by using lower resolutions and smaller model sizes when appropriate. However, existing methods generally focus on a few discrete LODs which suffer from aliasing and flicker artifacts as details are changed and limit their granularity for adapting to resource limitations. In this paper, we propose a method to encode light field networks with continuous LODs, allowing for finely tuned adaptations to rendering conditions. Our training procedure uses summed-area table filtering allowing efficient and continuous filtering at various LODs. Furthermore, we use saliency-based importance sampling which enables our light field networks to distribute their capacity, particularly limited at lower LODs, towards representing the details viewers are most likely to focus on. Incorporating continuous LODs into neural representations enables progressive streaming of neural representations, decreasing the latency and resource utilization for rendering. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted to BMVC 2023. Webpage at https://augmentariumlab.github.io/continuous-lfn/

arXiv:2309.01949 [pdf, other]

Efficient Bayesian Computational Imaging with a Surrogate Score-Based Prior

Authors: Berthy T. Feng, Katherine L. Bouman

Abstract: We propose a surrogate function for efficient use of score-based priors for Bayesian inverse imaging. Recent work turned score-based diffusion models into probabilistic priors for solving ill-posed imaging problems by appealing to an ODE-based log-probability function. However, evaluating this function is computationally inefficient and inhibits posterior estimation of high-dimensional images. Our… ▽ More We propose a surrogate function for efficient use of score-based priors for Bayesian inverse imaging. Recent work turned score-based diffusion models into probabilistic priors for solving ill-posed imaging problems by appealing to an ODE-based log-probability function. However, evaluating this function is computationally inefficient and inhibits posterior estimation of high-dimensional images. Our proposed surrogate prior is based on the evidence lower-bound of a score-based diffusion model. We demonstrate the surrogate prior on variational inference for efficient approximate posterior sampling of large images. Compared to the exact prior in previous work, our surrogate prior accelerates optimization of the variational image distribution by at least two orders of magnitude. We also find that our principled approach achieves higher-fidelity images than non-Bayesian baselines that involve hyperparameter-tuning at inference. Our work establishes a practical path forward for using score-based diffusion models as general-purpose priors for imaging. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.16861 [pdf, ps, other]

Facing Unknown: Open-World Encrypted Traffic Classification Based on Contrastive Pre-Training

Authors: Xiang Li, Beibei Feng, Tianning Zang, Shuyuan Zhao, Jingrun Ma

Abstract: Traditional Encrypted Traffic Classification (ETC) methods face a significant challenge in classifying large volumes of encrypted traffic in the open-world assumption, i.e., simultaneously classifying the known applications and detecting unknown applications. We propose a novel Open-World Contrastive Pre-training (OWCP) framework for this. OWCP performs contrastive pre-training to obtain a robust… ▽ More Traditional Encrypted Traffic Classification (ETC) methods face a significant challenge in classifying large volumes of encrypted traffic in the open-world assumption, i.e., simultaneously classifying the known applications and detecting unknown applications. We propose a novel Open-World Contrastive Pre-training (OWCP) framework for this. OWCP performs contrastive pre-training to obtain a robust feature representation. Based on this, we determine the spherical mapping space to find the marginal flows for each known class, which are used to train GANs to synthesize new flows similar to the known parts but do not belong to any class. These synthetic flows are assigned to Softmax's unknown node to modify the classifier, effectively enhancing sensitivity towards known flows and significantly suppressing unknown ones. Extensive experiments on three datasets show that OWCP significantly outperforms existing ETC and generic open-world classification methods. Furthermore, we conduct comprehensive ablation studies and sensitivity analyses to validate each integral component of OWCP. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by 2023 IEEE ISCC, 6 pages, 5 figures

arXiv:2308.06720 [pdf, other]

Joint Beamforming and Antenna Movement Design for Moveable Antenna Systems Based on Statistical CSI

Authors: Xintai Chen, Biqian Feng, Yongpeng Wu, Derrick Wing Kwan Ng, Robert Schober

Abstract: This paper studies a novel movable antenna (MA)-enhanced multiple-input multiple-output (MIMO) system to leverage the corresponding spatial degrees of freedom (DoFs) for improving the performance of wireless communications. We aim to maximize the achievable rate by jointly optimizing the MA positions and the transmit covariance matrix based on statistical channel state information (CSI). To solve… ▽ More This paper studies a novel movable antenna (MA)-enhanced multiple-input multiple-output (MIMO) system to leverage the corresponding spatial degrees of freedom (DoFs) for improving the performance of wireless communications. We aim to maximize the achievable rate by jointly optimizing the MA positions and the transmit covariance matrix based on statistical channel state information (CSI). To solve the resulting design problem, we develop a constrained stochastic successive convex approximation (CSSCA) algorithm applicable for the general movement mode. Furthermore, we propose two simplified antenna movement modes, namely the linear movement mode and the planar movement mode, to facilitate efficient antenna movement and reduce the computational complexity of the CSSCA algorithm. Numerical results show that the considered MA-enhanced system can significantly improve the achievable rate compared to conventional MIMO systems employing uniform planar arrays (UPAs) and that the proposed planar movement mode performs closely to the performance upper bound achieved by the general movement mode. △ Less

Submitted 18 August, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: Accepted by GLOBECOM 2023

arXiv:2308.06707 [pdf, other]

Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition

Authors: Xiaohu Huang, Xinggang Wang, Zhidianqiu Jin, Bo Yang, Botao He, Bin Feng, Wenyu Liu

Abstract: Graph convolutional networks have been widely applied in skeleton-based gait recognition. A key challenge in this task is to distinguish the individual walking styles of different subjects across various views. Existing state-of-the-art methods employ uniform convolutions to extract features from diverse sequences and ignore the effects of viewpoint changes. To overcome these limitations, we propo… ▽ More Graph convolutional networks have been widely applied in skeleton-based gait recognition. A key challenge in this task is to distinguish the individual walking styles of different subjects across various views. Existing state-of-the-art methods employ uniform convolutions to extract features from diverse sequences and ignore the effects of viewpoint changes. To overcome these limitations, we propose a condition-adaptive graph (CAG) convolution network that can dynamically adapt to the specific attributes of each skeleton sequence and the corresponding view angle. In contrast to using fixed weights for all joints and sequences, we introduce a joint-specific filter learning (JSFL) module in the CAG method, which produces sequence-adaptive filters at the joint level. The adaptive filters capture fine-grained patterns that are unique to each joint, enabling the extraction of diverse spatial-temporal information about body parts. Additionally, we design a view-adaptive topology learning (VATL) module that generates adaptive graph topologies. These graph topologies are used to correlate the joints adaptively according to the specific view conditions. Thus, CAG can simultaneously adjust to various walking styles and viewpoints. Experiments on the two most widely used datasets (i.e., CASIA-B and OU-MVLP) show that CAG surpasses all previous skeleton-based methods. Moreover, the recognition performance can be enhanced by simply combining CAG with appearance-based methods, demonstrating the ability of CAG to provide useful complementary information.The source code will be available at https://github.com/OliverHxh/CAG. △ Less

Submitted 13 August, 2023; originally announced August 2023.

Comments: Accepted by TIP journal

arXiv:2308.03757 [pdf, other]

3D Motion Magnification: Visualizing Subtle Motions with Time Varying Radiance Fields

Authors: Brandon Y. Feng, Hadi Alzayer, Michael Rubinstein, William T. Freeman, Jia-Bin Huang

Abstract: Motion magnification helps us visualize subtle, imperceptible motion. However, prior methods only work for 2D videos captured with a fixed camera. We present a 3D motion magnification method that can magnify subtle motions from scenes captured by a moving camera, while supporting novel view rendering. We represent the scene with time-varying radiance fields and leverage the Eulerian principle for… ▽ More Motion magnification helps us visualize subtle, imperceptible motion. However, prior methods only work for 2D videos captured with a fixed camera. We present a 3D motion magnification method that can magnify subtle motions from scenes captured by a moving camera, while supporting novel view rendering. We represent the scene with time-varying radiance fields and leverage the Eulerian principle for motion magnification to extract and amplify the variation of the embedding of a fixed point over time. We study and validate our proposed principle for 3D motion magnification using both implicit and tri-plane-based radiance fields as our underlying 3D scene representation. We evaluate the effectiveness of our method on both synthetic and real-world scenes captured under various camera setups. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: ICCV 2023. See the project page at https://3d-motion-magnification.github.io

arXiv:2306.09348 [pdf, other]

Seeing the World through Your Eyes

Authors: Hadi Alzayer, Kevin Zhang, Brandon Feng, Christopher Metzler, Jia-Bin Huang

Abstract: The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like. By imaging the eyes of a moving person, we can collect multiple views of a scene outside the camera's direct line of sight through the reflections in the eyes. In this paper, we reconstruct a 3D scene beyond the camera's line of sight using portrait images containing eye r… ▽ More The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like. By imaging the eyes of a moving person, we can collect multiple views of a scene outside the camera's direct line of sight through the reflections in the eyes. In this paper, we reconstruct a 3D scene beyond the camera's line of sight using portrait images containing eye reflections. This task is challenging due to 1) the difficulty of accurately estimating eye poses and 2) the entangled appearance of the eye iris and the scene reflections. Our method jointly refines the cornea poses, the radiance field depicting the scene, and the observer's eye iris texture. We further propose a simple regularization prior on the iris texture pattern to improve reconstruction quality. Through various experiments on synthetic and real-world captures featuring people with varied eye colors, we demonstrate the feasibility of our approach to recover 3D scenes using eye reflections. △ Less

Submitted 2 March, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: CVPR 2024. First two authors contributed equally. Project page: https://world-from-eyes.github.io/

arXiv:2306.07598 [pdf, other]

Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot, Generalizable Approach using RGB Images

Authors: Panwang Pan, Zhiwen Fan, Brandon Y. Feng, Peihao Wang, Chenxin Li, Zhangyang Wang

Abstract: The accurate estimation of six degrees-of-freedom (6DoF) object poses is essential for many applications in robotics and augmented reality. However, existing methods for 6DoF pose estimation often depend on CAD templates or dense support views, restricting their usefulness in realworld situations. In this study, we present a new cascade framework named Cas6D for few-shot 6DoF pose estimation that… ▽ More The accurate estimation of six degrees-of-freedom (6DoF) object poses is essential for many applications in robotics and augmented reality. However, existing methods for 6DoF pose estimation often depend on CAD templates or dense support views, restricting their usefulness in realworld situations. In this study, we present a new cascade framework named Cas6D for few-shot 6DoF pose estimation that is generalizable and uses only RGB images. To address the false positives of target object detection in the extreme few-shot setting, our framework utilizes a selfsupervised pre-trained ViT to learn robust feature representations. Then, we initialize the nearest top-K pose candidates based on similarity score and refine the initial poses using feature pyramids to formulate and update the cascade warped feature volume, which encodes context at increasingly finer scales. By discretizing the pose search range using multiple pose bins and progressively narrowing the pose search range in each stage using predictions from the previous stage, Cas6D can overcome the large gap between pose candidates and ground truth poses, which is a common failure mode in sparse-view scenarios. Experimental results on the LINEMOD and GenMOP datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2% and 3.8% accuracy (Proj-5) under the 32-shot setting compared to OnePose++ and Gen6D. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2306.05629 [pdf, other]

R-PMAC: A Robust Preamble Based MAC Mechanism Applied in Industrial Internet of Things

Authors: Kai Song, Biqian Feng, Yongpeng Wu, Zhen Gao, Wenjun Zhang

Abstract: This paper proposes a novel media access control (MAC) mechanism, called the robust preamble-based MAC mechanism (R-PMAC), which can be applied to power line communication (PLC) networks in the context of the Industrial Internet of Things (IIoT). Compared with other MAC mechanisms such as P-MAC and the MAC layer of IEEE1901.1, R-PMAC has higher networking speed. Besides, it supports whitelist auth… ▽ More This paper proposes a novel media access control (MAC) mechanism, called the robust preamble-based MAC mechanism (R-PMAC), which can be applied to power line communication (PLC) networks in the context of the Industrial Internet of Things (IIoT). Compared with other MAC mechanisms such as P-MAC and the MAC layer of IEEE1901.1, R-PMAC has higher networking speed. Besides, it supports whitelist authentication and functions properly in the presence of data frame loss. Firstly, we outline three basic mechanisms of R-PMAC, containing precise time difference calculation, preambles generation and short ID allocation. Secondly, we elaborate its networking process of single layer and multiple layers. Thirdly, we illustrate its robust mechanisms, including collision handling and data retransmission. Moreover, a low-cost hardware platform is established to measure the time of connecting hundreds of PLC nodes for the R-PMAC, P-MAC, and IEEE1901.1 mechanisms in a real power line environment. The experiment results show that R-PMAC outperforms the other mechanisms by achieving a 50% reduction in networking time. These findings indicate that the R-PMAC mechanism holds great potential for quickly and effectively building a PLC network in actual industrial scenarios. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: This paper has been accepted by IEEE Internet of Things Journal

arXiv:2305.19700 [pdf, other]

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Authors: Haijun Xiong, Yunze Deng, Xiaohu Huang, Xinggang Wang, Wenyu Liu, Bin Feng

Abstract: Gait recognition is an emerging biological recognition technology that identifies and verifies individuals based on their walking patterns. However, many current methods are limited in their use of temporal information. In order to fully harness the potential of gait recognition, it is crucial to consider temporal features at various granularities and spans. Hence, in this paper, we propose a nove… ▽ More Gait recognition is an emerging biological recognition technology that identifies and verifies individuals based on their walking patterns. However, many current methods are limited in their use of temporal information. In order to fully harness the potential of gait recognition, it is crucial to consider temporal features at various granularities and spans. Hence, in this paper, we propose a novel framework named GaitGS, which aggregates temporal features in the granularity dimension and span dimension simultaneously. Specifically, Multi-Granularity Feature Extractor (MGFE) is proposed to focus on capturing the micro-motion and macro-motion information at the frame level and unit level respectively. Moreover, we present Multi-Span Feature Learning (MSFL) module to generate global and local temporal representations. On three popular gait datasets, extensive experiments demonstrate the state-of-the-art performance of our method. Our method achieves the Rank-1 accuracies of 92.9% (+0.5%), 52.0% (+1.4%), and 97.5% (+0.8%) on CASIA-B, GREW, and OU-MVLP respectively. The source code will be released soon. △ Less

Submitted 1 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: 14 pages, 6 figures

arXiv:2305.07584 [pdf, other]

Proactive Content Caching Scheme in Urban Vehicular Networks

Authors: Biqian Feng, Chenyuan Feng, Daquan Feng, Yongpeng Wu, Xiang-Gen Xia

Abstract: Stream media content caching is a key enabling technology to promote the value chain of future urban vehicular networks. Nevertheless, the high mobility of vehicles, intermittency of information transmissions, high dynamics of user requests, limited caching capacities and extreme complexity of business scenarios pose an enormous challenge to content caching and distribution in vehicular networks.… ▽ More Stream media content caching is a key enabling technology to promote the value chain of future urban vehicular networks. Nevertheless, the high mobility of vehicles, intermittency of information transmissions, high dynamics of user requests, limited caching capacities and extreme complexity of business scenarios pose an enormous challenge to content caching and distribution in vehicular networks. To tackle this problem, this paper aims to design a novel edge-computing-enabled hierarchical cooperative caching framework. Firstly, we profoundly analyze the spatio-temporal correlation between the historical vehicle trajectory of user requests and construct the system model to predict the vehicle trajectory and content popularity, which lays a foundation for mobility-aware content caching and dispatching. Meanwhile, we probe into privacy protection strategies to realize privacy-preserved prediction model. Furthermore, based on trajectory and popular content prediction results, content caching strategy is studied, and adaptive and dynamic resource management schemes are proposed for hierarchical cooperative caching networks. Finally, simulations are provided to verify the superiority of our proposed scheme and algorithms. It shows that the proposed algorithms effectively improve the performance of the considered system in terms of hit ratio and average delay, and narrow the gap to the optimal caching scheme comparing with the traditional schemes. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted by IEEE Transactions on Communications

arXiv:2305.06233 [pdf, other]

View Correspondence Network for Implicit Light Field Representation

Authors: Süleyman Aslan, Brandon Yushan Feng, Amitabh Varshney

Abstract: We present a novel technique for implicit neural representation of light fields at continuously defined viewpoints with high quality and fidelity. Our implicit neural representation maps 4D coordinates defining two-plane parameterization of the light fields to the corresponding color values. We leverage periodic activations to achieve high expressivity and accurate reconstruction for complex data… ▽ More We present a novel technique for implicit neural representation of light fields at continuously defined viewpoints with high quality and fidelity. Our implicit neural representation maps 4D coordinates defining two-plane parameterization of the light fields to the corresponding color values. We leverage periodic activations to achieve high expressivity and accurate reconstruction for complex data manifolds while keeping low storage and inference time requirements. However, naïvely trained non-3D structured networks do not adequately satisfy the multi-view consistency; instead, they perform alpha blending of nearby viewpoints. In contrast, our View Correspondence Network, or VICON, leverages stereo matching, optimization by automatic differentiation with respect to the input space, and multi-view pixel correspondence to provide a novel implicit representation of the light fields faithful to the novel views that are unseen during the training. Experimental results show VICON superior to the state-of-the-art non-3D implicit light field representations both qualitatively and quantitatively. Moreover, our implicit representation captures a larger field of view (FoV), surpassing the extent of the observable scene by the cameras of the ground truth renderings. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 10 pages, 7 figures

arXiv:2304.11751 [pdf, other]

Score-Based Diffusion Models as Principled Priors for Inverse Imaging

Authors: Berthy T. Feng, Jamie Smith, Michael Rubinstein, Huiwen Chang, Katherine L. Bouman, William T. Freeman

Abstract: Priors are essential for reconstructing images from noisy and/or incomplete measurements. The choice of the prior determines both the quality and uncertainty of recovered images. We propose turning score-based diffusion models into principled image priors ("score-based priors") for analyzing a posterior of images given measurements. Previously, probabilistic priors were limited to handcrafted regu… ▽ More Priors are essential for reconstructing images from noisy and/or incomplete measurements. The choice of the prior determines both the quality and uncertainty of recovered images. We propose turning score-based diffusion models into principled image priors ("score-based priors") for analyzing a posterior of images given measurements. Previously, probabilistic priors were limited to handcrafted regularizers and simple distributions. In this work, we empirically validate the theoretically-proven probability function of a score-based diffusion model. We show how to sample from resulting posteriors by using this probability function for variational inference. Our results, including experiments on denoising, deblurring, and interferometric imaging, suggest that score-based priors enable principled inference with a sophisticated, data-driven image prior. △ Less

Submitted 28 August, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

Comments: ICCV 2023

arXiv:2304.02214 [pdf, other]

LogoNet: a fine-grained network for instance-level logo sketch retrieval

Authors: Binbin Feng, Jun Li, Jianhua Xu

Abstract: Sketch-based image retrieval, which aims to use sketches as queries to retrieve images containing the same query instance, receives increasing attention in recent years. Although dramatic progress has been made in sketch retrieval, few efforts are devoted to logo sketch retrieval which is still hindered by the following challenges: Firstly, logo sketch retrieval is more difficult than typical sket… ▽ More Sketch-based image retrieval, which aims to use sketches as queries to retrieve images containing the same query instance, receives increasing attention in recent years. Although dramatic progress has been made in sketch retrieval, few efforts are devoted to logo sketch retrieval which is still hindered by the following challenges: Firstly, logo sketch retrieval is more difficult than typical sketch retrieval problem, since a logo sketch usually contains much less visual contents with only irregular strokes and lines. Secondly, instance-specific sketches demonstrate dramatic appearance variances, making them less identifiable when querying the same logo instance. Thirdly, there exist several sketch retrieval benchmarking datasets nowadays, whereas an instance-level logo sketch dataset is still publicly unavailable. To address the above-mentioned limitations, we make twofold contributions in this study for instance-level logo sketch retrieval. To begin with, we construct an instance-level logo sketch dataset containing 2k logo instances and exceeding 9k sketches. To our knowledge, this is the first publicly available instance-level logo sketch dataset. Next, we develop a fine-grained triple-branch CNN architecture based on hybrid attention mechanism termed LogoNet for accurate logo sketch retrieval. More specifically, we embed the hybrid attention mechanism into the triple-branch architecture for capturing the key query-specific information from the limited visual cues in the logo sketches. Experimental evaluations both on our assembled dataset and public benchmark datasets demonstrate the effectiveness of our proposed network. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2303.16856 [pdf, other]

Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

Authors: Bin Feng, Tenglong Ao, Zequn Liu, Wei Ju, Libin Liu, Ming Zhang

Abstract: How to automatically synthesize natural-looking dance movements based on a piece of music is an incrementally popular yet challenging task. Most existing data-driven approaches require hard-to-get paired training data and fail to generate long sequences of motion due to error accumulation of autoregressive structure. We present a novel 3D dance synthesis system that only needs unpaired data for tr… ▽ More How to automatically synthesize natural-looking dance movements based on a piece of music is an incrementally popular yet challenging task. Most existing data-driven approaches require hard-to-get paired training data and fail to generate long sequences of motion due to error accumulation of autoregressive structure. We present a novel 3D dance synthesis system that only needs unpaired data for training and could generate realistic long-term motions at the same time. For the unpaired data training, we explore the disentanglement of beat and style, and propose a Transformer-based model free of reliance upon paired data. For the synthesis of long-term motions, we devise a new long-history attention strategy. It first queries the long-history embedding through an attention computation and then explicitly fuses this embedding into the generation pipeline via multimodal adaptation gate (MAG). Objective and subjective evaluations show that our results are comparable to strong baseline methods, despite not requiring paired training data, and are robust when inferring long-term music. To our best knowledge, we are the first to achieve unpaired data training - an ability that enables to alleviate data limitations effectively. Our code is released on https://github.com/BFeng14/RobustDancer △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Preliminary video demo: https://youtu.be/gJbxG9QlcUU

arXiv:2303.02242 [pdf, other]

TrojText: Test-time Invisible Textual Trojan Insertion

Authors: Qian Lou, Yepeng Liu, Bo Feng

Abstract: In Natural Language Processing (NLP), intelligent neuron models can be susceptible to textual Trojan attacks. Such attacks occur when Trojan models behave normally for standard inputs but generate malicious output for inputs that contain a specific trigger. Syntactic-structure triggers, which are invisible, are becoming more popular for Trojan attacks because they are difficult to detect and defen… ▽ More In Natural Language Processing (NLP), intelligent neuron models can be susceptible to textual Trojan attacks. Such attacks occur when Trojan models behave normally for standard inputs but generate malicious output for inputs that contain a specific trigger. Syntactic-structure triggers, which are invisible, are becoming more popular for Trojan attacks because they are difficult to detect and defend against. However, these types of attacks require a large corpus of training data to generate poisoned samples with the necessary syntactic structures for Trojan insertion. Obtaining such data can be difficult for attackers, and the process of generating syntactic poisoned triggers and inserting Trojans can be time-consuming. This paper proposes a solution called TrojText, which aims to determine whether invisible textual Trojan attacks can be performed more efficiently and cost-effectively without training data. The proposed approach, called the Representation-Logit Trojan Insertion (RLI) algorithm, uses smaller sampled test data instead of large training data to achieve the desired attack. The paper also introduces two additional techniques, namely the accumulated gradient ranking (AGR) and Trojan Weights Pruning (TWP), to reduce the number of tuned parameters and the attack overhead. The TrojText approach was evaluated on three datasets (AG's News, SST-2, and OLID) using three NLP models (BERT, XLNet, and DeBERTa). The experiments demonstrated that the TrojText approach achieved a 98.35\% classification accuracy for test sentences in the target class on the BERT model for the AG's News dataset. The source code for TrojText is available at https://github.com/UCF-ML-Research/TrojText. △ Less

Submitted 21 August, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: In The Eleventh International Conference on Learning Representations. 2023 (ICLR 2023)

arXiv:2301.10900 [pdf, other]

Graph Contrastive Learning for Skeleton-based Action Recognition

Authors: Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng

Abstract: In the field of skeleton-based action recognition, current top-performing graph convolutional networks (GCNs) exploit intra-sequence context to construct adaptive graphs for feature aggregation. However, we argue that such context is still \textit{local} since the rich cross-sequence relations have not been explicitly investigated. In this paper, we propose a graph contrastive learning framework f… ▽ More In the field of skeleton-based action recognition, current top-performing graph convolutional networks (GCNs) exploit intra-sequence context to construct adaptive graphs for feature aggregation. However, we argue that such context is still \textit{local} since the rich cross-sequence relations have not been explicitly investigated. In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences. In specific, SkeletonGCL associates graph learning across sequences by enforcing graphs to be class-discriminative, \emph{i.e.,} intra-class compact and inter-class dispersed, which improves the GCN capacity to distinguish various action patterns. Besides, two memory banks are designed to enrich cross-sequence context from two complementary levels, \emph{i.e.,} instance and semantic levels, enabling graph contrastive learning in multiple context scales. Consequently, SkeletonGCL establishes a new training paradigm, and it can be seamlessly incorporated into current GCNs. Without loss of generality, we combine SkeletonGCL with three GCNs (2S-ACGN, CTR-GCN, and InfoGCN), and achieve consistent improvements on NTU60, NTU120, and NW-UCLA benchmarks. The source code will be available at \url{https://github.com/OliverHxh/SkeletonGCL}. △ Less

Submitted 10 June, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: Accepted by ICLR2023

arXiv:2212.01602 [pdf, other]

StegaNeRF: Embedding Invisible Information within Neural Radiance Fields

Authors: Chenxin Li, Brandon Y. Feng, Zhiwen Fan, Panwang Pan, Zhangyang Wang

Abstract: Recent advances in neural rendering imply a future of widespread visual data distributions through sharing NeRF model weights. However, while common visual data (images and videos) have standard approaches to embed ownership or copyright information explicitly or subtly, the problem remains unexplored for the emerging NeRF format. We present StegaNeRF, a method for steganographic information embed… ▽ More Recent advances in neural rendering imply a future of widespread visual data distributions through sharing NeRF model weights. However, while common visual data (images and videos) have standard approaches to embed ownership or copyright information explicitly or subtly, the problem remains unexplored for the emerging NeRF format. We present StegaNeRF, a method for steganographic information embedding in NeRF renderings. We design an optimization framework allowing accurate hidden information extractions from images rendered by NeRF, while preserving its original visual quality. We perform experimental evaluations of our method under several potential deployment scenarios, and we further discuss the insights discovered through our analysis. StegaNeRF signifies an initial exploration into the novel problem of instilling customizable, imperceptible, and recoverable information to NeRF renderings, with minimal impact to rendered images. Project page: https://xggnet.github.io/StegaNeRF/. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: Project page: https://xggnet.github.io/StegaNeRF/

arXiv:2211.00722 [pdf, other]

doi 10.1145/3550469.3555417

VIINTER: View Interpolation with Implicit Neural Representations of Images

Authors: Brandon Yushan Feng, Susmija Jabbireddy, Amitabh Varshney

Abstract: We present VIINTER, a method for view interpolation by interpolating the implicit neural representation (INR) of the captured images. We leverage the learned code vector associated with each image and interpolate between these codes to achieve viewpoint transitions. We propose several techniques that significantly enhance the interpolation quality. VIINTER signifies a new way to achieve view inter… ▽ More We present VIINTER, a method for view interpolation by interpolating the implicit neural representation (INR) of the captured images. We leverage the learned code vector associated with each image and interpolate between these codes to achieve viewpoint transitions. We propose several techniques that significantly enhance the interpolation quality. VIINTER signifies a new way to achieve view interpolation without constructing 3D structure, estimating camera poses, or computing pixel correspondence. We validate the effectiveness of VIINTER on several multi-view scenes with different types of camera layout and scene composition. As the development of INR of images (as opposed to surface or volume) has centered around tasks like image fitting and super-resolution, with VIINTER, we show its capability for view interpolation and offer a promising outlook on using INR for image manipulation tasks. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: SIGGRAPH Asia 2022

arXiv:2209.12708 [pdf, other]

Faith: An Efficient Framework for Transformer Verification on GPUs

Authors: Boyuan Feng, Tianqi Tang, Yuke Wang, Zhaodong Chen, Zheng Wang, Shu Yang, Yuan Xie, Yufei Ding

Abstract: Transformer verification draws increasing attention in machine learning research and industry. It formally verifies the robustness of transformers against adversarial attacks such as exchanging words in a sentence with synonyms. However, the performance of transformer verification is still not satisfactory due to bound-centric computation which is significantly different from standard neural netwo… ▽ More Transformer verification draws increasing attention in machine learning research and industry. It formally verifies the robustness of transformers against adversarial attacks such as exchanging words in a sentence with synonyms. However, the performance of transformer verification is still not satisfactory due to bound-centric computation which is significantly different from standard neural networks. In this paper, we propose Faith, an efficient framework for transformer verification on GPUs. We first propose a semantic-aware computation graph transformation to identify semantic information such as bound computation in transformer verification. We exploit such semantic information to enable efficient kernel fusion at the computation graph level. Second, we propose a verification-specialized kernel crafter to efficiently map transformer verification to modern GPUs. This crafter exploits a set of GPU hardware supports to accelerate verification specialized operations which are usually memory-intensive. Third, we propose an expert-guided autotuning to incorporate expert knowledge on GPU backends to facilitate large search space exploration. Extensive evaluations show that Faith achieves $2.1\times$ to $3.4\times$ ($2.6\times$ on average) speedup over state-of-the-art frameworks. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: Published in ATC'22

arXiv:2209.07936 [pdf, other]

PA-Boot: A Formally Verified Authentication Protocol for Multiprocessor Secure Boot

Authors: Zhuoruo Zhang, Chenyang Yu, Rui Chang, Mingshuai Chen, Bo Feng, He Huang, Qinming Dai, Wenbo Shen, Yongwang Zhao

Abstract: Hardware supply-chain attacks are raising significant security threats to the boot process of multiprocessor systems. This paper identifies a new, prevalent hardware supply-chain attack surface that can bypass multiprocessor secure boot due to the absence of processor-authentication mechanisms. To defend against such attacks, we present PA-Boot, the first formally verified processor-authentication… ▽ More Hardware supply-chain attacks are raising significant security threats to the boot process of multiprocessor systems. This paper identifies a new, prevalent hardware supply-chain attack surface that can bypass multiprocessor secure boot due to the absence of processor-authentication mechanisms. To defend against such attacks, we present PA-Boot, the first formally verified processor-authentication protocol for secure boot in multiprocessor systems. PA-Boot is proved functionally correct and is guaranteed to detect multiple adversarial behaviors, e.g., processor replacements, man-in-the-middle attacks, and tampering with certificates. The fine-grained formalization of PA-Boot and its fully mechanized security proofs are carried out in the Isabelle/HOL theorem prover with 306 lemmas/theorems and ~7,100 LoC. Experiments on a proof-of-concept implementation indicate that PA-Boot can effectively identify boot-process attacks with a considerably minor overhead and thereby improve the security of multiprocessor systems. △ Less

Submitted 24 April, 2024; v1 submitted 16 September, 2022; originally announced September 2022.

arXiv:2209.06800 [pdf, other]

MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms

Authors: Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li, Yufei Ding

Abstract: The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the… ▽ More The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL, MGG-UVM, and ROC, respectively. △ Less

Submitted 26 June, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: Paper is accepted to OSDI'23

arXiv:2208.12341

Variance Reduction based Experience Replay for Policy Optimization

Authors: Hua Zheng, Wei Xie, M. Ben Feng

Abstract: For reinforcement learning on complex stochastic systems where many factors dynamically impact the output trajectories, it is desirable to effectively leverage the information from historical samples collected in previous iterations to accelerate policy optimization. Classical experience replay allows agents to remember by reusing historical observations. However, the uniform reuse strategy that t… ▽ More For reinforcement learning on complex stochastic systems where many factors dynamically impact the output trajectories, it is desirable to effectively leverage the information from historical samples collected in previous iterations to accelerate policy optimization. Classical experience replay allows agents to remember by reusing historical observations. However, the uniform reuse strategy that treats all observations equally overlooks the relative importance of different samples. To overcome this limitation, we propose a general variance reduction based experience replay (VRER) framework that can selectively reuse the most relevant samples to improve policy gradient estimation. This selective mechanism can adaptively put more weight on past samples that are more likely to be generated by the current target distribution. Our theoretical and empirical studies show that the proposed VRER can accelerate the learning of optimal policy and enhance the performance of state-of-the-art policy optimization approaches. △ Less

Submitted 9 September, 2022; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: This work was intended as a replacement of arXiv:2110.08902 and any subsequent updates will appear there

arXiv:2208.06143 [pdf, other]

PRIF: Primary Ray-based Implicit Function

Authors: Brandon Yushan Feng, Yinda Zhang, Danhang Tang, Ruofei Du, Amitabh Varshney

Abstract: We introduce a new implicit shape representation called Primary Ray-based Implicit Function (PRIF). In contrast to most existing approaches based on the signed distance function (SDF) which handles spatial locations, our representation operates on oriented rays. Specifically, PRIF is formulated to directly produce the surface hit point of a given input ray, without the expensive sphere-tracing ope… ▽ More We introduce a new implicit shape representation called Primary Ray-based Implicit Function (PRIF). In contrast to most existing approaches based on the signed distance function (SDF) which handles spatial locations, our representation operates on oriented rays. Specifically, PRIF is formulated to directly produce the surface hit point of a given input ray, without the expensive sphere-tracing operations, hence enabling efficient shape extraction and differentiable rendering. We demonstrate that neural networks trained to encode PRIF achieve successes in various tasks including single shape representation, category-wise shape generation, shape completion from sparse or noisy observations, inverse rendering for camera pose estimation, and neural rendering with color. △ Less

Submitted 12 August, 2022; originally announced August 2022.

Comments: ECCV 2022. Project Page: https://augmentariumlab.github.io/PRIF/

arXiv:2208.02466 [pdf, other]

Linear MIMO Precoders Design for Finite Alphabet Inputs via Model-Free Training

Authors: Chen Cao, Biqian Feng, Yongpeng Wu, Derrick Wing Kwan Ng, Wenjun Zhang

Abstract: This paper investigates a novel method for designing linear precoders with finite alphabet inputs based on autoencoders (AE) without the knowledge of the channel model. By model-free training of the autoencoder in a multiple-input multiple-output (MIMO) system, the proposed method can effectively solve the optimization problem to design the precoders that maximize the mutual information between th… ▽ More This paper investigates a novel method for designing linear precoders with finite alphabet inputs based on autoencoders (AE) without the knowledge of the channel model. By model-free training of the autoencoder in a multiple-input multiple-output (MIMO) system, the proposed method can effectively solve the optimization problem to design the precoders that maximize the mutual information between the channel inputs and outputs, when only the input-output information of the channel can be observed. Specifically, the proposed method regards the receiver and the precoder as two independent parameterized functions in the AE and alternately trains them using the exact and approximated gradient, respectively. Compared with previous precoders design methods, it alleviates the limitation of requiring the explicit channel model to be known. Simulation results show that the proposed method works as well as those methods under known channel models in terms of maximizing the mutual information and reducing the bit error rate. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: Accepted by GLOBECOM 2022

arXiv:2206.08482 [pdf, other]

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

Authors: Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Ang Li, Yufei Ding

Abstract: With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm. To this end, we propose GMI-DRL, a systematic design to accelerate multi-GPU DRL via… ▽ More With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm. To this end, we propose GMI-DRL, a systematic design to accelerate multi-GPU DRL via GPU spatial multiplexing. We introduce a novel design of resource-adjustable GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, an adaptive GMI management strategy to simultaneously achieve high GPU utilization and computation throughput, and a highly efficient inter-GMI communication support to meet the demands of various DRL communication patterns. Comprehensive experiments reveal that GMI-DRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL (up to 2.81X) and Horovod (up to 2.34X) support in training throughput on the latest DGX-A100 platform. Our work provides an initial user experience with GPU spatial multiplexing in processing heterogeneous workloads with a mixture of computation and communication. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2205.02410 [pdf, other]

Sequential Importance Sampling for Hybrid Model Bayesian Inference to Support Bioprocess Mechanism Learning and Robust Control

Authors: Wei Xie, Keqi Wang, Hua Zheng, Ben Feng

Abstract: Driven by the critical needs of biomanufacturing 4.0, we introduce a probabilistic knowledge graph hybrid model characterizing the risk- and science-based understanding of bioprocess mechanisms. It can faithfully capture the important properties, including nonlinear reactions, partially observed state, and nonstationary dynamics. Given very limited real process observations, we derive a posterior… ▽ More Driven by the critical needs of biomanufacturing 4.0, we introduce a probabilistic knowledge graph hybrid model characterizing the risk- and science-based understanding of bioprocess mechanisms. It can faithfully capture the important properties, including nonlinear reactions, partially observed state, and nonstationary dynamics. Given very limited real process observations, we derive a posterior distribution quantifying model estimation uncertainty. To avoid the evaluation of intractable likelihoods, Approximate Bayesian Computation sampling with Sequential Monte Carlo (ABC-SMC) is utilized to approximate the posterior distribution. Under high stochastic and model uncertainties, it is computationally expensive to match output trajectories. Therefore, we create a linear Gaussian dynamic Bayesian network (LG-DBN) auxiliary likelihood-based ABC-SMC approach. Through matching the summary statistics driven through LG-DBN likelihood that can capture critical interactions and variations, the proposed algorithm can accelerate hybrid model inference, support process monitoring, and facilitate mechanism learning and robust control. △ Less

Submitted 29 September, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

Comments: 11 pages, 2 figures

arXiv:2204.04932 [pdf]

doi 10.1109/CVCI56766.2022.9964574

Optimized SC-F-LOAM: Optimized Fast LiDAR Odometry and Mapping Using Scan Context

Authors: Lizhou Liao, Chunyun Fu, Binbin Feng, Tian Su

Abstract: LiDAR odometry can achieve accurate vehicle pose estimation for short driving range or in small-scale environments, but for long driving range or in large-scale environments, the accuracy deteriorates as a result of cumulative estimation errors. This drawback necessitates the inclusion of loop closure detection in a SLAM framework to suppress the adverse effects of cumulative errors. To improve th… ▽ More LiDAR odometry can achieve accurate vehicle pose estimation for short driving range or in small-scale environments, but for long driving range or in large-scale environments, the accuracy deteriorates as a result of cumulative estimation errors. This drawback necessitates the inclusion of loop closure detection in a SLAM framework to suppress the adverse effects of cumulative errors. To improve the accuracy of pose estimation, we propose a new LiDAR-based SLAM method which uses F-LOAM as LiDAR odometry, Scan Context for loop closure detection, and GTSAM for global optimization. In our approach, an adaptive distance threshold (instead of a fixed threshold) is employed for loop closure detection, which achieves more accurate loop closure detection results. Besides, a feature-based matching method is used in our approach to compute vehicle pose transformations between loop closure point cloud pairs, instead of using the raw point cloud obtained by the LiDAR sensor, which significantly reduces the computation time. The KITTI dataset is used for verifications of our method, and the experimental results demonstrate that the proposed method outperforms typical LiDAR odometry/SLAM methods in the literature. Our code is made publicly available for the benefit of the community. △ Less

Submitted 15 March, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

Journal ref: Proceedings of the 2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI), Nanjing, China, 28-30 October 2022

arXiv:2204.03270 [pdf, other]

Multi-scale Context-aware Network with Transformer for Gait Recognition

Authors: Duowang Zhu, Xiaohu Huang, Xinggang Wang, Bo Yang, Botao He, Wenyu Liu, Bin Feng

Abstract: Although gait recognition has drawn increasing research attention recently, since the silhouette differences are quite subtle in spatial domain, temporal feature representation is crucial for gait recognition. Inspired by the observation that humans can distinguish gaits of different subjects by adaptively focusing on clips of varying time scales, we propose a multi-scale context-aware network wit… ▽ More Although gait recognition has drawn increasing research attention recently, since the silhouette differences are quite subtle in spatial domain, temporal feature representation is crucial for gait recognition. Inspired by the observation that humans can distinguish gaits of different subjects by adaptively focusing on clips of varying time scales, we propose a multi-scale context-aware network with transformer (MCAT) for gait recognition. MCAT generates temporal features across three scales, and adaptively aggregates them using contextual information from both local and global perspectives. Specifically, MCAT contains an adaptive temporal aggregation (ATA) module that performs local relation modeling followed by global relation modeling to fuse the multi-scale features. Besides, in order to remedy the spatial feature corruption resulting from temporal operations, MCAT incorporates a salient spatial feature learning (SSFL) module to select groups of discriminative spatial features. Extensive experiments conducted on three datasets demonstrate the state-of-the-art performance. Concretely, we achieve rank-1 accuracies of 98.7%, 96.2% and 88.7% under normal walking, bag-carrying and coat-wearing conditions on CASIA-B, 97.5% on OU-MVLP and 50.6% on GREW. The source code will be available at https://github.com/zhuduowang/MCAT.git. △ Less

Submitted 25 September, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Extensions of CSTL

arXiv:2203.11404 [pdf, other]

Enhanced Preamble Based MAC Mechanism for IIoT-oriented PLC Network

Authors: Kai Song, Biqian Feng, Yongpeng Wu, Wenjun Zhang

Abstract: In this paper, we propose an enhanced preamble based media access control mechanism (E-PMAC), which can be applied in power line communication (PLC) network for Industrial Internet of Things (IIoT). We introduce detailed technologies used in E-PMAC, including delay calibration mechanism, preamble design, and slot allocation algorithm. With these technologies, E-PMAC is more robust than existing pr… ▽ More In this paper, we propose an enhanced preamble based media access control mechanism (E-PMAC), which can be applied in power line communication (PLC) network for Industrial Internet of Things (IIoT). We introduce detailed technologies used in E-PMAC, including delay calibration mechanism, preamble design, and slot allocation algorithm. With these technologies, E-PMAC is more robust than existing preamble based MAC mechanism (P-MAC). Besides, we analyze the disadvantage of P-MAC in multi-layer networking and design the networking process of E-PMAC to accelerate networking process. We analyze the complexity of networking process in P-MAC and E-PMAC and prove that E-PMAC has lower complexity than P-MAC. Finally, we simulate the single-layer networking and multi-layer networking of E-PMAC, P-MAC, and existing PLC protocol, i.e. , IEEE1901.1. The simulation results indicate that E-PMAC spends much less time in networking than IEEE1901.1 and P-MAC. Finally, with our work, a PLC network based on E-PMAC mechanism can be realized. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 7 pages, 12 figures, to appeal in The 2022 IEEE 95th Vehicular Technology Conference (VTC2022-Spring)

arXiv:2203.06764 [pdf, other]

TurbuGAN: An Adversarial Learning Approach to Spatially-Varying Multiframe Blind Deconvolution with Applications to Imaging Through Turbulence

Authors: Brandon Yushan Feng, Mingyang Xie, Christopher A. Metzler

Abstract: We present a self-supervised and self-calibrating multi-shot approach to imaging through atmospheric turbulence, called TurbuGAN. Our approach requires no paired training data, adapts itself to the distribution of the turbulence, leverages domain-specific data priors, and can generalize from tens to thousands of measurements. We achieve such functionality through an adversarial sensing framework a… ▽ More We present a self-supervised and self-calibrating multi-shot approach to imaging through atmospheric turbulence, called TurbuGAN. Our approach requires no paired training data, adapts itself to the distribution of the turbulence, leverages domain-specific data priors, and can generalize from tens to thousands of measurements. We achieve such functionality through an adversarial sensing framework adapted from CryoGAN, which uses a discriminator network to match the distributions of captured and simulated measurements. Our framework builds on CryoGAN by (1) generalizing the forward measurement model to incorporate physically accurate and computationally efficient models for light propagation through anisoplanatic turbulence, (2) enabling adaptation to slightly misspecified forward models, and (3) leveraging domain-specific prior knowledge using pretrained generative networks, when available. We validate TurbuGAN on both computationally simulated and experimentally captured images distorted with anisoplanatic turbulence. △ Less

Submitted 2 January, 2023; v1 submitted 13 March, 2022; originally announced March 2022.

arXiv:2202.13566 [pdf]

doi 10.1109/MIS.2020.3026990

Learning Parameters for a Generalized Vidale-Wolfe Response Model with Flexible Ad Elasticity and Word-of-Mouth

Authors: Yanwu Yang, Baozhu Feng, Daniel Zeng

Abstract: In this research, we investigate a generalized form of Vidale-Wolfe (GVW) model. One key element of our modeling work is that the GVW model contains two useful indexes representing advertiser's elasticity and the word-of-mouth (WoM) effect, respectively. Moreover, we discuss some desirable properties of the GVW model, and present a deep neural network (DNN)-based estimation method to learn its par… ▽ More In this research, we investigate a generalized form of Vidale-Wolfe (GVW) model. One key element of our modeling work is that the GVW model contains two useful indexes representing advertiser's elasticity and the word-of-mouth (WoM) effect, respectively. Moreover, we discuss some desirable properties of the GVW model, and present a deep neural network (DNN)-based estimation method to learn its parameters. Furthermore, based on three realworld datasets, we conduct computational experiments to validate the GVW model and identified properties. In addition, we also discuss potential advantages of the GVW model over econometric models. The research outcome shows that both the ad elasticity index and the WoM index have significant influences on advertising responses, and the GVW model has potential advantages over econometric models of advertising, in terms of several interesting phenomena drawn from practical advertising situations. The GVW model and its deep learning-based estimation method provide a basis to support big data-driven advertising analytics and decision makings; in the meanwhile, identified properties and experimental findings of this research illuminate critical managerial insights for advertisers in various advertising forms. △ Less

Submitted 28 February, 2022; originally announced February 2022.

Comments: 20 pages, 8 figures, 1 table

MSC Class: 68Txx ACM Class: I.2.6

Journal ref: IEEE Intelligent Systems, 36(5), 69-79 (2021)

Showing 1–50 of 109 results for author: Feng, B