Search | arXiv e-print repository

Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint

Authors: Jun Li, Weiwei Zhang, Kang Wei, Guangji Chen, Long Shi, Wen Chen

Abstract: As an emerging artificial intelligence technology, graph neural networks (GNNs) have exhibited promising performance across a wide range of graph-related applications. However, information exchanges among neighbor nodes in GNN pose new challenges in the resource-constrained scenario, especially in wireless systems. In practical wireless systems, the communication links among nodes are usually unre… ▽ More As an emerging artificial intelligence technology, graph neural networks (GNNs) have exhibited promising performance across a wide range of graph-related applications. However, information exchanges among neighbor nodes in GNN pose new challenges in the resource-constrained scenario, especially in wireless systems. In practical wireless systems, the communication links among nodes are usually unreliable due to wireless fading and receiver noise, consequently resulting in performance degradation of GNNs. To improve the learning performance of GNNs, we aim to maximize the number of long-term average (LTA) communication links by the optimized power control under energy consumption constraints. Using the Lyapunov optimization method, we first transform the intractable long-term problem into a deterministic problem in each time slot by converting the long-term energy constraints into the objective function. In spite of this non-convex combinatorial optimization problem, we address this problem via equivalently solving a sequence of convex feasibility problems together with a greedy based solver. Simulation results demonstrate the superiority of our proposed scheme over the baselines. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 5 pages,3 figures

arXiv:2405.05695 [pdf, other]

Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

Authors: Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma

Abstract: We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure fo… ▽ More We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure for the primary and auxiliary tasks, which produces different networks for training and inference. Specifically, starting from two single task networks/branches (each representing a task), we propose a novel method with evolving networks where only primary-to-auxiliary links exist as the cross-task connections after convergence. These connections can be removed during the primary task inference, resulting in a single-task inference cost. We achieve this by formulating a Neural Architecture Search (NAS) problem, where we initialize bi-directional connections in the search space and guide the NAS optimization converging to an architecture with only the single-side primary-to-auxiliary connections. Moreover, our method can be incorporated with optimization-based auxiliary learning approaches. Extensive experiments with six tasks on NYU v2, CityScapes, and Taskonomy datasets using VGG, ResNet, and ViT backbones validate the promising performance. The codes are available at https://github.com/ethanygao/Aux-NAS. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted to ICLR 2024

Journal ref: International Conference on Learning Representations (ICLR), 2024

arXiv:2405.05538 [pdf, other]

A Survey on Personalized Content Synthesis with Diffusion Models

Authors: Xulu Zhang, Xiao-Yong Wei, Wengyu Zhang, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li

Abstract: Recent advancements in generative models have significantly impacted content creation, leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user-provided examples, PCS aims to customize the subject of interest to specific user-defined prompts. Over the past two years, more than 150 methods have been proposed. However, existing surveys mainly focus on text-to-image… ▽ More Recent advancements in generative models have significantly impacted content creation, leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user-provided examples, PCS aims to customize the subject of interest to specific user-defined prompts. Over the past two years, more than 150 methods have been proposed. However, existing surveys mainly focus on text-to-image generation, with few providing up-to-date summaries on PCS. This paper offers a comprehensive survey of PCS, with a particular focus on the diffusion models. Specifically, we introduce the generic frameworks of PCS research, which can be broadly classified into optimization-based and learning-based approaches. We further categorize and analyze these methodologies, discussing their strengths, limitations, and key techniques. Additionally, we delve into specialized tasks within the field, such as personalized object generation, face synthesis, and style personalization, highlighting their unique challenges and innovations. Despite encouraging progress, we also present an analysis of the challenges such as overfitting and the trade-off between subject fidelity and text alignment. Through this detailed overview and analysis, we propose future directions to advance the development of PCS. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05500 [pdf]

Research on the Tender Leaf Identification and Mechanically Perceptible Plucking Finger for High-quality Green Tea

Authors: Wei Zhang, Yong Chen, Qianqian Wang, Jun Chen

Abstract: BACKGROUND: Intelligent identification and precise plucking are the keys to intelligent tea harvesting robots, which are of increasing significance nowadays. Aiming at plucking tender leaves for high-quality green tea producing, in this paper, a tender leaf identification algorithm and a mechanically perceptible plucking finger have been proposed. RESULTS: Based on segmentation algorithm and color… ▽ More BACKGROUND: Intelligent identification and precise plucking are the keys to intelligent tea harvesting robots, which are of increasing significance nowadays. Aiming at plucking tender leaves for high-quality green tea producing, in this paper, a tender leaf identification algorithm and a mechanically perceptible plucking finger have been proposed. RESULTS: Based on segmentation algorithm and color features, the tender leaf identification algorithm shows an average identification accuracy of over 92.8%. The mechanically perceptible plucking finger plucks tender leaves in a way that a human hand does so as to remain high quality of tea products. Though finite element analysis, we determine the ideal size of grippers and the location of strain gauge attachment on a gripper to enable the employment of feedback control of desired gripping force. Revealed from our experiments, the success rate of tender leaf plucking reaches 92.5%, demonstrating the effectiveness of our design. CONCLUSION: The results show that the tender leaf identification algorithm and the mechanically perceptible plucking finger are effective for tender leaves identification and plucking, providing a foundation for the development of an intelligent tender leaf plucking robot. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.05258 [pdf, other]

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Authors: Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the effi… ▽ More Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Preprint; 17 pages, 6 figures, 8 tables; Code at https://github.com/ldkong1205/LaserMix

arXiv:2405.05109 [pdf, other]

QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs

Authors: Weijia Zhang, Vaishali Pal, Jia-Hong Huang, Evangelos Kanoulas, Maarten de Rijke

Abstract: Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users' information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing que… ▽ More Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users' information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model (LLM), utilizes textual queries and multiple tables to generate query-dependent table summaries tailored to users' information needs. To facilitate research in this area, we present a comprehensive dataset specifically tailored for this task, consisting of 4909 query-summary pairs, each associated with multiple tables. Through extensive experiments using our curated dataset, we demonstrate the effectiveness of our proposed method compared to baseline approaches. Our findings offer insights into the challenges of complex table reasoning for precise summarization, contributing to the advancement of research in query-focused multi-table summarization. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 16 pages, 3 figures

arXiv:2405.04936 [pdf, other]

SPSW: Database Watermarking Based on Fake Tuples and Sparse Priority Strategy

Authors: Zhiwen Ren, Zehua Ma, Weiming Zhang, Nenghai Yu

Abstract: Databases play a crucial role in storing and managing vast amounts of data in various organizations and industries. Yet the risk of database leakage poses a significant threat to data privacy and security. To trace the source of database leakage, researchers have proposed many database watermarking schemes. Among them, fake-tuples-based database watermarking shows great potential as it does not mo… ▽ More Databases play a crucial role in storing and managing vast amounts of data in various organizations and industries. Yet the risk of database leakage poses a significant threat to data privacy and security. To trace the source of database leakage, researchers have proposed many database watermarking schemes. Among them, fake-tuples-based database watermarking shows great potential as it does not modify the original data of the database, ensuring the seamless usability of the watermarked database. However, the existing fake-tuple-based database watermarking schemes need to insert a large number of fake tuples for the embedding of each watermark bit, resulting in low watermark transparency. Therefore, we propose a novel database watermarking scheme based on fake tuples and sparse priority strategy, named SPSW, which achieves the same watermark capacity with a lower number of inserted fake tuples compared to the existing embedding strategy. Specifically, for a database about to be watermarked, we prioritize embedding the sparsest watermark sequence, i.e., the sequence containing the most `0' bits among the currently available watermark sequences. For each bit in the sparse watermark sequence, when it is set to `1', SPSW will embed the corresponding set of fake tuples into the database. Otherwise, no modifications will be made to the database. Through theoretical analysis, the proposed sparse priority strategy not only improves transparency but also enhances the robustness of the watermark. The comparative experimental results with other database watermarking schemes further validate the superior performance of the proposed SPSW, aligning with the theoretical analysis. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04165 [pdf, other]

LingML: Linguistic-Informed Machine Learning for Enhanced Fake News Detection

Authors: Jasraj Singh, Fang Liu, Hong Xu, Bee Chin Ng, Wei Zhang

Abstract: Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with… ▽ More Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with linguistics input and we propose LingML, linguistic-informed ML, for fake news detection. We conducted an experimental study with a popular dataset on fake news during the pandemic. The experiment results show that our proposed solution is highly effective. There are fewer than two errors out of every ten attempts with only linguistic input used in ML and the knowledge is highly explainable. When linguistics input is integrated with advanced large-scale ML models for natural language processing, our solution outperforms existing ones with 1.8% average error rate. LingML creates a new path with linguistics to push the frontier of effective and efficient fake news detection. It also sheds light on real-world multi-disciplinary applications requiring both ML and domain expertise to achieve optimal performance. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 7 pages

arXiv:2405.04121 [pdf, other]

ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation

Authors: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

Abstract: Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Mult… ▽ More Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Multi-Stage Knowledge Distillation, transferring comprehensive knowledge from the Vision Foundation Model (VFM), extensively trained on diverse open-world images. This enables effective knowledge transfer to a lightweight student model across modalities. ELiTe employs Parameter-Efficient Fine-Tuning to strengthen the VFM teacher and expedite large-scale model training with minimal costs. Additionally, we introduce the Segment Anything Model based Pseudo-Label Generation approach to enhance low-quality image labels, facilitating robust semantic representations. Efficient knowledge transfer in ELiTe yields state-of-the-art results on the SemanticKITTI benchmark, outperforming real-time inference models. Our approach achieves this with significantly fewer parameters, confirming its effectiveness and efficiency. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 9 pages, 6 figures, ICME 2024 oral

arXiv:2405.04114 [pdf, other]

Acceleration Algorithms in GNNs: A Survey

Authors: Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui

Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the resear… ▽ More Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the research community. In this paper, we present a systematic review of acceleration algorithms in GNNs, which can be categorized into three main topics based on their purpose: training acceleration, inference acceleration, and execution acceleration. Specifically, we summarize and categorize the existing approaches for each main topic, and provide detailed characterizations of the approaches within each category. Additionally, we review several libraries related to acceleration algorithms in GNNs and discuss our Scalable Graph Learning (SGL) library. Finally, we propose promising directions for future research. A complete summary is presented in our GitHub repository: https://github.com/PKU-DAIR/SGL/blob/main/Awsome-GNN-Acceleration.md. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 9 pages,3 figures

arXiv:2405.03950 [pdf, other]

Relating-Up: Advancing Graph Neural Networks through Inter-Graph Relationships

Authors: Qi Zou, Na Yu, Daoliang Zhang, Wei Zhang, Rui Gao

Abstract: Graph Neural Networks (GNNs) have excelled in learning from graph-structured data, especially in understanding the relationships within a single graph, i.e., intra-graph relationships. Despite their successes, GNNs are limited by neglecting the context of relationships across graphs, i.e., inter-graph relationships. Recognizing the potential to extend this capability, we introduce Relating-Up, a p… ▽ More Graph Neural Networks (GNNs) have excelled in learning from graph-structured data, especially in understanding the relationships within a single graph, i.e., intra-graph relationships. Despite their successes, GNNs are limited by neglecting the context of relationships across graphs, i.e., inter-graph relationships. Recognizing the potential to extend this capability, we introduce Relating-Up, a plug-and-play module that enhances GNNs by exploiting inter-graph relationships. This module incorporates a relation-aware encoder and a feedback training strategy. The former enables GNNs to capture relationships across graphs, enriching relation-aware graph representation through collective context. The latter utilizes a feedback loop mechanism for the recursively refinement of these representations, leveraging insights from refining inter-graph dynamics to conduct feedback loop. The synergy between these two innovations results in a robust and versatile module. Relating-Up enhances the expressiveness of GNNs, enabling them to encapsulate a wider spectrum of graph relationships with greater precision. Our evaluations across 16 benchmark datasets demonstrate that integrating Relating-Up into GNN architectures substantially improves performance, positioning Relating-Up as a formidable choice for a broad spectrum of graph representation learning tasks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 16 pages, 6 figures, 9 tables

arXiv:2405.03884 [pdf, other]

BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection

Authors: Saket S. Chaturvedi, Lan Zhang, Wenbin Zhang, Pan He, Xiaoyong Yuan

Abstract: 3D object detection plays an important role in autonomous driving; however, its vulnerability to backdoor attacks has become evident. By injecting ''triggers'' to poison the training dataset, backdoor attacks manipulate the detector's prediction for inputs containing these triggers. Existing backdoor attacks against 3D object detection primarily poison 3D LiDAR signals, where large-sized 3D trigge… ▽ More 3D object detection plays an important role in autonomous driving; however, its vulnerability to backdoor attacks has become evident. By injecting ''triggers'' to poison the training dataset, backdoor attacks manipulate the detector's prediction for inputs containing these triggers. Existing backdoor attacks against 3D object detection primarily poison 3D LiDAR signals, where large-sized 3D triggers are injected to ensure their visibility within the sparse 3D space, rendering them easy to detect and impractical in real-world scenarios. In this paper, we delve into the robustness of 3D object detection, exploring a new backdoor attack surface through 2D cameras. Given the prevalent adoption of camera and LiDAR signal fusion for high-fidelity 3D perception, we investigate the latent potential of camera signals to disrupt the process. Although the dense nature of camera signals enables the use of nearly imperceptible small-sized triggers to mislead 2D object detection, realizing 2D-oriented backdoor attacks against 3D object detection is non-trivial. The primary challenge emerges from the fusion process that transforms camera signals into a 3D space, compromising the association with the 2D trigger to the target output. To tackle this issue, we propose an innovative 2D-oriented backdoor attack against LiDAR-camera fusion methods for 3D object detection, named BadFusion, for preserving trigger effectiveness throughout the entire fusion process. The evaluation demonstrates the effectiveness of BadFusion, achieving a significantly higher attack success rate compared to existing 2D-oriented attacks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted at IJCAI 2024 Conference

arXiv:2405.03614 [pdf, ps, other]

Repairing with Zero Skip Cost

Authors: Wenqin Zhang, Yeow Meng Chee, Son Hoang Dau, Tuvi Etzion, Han Mao Kiah, Yuan Luo

Abstract: To measure repair latency at helper nodes, we introduce a new metric called skip cost that quantifies the number of contiguous sections accessed on a disk. We provide explicit constructions of zigzag codes and fractional repetition codes that incur zero skip cost To measure repair latency at helper nodes, we introduce a new metric called skip cost that quantifies the number of contiguous sections accessed on a disk. We provide explicit constructions of zigzag codes and fractional repetition codes that incur zero skip cost △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03296 [pdf, ps, other]

Coefficient Decomposition for Spectral Graph Convolution

Authors: Feng Huang, Wen Zhang

Abstract: Spectral graph convolutional network (SGCN) is a kind of graph neural networks (GNN) based on graph signal filters, and has shown compelling expressivity for modeling graph-structured data. Most SGCNs adopt polynomial filters and learn the coefficients from the training data. Many of them focus on which polynomial basis leads to optimal expressive power and models' architecture is little discussed… ▽ More Spectral graph convolutional network (SGCN) is a kind of graph neural networks (GNN) based on graph signal filters, and has shown compelling expressivity for modeling graph-structured data. Most SGCNs adopt polynomial filters and learn the coefficients from the training data. Many of them focus on which polynomial basis leads to optimal expressive power and models' architecture is little discussed. In this paper, we propose a general form in terms of spectral graph convolution, where the coefficients of polynomial basis are stored in a third-order tensor. Then, we show that the convolution block in existing SGCNs can be derived by performing a certain coefficient decomposition operation on the coefficient tensor. Based on the generalized view, we develop novel spectral graph convolutions CoDeSGC-CP and -Tucker by tensor decomposition CP and Tucker on the coefficient tensor. Extensive experimental results demonstrate that the proposed convolutions achieve favorable performance improvements. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03125 [pdf, other]

MambaJSCC: Deep Joint Source-Channel Coding with Visual State Space Model

Authors: Tong Wu, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Wenjun Zhang, Ping Zhang

Abstract: Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the… ▽ More Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the state space, enabling feature extraction and encoding processes to operate with linear complexity. It also incorporates channel state information (CSI) via a newly proposed CSI embedding method. This method deploys a shared CSI encoding module within both the encoder and decoder to encode and inject the CSI into each VSSM-CA block, improving the adaptability of a single model to varying channel conditions. Experimental results show that MambaJSCC not only outperforms Swin Transformer based JSCC (SwinJSCC) but also significantly reduces parameter size, computational overhead, and inference delay (ID). For example, with employing an equal number of the VSSM-CA blocks and the Swin Transformer blocks, MambaJSCC achieves a 0.48 dB gain in peak-signal-to-noise ratio (PSNR) over SwinJSCC while requiring only 53.3% multiply-accumulate operations, 53.8% of the parameters, and 44.9% of ID. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: submitted to IEEE conference

arXiv:2405.03008 [pdf, other]

DVMSR: Distillated Vision Mamba for Efficient Super-Resolution

Authors: Xiaoyan Lei, Wenlong ZHang, Weifeng Cao

Abstract: Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational comple… ▽ More Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates Vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several Vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://github.com/nathan66666/DVMSR.git △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 8 pages, 8 figures

arXiv:2405.02954 [pdf, other]

Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

Authors: Wenyu Zhang, Li Shen, Chuan-Sheng Foo

Abstract: Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the sourc… ▽ More Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model at the start of source training, and subsequently discarded. Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge. Rather than discarding this valuable knowledge, we introduce an integrated framework to incorporate pre-trained networks into the target adaptation process. The proposed framework is flexible and allows us to plug modern pre-trained networks into the adaptation process to leverage their stronger representation learning capabilities. For adaptation, we propose the Co-learn algorithm to improve target pseudolabel quality collaboratively through the source model and a pre-trained feature extractor. Building on the recent success of the vision-language model CLIP in zero-shot image recognition, we present an extension Co-learn++ to further incorporate CLIP's zero-shot classification decisions. We evaluate on 3 benchmark datasets and include more challenging scenarios such as open-set, partial-set and open-partial SFDA. Experimental results demonstrate that our proposed strategy improves adaptation performance and can be successfully integrated with existing SFDA methods. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: Extension of ICCV paper arXiv:2212.07585, submitted to IJCV

arXiv:2405.02358 [pdf, other]

A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model

Authors: Jiexia Ye, Weiqi Zhang, Ke Yi, Yongzi Yu, Ziyue Li, Jia Li, Fugee Tsung

Abstract: Time series data are ubiquitous across various domains, making time series analysis critically important. Traditional time series models are task-specific, featuring singular functionality and limited generalization capacity. Recently, large language foundation models have unveiled their remarkable capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explai… ▽ More Time series data are ubiquitous across various domains, making time series analysis critically important. Traditional time series models are task-specific, featuring singular functionality and limited generalization capacity. Recently, large language foundation models have unveiled their remarkable capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explainability. This success has sparked interest in the exploration of foundation models to solve multiple time series challenges simultaneously. There are two main research lines, namely pre-training foundation models from scratch for time series and adapting large language foundation models for time series. They both contribute to the development of a unified model that is highly generalizable, versatile, and comprehensible for time series analysis. This survey offers a 3E analytical framework for comprehensive examination of related research. Specifically, we examine existing works from three dimensions, namely Effectiveness, Efficiency and Explainability. In each dimension, we focus on discussing how related works devise tailored solution by considering unique challenges in the realm of time series. Furthermore, we provide a domain taxonomy to help followers keep up with the domain-specific advancements. In addition, we introduce extensive resources to facilitate the field's development, including datasets, open-source, time series libraries. A GitHub repository is also maintained for resource updates (https://github.com/start2020/Awesome-TimeSeries-LLM-FM). △ Less

Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 5 figures, 6 tables, 41 pages

arXiv:2405.02355 [pdf, other]

CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation

Authors: Kounianhua Du, Renting Rui, Huacan Chai, Lingyue Fu, Wei Xia, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang

Abstract: Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are… ▽ More Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are inherently logical and complex, making them hard to be correctly generated. Existing methods rely on multiple prompts to the large language model to explore better solutions, which is expensive. In this paper, we propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks. CodeGRAG extracts and summarizes the control flow and data flow of code blocks to fill the gap between programming languages and natural language. The extracted external structural knowledge models the inherent flows of code blocks, which can facilitate LLMs for better understanding of code syntax and serve as a bridge among different programming languages. CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation, e.g., C++ for Python. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.02301 [pdf, other]

TFCounter:Polishing Gems for Training-Free Object Counting

Authors: Pan Ting, Jianfeng Lin, Wenhao Yu, Wenlong Zhang, Xiaoying Chen, Jinlu Zhang, Binqiang Huang

Abstract: Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis. Existing object counting methods face a tri-fold challenge: achieving superior performance, maintaining high generalizability, and minimizing annotation costs. We develop a novel training-free class-agnostic object counter, TFCounter, which is prompt-context-a… ▽ More Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis. Existing object counting methods face a tri-fold challenge: achieving superior performance, maintaining high generalizability, and minimizing annotation costs. We develop a novel training-free class-agnostic object counter, TFCounter, which is prompt-context-aware via the cascade of the essential elements in large-scale foundation models. This approach employs an iterative counting framework with a dual prompt system to recognize a broader spectrum of objects varying in shape, appearance, and size. Besides, it introduces an innovative context-aware similarity module incorporating background context to enhance accuracy within messy scenes. To demonstrate cross-domain generalizability, we collect a novel counting dataset named BIKE-1000, including exclusive 1000 images of shared bicycles from Meituan. Extensive experiments on FSC-147, CARPK, and BIKE-1000 datasets demonstrate that TFCounter outperforms existing leading training-free methods and exhibits competitive results compared to trained counterparts. △ Less

Submitted 12 March, 2024; originally announced May 2024.

Comments: 14pages,11 figuers

MSC Class: 68

arXiv:2405.01394 [pdf, other]

Analysis of a Modular Autonomous Driving Architecture: The Top Submission to CARLA Leaderboard 2.0 Challenge

Authors: Weize Zhang, Mohammed Elmahgiubi, Kasra Rezaee, Behzad Khamidehi, Hamidreza Mirkhani, Fazel Arasteh, Chunlin Li, Muhammad Ahsan Kaleem, Eduardo R. Corral-Soto, Dhruv Sharma, Tongtong Cao

Abstract: In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assiste… ▽ More In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assisted perception models to help our planner perform more reliably in highly challenging traffic scenarios. We use open-source driving datasets in conjunction with Inverse Reinforcement Learning (IRL) to enhance the performance of our motion planner. We provide insight into our design choices and trade-offs made to achieve this solution. We also explore the impact of each component in the overall performance of our solution, with the intent of providing a guideline where allocation of resources can have the greatest impact. △ Less

Submitted 21 March, 2024; originally announced May 2024.

arXiv:2405.00616 [pdf, other]

An Expectation-Maximization Relaxed Method for Privacy Funnel

Authors: Lingyi Chen, Jiachuan Ye, Shitong Wu, Huihui Wu, Hao Wu, Wenyi Zhang

Abstract: The privacy funnel (PF) gives a framework of privacy-preserving data release, where the goal is to release useful data while also limiting the exposure of associated sensitive information. This framework has garnered significant interest due to its broad applications in characterization of the privacy-utility tradeoff. Hence, there is a strong motivation to develop numerical methods with high prec… ▽ More The privacy funnel (PF) gives a framework of privacy-preserving data release, where the goal is to release useful data while also limiting the exposure of associated sensitive information. This framework has garnered significant interest due to its broad applications in characterization of the privacy-utility tradeoff. Hence, there is a strong motivation to develop numerical methods with high precision and theoretical convergence guarantees. In this paper, we propose a novel relaxation variant based on Jensen's inequality of the objective function for the computation of the PF problem. This model is proved to be equivalent to the original in terms of optimal solutions and optimal values. Based on our proposed model, we develop an accurate algorithm which only involves closed-form iterations. The convergence of our algorithm is theoretically guaranteed through descent estimation and Pinsker's inequality. Numerical results demonstrate the effectiveness of our proposed algorithm. △ Less