-
Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models
Authors:
Zhe Ma,
Xuhong Zhang,
Qingming Li,
Tianyu Du,
Wenzhi Chen,
Zonghui Wang,
Shouling Ji
Abstract:
The past few years have witnessed substantial advancement in text-guided image generation powered by diffusion models. However, it was shown that text-to-image diffusion models are vulnerable to training image memorization, raising concerns on copyright infringement and privacy invasion. In this work, we perform practical analysis of memorization in text-to-image diffusion models. Targeting a set…
▽ More
The past few years have witnessed substantial advancement in text-guided image generation powered by diffusion models. However, it was shown that text-to-image diffusion models are vulnerable to training image memorization, raising concerns on copyright infringement and privacy invasion. In this work, we perform practical analysis of memorization in text-to-image diffusion models. Targeting a set of images to protect, we conduct quantitive analysis on them without need to collect any prompts. Specifically, we first formally define the memorization of image and identify three necessary conditions of memorization, respectively similarity, existence and probability. We then reveal the correlation between the model's prediction error and image replication. Based on the correlation, we propose to utilize inversion techniques to verify the safety of target images against memorization and measure the extent to which they are memorized. Model developers can utilize our analysis method to discover memorized images or reliably claim safety against memorization. Extensive experiments on the Stable Diffusion, a popular open-source text-to-image diffusion model, demonstrate the effectiveness of our analysis method.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Autonomous Robotic Ultrasound System for Liver Follow-up Diagnosis: Pilot Phantom Study
Authors:
Tianpeng Zhang,
Sekeun Kim,
Jerome Charton,
Haitong Ma,
Kyungsang Kim,
Na Li,
Quanzheng Li
Abstract:
The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate mapping between CT image and robot, and (iii) ta…
▽ More
The paper introduces a novel autonomous robot ultrasound (US) system targeting liver follow-up scans for outpatients in local communities. Given a computed tomography (CT) image with specific target regions of interest, the proposed system carries out the autonomous follow-up scan in three steps: (i) initial robot contact to surface, (ii) coordinate mapping between CT image and robot, and (iii) target US scan. Utilizing 3D US-CT registration and deep learning-based segmentation networks, we can achieve precise imaging of 3D hepatic veins, facilitating accurate coordinate mapping between CT and the robot. This enables the automatic localization of follow-up targets within the CT image, allowing the robot to navigate precisely to the target's surface. Evaluation of the ultrasound phantom confirms the quality of the US-CT registration and shows the robot reliably locates the targets in repeated trials. The proposed framework holds the potential to significantly reduce time and costs for healthcare providers, clinicians, and follow-up patients, thereby addressing the increasing healthcare burden associated with chronic disease in local communities.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
A Survey on Personalized Content Synthesis with Diffusion Models
Authors:
Xulu Zhang,
Xiao-Yong Wei,
Wengyu Zhang,
Jinlin Wu,
Zhaoxiang Zhang,
Zhen Lei,
Qing Li
Abstract:
Recent advancements in generative models have significantly impacted content creation, leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user-provided examples, PCS aims to customize the subject of interest to specific user-defined prompts. Over the past two years, more than 150 methods have been proposed. However, existing surveys mainly focus on text-to-image…
▽ More
Recent advancements in generative models have significantly impacted content creation, leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user-provided examples, PCS aims to customize the subject of interest to specific user-defined prompts. Over the past two years, more than 150 methods have been proposed. However, existing surveys mainly focus on text-to-image generation, with few providing up-to-date summaries on PCS. This paper offers a comprehensive survey of PCS, with a particular focus on the diffusion models. Specifically, we introduce the generic frameworks of PCS research, which can be broadly classified into optimization-based and learning-based approaches. We further categorize and analyze these methodologies, discussing their strengths, limitations, and key techniques. Additionally, we delve into specialized tasks within the field, such as personalized object generation, face synthesis, and style personalization, highlighting their unique challenges and innovations. Despite encouraging progress, we also present an analysis of the challenges such as overfitting and the trade-off between subject fidelity and text alignment. Through this detailed overview and analysis, we propose future directions to advance the development of PCS.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Differentially Private Post-Processing for Fair Regression
Authors:
Ruicheng Xian,
Qiaobo Li,
Gautam Kamath,
Han Zhao
Abstract:
This paper describes a differentially private post-processing algorithm for learning fair regressors satisfying statistical parity, addressing privacy concerns of machine learning models trained on sensitive data, as well as fairness concerns of their potential to propagate historical biases. Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outp…
▽ More
This paper describes a differentially private post-processing algorithm for learning fair regressors satisfying statistical parity, addressing privacy concerns of machine learning models trained on sensitive data, as well as fairness concerns of their potential to propagate historical biases. Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outputs. It consists of three steps: first, the output distributions are estimated privately via histogram density estimation and the Laplace mechanism, then their Wasserstein barycenter is computed, and the optimal transports to the barycenter are used for post-processing to satisfy fairness. We analyze the sample complexity of our algorithm and provide fairness guarantee, revealing a trade-off between the statistical bias and variance induced from the choice of the number of bins in the histogram, in which using less bins always favors fairness at the expense of error.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
ReinWiFi: A Reinforcement-Learning-Based Framework for the Application-Layer QoS Optimization of WiFi Networks
Authors:
Qianren Li,
Bojie Lv,
Yuncong Hong,
Rui Wang
Abstract:
In this paper, a reinforcement-learning-based scheduling framework is proposed and implemented to optimize the application-layer quality-of-service (QoS) of a practical wireless local area network (WLAN) suffering from unknown interference. Particularly, application-layer tasks of file delivery and delay-sensitive communication, e.g., screen projection, in a WLAN with enhanced distributed channel…
▽ More
In this paper, a reinforcement-learning-based scheduling framework is proposed and implemented to optimize the application-layer quality-of-service (QoS) of a practical wireless local area network (WLAN) suffering from unknown interference. Particularly, application-layer tasks of file delivery and delay-sensitive communication, e.g., screen projection, in a WLAN with enhanced distributed channel access (EDCA) mechanism, are jointly scheduled by adjusting the contention window sizes and application-layer throughput limitation, such that their QoS, including the throughput of file delivery and the round trip time of the delay-sensitive communication, can be optimized. Due to the unknown interference and vendor-dependent implementation of the network interface card, the relation between the scheduling policy and the system QoS is unknown. Hence, a reinforcement learning method is proposed, in which a novel Q-network is trained to map from the historical scheduling parameters and QoS observations to the current scheduling action. It is demonstrated on a testbed that the proposed framework can achieve a significantly better QoS than the conventional EDCA mechanism.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
StyleSeg V2: Towards Robust One-shot Segmentation of Brain Tissue via Optimization-free Registration Error Perception
Authors:
Zhiwei Wang,
Xiaoyu Zeng,
Chongwei Wu,
Jinxin lv,
Xu Zhang,
Wei Fang,
Qiang Li
Abstract:
One-shot segmentation of brain tissue requires training registration-segmentation (reg-seg) dual-model iteratively, where reg-model aims to provide pseudo masks of unlabeled images for seg-model by warping a carefully-labeled atlas. However, the imperfect reg-model induces image-mask misalignment, poisoning the seg-model subsequently. Recent StyleSeg bypasses this bottleneck by replacing the unlab…
▽ More
One-shot segmentation of brain tissue requires training registration-segmentation (reg-seg) dual-model iteratively, where reg-model aims to provide pseudo masks of unlabeled images for seg-model by warping a carefully-labeled atlas. However, the imperfect reg-model induces image-mask misalignment, poisoning the seg-model subsequently. Recent StyleSeg bypasses this bottleneck by replacing the unlabeled images with their warped copies of atlas, but needs to borrow the diverse image patterns via style transformation. Here, we present StyleSeg V2, inherited from StyleSeg but granted the ability of perceiving the registration errors. The motivation is that good registration behaves in a mirrored fashion for mirrored images. Therefore, almost at no cost, StyleSeg V2 can have reg-model itself "speak out" incorrectly-aligned regions by simply mirroring (symmetrically flipping the brain) its input, and the registration errors are symmetric inconsistencies between the outputs of original and mirrored inputs. Consequently, StyleSeg V2 allows the seg-model to make use of correctly-aligned regions of unlabeled images and also enhances the fidelity of style-transformed warped atlas image by weighting the local transformation strength according to registration errors. The experimental results on three public datasets demonstrate that our proposed StyleSeg V2 outperforms other state-of-the-arts by considerable margins, and exceeds StyleSeg by increasing the average Dice by at least 2.4%.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation
Authors:
Kaize Shi,
Xueyao Sun,
Qing Li,
Guandong Xu
Abstract:
Large Language Models (LLMs) have made significant strides in information acquisition. However, their overreliance on potentially flawed parametric knowledge leads to hallucinations and inaccuracies, particularly when handling long-tail, domain-specific queries. Retrieval Augmented Generation (RAG) addresses this limitation by incorporating external, non-parametric knowledge. Nevertheless, the ret…
▽ More
Large Language Models (LLMs) have made significant strides in information acquisition. However, their overreliance on potentially flawed parametric knowledge leads to hallucinations and inaccuracies, particularly when handling long-tail, domain-specific queries. Retrieval Augmented Generation (RAG) addresses this limitation by incorporating external, non-parametric knowledge. Nevertheless, the retrieved long-context documents often contain noisy, irrelevant information alongside vital knowledge, negatively diluting LLMs' attention. Inspired by the supportive role of essential concepts in individuals' reading comprehension, we propose a novel concept-based RAG framework with the Abstract Meaning Representation (AMR)-based concept distillation algorithm. The proposed algorithm compresses the cluttered raw retrieved documents into a compact set of crucial concepts distilled from the informative nodes of AMR by referring to reliable linguistic features. The concepts explicitly constrain LLMs to focus solely on vital information in the inference process. We conduct extensive experiments on open-domain question-answering datasets to empirically evaluate the proposed method's effectiveness. The results indicate that the concept-based RAG framework outperforms other baseline methods, particularly as the number of supporting documents increases, while also exhibiting robustness across various backbone LLMs. This emphasizes the distilled concepts are informative for augmenting the RAG process by filtering out interference information. To the best of our knowledge, this is the first work introducing AMR to enhance the RAG, presenting a potential solution to augment inference performance with semantic-based context compression.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Accelerating Legacy Numerical Solvers by Non-intrusive Gradient-based Meta-solving
Authors:
Sohei Arisaka,
Qianxiao Li
Abstract:
Scientific computing is an essential tool for scientific discovery and engineering design, and its computational cost is always a main concern in practice. To accelerate scientific computing, it is a promising approach to use machine learning (especially meta-learning) techniques for selecting hyperparameters of traditional numerical methods. There have been numerous proposals to this direction, b…
▽ More
Scientific computing is an essential tool for scientific discovery and engineering design, and its computational cost is always a main concern in practice. To accelerate scientific computing, it is a promising approach to use machine learning (especially meta-learning) techniques for selecting hyperparameters of traditional numerical methods. There have been numerous proposals to this direction, but many of them require automatic-differentiable numerical methods. However, in reality, many practical applications still depend on well-established but non-automatic-differentiable legacy codes, which prevents practitioners from applying the state-of-the-art research to their own problems. To resolve this problem, we propose a non-intrusive methodology with a novel gradient estimation technique to combine machine learning and legacy numerical codes without any modification. We theoretically and numerically show the advantage of the proposed method over other baselines and present applications of accelerating established non-automatic-differentiable numerical solvers implemented in PETSc, a widely used open-source numerical software library.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis
Authors:
Haofei Song,
Xintian Mao,
Jing Yu,
Qingli Li,
Yan Wang
Abstract:
Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution fr…
▽ More
Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution from other views. Based on this observation, we propose an Inter-Intra-slice Interpolation Network (I$^3$Net), which fully explores information from high in-plane resolution and compensates for low through-plane resolution. The through-plane branch supplements the limited information contained in low through-plane resolution from high in-plane resolution and enables continual and diverse feature learning. In-plane branch transforms features to the frequency domain and enforces an equal learning opportunity for all frequency bands in a global context learning paradigm. We further propose a cross-view block to take advantage of the information from all three views online. Extensive experiments on two public datasets demonstrate the effectiveness of I$^3$Net, and noticeably outperforms state-of-the-art super-resolution, video frame interpolation and slice interpolation methods by a large margin. We achieve 43.90dB in PSNR, with at least 1.14dB improvement under the upscale factor of $\times$2 on MSD dataset with faster inference. Code is available at https://github.com/DeepMed-Lab-ECNU/Medical-Image-Reconstruction.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
From Generalization Analysis to Optimization Designs for State Space Models
Authors:
Fusheng Liu,
Qianxiao Li
Abstract:
A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an i…
▽ More
A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
SlotGAT: Slot-based Message Passing for Heterogeneous Graph Neural Network
Authors:
Ziang Zhou,
Jieming Shi,
Renchi Yang,
Yuanhang Zou,
Qing Li
Abstract:
Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though…
▽ More
Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node $v$ are forced to be transformed to the feature space of $v$ for aggregation, though the neighbors are in different types. That is, the semantics in different node types are entangled together into node $v$'s representation. To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. Moreover, in a slot-based message passing layer, we design an attention mechanism for effective slot-wise message aggregation. Further, we develop a slot attention technique after the last layer of SlotGAT, to learn the importance of different slots in downstream tasks. Our analysis indicates that the slots in SlotGAT can preserve different semantics in various feature spaces. The superiority of SlotGAT is evaluated against 13 baselines on 6 datasets for node classification and link prediction. Our code is at https://github.com/scottjiao/SlotGAT_ICML23/.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Ergodic Spectral Efficiency Analysis of Intelligent Omni-Surface Aided Systems Suffering From Imperfect CSI and Hardware Impairments
Authors:
Qingchao Li,
Mohammed El-Hajjar,
Lajos Hanzo
Abstract:
In contrast to the conventional reconfigurable intelligent surfaces (RIS), intelligent omni-surfaces (IOS) are capable of full-space coverage of smart radio environments by simultaneously transmitting and reflecting the incident signals. In this paper, we investigate the ergodic spectral efficiency of IOS-aided systems for transmission over random channel links, while considering both realistic im…
▽ More
In contrast to the conventional reconfigurable intelligent surfaces (RIS), intelligent omni-surfaces (IOS) are capable of full-space coverage of smart radio environments by simultaneously transmitting and reflecting the incident signals. In this paper, we investigate the ergodic spectral efficiency of IOS-aided systems for transmission over random channel links, while considering both realistic imperfect channel state information (CSI) and transceiver hardware impairments (HWIs). Firstly, we formulate the linear minimum mean square error estimator of the equivalent channel spanning from the user equipments (UEs) to the access point (AP), where the transceiver HWIs are also considered. Then, we apply a two-timescale protocol for designing the beamformer of the IOS-aided system. Specifically, for the active AP beamformer, the minimum mean square error combining method is employed, which relies on the estimated equivalent channels, on the statistical information of the channel estimation error, on the inter-user interference as well as on the HWIs at the AP and UEs. By contrast, the passive IOS beamformer is designed based on the statistical CSI for maximizing the upper bound of the ergodic spectral efficiency. The theoretical analysis and simulation results show that the transceiver HWIs have a significant effect on the ergodic spectral efficiency, especially in the high transmit power region. Furthermore, we show that the HWIs at the AP can be effectively compensated by deploying more AP antennas.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Performance Analysis of Reconfigurable Holographic Surfaces in the Near-Field Scenario of Cell-Free Networks Under Hardware Impairments
Authors:
Qingchao Li,
Mohammed El-Hajjar,
Yanshi Sun,
Lajos Hanzo
Abstract:
We propose a hybrid beamforming architecture for near-field reconfigurable holographic surfaces (RHS) harnessed in cell-free networks. Specifically, the holographic beamformer of each base station (BS) is designed for maximizing the channel gain based on the local channel state information (CSI). By contrast, the digital beamformer at the central processing unit is designed based on the minimum me…
▽ More
We propose a hybrid beamforming architecture for near-field reconfigurable holographic surfaces (RHS) harnessed in cell-free networks. Specifically, the holographic beamformer of each base station (BS) is designed for maximizing the channel gain based on the local channel state information (CSI). By contrast, the digital beamformer at the central processing unit is designed based on the minimum mean squared error criterion. Furthermore, the near-field spectral efficiency of the RHS in cell-free networks is derived theoretically by harnessing the popular stochastic geometry approach. We consider both the phase shift error (PSE) at the RHS elements and the hardware impairment (HWI) at the radio frequency (RF) chains of the transceivers. Furthermore, we theoretically derive the asymptotic capacity bound, when considering an infinite physical size for the RHS in the near-field channel model. The theoretical analysis and simulation results show that the PSE at the RHS elements and the HWI at the RF chains of transceivers limit the spectral efficiency in the high signal-to-noise ratio region. Moreover, we show that the PSE at the RHS elements and the HWI at the RF chains of BSs can be compensated by increasing the number of BSs. Finally, we also demonstrate that the ergodic spectral efficiency based on the near-field channel model is higher than that based on the far-field channel model assumption.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Energy-Efficient Reconfigurable Holographic Surfaces Operating in the Presence of Realistic Hardware Impairments
Authors:
Qingchao Li,
Mohammed El-Hajjar,
Yanshi Sun,
Ibrahim Hemadeh,
Arman Shojaeifard,
Lajos Hanzo
Abstract:
Reconfigurable holographic surfaces (RHSs) constitute a promising technique of supporting energy-efficient communications. In this paper, we formulate the energy efficiency maximization problem of the switch-controlled RHS-aided beamforming architecture by alternately optimizing the holographic beamformer at the RHS, the digital beamformer, the total transmit power and the power sharing ratio of e…
▽ More
Reconfigurable holographic surfaces (RHSs) constitute a promising technique of supporting energy-efficient communications. In this paper, we formulate the energy efficiency maximization problem of the switch-controlled RHS-aided beamforming architecture by alternately optimizing the holographic beamformer at the RHS, the digital beamformer, the total transmit power and the power sharing ratio of each user. Specifically, to deal with this challenging non-convex optimization problem, we decouple it into three sub-problems. Firstly, the coefficients of RHS elements responsible for the holographic beamformer are optimized to maximize the sum of the eigen-channel gains of all users by our proposed low-complexity eigen-decomposition (ED) method. Then, the digital beamformer is designed by the singular value decomposition (SVD) method to support multi-user information transfer. Finally, the total transmit power and the power sharing ratio are alternately optimized, while considering the effect of transceiver hardware impairments (HWI). We theoretically derive the spectral efficiency and energy efficiency performance upper bound for the RHS-based beamforming architectures in the presence of HWIs. Our simulation results show that the switch-controlled RHS-aided beamforming architecture achieves higher energy efficiency than the conventional fully digital beamformer and the hybrid beamformer based on phase shift arrays (PSA). Moreover, considering the effect of HWI in the beamforming design can bring about further energy efficiency enhancements.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Achievable Rate Analysis of Intelligent Omni-Surface Assisted NOMA Holographic MIMO Systems
Authors:
Qingchao Li,
Mohammed El-Hajjar,
Yanshi Sun,
Ibrahim Hemadeh,
Yingming Tsai,
Arman Shojaeifard,
Lajos Hanzo
Abstract:
An intelligent omni-surface (IOS) assisted holographic multiple-input and multiple-output architecture is conceived for $360^\circ$ full-space coverage at a low energy consumption. The theoretical ergodic rate lower bound of our non-orthogonal multiple access (NOMA) scheme is derived based on the moment matching approximation method, while considering the signal distortion at transceivers imposed…
▽ More
An intelligent omni-surface (IOS) assisted holographic multiple-input and multiple-output architecture is conceived for $360^\circ$ full-space coverage at a low energy consumption. The theoretical ergodic rate lower bound of our non-orthogonal multiple access (NOMA) scheme is derived based on the moment matching approximation method, while considering the signal distortion at transceivers imposed by hardware impairments (HWIs). Furthermore, the asymptotically ergodic rate lower bound is derived both for an infinite number of IOS elements and for continuous aperture surfaces. Both the theoretical analysis and the simulation results show that the achievable rate of the NOMA scheme is higher than that of its orthogonal multiple access counterpart. Furthermore, owing to the HWIs at the transceivers, the achievable rate saturates at high signal-to-noise ratio region, instead of reaching its theoretical maximum.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
JNI Global References Are Still Vulnerable: Attacks and Defenses
Authors:
Yi He,
Yuan Zhou,
Yacong Gu,
Purui Su,
Qi Li,
Yajin Zhou,
Yong Jiang
Abstract:
System services and resources in Android are accessed through IPC based mechanisms. Previous research has demonstrated that they are vulnerable to the denial-of-service attack (DoS attack). For instance, the JNI global reference (JGR), which is widely used by system services, can be exhausted to cause the system reboot (hence the name JGRE attack). Even though the Android team tries to fix the pro…
▽ More
System services and resources in Android are accessed through IPC based mechanisms. Previous research has demonstrated that they are vulnerable to the denial-of-service attack (DoS attack). For instance, the JNI global reference (JGR), which is widely used by system services, can be exhausted to cause the system reboot (hence the name JGRE attack). Even though the Android team tries to fix the problem by enforcing security checks, we find that it is still possible to construct a JGR exhaustion DoS attack in the latest Android system.
In this paper, we propose a new JGR exhaustion DoS attack, which is effective in different Android versions, including the latest one (i.e., Android 10). Specifically, we developed JGREAnalyzer, a tool that can systematically detect JGR vulnerable services APIs via a call graph analysis and a forwarding reachability analysis. We applied this tool to different Android versions and found multiple vulnerabilities. In particular, among 148 system services in Android 10, 12 of them have 21 vulnerabilities. Among them, 9 can be successfully exploited without any permissions. We further analyze the root cause of the vulnerabilities and propose a new defense to mitigate the JGRE attack by restricting resource consumption via global reference counting.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
STT: Stateful Tracking with Transformers for Autonomous Driving
Authors:
Longlong Jing,
Ruichi Yu,
Xu Chen,
Zhengli Zhao,
Shiwei Sheng,
Colin Graber,
Qi Chen,
Qinru Li,
Shangxuan Wu,
Han Deng,
Sangjin Lee,
Chris Sweeney,
Qiurui He,
Wei-Chih Hung,
Tong He,
Xingyi Zhou,
Farshid Moussavi,
Zijian Guo,
Yin Zhou,
Mingxing Tan,
Weilong Yang,
Congcong Li
Abstract:
Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying c…
▽ More
Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles
Authors:
Mingi Jeong,
Arihant Chadda,
Ziang Ren,
Luyang Zhao,
Haowen Liu,
Monika Roznere,
Aiwei Zhang,
Yitao Jiang,
Sabriel Achong,
Samuel Lensgraf,
Alberto Quattrini Li
Abstract:
This paper introduces the first publicly accessible multi-modal perception dataset for autonomous maritime navigation, focusing on in-water obstacles within the aquatic environment to enhance situational awareness for Autonomous Surface Vehicles (ASVs). This dataset, consisting of diverse objects encountered under varying environmental conditions, aims to bridge the research gap in marine robotics…
▽ More
This paper introduces the first publicly accessible multi-modal perception dataset for autonomous maritime navigation, focusing on in-water obstacles within the aquatic environment to enhance situational awareness for Autonomous Surface Vehicles (ASVs). This dataset, consisting of diverse objects encountered under varying environmental conditions, aims to bridge the research gap in marine robotics by providing a multi-modal, annotated, and ego-centric perception dataset, for object detection and classification. We also show the applicability of the proposed dataset's framework using deep learning-based open-source perception algorithms that have shown success. We expect that our dataset will contribute to development of the marine autonomy pipeline and marine (field) robotics. Please note this is a work-in-progress paper about our on-going research that we plan to release in full via future publication.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
A Survey of Deep Learning Library Testing Methods
Authors:
Xiaoyu Zhang,
Weipeng Jiang,
Chao Shen,
Qi Li,
Qian Wang,
Chenhao Lin,
Xiaohong Guan
Abstract:
In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Study…
▽ More
In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Studying the characteristics of DL libraries, their associated bugs, and the corresponding testing methods is crucial for enhancing the security of DL systems and advancing the widespread application of DL technology. This paper provides an overview of the testing research related to various DL libraries, discusses the strengths and weaknesses of existing methods, and provides guidance and reference for the application of the DL library. This paper first introduces the workflow of DL underlying libraries and the characteristics of three kinds of DL libraries involved, namely DL framework, DL compiler, and DL hardware library. It then provides definitions for DL underlying library bugs and testing. Additionally, this paper summarizes the existing testing methods and tools tailored to these DL libraries separately and analyzes their effectiveness and limitations. It also discusses the existing challenges of DL library testing and outlines potential directions for future research.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Learning Visuotactile Skills with Two Multifingered Hands
Authors:
Toru Lin,
Yu Zhang,
Qiyang Li,
Haozhi Qi,
Brent Yi,
Sergey Levine,
Jitendra Malik
Abstract:
Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hard…
▽ More
Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. To tackle the first challenge, we develop HATO, a low-cost hands-arms teleoperation system that leverages off-the-shelf electronics, complemented with a software suite that enables efficient data collection; the comprehensive software suite also supports multimodal data processing, scalable policy learning, and smooth policy deployment. To tackle the latter challenge, we introduce a novel hardware adaptation by repurposing two prosthetic hands equipped with touch sensors for research. Using visuotactile data collected from our system, we learn skills to complete long-horizon, high-precision tasks which are difficult to achieve without multifingered dexterity and touch feedback. Furthermore, we empirically investigate the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data. Videos, code, and datasets can be found at https://toruowo.github.io/hato/ .
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images
Authors:
Zhiwei Wang,
Ying Zhou,
Shiquan He,
Ting Li,
Fan Huang,
Qiang Ding,
Xinxia Feng,
Mei Liu,
Qiang Li
Abstract:
Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts onl…
▽ More
Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by reshaping the photometric constraint into a cycle form. Instead of only warping the source image, MonoPCC constructs a closed loop consisting of two opposite forward-backward warping paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image's phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward warping. The comprehensive and extensive experimental results on four endoscopic datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by at least 7.27%, 9.38%, 9.90% and 3.17%, respectively.
△ Less
Submitted 7 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation
Authors:
Zhaoyang Chu,
Yao Wan,
Qian Li,
Yang Wu,
Hongyu Zhang,
Yulei Sui,
Guandong Xu,
Hai Jin
Abstract:
Vulnerability detection is crucial for ensuring the security and reliability of software systems. Recently, Graph Neural Networks (GNNs) have emerged as a prominent code embedding approach for vulnerability detection, owing to their ability to capture the underlying semantic structure of source code. However, GNNs face significant challenges in explainability due to their inherently black-box natu…
▽ More
Vulnerability detection is crucial for ensuring the security and reliability of software systems. Recently, Graph Neural Networks (GNNs) have emerged as a prominent code embedding approach for vulnerability detection, owing to their ability to capture the underlying semantic structure of source code. However, GNNs face significant challenges in explainability due to their inherently black-box nature. To this end, several factual reasoning-based explainers have been proposed. These explainers provide explanations for the predictions made by GNNs by analyzing the key features that contribute to the outcomes. We argue that these factual reasoning-based explanations cannot answer critical what-if questions: What would happen to the GNN's decision if we were to alter the code graph into alternative structures? Inspired by advancements of counterfactual reasoning in artificial intelligence, we propose CFExplainer, a novel counterfactual explainer for GNN-based vulnerability detection. Unlike factual reasoning-based explainers, CFExplainer seeks the minimal perturbation to the input code graph that leads to a change in the prediction, thereby addressing the what-if questions for vulnerability detection. We term this perturbation a counterfactual explanation, which can pinpoint the root causes of the detected vulnerability and furnish valuable insights for developers to undertake appropriate actions for fixing the vulnerability. Extensive experiments on four GNN-based vulnerability detection models demonstrate the effectiveness of CFExplainer over existing state-of-the-art factual reasoning-based explainers.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Graph Machine Learning in the Era of Large Language Models (LLMs)
Authors:
Wenqi Fan,
Shijie Wang,
Jiani Huang,
Zhikai Chen,
Yu Song,
Wenzhuo Tang,
Haitao Mao,
Hui Liu,
Xiaorui Liu,
Dawei Yin,
Qing Li
Abstract:
Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented ca…
▽ More
Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Learning H-Infinity Locomotion Control
Authors:
Junfeng Long,
Wenye Yu,
Quanyi Li,
Zirui Wang,
Dahua Lin,
Jiangmiao Pang
Abstract:
Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances. However, recent learning-based policies only use basic domain randomization to improve the robustness of learned policies, which cannot guarantee that the robot has adequate disturbance resistance capabilities. In this paper, we propose to mod…
▽ More
Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances. However, recent learning-based policies only use basic domain randomization to improve the robustness of learned policies, which cannot guarantee that the robot has adequate disturbance resistance capabilities. In this paper, we propose to model the learning process as an adversarial interaction between the actor and a newly introduced disturber and ensure their optimization with $H_{\infty}$ constraint. In contrast to the actor that maximizes the discounted overall reward, the disturber is responsible for generating effective external forces and is optimized by maximizing the error between the task reward and its oracle, i.e., "cost" in each iteration. To keep joint optimization between the actor and the disturber stable, our $H_{\infty}$ constraint mandates the bound of ratio between the cost to the intensity of the external forces. Through reciprocal interaction throughout the training phase, the actor can acquire the capability to navigate increasingly complex physical disturbances. We verify the robustness of our approach on quadrupedal locomotion tasks with Unitree Aliengo robot, and also a more challenging task with Unitree A1 robot, where the quadruped is expected to perform locomotion merely on its hind legs as if it is a bipedal robot. The simulated quantitative results show improvement against baselines, demonstrating the effectiveness of the method and each design choice. On the other hand, real-robot experiments qualitatively exhibit how robust the policy is when interfering with various disturbances on various terrains, including stairs, high platforms, slopes, and slippery terrains. All code, checkpoints, and real-world deployment guidance will be made public.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding
Authors:
Guibiao Liao,
Jiankun Li,
Zhenyu Bao,
Xiaoqing Ye,
Jingdong Wang,
Qing Li,
Kanglin Liu
Abstract:
The recent 3D Gaussian Splatting (GS) exhibits high-quality and real-time synthesis of novel views in 3D scenes. Currently, it primarily focuses on geometry and appearance modeling, while lacking the semantic understanding of scenes. To bridge this gap, we present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting to efficiently comprehe…
▽ More
The recent 3D Gaussian Splatting (GS) exhibits high-quality and real-time synthesis of novel views in 3D scenes. Currently, it primarily focuses on geometry and appearance modeling, while lacking the semantic understanding of scenes. To bridge this gap, we present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting to efficiently comprehend 3D environments without annotated semantic data. In specific, rather than straightforwardly learning and rendering high-dimensional semantic features of 3D Gaussians, which significantly diminishes the efficiency, we propose a Semantic Attribute Compactness (SAC) approach. SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians, enabling highly efficient rendering (>100 FPS). Additionally, to address the semantic ambiguity, caused by utilizing view-inconsistent 2D CLIP semantics to supervise Gaussians, we introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model. 3DCS imposes cross-view semantic consistency constraints by leveraging refined, self-predicted pseudo-labels derived from the trained 3D Gaussian model, thereby enhancing precise and view-consistent segmentation results. Extensive experiments demonstrate that our method remarkably outperforms existing state-of-the-art approaches, achieving improvements of 17.29% and 20.81% in mIoU metric on Replica and ScanNet datasets, respectively, while maintaining real-time rendering speed. Furthermore, our approach exhibits superior performance even with sparse input data, verifying the robustness of our method.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets
Authors:
Zeyu Li,
Ruitong Gan,
Chuanchen Luo,
Yuxi Wang,
Jiaheng Liu,
Ziwei Zhu Man Zhang,
Qing Li,
Xucheng Yin,
Zhaoxiang Zhang,
Junran Peng
Abstract:
Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture.…
▽ More
Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.
△ Less
Submitted 24 April, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Global OpenBuildingMap -- Unveiling the Mystery of Global Buildings
Authors:
Xiao Xiang Zhu,
Qingyu Li,
Yilei Shi,
Yuanyuan Wang,
Adam Stewart,
Jonathan Prexl
Abstract:
Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. To…
▽ More
Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. To this end, by using a big data analytics approach and nearly 800,000 satellite images, we generated the highest resolution and highest accuracy building map ever created: the Global OpenBuildingMap (Global OBM). A joint analysis of building maps and solar potentials indicates that rooftop solar energy can supply the global energy consumption need at a reasonable cost. Specifically, if solar panels were placed on the roofs of all buildings, they could supply 1.1-3.3 times -- depending on the efficiency of the solar device -- the global energy consumption in 2020, which is the year with the highest consumption on record. We also identified a clear geospatial correlation between building areas and key socioeconomic variables, which indicates our global building map can serve as an important input to modeling global socioeconomic needs and drivers.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
DeviceRadar: Online IoT Device Fingerprinting in ISPs using Programmable Switches
Authors:
Ruoyu Li,
Qing Li,
Tao Lin,
Qingsong Zou,
Dan Zhao,
Yucheng Huang,
Gareth Tyson,
Guorui Xie,
Yong Jiang
Abstract:
Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of teraby…
▽ More
Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of terabytes per day in ISP networks. This paper proposes DeviceRadar, an online IoT device fingerprinting framework that achieves accurate, real-time processing in ISPs using programmable switches. We innovatively exploit "key packets" as a basis of fingerprints only using packet sizes and directions, which appear periodically while exhibiting differences across different IoT devices. To utilize them, we propose a packet size embedding model to discover the spatial relationships between packets. Meanwhile, we design an algorithm to extract the "key packets" of each device, and propose an approach that jointly considers the spatial relationships and the key packets to produce a neighboring key packet distribution, which can serve as a feature vector for machine learning models for inference. Last, we design a model transformation method and a feature extraction process to deploy the model on a programmable data plane within its constrained arithmetic operations and memory to achieve line-speed processing. Our experiments show that DeviceRadar can achieve state-of-the-art accuracy across 77 IoT devices with 40 Gbps throughput, and requires only 1.3% of the processing time compared to GPU-accelerated approaches.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing
Authors:
Jiabao Ji,
Bairu Hou,
Zhen Zhang,
Guanhua Zhang,
Wenqi Fan,
Qing Li,
Yang Zhang,
Gaowen Liu,
Sijia Liu,
Shiyu Chang
Abstract:
Although large language models (LLMs) have achieved significant success, their vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised considerable concerns. However, the increasing size of these models and their limited access make improving their robustness a challenging task. Among various defense strategies, randomized smoothing has shown great potential for…
▽ More
Although large language models (LLMs) have achieved significant success, their vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised considerable concerns. However, the increasing size of these models and their limited access make improving their robustness a challenging task. Among various defense strategies, randomized smoothing has shown great potential for LLMs, as it does not require full access to the model's parameters or fine-tuning via adversarial training. However, randomized smoothing involves adding noise to the input before model prediction, and the final model's robustness largely depends on the model's performance on these noise corrupted data. Its effectiveness is often limited by the model's sub-optimal performance on noisy data. To address this issue, we propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions. We call this procedure self-denoised smoothing. Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility. Our experimental results indicate that our method surpasses existing methods in both empirical and certified robustness in defending against adversarial attacks for both downstream tasks and human alignments (i.e., jailbreak attacks). Our code is publicly available at https://github.com/UCSB-NLP-Chang/SelfDenoise
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Improving the perception of visual fiducial markers in the field using Adaptive Active Exposure Control
Authors:
Ziang Ren,
Samuel Lensgraf,
Alberto Quattrini Li
Abstract:
Accurate localization is fundamental for autonomous underwater vehicles (AUVs) to carry out precise tasks, such as manipulation and construction. Vision-based solutions using fiducial marker are promising, but extremely challenging underwater because of harsh lighting condition underwater. This paper introduces a gradient-based active camera exposure control method to tackle sharp lighting variati…
▽ More
Accurate localization is fundamental for autonomous underwater vehicles (AUVs) to carry out precise tasks, such as manipulation and construction. Vision-based solutions using fiducial marker are promising, but extremely challenging underwater because of harsh lighting condition underwater. This paper introduces a gradient-based active camera exposure control method to tackle sharp lighting variations during image acquisition, which can establish better foundation for subsequent image enhancement procedures. Considering a typical scenario for underwater operations where visual tags are used, we proposed several experiments comparing our method with other state-of-the-art exposure control method including Active Exposure Control (AEC) and Gradient-based Exposure Control (GEC). Results show a significant improvement in the accuracy of robot localization. This method is an important component that can be used in visual-based state estimation pipeline to improve the overall localization accuracy.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Variational Multi-Modal Hypergraph Attention Network for Multi-Modal Relation Extraction
Authors:
Qian Li,
Cheng Ji,
Shu Guo,
Yong Zhao,
Qianren Mao,
Shangguang Wang,
Yuntao Wei,
Jianxin Li
Abstract:
Multi-modal relation extraction (MMRE) is a challenging task that aims to identify relations between entities in text leveraging image information. Existing methods are limited by their neglect of the multiple entity pairs in one sentence sharing very similar contextual information (ie, the same text and image), resulting in increased difficulty in the MMRE task. To address this limitation, we pro…
▽ More
Multi-modal relation extraction (MMRE) is a challenging task that aims to identify relations between entities in text leveraging image information. Existing methods are limited by their neglect of the multiple entity pairs in one sentence sharing very similar contextual information (ie, the same text and image), resulting in increased difficulty in the MMRE task. To address this limitation, we propose the Variational Multi-Modal Hypergraph Attention Network (VM-HAN) for multi-modal relation extraction. Specifically, we first construct a multi-modal hypergraph for each sentence with the corresponding image, to establish different high-order intra-/inter-modal correlations for different entity pairs in each sentence. We further design the Variational Hypergraph Attention Networks (V-HAN) to obtain representational diversity among different entity pairs using Gaussian distribution and learn a better hypergraph structure via variational attention. VM-HAN achieves state-of-the-art performance on the multi-modal relation extraction task, outperforming existing methods in terms of accuracy and efficiency.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems
Authors:
Xiaoyan Zhou,
Feiran Liang,
Zhaojie Xie,
Yang Lan,
Wenjia Niu,
Jiqiang Liu,
Haining Wang,
Qiang Li
Abstract:
Package managers such as NPM, Maven, and PyPI play a pivotal role in open-source software (OSS) ecosystems, streamlining the distribution and management of various freely available packages. The fine-grained details within software packages can unveil potential risks within existing OSS ecosystems, offering valuable insights for detecting malicious packages. In this study, we undertake a large-sca…
▽ More
Package managers such as NPM, Maven, and PyPI play a pivotal role in open-source software (OSS) ecosystems, streamlining the distribution and management of various freely available packages. The fine-grained details within software packages can unveil potential risks within existing OSS ecosystems, offering valuable insights for detecting malicious packages. In this study, we undertake a large-scale empirical analysis focusing on fine-grained information (FGI): the metadata, static, and dynamic functions. Specifically, we investigate the FGI usage across a diverse set of 50,000+ legitimate and 1,000+ malicious packages. Based on this diverse data collection, we conducted a comparative analysis between legitimate and malicious packages. Our findings reveal that (1) malicious packages have less metadata content and utilize fewer static and dynamic functions than legitimate ones; (2) malicious packages demonstrate a higher tendency to invoke HTTP/URL functions as opposed to other application services, such as FTP or SMTP; (3) FGI serves as a distinguishable indicator between legitimate and malicious packages; and (4) one dimension in FGI has sufficient distinguishable capability to detect malicious packages, and combining all dimensions in FGI cannot significantly improve overall performance.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment
Authors:
Qinfeng Li,
Zhiqiang Shen,
Zhenghan Qin,
Yangfan Xie,
Xuhong Zhang,
Tianyu Du,
Jianwei Yin
Abstract:
Proprietary large language models (LLMs) have been widely applied in various scenarios. Additionally, deploying LLMs on edge devices is trending for efficiency and privacy reasons. However, edge deployment of proprietary LLMs introduces new security challenges: edge-deployed models are exposed as white-box accessible to users, enabling adversaries to conduct effective model stealing (MS) attacks.…
▽ More
Proprietary large language models (LLMs) have been widely applied in various scenarios. Additionally, deploying LLMs on edge devices is trending for efficiency and privacy reasons. However, edge deployment of proprietary LLMs introduces new security challenges: edge-deployed models are exposed as white-box accessible to users, enabling adversaries to conduct effective model stealing (MS) attacks. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify four critical protection properties that existing methods fail to simultaneously satisfy: (1) maintaining protection after a model is physically copied; (2) authorizing model access at request level; (3) safeguarding runtime reverse engineering; (4) achieving high security with negligible runtime overhead. To address the above issues, we propose TransLinkGuard, a plug-and-play model protection approach against model stealing on edge devices. The core part of TransLinkGuard is a lightweight authorization module residing in a secure environment, e.g., TEE. The authorization module can freshly authorize each request based on its input. Extensive experiments show that TransLinkGuard achieves the same security protection as the black-box security guarantees with negligible overhead.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Clipped SGD Algorithms for Privacy Preserving Performative Prediction: Bias Amplification and Remedies
Authors:
Qiang Li,
Michal Yemini,
Hoi-To Wai
Abstract:
Clipped stochastic gradient descent (SGD) algorithms are among the most popular algorithms for privacy preserving optimization that reduces the leakage of users' identity in model training. This paper studies the convergence properties of these algorithms in a performative prediction setting, where the data distribution may shift due to the deployed prediction model. For example, the latter is cau…
▽ More
Clipped stochastic gradient descent (SGD) algorithms are among the most popular algorithms for privacy preserving optimization that reduces the leakage of users' identity in model training. This paper studies the convergence properties of these algorithms in a performative prediction setting, where the data distribution may shift due to the deployed prediction model. For example, the latter is caused by strategical users during the training of loan policy for banks. Our contributions are two-fold. First, we show that the straightforward implementation of a projected clipped SGD (PCSGD) algorithm may converge to a biased solution compared to the performative stable solution. We quantify the lower and upper bound for the magnitude of the bias and demonstrate a bias amplification phenomenon where the bias grows with the sensitivity of the data distribution. Second, we suggest two remedies to the bias amplification effect. The first one utilizes an optimal step size design for PCSGD that takes the privacy guarantee into account. The second one uses the recently proposed DiceSGD algorithm [Zhang et al., 2024]. We show that the latter can successfully remove the bias and converge to the performative stable solution. Numerical experiments verify our analysis.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Taming Latent Diffusion Model for Neural Radiance Field Inpainting
Authors:
Chieh Hubert Lin,
Changil Kim,
Jia-Bin Huang,
Qinbo Li,
Chih-Yao Ma,
Johannes Kopf,
Ming-Hsuan Yang,
Hung-Yu Tseng
Abstract:
Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the rad…
▽ More
Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
DIDLM:A Comprehensive Multi-Sensor Dataset with Infrared Cameras, Depth Cameras, LiDAR, and 4D Millimeter-Wave Radar in Challenging Scenarios for 3D Mapping
Authors:
WeiSheng Gong,
Chen He,
KaiJie Su,
QingYong Li
Abstract:
This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as…
▽ More
This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as rain, snow, and uneven road surfaces. The dataset also includes interactive robot data at different speeds indoors and outdoors, providing a realistic background environment. Slam comparisons between similar routes are conducted, analyzing the influence of different complex scenes on various sensors. Various SLAM algorithms are employed to process the dataset, revealing performance differences among algorithms in different scenarios. In summary, this dataset addresses the problem of data scarcity in special environments, fostering the development of perception and mapping algorithms for extreme conditions. Leveraging multi-sensor data including infrared, depth cameras, LiDAR, 4D millimeter-wave radar, and robot interactions, the dataset advances intelligent mapping and perception capabilities.Our dataset is available at https://github.com/GongWeiSheng/DIDLM.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
PrintListener: Uncovering the Vulnerability of Fingerprint Authentication via the Finger Friction Sound
Authors:
Man Zhou,
Shuao Su,
Qian Wang,
Qi Li,
Yuting Zhou,
Xiaojing Ma,
Zhengxiong Li
Abstract:
Fingerprint authentication has been extensively employed in contemporary identity verification systems owing to its rapidity and cost-effectiveness. Due to its widespread use, fingerprint leakage may cause sensitive information theft, enormous economic and personnel losses, and even a potential compromise of national security. As a fingerprint that can coincidentally match a specific proportion of…
▽ More
Fingerprint authentication has been extensively employed in contemporary identity verification systems owing to its rapidity and cost-effectiveness. Due to its widespread use, fingerprint leakage may cause sensitive information theft, enormous economic and personnel losses, and even a potential compromise of national security. As a fingerprint that can coincidentally match a specific proportion of the overall fingerprint population, MasterPrint rings the alarm bells for the security of fingerprint authentication. In this paper, we propose a new side-channel attack on the minutiae-based Automatic Fingerprint Identification System (AFIS), called PrintListener, which leverages users' fingertip swiping actions on the screen to extract fingerprint pattern features (the first-level features) and synthesizes a stronger targeted PatternMasterPrint with potential second-level features. The attack scenario of PrintListener is extensive and covert. It only needs to record users' fingertip friction sound and can be launched by leveraging a large number of social media platforms. Extensive experimental results in realworld scenarios show that Printlistener can significantly improve the attack potency of MasterPrint.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Mitigating Receiver Impact on Radio Frequency Fingerprint Identification via Domain Adaptation
Authors:
Liu Yang,
Qiang Li,
Xiaoyang Ren,
Yi Fang,
Shafei Wang
Abstract:
Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in developing state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, whe…
▽ More
Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in developing state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, where the RFFI model is trained and deployed on different receivers. Due to altered receiver characteristics, direct deployment of RFFI model on a new receiver leads to significant performance degradation. To address this issue, we formulate the cross-receiver RFFI as a model adaptation problem, which adapts the trained model to unlabeled signals from a new receiver. We first develop a theoretical generalization error bound for the adaptation model. Motivated by the bound, we propose a novel method to solve the cross-receiver RFFI problem, which includes domain alignment and adaptive pseudo-labeling. The former aims at finding a feature space where both domains exhibit similar distributions, effectively reducing the domain discrepancy. Meanwhile, the latter employs a dynamic pseudo-labeling scheme to implicitly transfer the label information from the labeled receiver to the new receiver. Experimental results indicate that the proposed method can effectively mitigate the receiver impact and improve the cross-receiver RFFI performance.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Prompt-driven Universal Model for View-Agnostic Echocardiography Analysis
Authors:
Sekeun Kim,
Hui Ren,
Peng Guo,
Abder-Rahman Ali,
Patrick Zhang,
Kyungsang Kim,
Xiang Li,
Quanzheng Li
Abstract:
Echocardiography segmentation for cardiac analysis is time-consuming and resource-intensive due to the variability in image quality and the necessity to process scans from various standard views. While current automated segmentation methods in echocardiography show promising performance, they are trained on specific scan views to analyze corresponding data. However, this solution has a limitation…
▽ More
Echocardiography segmentation for cardiac analysis is time-consuming and resource-intensive due to the variability in image quality and the necessity to process scans from various standard views. While current automated segmentation methods in echocardiography show promising performance, they are trained on specific scan views to analyze corresponding data. However, this solution has a limitation as the number of required models increases with the number of standard views. To address this, in this paper, we present a prompt-driven universal method for view-agnostic echocardiography analysis. Considering the domain shift between standard views, we first introduce a method called prompt matching, aimed at learning prompts specific to different views by matching prompts and querying input embeddings using a pre-trained vision model. Then, we utilized a pre-trained medical language model to align textual information with pixel data for accurate segmentation. Extensive experiments on three standard views showed that our approach significantly outperforms the state-of-the-art universal methods and achieves comparable or even better performances over the segmentation model trained and tested on same views.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Automated discovery of symbolic laws governing skill acquisition from naturally occurring data
Authors:
Sannyuya Liu,
Qing Li,
Xiaoxuan Shen,
Jianwen Sun,
Zongkai Yang
Abstract:
Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and al…
▽ More
Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and algorithmic explosion in searching. Initially a deep learning model is employed to determine the learner's cognitive state and assess the feature importance. Subsequently, symbolic regression algorithms are utilized to parse the neural network model into algebraic equations. The experimental results of simulated data demonstrate that the proposed algorithm can accurately restore various preset laws within a certain range of noise, in continues feedback setting. Application of proposed method to Lumosity training data demonstrates superior performance compared to traditional and latest models in terms of fitness. The results indicate the discovery of two new forms of skill acquisition laws, while some previous findings have been reaffirmed.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Investigating the Impact of Quantization on Adversarial Robustness
Authors:
Qun Li,
Yuan Meng,
Chen Tang,
Jiacheng Jiang,
Zhi Wang
Abstract:
Quantization is a promising technique for reducing the bit-width of deep models to improve their runtime performance and storage efficiency, and thus becomes a fundamental step for deployment. In real-world scenarios, quantized models are often faced with adversarial attacks which cause the model to make incorrect inferences by introducing slight perturbations. However, recent studies have paid le…
▽ More
Quantization is a promising technique for reducing the bit-width of deep models to improve their runtime performance and storage efficiency, and thus becomes a fundamental step for deployment. In real-world scenarios, quantized models are often faced with adversarial attacks which cause the model to make incorrect inferences by introducing slight perturbations. However, recent studies have paid less attention to the impact of quantization on the model robustness. More surprisingly, existing studies on this topic even present inconsistent conclusions, which prompted our in-depth investigation. In this paper, we conduct a first-time analysis of the impact of the quantization pipeline components that can incorporate robust optimization under the settings of Post-Training Quantization and Quantization-Aware Training. Through our detailed analysis, we discovered that this inconsistency arises from the use of different pipelines in different studies, specifically regarding whether robust optimization is performed and at which quantization stage it occurs. Our research findings contribute insights into deploying more secure and robust quantized networks, assisting practitioners in reference for scenarios with high-security requirements and limited resources.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
A parameter-free clustering algorithm for missing datasets
Authors:
Qi Li,
Xianjun Zeng,
Shuliang Wang,
Wenhao Zhu,
Shijie Ruan,
Zhimeng Yuan
Abstract:
Missing datasets, in which some objects have missing values in certain dimensions, are prevalent in the Real-world. Existing clustering algorithms for missing datasets first impute the missing values and then perform clustering. However, both the imputation and clustering processes require input parameters. Too many input parameters inevitably increase the difficulty of obtaining accurate clusteri…
▽ More
Missing datasets, in which some objects have missing values in certain dimensions, are prevalent in the Real-world. Existing clustering algorithms for missing datasets first impute the missing values and then perform clustering. However, both the imputation and clustering processes require input parameters. Too many input parameters inevitably increase the difficulty of obtaining accurate clustering results. Although some studies have shown that decision graphs can replace the input parameters of clustering algorithms, current decision graphs require equivalent dimensions among objects and are therefore not suitable for missing datasets. To this end, we propose a Single-Dimensional Clustering algorithm, i.e., SDC. SDC, which removes the imputation process and adapts the decision graph to the missing datasets by splitting dimension and partition intersection fusion, can obtain valid clustering results on the missing datasets without input parameters. Experiments demonstrate that, across three evaluation metrics, SDC outperforms baseline algorithms by at least 13.7%(NMI), 23.8%(ARI), and 8.1%(Purity).
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging
Authors:
Tianshuo Cong,
Delong Ran,
Zesen Liu,
Xinlei He,
Jinyuan Liu,
Yichen Gong,
Qi Li,
Anyu Wang,
Xiaoyun Wang
Abstract:
Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e.g., GPUs) or require the collection of specific training data. Instead, it involves editing different upstream model parameters to absorb their downstream task capabilities. However, uncertified model merging can infringe upon the Intellectual Property (IP) rights of the origin…
▽ More
Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e.g., GPUs) or require the collection of specific training data. Instead, it involves editing different upstream model parameters to absorb their downstream task capabilities. However, uncertified model merging can infringe upon the Intellectual Property (IP) rights of the original upstream models. In this paper, we conduct the first study on the robustness of IP protection methods in model merging scenarios. We investigate two state-of-the-art IP protection techniques: Quantization Watermarking and Instructional Fingerprint, along with various advanced model merging technologies, such as Task Arithmetic, TIES-MERGING, and so on. Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models, whereas model fingerprinting techniques can. Our research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques, thereby promoting the healthy development of the open-source LLM community.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
OSS Malicious Package Analysis in the Wild
Authors:
Xiaoyan Zhou,
Ying Zhang,
Wenjia Niu,
Jiqiang Liu,
Haining Wang,
Qiang Li
Abstract:
The open-source software (OSS) ecosystem suffers from various security threats and risks, and malicious packages play a central role in software supply chain (SSC) attacks. Although malware research has a history of over thirty years, less attention has been paid to OSS malware. Its existing research has three limitations: a lack of high-quality datasets, malware diversity, and attack campaign con…
▽ More
The open-source software (OSS) ecosystem suffers from various security threats and risks, and malicious packages play a central role in software supply chain (SSC) attacks. Although malware research has a history of over thirty years, less attention has been paid to OSS malware. Its existing research has three limitations: a lack of high-quality datasets, malware diversity, and attack campaign context. In this paper, we first build and curate the largest dataset of 23,425 malicious packages from scattered online sources. We then propose a knowledge graph to represent the OSS malware corpus and conduct malicious package analysis in the wild. Our main findings include (1) it is essential to collect malicious packages from various online sources because there is little data overlap between different sources; (2) despite the sheer volume of SSC attack campaigns, many malicious packages are similar, and unknown/sophisticated attack behaviors have yet to emerge or be detected; (3) OSS malicious package has its distinct life cycle, denoted as {changing->release->detection->removal}, and slightly changing the package (different name) is a widespread attack manner; (4) while malicious packages often lack context about how and who released them, security reports disclose the information about corresponding SSC attack campaigns.
△ Less
Submitted 21 April, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning
Authors:
Yu Lei,
Guoshuai Sheng,
Fangfang Li,
Quanxue Gao,
Cheng Deng,
Qin Li
Abstract:
Zero-shot learning(ZSL) aims to recognize new classes without prior exposure to their samples, relying on semantic knowledge from observed classes. However, current attention-based models may overlook the transferability of visual features and the distinctiveness of attribute localization when learning regional features in images. Additionally, they often overlook shared attributes among different…
▽ More
Zero-shot learning(ZSL) aims to recognize new classes without prior exposure to their samples, relying on semantic knowledge from observed classes. However, current attention-based models may overlook the transferability of visual features and the distinctiveness of attribute localization when learning regional features in images. Additionally, they often overlook shared attributes among different objects. Highly discriminative attribute features are crucial for identifying and distinguishing unseen classes. To address these issues, we propose an innovative approach called High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning (HDAFL). HDAFL optimizes visual features by learning attribute features to obtain discriminative visual embeddings. Specifically, HDAFL utilizes multiple convolutional kernels to automatically learn discriminative regions highly correlated with attributes in images, eliminating irrelevant interference in image features. Furthermore, we introduce a Transformer-based attribute discrimination encoder to enhance the discriminative capability among attributes. Simultaneously, the method employs contrastive loss to alleviate dataset biases and enhance the transferability of visual features, facilitating better semantic transfer between seen and unseen classes. Experimental results demonstrate the effectiveness of HDAFL across three widely used datasets.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Optimizing Information Propagation for Blockchain-empowered Mobile AIGC: A Graph Attention Network Approach
Authors:
Jiana Liao,
Jinbo Wen,
Jiawen Kang,
Yang Zhang,
Jianbo Du,
Qihao Li,
Weiting Zhang,
Dong Yang
Abstract:
Artificial Intelligence-Generated Content (AIGC) is a rapidly evolving field that utilizes advanced AI algorithms to generate content. Through integration with mobile edge networks, mobile AIGC networks have gained significant attention, which can provide real-time customized and personalized AIGC services and products. Since blockchains can facilitate decentralized and transparent data management…
▽ More
Artificial Intelligence-Generated Content (AIGC) is a rapidly evolving field that utilizes advanced AI algorithms to generate content. Through integration with mobile edge networks, mobile AIGC networks have gained significant attention, which can provide real-time customized and personalized AIGC services and products. Since blockchains can facilitate decentralized and transparent data management, AIGC products can be securely managed by blockchain to avoid tampering and plagiarization. However, the evolution of blockchain-empowered mobile AIGC is still in its nascent phase, grappling with challenges such as improving information propagation efficiency to enable blockchain-empowered mobile AIGC. In this paper, we design a Graph Attention Network (GAT)-based information propagation optimization framework for blockchain-empowered mobile AIGC. We first innovatively apply age of information as a data-freshness metric to measure information propagation efficiency in public blockchains. Considering that GATs possess the excellent ability to process graph-structured data, we utilize the GAT to obtain the optimal information propagation trajectory. Numerical results demonstrate that the proposed scheme exhibits the most outstanding information propagation efficiency compared with traditional routing mechanisms.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Msmsfnet: a multi-stream and multi-scale fusion net for edge detection
Authors:
Chenguang Liu,
Chisheng Wang,
Feifei Dong,
Xin Su,
Chuanhua Zhu,
Dejin Zhang,
Qingquan Li
Abstract:
Edge detection is a long standing problem in computer vision. Recent deep learning based algorithms achieve state of-the-art performance in publicly available datasets. Despite the efficiency of these algorithms, their performance, however, relies heavily on the pretrained weights of the backbone network on the ImageNet dataset. This limits heavily the design space of deep learning based edge dete…
▽ More
Edge detection is a long standing problem in computer vision. Recent deep learning based algorithms achieve state of-the-art performance in publicly available datasets. Despite the efficiency of these algorithms, their performance, however, relies heavily on the pretrained weights of the backbone network on the ImageNet dataset. This limits heavily the design space of deep learning based edge detectors. Whenever we want to devise a new model, we have to train this new model on the ImageNet dataset first, and then fine tune the model using the edge detection datasets. The comparison would be unfair otherwise. However, it is usually not feasible for many researchers to train a model on the ImageNet dataset due to the limited computation resources. In this work, we study the performance that can be achieved by state-of-the-art deep learning based edge detectors in publicly available datasets when they are trained from scratch, and devise a new network architecture, the multi-stream and multi scale fusion net (msmsfnet), for edge detection. We show in our experiments that by training all models from scratch to ensure the fairness of comparison, out model outperforms state-of-the art deep learning based edge detectors in three publicly available datasets.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
AlphaCrystal-II: Distance matrix based crystal structure prediction using deep learning
Authors:
Yuqi Song,
Rongzhi Dong,
Lai Wei,
Qin Li,
Jianjun Hu
Abstract:
Computational prediction of stable crystal structures has a profound impact on the large-scale discovery of novel functional materials. However, predicting the crystal structure solely from a material's composition or formula is a promising yet challenging task, as traditional ab initio crystal structure prediction (CSP) methods rely on time-consuming global searches and first-principles free ener…
▽ More
Computational prediction of stable crystal structures has a profound impact on the large-scale discovery of novel functional materials. However, predicting the crystal structure solely from a material's composition or formula is a promising yet challenging task, as traditional ab initio crystal structure prediction (CSP) methods rely on time-consuming global searches and first-principles free energy calculations. Inspired by the recent success of deep learning approaches in protein structure prediction, which utilize pairwise amino acid interactions to describe 3D structures, we present AlphaCrystal-II, a novel knowledge-based solution that exploits the abundant inter-atomic interaction patterns found in existing known crystal structures. AlphaCrystal-II predicts the atomic distance matrix of a target crystal material and employs this matrix to reconstruct its 3D crystal structure. By leveraging the wealth of inter-atomic relationships of known crystal structures, our approach demonstrates remarkable effectiveness and reliability in structure prediction through comprehensive experiments. This work highlights the potential of data-driven methods in accelerating the discovery and design of new materials with tailored properties.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Authors:
Derui Zhu,
Dingfan Chen,
Qing Li,
Zongxiong Chen,
Lei Ma,
Jens Grossklags,
Mario Fritz
Abstract:
Despite tremendous advancements in large language models (LLMs) over recent years, a notably urgent challenge for their practical deployment is the phenomenon of hallucination, where the model fabricates facts and produces non-factual statements. In response, we propose PoLLMgraph, a Polygraph for LLMs, as an effective model-based white-box detection and forecasting approach. PoLLMgraph distinctly…
▽ More
Despite tremendous advancements in large language models (LLMs) over recent years, a notably urgent challenge for their practical deployment is the phenomenon of hallucination, where the model fabricates facts and produces non-factual statements. In response, we propose PoLLMgraph, a Polygraph for LLMs, as an effective model-based white-box detection and forecasting approach. PoLLMgraph distinctly differs from the large body of existing research that concentrates on addressing such challenges through black-box evaluations. In particular, we demonstrate that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics during generation via tractable probabilistic models. Experimental results on various open-source LLMs confirm the efficacy of PoLLMgraph, outperforming state-of-the-art methods by a considerable margin, evidenced by over 20% improvement in AUC-ROC on common benchmarking datasets like TruthfulQA. Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Exploiting Sequence Number Leakage: TCP Hijacking in NAT-Enabled Wi-Fi Networks
Authors:
Yuxiang Yang,
Xuewei Feng,
Qi Li,
Kun Sun,
Ziqiang Wang,
Ke Xu
Abstract:
In this paper, we uncover a new side-channel vulnerability in the widely used NAT port preservation strategy and an insufficient reverse path validation strategy of Wi-Fi routers, which allows an off-path attacker to infer if there is one victim client in the same network communicating with another host on the Internet using TCP. After detecting the presence of TCP connections between the victim c…
▽ More
In this paper, we uncover a new side-channel vulnerability in the widely used NAT port preservation strategy and an insufficient reverse path validation strategy of Wi-Fi routers, which allows an off-path attacker to infer if there is one victim client in the same network communicating with another host on the Internet using TCP. After detecting the presence of TCP connections between the victim client and the server, the attacker can evict the original NAT mapping and reconstruct a new mapping at the router by sending fake TCP packets due to the routers' vulnerability of disabling TCP window tracking strategy, which has been faithfully implemented in most of the routers for years. In this way, the attacker can intercept TCP packets from the server and obtain the current sequence and acknowledgment numbers, which in turn allows the attacker to forcibly close the connection, poison the traffic in plain text, or reroute the server's incoming packets to the attacker. We test 67 widely used routers from 30 vendors and discover that 52 of them are affected by this attack. Also, we conduct an extensive measurement study on 93 real-world Wi-Fi networks. The experimental results show that 75 of these evaluated Wi-Fi networks (81%) are fully vulnerable to our attack. Our case study shows that it takes about 17.5, 19.4, and 54.5 seconds on average to terminate an SSH connection, download private files from FTP servers, and inject fake HTTP response packets with success rates of 87.4%, 82.6%, and 76.1%. We responsibly disclose the vulnerability and suggest mitigation strategies to all affected vendors and have received positive feedback, including acknowledgments, CVEs, rewards, and adoption of our suggestions.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.