-
iFADIT: Invertible Face Anonymization via Disentangled Identity Transform
Authors:
Lin Yuan,
Kai Liang,
Xiong Li,
Tao Wu,
Nannan Wang,
Xinbo Gao
Abstract:
Face anonymization aims to conceal the visual identity of a face to safeguard the individual's privacy. Traditional methods like blurring and pixelation can largely remove identifying features, but these techniques significantly degrade image quality and are vulnerable to deep reconstruction attacks. Generative models have emerged as a promising solution for anonymizing faces while preserving a na…
▽ More
Face anonymization aims to conceal the visual identity of a face to safeguard the individual's privacy. Traditional methods like blurring and pixelation can largely remove identifying features, but these techniques significantly degrade image quality and are vulnerable to deep reconstruction attacks. Generative models have emerged as a promising solution for anonymizing faces while preserving a natural appearance. However, many still face limitations in visual quality and often overlook the potential to recover the original face from the anonymized version, which can be valuable in specific contexts such as image forensics. This paper proposes a novel framework named iFADIT, an acronym for Invertible Face Anonymization via Disentangled Identity Transform. The framework features a disentanglement architecture coupled with a secure flow-based model: the former decouples identity information from non-identifying attributes, while the latter transforms the decoupled identity into an anonymized version in an invertible manner controlled by a secret key. The anonymized face can then be reconstructed based on a pre-trained StyleGAN that ensures high image quality and realistic facial details. Recovery of the original face (aka de-anonymization) is possible upon the availability of the matching secret, by inverting the anonymization process based on the same set of model parameters. Furthermore, a dedicated secret-key mechanism along with a dual-phase training strategy is devised to ensure the desired properties of face anonymization. Qualitative and quantitative experiments demonstrate the superiority of the proposed approach in anonymity, reversibility, security, diversity, and interpretability over competing methods.
△ Less
Submitted 16 January, 2025; v1 submitted 8 January, 2025;
originally announced January 2025.
-
The spectral Einstein functional for the Witten Deformation
Authors:
Tong Wu,
Yong Wang
Abstract:
In the paper, given two vector fields and the Witten deformation, we compute the spectral Einstein functional for the Witten deformation on even-dimensional spin manifolds without boundary.
In the paper, given two vector fields and the Witten deformation, we compute the spectral Einstein functional for the Witten deformation on even-dimensional spin manifolds without boundary.
△ Less
Submitted 5 January, 2025;
originally announced January 2025.
-
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
Authors:
Mingjie Pan,
Jiyao Zhang,
Tianshu Wu,
Yinghao Zhao,
Wenlong Gao,
Hao Dong
Abstract:
The development of general robotic systems capable of manipulating in unstructured environments is a significant challenge. While Vision-Language Models(VLM) excel in high-level commonsense reasoning, they lack the fine-grained 3D spatial understanding required for precise manipulation tasks. Fine-tuning VLM on robotic datasets to create Vision-Language-Action Models(VLA) is a potential solution,…
▽ More
The development of general robotic systems capable of manipulating in unstructured environments is a significant challenge. While Vision-Language Models(VLM) excel in high-level commonsense reasoning, they lack the fine-grained 3D spatial understanding required for precise manipulation tasks. Fine-tuning VLM on robotic datasets to create Vision-Language-Action Models(VLA) is a potential solution, but it is hindered by high data collection costs and generalization issues. To address these challenges, we propose a novel object-centric representation that bridges the gap between VLM's high-level reasoning and the low-level precision required for manipulation. Our key insight is that an object's canonical space, defined by its functional affordances, provides a structured and semantically meaningful way to describe interaction primitives, such as points and directions. These primitives act as a bridge, translating VLM's commonsense reasoning into actionable 3D spatial constraints. In this context, we introduce a dual closed-loop, open-vocabulary robotic manipulation system: one loop for high-level planning through primitive resampling, interaction rendering and VLM checking, and another for low-level execution via 6D pose tracking. This design ensures robust, real-time control without requiring VLM fine-tuning. Extensive experiments demonstrate strong zero-shot generalization across diverse robotic manipulation tasks, highlighting the potential of this approach for automating large-scale simulation data generation.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Geometric curvature effect on suppressing the Ion-Temperature-Gradient mode near the magnetic axis
Authors:
Tiannan Wu,
Shaojie Wang
Abstract:
Global gyrokinetic simulation of the ion temperature gradient mode shows that the radial electric field ($E_r$) well upshifts the critical temperature gradient near the magnetic axis, in the weak but not in the strong magnetic shear configuration. The geometric curvature effect significantly influences the $E \times B$ shear and the wave number near the axis, so that the $E_r$ well suppresses the…
▽ More
Global gyrokinetic simulation of the ion temperature gradient mode shows that the radial electric field ($E_r$) well upshifts the critical temperature gradient near the magnetic axis, in the weak but not in the strong magnetic shear configuration. The geometric curvature effect significantly influences the $E \times B$ shear and the wave number near the axis, so that the $E_r$ well suppresses the high-n modes but has little effect on the low-n modes, which are suppressed by the weak magnetic shear effect. This new finding unravels the formation mechanism of the internal transport barrier in the weak central magnetic shear discharges.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine
Authors:
Yishen Liu,
Shengda Luo,
Zishao Zhong,
Tongtong Wu,
Jianguo Zhang,
Peiyao Ou,
Yong Liang,
Liang Liu,
Hudan Pan
Abstract:
Large language models (LLMs) primarily trained on English texts, often face biases and inaccuracies in Chinese contexts. Their limitations are pronounced in fields like Traditional Chinese Medicine (TCM), where cultural and clinical subtleties are vital, further hindered by a lack of domain-specific data, such as rheumatoid arthritis (RA). To address these issues, this paper introduces Hengqin-RA-…
▽ More
Large language models (LLMs) primarily trained on English texts, often face biases and inaccuracies in Chinese contexts. Their limitations are pronounced in fields like Traditional Chinese Medicine (TCM), where cultural and clinical subtleties are vital, further hindered by a lack of domain-specific data, such as rheumatoid arthritis (RA). To address these issues, this paper introduces Hengqin-RA-v1, the first large language model specifically tailored for TCM with a focus on diagnosing and treating RA. We also present HQ-GCM-RA-C1, a comprehensive RA-specific dataset curated from ancient Chinese medical literature, classical texts, and modern clinical studies. This dataset empowers Hengqin-RA-v1 to deliver accurate and culturally informed responses, effectively bridging the gaps left by general-purpose models. Extensive experiments demonstrate that Hengqin-RA-v1 outperforms state-of-the-art models, even surpassing the diagnostic accuracy of TCM practitioners in certain cases.
△ Less
Submitted 27 March, 2025; v1 submitted 5 January, 2025;
originally announced January 2025.
-
Diffusion Model-Based Data Synthesis Aided Federated Semi-Supervised Learning
Authors:
Zhongwei Wang,
Tong Wu,
Zhiyong Chen,
Liang Qian,
Yin Xu,
Meixia Tao
Abstract:
Federated semi-supervised learning (FSSL) is primarily challenged by two factors: the scarcity of labeled data across clients and the non-independent and identically distribution (non-IID) nature of data among clients. In this paper, we propose a novel approach, diffusion model-based data synthesis aided FSSL (DDSA-FSSL), which utilizes a diffusion model (DM) to generate synthetic data, bridging t…
▽ More
Federated semi-supervised learning (FSSL) is primarily challenged by two factors: the scarcity of labeled data across clients and the non-independent and identically distribution (non-IID) nature of data among clients. In this paper, we propose a novel approach, diffusion model-based data synthesis aided FSSL (DDSA-FSSL), which utilizes a diffusion model (DM) to generate synthetic data, bridging the gap between heterogeneous local data distributions and the global data distribution. In DDSA-FSSL, clients address the challenge of the scarcity of labeled data by employing a federated learning-trained classifier to perform pseudo labeling for unlabeled data. The DM is then collaboratively trained using both labeled and precision-optimized pseudo-labeled data, enabling clients to generate synthetic samples for classes that are absent in their labeled datasets. This process allows clients to generate more comprehensive synthetic datasets aligned with the global distribution. Extensive experiments conducted on multiple datasets and varying non-IID distributions demonstrate the effectiveness of DDSA-FSSL, e.g., it improves accuracy from 38.46% to 52.14% on CIFAR-10 datasets with 10% labeled data.
△ Less
Submitted 4 January, 2025;
originally announced January 2025.
-
A self-learning magnetic Hopfield neural network with intrinsic gradient descent adaption
Authors:
Chang Niu,
Huanyu Zhang,
Chuanlong Xu,
Wenjie Hu,
Yunzhuo Wu,
Yu Wu,
Yadi Wang,
Tong Wu,
Yi Zhu,
Yinyan Zhu,
Wenbin Wang,
Yizheng Wu,
Lifeng Yin,
Jiang Xiao,
Weichao Yu,
Hangwen Guo,
Jian Shen
Abstract:
Physical neural networks using physical materials and devices to mimic synapses and neurons offer an energy-efficient way to implement artificial neural networks. Yet, training physical neural networks are difficult and heavily relies on external computing resources. An emerging concept to solve this issue is called physical self-learning that uses intrinsic physical parameters as trainable weight…
▽ More
Physical neural networks using physical materials and devices to mimic synapses and neurons offer an energy-efficient way to implement artificial neural networks. Yet, training physical neural networks are difficult and heavily relies on external computing resources. An emerging concept to solve this issue is called physical self-learning that uses intrinsic physical parameters as trainable weights. Under external inputs (i.e. training data), training is achieved by the natural evolution of physical parameters that intrinsically adapt modern learning rules via autonomous physical process, eliminating the requirements on external computation resources.Here, we demonstrate a real spintronic system that mimics Hopfield neural networks (HNN) and unsupervised learning is intrinsically performed via the evolution of physical process. Using magnetic texture defined conductance matrix as trainable weights, we illustrate that under external voltage inputs, the conductance matrix naturally evolves and adapts Oja's learning algorithm in a gradient descent manner. The self-learning HNN is scalable and can achieve associative memories on patterns with high similarities. The fast spin dynamics and reconfigurability of magnetic textures offer an advantageous platform towards efficient autonomous training directly in materials.
△ Less
Submitted 6 January, 2025; v1 submitted 3 January, 2025;
originally announced January 2025.
-
Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent
Authors:
Tong Wu
Abstract:
Tensors, which give a faithful and effective representation to deliver the intrinsic structure of multi-dimensional data, play a crucial role in an increasing number of signal processing and machine learning problems. However, tensor data are often accompanied by arbitrary signal corruptions, including missing entries and sparse noise. A fundamental challenge is to reliably extract the meaningful…
▽ More
Tensors, which give a faithful and effective representation to deliver the intrinsic structure of multi-dimensional data, play a crucial role in an increasing number of signal processing and machine learning problems. However, tensor data are often accompanied by arbitrary signal corruptions, including missing entries and sparse noise. A fundamental challenge is to reliably extract the meaningful information from corrupted tensor data in a statistically and computationally efficient manner. This paper develops a scaled gradient descent (ScaledGD) algorithm to directly estimate the tensor factors with tailored spectral initializations under the tensor-tensor product (t-product) and tensor singular value decomposition (t-SVD) framework. In theory, ScaledGD achieves linear convergence at a constant rate that is independent of the condition number of the ground truth low-rank tensor for two canonical problems -- tensor robust principal component analysis and tensor completion -- as long as the level of corruptions is not too large and the sample size is sufficiently large, while maintaining the low per-iteration cost of gradient descent. To the best of our knowledge, ScaledGD is the first algorithm that provably has such properties for low-rank tensor estimation with the t-SVD decomposition. Finally, numerical examples are provided to demonstrate the efficacy of ScaledGD in accelerating the convergence rate of ill-conditioned low-rank tensor estimation in these two applications.
△ Less
Submitted 3 January, 2025;
originally announced January 2025.
-
Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems
Authors:
Shunxing Yang,
Junteng Yao,
Jie Tang,
Tuo Wu,
Maged Elkashlan,
Chau Yuen,
Merouane Debbah,
Hyundong Shin,
Matthew Valenti
Abstract:
Fluid antenna systems (FAS) enable dynamic antenna positioning, offering new opportunities to enhance integrated sensing and communication (ISAC) performance. However, existing studies primarily focus on communication enhancement or single-target sensing, leaving multi-target scenarios underexplored. Additionally, the joint optimization of beamforming and antenna positions poses a highly non-conve…
▽ More
Fluid antenna systems (FAS) enable dynamic antenna positioning, offering new opportunities to enhance integrated sensing and communication (ISAC) performance. However, existing studies primarily focus on communication enhancement or single-target sensing, leaving multi-target scenarios underexplored. Additionally, the joint optimization of beamforming and antenna positions poses a highly non-convex problem, with traditional methods becoming impractical as the number of fluid antennas increases. To address these challenges, this letter proposes a block coordinate descent (BCD) framework integrated with a deep reinforcement learning (DRL)-based approach for intelligent antenna positioning. By leveraging the deep deterministic policy gradient (DDPG) algorithm, the proposed framework efficiently balances sensing and communication performance. Simulation results demonstrate the scalability and effectiveness of the proposed approach.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Online Video Understanding: OVBench and VideoChat-Online
Authors:
Zhenpeng Huang,
Xinhao Li,
Jiaqi Li,
Jing Wang,
Xiangyu Zeng,
Cheng Liang,
Tao Wu,
Xi Chen,
Liang Li,
Limin Wang
Abstract:
Multimodal Large Language Models (MLLMs) have significantly progressed in offline video understanding. However, applying these models to real-world scenarios, such as autonomous driving and human-computer interaction, presents unique challenges due to the need for real-time processing of continuous online video streams. To this end, this paper presents systematic efforts from three perspectives: e…
▽ More
Multimodal Large Language Models (MLLMs) have significantly progressed in offline video understanding. However, applying these models to real-world scenarios, such as autonomous driving and human-computer interaction, presents unique challenges due to the need for real-time processing of continuous online video streams. To this end, this paper presents systematic efforts from three perspectives: evaluation benchmark, model architecture, and training strategy. First, we introduce OVBench, a comprehensive question-answering benchmark designed to evaluate models' ability to perceive, memorize, and reason within online video contexts. It features 6 core task types across three temporal contexts-past, current, and future-forming 16 subtasks from diverse datasets. Second, we propose a new Pyramid Memory Bank (PMB) that effectively retains key spatiotemporal information in video streams. Third, we proposed an offline-to-online learning paradigm, designing an interleaved dialogue format for online video data and constructing an instruction-tuning dataset tailored for online video training. This framework led to the development of VideoChat-Online, a robust and efficient model for online video understanding. Despite the lower computational cost and higher efficiency, VideoChat-Online outperforms existing state-of-the-art offline and online models across popular offline video benchmarks and OVBench, demonstrating the effectiveness of our model architecture and training strategy. % Our approach surpasses existing state-of-the-art offline models Qwen2-VL 7B and online models Flash-VStream, by 4.19% and 23.7% on OVBench, respectively.
△ Less
Submitted 17 April, 2025; v1 submitted 31 December, 2024;
originally announced January 2025.
-
Are the Values of LLMs Structurally Aligned with Humans? A Causal Perspective
Authors:
Yipeng Kang,
Junqi Wang,
Yexin Li,
Mengmeng Wang,
Wenming Tu,
Quansen Wang,
Hengli Li,
Tingjun Wu,
Xue Feng,
Fangwei Zhong,
Zilong Zheng
Abstract:
As large language models (LLMs) become increasingly integrated into critical applications, aligning their behavior with human values presents significant challenges. Current methods, such as Reinforcement Learning from Human Feedback (RLHF), typically focus on a limited set of coarse-grained values and are resource-intensive. Moreover, the correlations between these values remain implicit, leading…
▽ More
As large language models (LLMs) become increasingly integrated into critical applications, aligning their behavior with human values presents significant challenges. Current methods, such as Reinforcement Learning from Human Feedback (RLHF), typically focus on a limited set of coarse-grained values and are resource-intensive. Moreover, the correlations between these values remain implicit, leading to unclear explanations for value-steering outcomes. Our work argues that a latent causal value graph underlies the value dimensions of LLMs and that, despite alignment training, this structure remains significantly different from human value systems. We leverage these causal value graphs to guide two lightweight value-steering methods: role-based prompting and sparse autoencoder (SAE) steering, effectively mitigating unexpected side effects. Furthermore, SAE provides a more fine-grained approach to value steering. Experiments on Gemma-2B-IT and Llama3-8B-IT demonstrate the effectiveness and controllability of our methods.
△ Less
Submitted 23 February, 2025; v1 submitted 31 December, 2024;
originally announced January 2025.
-
MSM-BD: Multimodal Social Media Bot Detection Using Heterogeneous Information
Authors:
Tingxuan Wu,
Zhaorui Ma,
Yanjun Cui,
Ziyi Zhou,
Eric Wang
Abstract:
Although social bots can be engineered for constructive applications, their potential for misuse in manipulative schemes and malware distribution cannot be overlooked. This dichotomy underscores the critical need to detect social bots on social media platforms. Advances in artificial intelligence have improved the abilities of social bots, allowing them to generate content that is almost indisting…
▽ More
Although social bots can be engineered for constructive applications, their potential for misuse in manipulative schemes and malware distribution cannot be overlooked. This dichotomy underscores the critical need to detect social bots on social media platforms. Advances in artificial intelligence have improved the abilities of social bots, allowing them to generate content that is almost indistinguishable from human-created content. These advancements require the development of more advanced detection techniques to accurately identify these automated entities. Given the heterogeneous information landscape on social media, spanning images, texts, and user statistical features, we propose MSM-BD, a Multimodal Social Media Bot Detection approach using heterogeneous information. MSM-BD incorporates specialized encoders for heterogeneous information and introduces a cross-modal fusion technology, Cross-Modal Residual Cross-Attention (CMRCA), to enhance detection accuracy. We validate the effectiveness of our model through extensive experiments using the TwiBot-22 dataset.
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
Relation-Aware Equivariant Graph Networks for Epitope-Unknown Antibody Design and Specificity Optimization
Authors:
Lirong Wu,
Haitao Lin,
Yufei Huang,
Zhangyang Gao,
Cheng Tan,
Yunfan Liu,
Tailin Wu,
Stan Z. Li
Abstract:
Antibodies are Y-shaped proteins that protect the host by binding to specific antigens, and their binding is mainly determined by the Complementary Determining Regions (CDRs) in the antibody. Despite the great progress made in CDR design, existing computational methods still encounter several challenges: 1) poor capability of modeling complex CDRs with long sequences due to insufficient contextual…
▽ More
Antibodies are Y-shaped proteins that protect the host by binding to specific antigens, and their binding is mainly determined by the Complementary Determining Regions (CDRs) in the antibody. Despite the great progress made in CDR design, existing computational methods still encounter several challenges: 1) poor capability of modeling complex CDRs with long sequences due to insufficient contextual information; 2) conditioned on pre-given antigenic epitopes and their static interaction with the target antibody; 3) neglect of specificity during antibody optimization leads to non-specific antibodies. In this paper, we take into account a variety of node features, edge features, and edge relations to include more contextual and geometric information. We propose a novel Relation-Aware Antibody Design (RAAD) framework, which dynamically models antigen-antibody interactions for co-designing the sequences and structures of antigen-specific CDRs. Furthermore, we propose a new evaluation metric to better measure antibody specificity and develop a contrasting specificity-enhancing constraint to optimize the specificity of antibodies. Extensive experiments have demonstrated the superior capability of RAAD in terms of antibody modeling, generation, and optimization across different CDR types, sequence lengths, pre-training strategies, and input contexts.
△ Less
Submitted 13 December, 2024;
originally announced January 2025.
-
Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study
Authors:
Yulin Fei,
Yuhui Gao,
Xingyuan Xian,
Xiaojin Zhang,
Tao Wu,
Wei Chen
Abstract:
With the rise of multimodal large language models, accurately extracting and understanding textual information from video content, referred to as video based optical character recognition (Video OCR), has become a crucial capability. This paper introduces a novel benchmark designed to evaluate the video OCR performance of multi-modal models in videos. Comprising 1,028 videos and 2,961 question-ans…
▽ More
With the rise of multimodal large language models, accurately extracting and understanding textual information from video content, referred to as video based optical character recognition (Video OCR), has become a crucial capability. This paper introduces a novel benchmark designed to evaluate the video OCR performance of multi-modal models in videos. Comprising 1,028 videos and 2,961 question-answer pairs, this benchmark proposes several key challenges through 6 distinct subtasks: (1) Recognition of text content itself and its basic visual attributes, (2)Semantic and Spatial Comprehension of OCR objects in videos (3) Dynamic Motion detection and Temporal Localization. We developed this benchmark using a semi-automated approach that integrates the OCR ability of image LLMs with manual refinement, balancing efficiency, cost, and data quality. Our resource aims to help advance research in video LLMs and underscores the need for improving OCR ability for video LLMs. The benchmark will be released on https://github.com/YuHuiGao/FG-Bench.git.
△ Less
Submitted 29 December, 2024;
originally announced December 2024.
-
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
Authors:
Tao Wu,
Yong Zhang,
Xiaodong Cun,
Zhongang Qi,
Junfu Pu,
Huanzhang Dou,
Guangcong Zheng,
Ying Shan,
Xi Li
Abstract:
Zero-shot customized video generation has gained significant attention due to its substantial application potential. Existing methods rely on additional models to extract and inject reference subject features, assuming that the Video Diffusion Model (VDM) alone is insufficient for zero-shot customized video generation. However, these methods often struggle to maintain consistent subject appearance…
▽ More
Zero-shot customized video generation has gained significant attention due to its substantial application potential. Existing methods rely on additional models to extract and inject reference subject features, assuming that the Video Diffusion Model (VDM) alone is insufficient for zero-shot customized video generation. However, these methods often struggle to maintain consistent subject appearance due to suboptimal feature extraction and injection techniques. In this paper, we reveal that VDM inherently possesses the force to extract and inject subject features. Departing from previous heuristic approaches, we introduce a novel framework that leverages VDM's inherent force to enable high-quality zero-shot customized video generation. Specifically, for feature extraction, we directly input reference images into VDM and use its intrinsic feature extraction process, which not only provides fine-grained features but also significantly aligns with VDM's pre-trained knowledge. For feature injection, we devise an innovative bidirectional interaction between subject features and generated content through spatial self-attention within VDM, ensuring that VDM has better subject fidelity while maintaining the diversity of the generated video. Experiments on both customized human and object video generation validate the effectiveness of our framework.
△ Less
Submitted 29 December, 2024; v1 submitted 27 December, 2024;
originally announced December 2024.
-
Computing Approximate Graph Edit Distance via Optimal Transport
Authors:
Qihao Cheng,
Da Yan,
Tianhao Wu,
Zhongyi Huang,
Qin Zhang
Abstract:
Given a graph pair $(G^1, G^2)$, graph edit distance (GED) is defined as the minimum number of edit operations converting $G^1$ to $G^2$. GED is a fundamental operation widely used in many applications, but its exact computation is NP-hard, so the approximation of GED has gained a lot of attention. Data-driven learning-based methods have been found to provide superior results compared to classical…
▽ More
Given a graph pair $(G^1, G^2)$, graph edit distance (GED) is defined as the minimum number of edit operations converting $G^1$ to $G^2$. GED is a fundamental operation widely used in many applications, but its exact computation is NP-hard, so the approximation of GED has gained a lot of attention. Data-driven learning-based methods have been found to provide superior results compared to classical approximate algorithms, but they directly fit the coupling relationship between a pair of vertices from their vertex features. We argue that while pairwise vertex features can capture the coupling cost (discrepancy) of a pair of vertices, the vertex coupling matrix should be derived from the vertex-pair cost matrix through a more well-established method that is aware of the global context of the graph pair, such as optimal transport. In this paper, we propose an ensemble approach that integrates a supervised learning-based method and an unsupervised method, both based on optimal transport. Our learning method, GEDIOT, is based on inverse optimal transport that leverages a learnable Sinkhorn algorithm to generate the coupling matrix. Our unsupervised method, GEDGW, models GED computation as a linear combination of optimal transport and its variant, Gromov-Wasserstein discrepancy, for node and edge operations, respectively, which can be solved efficiently without needing the ground truth. Our ensemble method, GEDHOT, combines GEDIOT and GEDGW to further boost the performance. Extensive experiments demonstrate that our methods significantly outperform the existing methods in terms of the performance of GED computation, edit path generation, and model generalizability.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study
Authors:
Xuefeng Jiang,
Lvhua Wu,
Sheng Sun,
Jia Li,
Jingjing Xue,
Yuwei Wang,
Tingting Wu,
Min Liu
Abstract:
Code vulnerability detection (CVD) is essential for addressing and preventing system security issues, playing a crucial role in ensuring software security. Previous learning-based vulnerability detection methods rely on either fine-tuning medium-size sequence models or training smaller neural networks from scratch. Recent advancements in large pre-trained language models (LLMs) have showcased rema…
▽ More
Code vulnerability detection (CVD) is essential for addressing and preventing system security issues, playing a crucial role in ensuring software security. Previous learning-based vulnerability detection methods rely on either fine-tuning medium-size sequence models or training smaller neural networks from scratch. Recent advancements in large pre-trained language models (LLMs) have showcased remarkable capabilities in various code intelligence tasks including code understanding and generation. However, the effectiveness of LLMs in detecting code vulnerabilities is largely under-explored. This work aims to investigate the gap by fine-tuning LLMs for the CVD task, involving four widely-used open-source LLMs. We also implement other five previous graph-based or medium-size sequence models for comparison. Experiments are conducted on five commonly-used CVD datasets, including both the part of short samples and long samples. In addition, we conduct quantitative experiments to investigate the class imbalance issue and the model's performance on samples of different lengths, which are rarely studied in previous works. To better facilitate communities, we open-source all codes and resources of this study in https://github.com/SakiRinn/LLM4CVD and https://huggingface.co/datasets/xuefen/VulResource.
△ Less
Submitted 5 January, 2025; v1 submitted 24 December, 2024;
originally announced December 2024.
-
Non-Convex Tensor Recovery from Local Measurements
Authors:
Tongle Wu,
Ying Sun,
Jicong Fan
Abstract:
Motivated by the settings where sensing the entire tensor is infeasible, this paper proposes a novel tensor compressed sensing model, where measurements are only obtained from sensing each lateral slice via mutually independent matrices. Leveraging the low tubal rank structure, we reparameterize the unknown tensor ${\boldsymbol {\mathcal X}}^\star$ using two compact tensor factors and formulate th…
▽ More
Motivated by the settings where sensing the entire tensor is infeasible, this paper proposes a novel tensor compressed sensing model, where measurements are only obtained from sensing each lateral slice via mutually independent matrices. Leveraging the low tubal rank structure, we reparameterize the unknown tensor ${\boldsymbol {\mathcal X}}^\star$ using two compact tensor factors and formulate the recovery problem as a nonconvex minimization problem. To solve the problem, we first propose an alternating minimization algorithm, termed \textsf{Alt-PGD-Min}, that iteratively optimizes the two factors using a projected gradient descent and an exact minimization step, respectively. Despite nonconvexity, we prove that \textsf{Alt-PGD-Min} achieves $ε$-accuracy recovery with $\mathcal O\left( κ^2 \log \frac{1}ε\right)$ iteration complexity and $\mathcal O\left( κ^6rn_3\log n_3 \left( κ^2r\left(n_1 + n_2 \right) + n_1 \log \frac{1}ε\right) \right)$ sample complexity, where $κ$ denotes tensor condition number of $\boldsymbol{\mathcal X}^\star$. To further accelerate the convergence, especially when the tensor is ill-conditioned with large $κ$, we prove \textsf{Alt-ScalePGD-Min} that preconditions the gradient update using an approximate Hessian that can be computed efficiently. We show that \textsf{Alt-ScalePGD-Min} achieves $κ$ independent iteration complexity $\mathcal O(\log \frac{1}ε)$ and improves the sample complexity to $\mathcal O\left( κ^4 rn_3 \log n_3 \left( κ^4r(n_1+n_2) + n_1 \log \frac{1}ε\right) \right)$. Experiments validate the effectiveness of the proposed methods.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents
Authors:
Feiran Jia,
Tong Wu,
Xin Qin,
Anna Squicciarini
Abstract:
Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing complex real-world tasks through tool integration. This enhanced ability to interact with external systems and process various data sources, while powerful, introduces significant security vulnerabilities. In particular, indirect prompt injection attacks pose a critical threat, wher…
▽ More
Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing complex real-world tasks through tool integration. This enhanced ability to interact with external systems and process various data sources, while powerful, introduces significant security vulnerabilities. In particular, indirect prompt injection attacks pose a critical threat, where malicious instructions embedded within external data sources can manipulate agents to deviate from user intentions. While existing defenses based on rule constraints, source spotlighting, and authentication protocols show promise, they struggle to maintain robust security while preserving task functionality. We propose a novel and orthogonal perspective that reframes agent security from preventing harmful actions to ensuring task alignment, requiring every agent action to serve user objectives. Based on this insight, we develop Task Shield, a test-time defense mechanism that systematically verifies whether each instruction and tool call contributes to user-specified goals. Through experiments on the AgentDojo benchmark, we demonstrate that Task Shield reduces attack success rates (2.07\%) while maintaining high task utility (69.79\%) on GPT-4o.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
Sensing Surface Patches in Volume Rendering for Inferring Signed Distance Functions
Authors:
Sijia Jiang,
Tong Wu,
Jing Hua,
Zhizhong Han
Abstract:
It is vital to recover 3D geometry from multi-view RGB images in many 3D computer vision tasks. The latest methods infer the geometry represented as a signed distance field by minimizing the rendering error on the field through volume rendering. However, it is still challenging to explicitly impose constraints on surfaces for inferring more geometry details due to the limited ability of sensing su…
▽ More
It is vital to recover 3D geometry from multi-view RGB images in many 3D computer vision tasks. The latest methods infer the geometry represented as a signed distance field by minimizing the rendering error on the field through volume rendering. However, it is still challenging to explicitly impose constraints on surfaces for inferring more geometry details due to the limited ability of sensing surfaces in volume rendering. To resolve this problem, we introduce a method to infer signed distance functions (SDFs) with a better sense of surfaces through volume rendering. Using the gradients and signed distances, we establish a small surface patch centered at the estimated intersection along a ray by pulling points randomly sampled nearby. Hence, we are able to explicitly impose surface constraints on the sensed surface patch, such as multi-view photo consistency and supervision from depth or normal priors, through volume rendering. We evaluate our method by numerical and visual comparisons on scene benchmarks. Our superiority over the latest methods justifies our effectiveness.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Rethinking Hardware Impairments in Multi-User Systems: Can FAS Make a Difference?
Authors:
Junteng Yao,
Tuo Wu,
Liaoshi Zhou,
Ming Jin,
Cunhua Pan,
Maged Elkashlan,
Fumiyuki Adachi,
George K. Karagiannidis,
Naofal Al-Dhahir,
Chau Yuen
Abstract:
In this paper, we analyze the role of fluid antenna systems (FAS) in multi-user systems with hardware impairments (HIs). Specifically, we investigate a scenario where a base station (BS) equipped with multiple fluid antennas communicates with multiple users (CUs), each equipped with a single fluid antenna. Our objective is to maximize the minimum communication rate among all users by jointly optim…
▽ More
In this paper, we analyze the role of fluid antenna systems (FAS) in multi-user systems with hardware impairments (HIs). Specifically, we investigate a scenario where a base station (BS) equipped with multiple fluid antennas communicates with multiple users (CUs), each equipped with a single fluid antenna. Our objective is to maximize the minimum communication rate among all users by jointly optimizing the BS's transmit beamforming, the positions of its transmit fluid antennas, and the positions of the CUs' receive fluid antennas. To address this non-convex problem, we propose a block coordinate descent (BCD) algorithm integrating semidefinite relaxation (SDR), rank-one constraint relaxation (SRCR), successive convex approximation (SCA), and majorization-minimization (MM). Simulation results demonstrate that FAS significantly enhances system performance and robustness, with notable gains when both the BS and CUs are equipped with fluid antennas. Even under low transmit power conditions, deploying FAS at the BS alone yields substantial performance gains. However, the effectiveness of FAS depends on the availability of sufficient movement space, as space constraints may limit its benefits compared to fixed antenna strategies. Our findings highlight the potential of FAS to mitigate HIs and enhance multi-user system performance, while emphasizing the need for practical deployment considerations.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Selective excitation of collective modes in multiband superconductor MgB2
Authors:
Jiayu Yuan,
Liyu Shi,
Tiequan Xu,
Yue Wang,
Zizhao Gan,
Hao Wang,
Tianyi Wu,
Dong Wu,
Tao Dong,
Nanlin Wang
Abstract:
Recent developments in nonequilibrium and nonlinear terahertz (THz) spectroscopies have significantly advanced our understanding of collective excitations in superconductors. However, there is still debate surrounding the identification of Higgs or Leggett modes, as well as BCS charge fluctuations, in the well-known two-band superconductor MgB$_2$. Here, we utilized both multi-cycle and single-cyc…
▽ More
Recent developments in nonequilibrium and nonlinear terahertz (THz) spectroscopies have significantly advanced our understanding of collective excitations in superconductors. However, there is still debate surrounding the identification of Higgs or Leggett modes, as well as BCS charge fluctuations, in the well-known two-band superconductor MgB$_2$. Here, we utilized both multi-cycle and single-cycle THz pump-broadband THz probe techniques to investigate the THz nonlinear response of MgB$_2$. Through multicycle THz pump-THz probe experiments on MgB$_2$, we observed distinct nonlinear signals at both the fundamental frequency ($ω$) and the second harmonic frequency (2$ω$) of the pump pulses, which exhibited resonant enhancement at temperatures where their frequencies respectively match 2$Δ_π(T)$. They are mainly attributed to the $π$-band Higgs mode. By adjusting the THz pump pulse to a single-cycle waveform that satisfies non-adiabatic excitation criteria, we observed an over-damped oscillation corresponding to the Leggett mode. Our findings contribute to solving the ongoing debates and demonstrate the selective excitation of collective modes in multiband superconductors, offering new insights into the interaction between Higgs and Leggett modes.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Echo: Simulating Distributed Training At Scale
Authors:
Yicheng Feng,
Yuetao Chen,
Kaiwen Chen,
Jingzong Li,
Tianyuan Wu,
Peng Cheng,
Chuan Wu,
Wei Wang,
Tsung-Yi Ho,
Hong Xu
Abstract:
Simulation offers unique values for both enumeration and extrapolation purposes, and is becoming increasingly important for managing the massive machine learning (ML) clusters and large-scale distributed training jobs. In this paper, we build Echo to tackle three key challenges in large-scale training simulation: (1) tracing the runtime training workloads at each device in an ex-situ fashion so we…
▽ More
Simulation offers unique values for both enumeration and extrapolation purposes, and is becoming increasingly important for managing the massive machine learning (ML) clusters and large-scale distributed training jobs. In this paper, we build Echo to tackle three key challenges in large-scale training simulation: (1) tracing the runtime training workloads at each device in an ex-situ fashion so we can use a single device to obtain the actual execution graphs of 1K-GPU training, (2) accurately estimating the collective communication without high overheads of discrete-event based network simulation, and (3) accounting for the interference-induced computation slowdown from overlapping communication and computation kernels on the same device. Echo delivers on average 8% error in training step -- roughly 3x lower than state-of-the-art simulators -- for GPT-175B on a 96-GPU H800 cluster with 3D parallelism on Megatron-LM under 2 minutes.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Interfacial Perpendicular Magnetic Anisotropy of Ultrathin Fe(001) Film Grown on CoO(001) Surface
Authors:
Tong Wu,
Yunzhuo Wu,
Haoran Chen,
Hongyue Xu,
Zhen Cheng,
Yuanfei Fan,
Nan Jiang,
Wentao Qin,
Yongwei Cui,
Yuqiang Gao,
Guanhua Zhang,
Zhe Yuan,
Yizheng Wu
Abstract:
Exploring novel systems with perpendicular magnetic anisotropy (PMA) is vital for advancing memory devices. In this study, we report an intriguing PMA system involving an ultrathin Fe layer on an antiferromagnetic (AFM) CoO(001) surface. The measured perpendicular anisotropy field is inversely proportional to the Fe thickness, indicating an interfacial origin of PMA. Temperature-dependent measurem…
▽ More
Exploring novel systems with perpendicular magnetic anisotropy (PMA) is vital for advancing memory devices. In this study, we report an intriguing PMA system involving an ultrathin Fe layer on an antiferromagnetic (AFM) CoO(001) surface. The measured perpendicular anisotropy field is inversely proportional to the Fe thickness, indicating an interfacial origin of PMA. Temperature-dependent measurements reveal that the antiferromagnetism of CoO has a negligible effect on the PMA. By leveraging the magneto-optical Kerr effect and birefringence effect, we achieved concurrent visualization of ferromagnetic (FM) and AFM domains. A pronounced coupling effect between these domains was observed near the spin reorientation transition, contrasting sharply with areas of stronger PMA that exhibited weak coupling. This research not only establishes a new FM/AFM bilayer PMA system but also significantly advances the understanding of FM/AFM interfacial interactions.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
Authors:
Zhibing Li,
Tong Wu,
Jing Tan,
Mengchen Zhang,
Jiaqi Wang,
Dahua Lin
Abstract:
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs, while still struggling with inherent ambiguities between lighting and material. On the other h…
▽ More
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs, while still struggling with inherent ambiguities between lighting and material. On the other hand, learning-based approaches leverage rich material priors from existing 3D object datasets but face challenges with maintaining multi-view consistency. In this paper, we introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations. Our method achieves accurate and multi-view consistent estimation on surface normals and material properties. This is made possible through a novel cross-view, cross-domain attention module and an illumination-augmented, view-adaptive training strategy. Additionally, we introduce ARB-Objaverse, a new dataset that provides large-scale multi-view intrinsic data and renderings under diverse lighting conditions, supporting robust training. Extensive experiments demonstrate that IDArb outperforms state-of-the-art methods both qualitatively and quantitatively. Moreover, our approach facilitates a range of downstream tasks, including single-image relighting, photometric stereo, and 3D reconstruction, highlighting its broad applications in realistic 3D content creation.
△ Less
Submitted 1 April, 2025; v1 submitted 16 December, 2024;
originally announced December 2024.
-
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Authors:
Gaojian Wang,
Feng Lin,
Tong Wu,
Zhenguang Liu,
Zhongjie Ba,
Kui Ren
Abstract:
This work asks: with abundant, unlabeled real faces, how to learn a robust and transferable facial representation that boosts various face security tasks with respect to generalization performance? We make the first attempt and propose a self-supervised pretraining framework to learn fundamental representations of real face images, FSFM, that leverages the synergy between masked image modeling (MI…
▽ More
This work asks: with abundant, unlabeled real faces, how to learn a robust and transferable facial representation that boosts various face security tasks with respect to generalization performance? We make the first attempt and propose a self-supervised pretraining framework to learn fundamental representations of real face images, FSFM, that leverages the synergy between masked image modeling (MIM) and instance discrimination (ID). We explore various facial masking strategies for MIM and present a simple yet powerful CRFR-P masking, which explicitly forces the model to capture meaningful intra-region consistency and challenging inter-region coherency. Furthermore, we devise the ID network that naturally couples with MIM to establish underlying local-to-global correspondence via tailored self-distillation. These three learning objectives, namely 3C, empower encoding both local features and global semantics of real faces. After pretraining, a vanilla ViT serves as a universal vision foundation model for downstream face security tasks: cross-dataset deepfake detection, cross-domain face anti-spoofing, and unseen diffusion facial forgery detection. Extensive experiments on 10 public datasets demonstrate that our model transfers better than supervised pretraining, visual and facial self-supervised learning arts, and even outperforms task-specialized SOTA methods.
△ Less
Submitted 6 April, 2025; v1 submitted 16 December, 2024;
originally announced December 2024.
-
Quasinormal mode as a foundational framework for all electromagnetic Fano resonances
Authors:
Mikhail Bochkarev,
Nikolay Solodovchenko,
Kirill Samusev,
Mikhail Limonov,
Tong Wu,
Philippe Lalanne
Abstract:
Fano profiles are observed across various fields of wave physics. They emerge from interference phenomena and are quantified by the asymmetry parameter q. In optics, q is usually considered as a phenomenological coefficient obtained by fitting experimental or numerical data. In this work, we introduce an ab initio Maxwellian approach using quasinormal modes to analytically describe line shapes in…
▽ More
Fano profiles are observed across various fields of wave physics. They emerge from interference phenomena and are quantified by the asymmetry parameter q. In optics, q is usually considered as a phenomenological coefficient obtained by fitting experimental or numerical data. In this work, we introduce an ab initio Maxwellian approach using quasinormal modes to analytically describe line shapes in light scattering problems. We show that the response of each individual quasinormal mode inherently exhibits a Fano profile and derive an explicit analytical formula for the Fano parameter. Experimental and numerical validations confirm the formula's accuracy across a broad spectrum of electromagnetic systems. The general expression for q opens new possibilities for fine-tuning and optimizing spectral line shapes in electromagnetism.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores
Authors:
Jinliang Shi,
Shigang Li,
Youxuan Xu,
Rongtian Fu,
Xueying Wang,
Tong Wu
Abstract:
Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior computing power, which is promising to boost the performance of matrix operators to a higher level. However, due to the irregularity of unstructured sparse data,…
▽ More
Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior computing power, which is promising to boost the performance of matrix operators to a higher level. However, due to the irregularity of unstructured sparse data, it is difficult to deliver practical speedups on TCUs. To this end, we propose FlashSparse, a novel approach to bridge the gap between sparse workloads and the TCU architecture. Specifically, FlashSparse minimizes the sparse granularity for SpMM and SDDMM on TCUs through a novel swap-and-transpose matrix multiplication strategy. Benefiting from the minimum sparse granularity, the computation redundancy is remarkably reduced while the computing power of TCUs is fully utilized. Besides, FlashSparse is equipped with a memory-efficient thread mapping strategy for coalesced data access and a sparse matrix storage format to save memory footprint. Extensive experimental results on H100 and RTX 4090 GPUs show that FlashSparse sets a new state-of-the-art for sparse matrix multiplications (geometric mean 5.5x speedup over DTC-SpMM and 3.22x speedup over RoDe).
△ Less
Submitted 14 December, 2024;
originally announced December 2024.
-
NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
Authors:
Tao Wu,
Chuhao Zhou,
Yen Heng Wong,
Lin Gu,
Jianfei Yang
Abstract:
The rapid advancement of Vision-Language Models (VLMs) has significantly advanced the development of Embodied Question Answering (EQA), enhancing agents' abilities in language understanding and reasoning within complex and realistic scenarios. However, EQA in real-world scenarios remains challenging, as human-posed questions often contain noise that can interfere with an agent's exploration and re…
▽ More
The rapid advancement of Vision-Language Models (VLMs) has significantly advanced the development of Embodied Question Answering (EQA), enhancing agents' abilities in language understanding and reasoning within complex and realistic scenarios. However, EQA in real-world scenarios remains challenging, as human-posed questions often contain noise that can interfere with an agent's exploration and response, bringing challenges especially for language beginners and non-expert users. To address this, we introduce a NoisyEQA benchmark designed to evaluate an agent's ability to recognize and correct noisy questions. This benchmark introduces four common types of noise found in real-world applications: Latent Hallucination Noise, Memory Noise, Perception Noise, and Semantic Noise generated through an automated dataset creation framework. Additionally, we also propose a 'Self-Correction' prompting mechanism and a new evaluation metric to enhance and measure both noise detection capability and answer quality. Our comprehensive evaluation reveals that current EQA agents often struggle to detect noise in questions, leading to responses that frequently contain erroneous information. Through our Self-Correct Prompting mechanism, we can effectively improve the accuracy of agent answers.
△ Less
Submitted 14 December, 2024;
originally announced December 2024.
-
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
Authors:
Tong Wu,
Yinghao Xu,
Ryan Po,
Mengchen Zhang,
Guandao Yang,
Jiaqi Wang,
Ziwei Liu,
Dahua Lin,
Gordon Wetzstein
Abstract:
Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from the source images. Current methods attempt to distill identity and style from source i…
▽ More
Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from the source images. Current methods attempt to distill identity and style from source images. However, "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes such as lighting and dynamics. Additionally, a simplified "style" adaptation prevents combining multiple attributes from different sources into one generated image. In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, allowing users to apply characteristics such as lighting, texture, and dynamics from different images. To achieve this goal, we constructed the first fine-grained visual attributes dataset (FiVA) to the best of our knowledge. This FiVA dataset features a well-organized taxonomy for visual attributes and includes around 1 M high-quality generated images with visual attribute annotations. Leveraging this dataset, we propose a fine-grained visual attribute adaptation framework (FiVA-Adapter), which decouples and adapts visual attributes from one or more source images into a generated one. This approach enhances user-friendly customization, allowing users to selectively apply desired attributes to create images that meet their unique preferences and specific content requirements.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
ASC-Hook: fast and transparent system call hook for Arm
Authors:
Yang Shen,
Min Xie,
Wenzhe Zhang,
Tao Wu
Abstract:
Intercepting system calls is crucial for tools that aim to modify or monitor application behavior. However, existing system call interception tools on the ARM platform still suffer from limitations in terms of performance and completeness. This paper presents an efficient and comprehensive binary rewriting framework, ASC-Hook, specifically designed for intercepting system calls on the ARM platform…
▽ More
Intercepting system calls is crucial for tools that aim to modify or monitor application behavior. However, existing system call interception tools on the ARM platform still suffer from limitations in terms of performance and completeness. This paper presents an efficient and comprehensive binary rewriting framework, ASC-Hook, specifically designed for intercepting system calls on the ARM platform. ASC-Hook addresses two key challenges on the ARM architecture: the misalignment of the target address caused by directly replacing the SVC instruction with br x8, and the return to the original control flow after system call interception. This is achieved through a hybrid replacement strategy and our specially designed trampoline mechanism. By implementing multiple completeness strategies specifically for system calls, we ensured comprehensive and thorough interception. Experimental results show that ASC-Hook reduces overhead to at least 1/29 of that of existing system call interception tools. We conducted extensive performance evaluations of ASC-Hook, and the average performance loss for system call-intensive applications is 3.7\% .
△ Less
Submitted 20 June, 2025; v1 submitted 7 December, 2024;
originally announced December 2024.
-
SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images
Authors:
Jiahua Dong,
Tong Wu,
Rui Qian,
Jiaqi Wang
Abstract:
The 3D contrastive learning paradigm has demonstrated remarkable performance in downstream tasks through pretraining on point cloud data. Recent advances involve additional 2D image priors associated with 3D point clouds for further improvement. Nonetheless, these existing frameworks are constrained by the restricted range of available point cloud datasets, primarily due to the high costs of obtai…
▽ More
The 3D contrastive learning paradigm has demonstrated remarkable performance in downstream tasks through pretraining on point cloud data. Recent advances involve additional 2D image priors associated with 3D point clouds for further improvement. Nonetheless, these existing frameworks are constrained by the restricted range of available point cloud datasets, primarily due to the high costs of obtaining point cloud data. To this end, we propose SimC3D, a simple but effective 3D contrastive learning framework, for the first time, pretraining 3D backbones from pure RGB image data. SimC3D performs contrastive 3D pretraining with three appealing properties. (1) Pure image data: SimC3D simplifies the dependency of costly 3D point clouds and pretrains 3D backbones using solely RBG images. By employing depth estimation and suitable data processing, the monocular synthesized point cloud shows great potential for 3D pretraining. (2) Simple framework: Traditional multi-modal frameworks facilitate 3D pretraining with 2D priors by utilizing an additional 2D backbone, thereby increasing computational expense. In this paper, we empirically demonstrate that the primary benefit of the 2D modality stems from the incorporation of locality information. Inspired by this insightful observation, SimC3D directly employs 2D positional embeddings as a stronger contrastive objective, eliminating the necessity for 2D backbones and leading to considerable performance improvements. (3) Strong performance: SimC3D outperforms previous approaches that leverage ground-truth point cloud data for pretraining in various downstream tasks. Furthermore, the performance of SimC3D can be further enhanced by combining multiple image datasets, showcasing its significant potential for scalability. The code will be available at https://github.com/Dongjiahua/SimC3D.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Wavelet Diffusion Neural Operator
Authors:
Peiyan Hu,
Rui Wang,
Xiang Zheng,
Tao Zhang,
Haodong Feng,
Ruiqi Feng,
Long Wei,
Yue Wang,
Zhi-Ming Ma,
Tailin Wu
Abstract:
Simulating and controlling physical systems described by partial differential equations (PDEs) are crucial tasks across science and engineering. Recently, diffusion generative models have emerged as a competitive class of methods for these tasks due to their ability to capture long-term dependencies and model high-dimensional states. However, diffusion models typically struggle with handling syste…
▽ More
Simulating and controlling physical systems described by partial differential equations (PDEs) are crucial tasks across science and engineering. Recently, diffusion generative models have emerged as a competitive class of methods for these tasks due to their ability to capture long-term dependencies and model high-dimensional states. However, diffusion models typically struggle with handling system states with abrupt changes and generalizing to higher resolutions. In this work, we propose Wavelet Diffusion Neural Operator (WDNO), a novel PDE simulation and control framework that enhances the handling of these complexities. WDNO comprises two key innovations. Firstly, WDNO performs diffusion-based generative modeling in the wavelet domain for the entire trajectory to handle abrupt changes and long-term dependencies effectively. Secondly, to address the issue of poor generalization across different resolutions, which is one of the fundamental tasks in modeling physical systems, we introduce multi-resolution training. We validate WDNO on five physical systems, including 1D advection equation, three challenging physical systems with abrupt changes (1D Burgers' equation, 1D compressible Navier-Stokes equation and 2D incompressible fluid), and a real-world dataset ERA5, which demonstrates superior performance on both simulation and control tasks over state-of-the-art methods, with significant improvements in long-term and detail prediction accuracy. Remarkably, in the challenging context of the 2D high-dimensional and indirect control task aimed at reducing smoke leakage, WDNO reduces the leakage by 78% compared to the second-best baseline. The code can be found at https://github.com/AI4Science-WestlakeU/wdno.git.
△ Less
Submitted 26 June, 2025; v1 submitted 6 December, 2024;
originally announced December 2024.
-
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Authors:
Jun Zhang,
Desen Meng,
Ji Qi,
Zhenpeng Huang,
Tao Wu,
Limin Wang
Abstract:
Despite the remarkable performance of multimodal large language models (MLLMs) across diverse tasks, the substantial training and inference costs impede their advancement. The majority of computation stems from the overwhelming volume of vision tokens processed by the transformer decoder. In this paper, we propose to build efficient MLLMs by leveraging the Mixture-of-Depths (MoD) mechanism, where…
▽ More
Despite the remarkable performance of multimodal large language models (MLLMs) across diverse tasks, the substantial training and inference costs impede their advancement. The majority of computation stems from the overwhelming volume of vision tokens processed by the transformer decoder. In this paper, we propose to build efficient MLLMs by leveraging the Mixture-of-Depths (MoD) mechanism, where each transformer decoder layer selects essential vision tokens to process while skipping redundant ones. However, integrating MoD into MLLMs is non-trivial. To address the challenges of training and inference stability as well as limited training data, we adapt the MoD module with two novel designs: tanh-gated weight normalization (TanhNorm) and symmetric token reweighting (STRing). Moreover, we observe that vision tokens exhibit higher redundancy in deeper layer and thus design a progressive ratio decay (PRD) strategy, which gradually reduces the token retention ratio layer by layer, employing a shifted cosine schedule. This crucial design fully unleashes the potential of MoD, significantly boosting the efficiency and performance of our models. To validate the effectiveness of our approach, we conduct extensive experiments with two baseline models across 14 benchmarks. Our model, p-MoD, matches or even surpasses the performance of the baseline models, with only 55.6% TFLOPs and 53.8% KV cache storage during inference, and 77.7% GPU hours during training.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
M2PDE: Compositional Generative Multiphysics and Multi-component PDE Simulation
Authors:
Tao Zhang,
Zhenhai Liu,
Feipeng Qi,
Yongjun Jiao,
Tailin Wu
Abstract:
Multiphysics simulation, which models the interactions between multiple physical processes, and multi-component simulation of complex structures are critical in fields like nuclear and aerospace engineering. Previous studies use numerical solvers or ML-based surrogate models for these simulations. However, multiphysics simulations typically require integrating multiple specialized solvers-each for…
▽ More
Multiphysics simulation, which models the interactions between multiple physical processes, and multi-component simulation of complex structures are critical in fields like nuclear and aerospace engineering. Previous studies use numerical solvers or ML-based surrogate models for these simulations. However, multiphysics simulations typically require integrating multiple specialized solvers-each for a specific physical process-into a coupled program, which introduces significant development challenges. Furthermore, existing numerical algorithms struggle with highly complex large-scale structures in multi-component simulations. Here we propose compositional Multiphysics and Multi-component PDE Simulation with Diffusion models (M2PDE) to overcome these challenges. During diffusion-based training, M2PDE learns energy functions modeling the conditional probability of one physical process/component conditioned on other processes/components. In inference, M2PDE generates coupled multiphysics and multi-component solutions by sampling from the joint probability distribution. We evaluate M2PDE on two multiphysics tasks-reaction-diffusion and nuclear thermal coupling-where it achieves more accurate predictions than surrogate models in challenging scenarios. We then apply it to a multi-component prismatic fuel element problem, demonstrating that M2PDE scales from single-component training to a 64-component structure and outperforms existing domain-decomposition and graph-based approaches. The code is available at https://github.com/AI4Science-WestlakeU/M2PDE.
△ Less
Submitted 11 May, 2025; v1 submitted 5 December, 2024;
originally announced December 2024.
-
Frequency-tunable biphoton generation via spontaneous four-wave mixing
Authors:
Jiun-Shiuan Shiu,
Chang-Wei Lin,
Yu-Chiao Huang,
Meng-Jung Lin,
I-Chia Huang,
Ting-Ho Wu,
Pei-Chen Kuan,
Yong-Fan Chen
Abstract:
We present experimental results on tuning biphoton frequency by introducing a detuned coupling field in spontaneous four-wave mixing (SFWM), and examine its impact on the pairing ratio. This tunability is achieved by manipulating the inherent electromagnetically induced transparency (EIT) effect in the double-$Λ$ scheme. Introducing a detuned coupling field degrades the efficiency of EIT-based sti…
▽ More
We present experimental results on tuning biphoton frequency by introducing a detuned coupling field in spontaneous four-wave mixing (SFWM), and examine its impact on the pairing ratio. This tunability is achieved by manipulating the inherent electromagnetically induced transparency (EIT) effect in the double-$Λ$ scheme. Introducing a detuned coupling field degrades the efficiency of EIT-based stimulated four-wave mixing, which in turn reduces the biphoton pairing ratio. However, this reduction can be mitigated by increasing the optical power of the coupling field. Additionally, we observe that blue- and red-detuning the biphoton frequency results in distinct temporal profiles of biphoton wavepackets due to phase mismatch. These findings provide insights into the mechanisms of frequency-tunable biphoton generation via SFWM, and suggest potential optimizations for applications in quantum communication and information processing.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Fluid Antenna Systems Enabling 6G:Principles, Applications, and Research Directions
Authors:
Tuo Wu,
Kangda Zhi,
Junteng Yao,
Xiazhi Lai,
Jianchao Zheng,
Hong Niu,
Maged Elkashlan,
Kai-Kit Wong,
Chan-Byoung Chae,
Zhiguo Ding,
George K. Karagiannidis,
Merouane Debbah,
Chau Yuen
Abstract:
Fluid antenna system (FAS) as a new version of reconfigurable antenna technologies promoting shape and position flexibility, has emerged as an exciting and possibly transformative technology for wireless communications systems. FAS represents any software-controlled fluidic, conductive or dielectric structure that can dynamically alter antenna's shape and position to change the gain, the radiation…
▽ More
Fluid antenna system (FAS) as a new version of reconfigurable antenna technologies promoting shape and position flexibility, has emerged as an exciting and possibly transformative technology for wireless communications systems. FAS represents any software-controlled fluidic, conductive or dielectric structure that can dynamically alter antenna's shape and position to change the gain, the radiation pattern, the operating frequency, and other critical radiation characteristics. With its capability, it is highly anticipated that FAS can contribute greatly to the upcoming sixth generation (6G) wireless networks. This article substantiates this thought by addressing four major questions: 1) Is FAS crucial to 6G? 2) How to characterize FAS? 3) What are the applications of FAS? 4) What are the relevant challenges and future research directions? In particular, five promising research directions that underscore the potential of FAS are discussed. We conclude this article by showcasing the impressive performance of FAS.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Imagine360: Immersive 360 Video Generation from Perspective Anchor
Authors:
Jing Tan,
Shuai Yang,
Tong Wu,
Jingwen He,
Yuwei Guo,
Ziwei Liu,
Dahua Lin
Abstract:
$360^\circ$ videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in $360^\circ$ video format, we seek to lift standard perspective videos into $360^\circ$ equirectangular videos. To this end, we introduce Imagine360, the first perspective-to-$360^\circ…
▽ More
$360^\circ$ videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in $360^\circ$ video format, we seek to lift standard perspective videos into $360^\circ$ equirectangular videos. To this end, we introduce Imagine360, the first perspective-to-$360^\circ$ video generation framework that creates high-quality $360^\circ$ videos with rich and diverse motion patterns from video anchors. Imagine360 learns fine-grained spherical visual and motion patterns from limited $360^\circ$ video data with several key designs. 1) Firstly we adopt the dual-branch design, including a perspective and a panorama video denoising branch to provide local and global constraints for $360^\circ$ video generation, with motion module and spatial LoRA layers fine-tuned on extended web $360^\circ$ videos. 2) Additionally, an antipodal mask is devised to capture long-range motion dependencies, enhancing the reversed camera motion between antipodal pixels across hemispheres. 3) To handle diverse perspective video inputs, we propose elevation-aware designs that adapt to varying video masking due to changing elevations across frames. Extensive experiments show Imagine360 achieves superior graphics quality and motion coherence among state-of-the-art $360^\circ$ video generation methods. We believe Imagine360 holds promise for advancing personalized, immersive $360^\circ$ video creation.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Exploring Evolutionary Spectral Clustering for Temporal-Smoothed Clustered Cell-Free Networking
Authors:
Junyuan Wang,
Tianyao Wu,
Ouyang Zhou,
Yaping Zhu
Abstract:
Clustered cell-free networking, which dynamically partitions the whole network into nonoverlapping subnetworks, has been recently proposed to mitigate the cell-edge problem in cellular networks. However, prior works only focused on optimizing clustered cell-free networking in static scenarios with fixed users. This could lead to a large number of handovers in the practical dynamic environment with…
▽ More
Clustered cell-free networking, which dynamically partitions the whole network into nonoverlapping subnetworks, has been recently proposed to mitigate the cell-edge problem in cellular networks. However, prior works only focused on optimizing clustered cell-free networking in static scenarios with fixed users. This could lead to a large number of handovers in the practical dynamic environment with moving users, seriously hindering the implementation of clustered cell-free networking in practice. This paper considers user mobility and aims to simultaneously maximize the sum rate and minimize the number of handovers. By transforming the multi-objective optimization problem into a time-varying graph partitioning problem and exploring evolutionary spectral clustering, a temporal-smoothed clustered cell-free networking algorithm is proposed, which is shown to be effective in smoothing network partitions over time and reducing handovers while maintaining similar sum rate.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
Authors:
Zeyi Sun,
Ziyang Chu,
Pan Zhang,
Tong Wu,
Xiaoyi Dong,
Yuhang Zang,
Yuanjun Xiong,
Dahua Lin,
Jiaqi Wang
Abstract:
In-context generation is a key component of large language models' (LLMs) open-task generalization capability. By leveraging a few examples as context, LLMs can perform both in-domain and out-of-domain tasks. Recent advancements in auto-regressive vision-language models (VLMs) built upon LLMs have showcased impressive performance in text-to-image generation. However, the potential of in-context le…
▽ More
In-context generation is a key component of large language models' (LLMs) open-task generalization capability. By leveraging a few examples as context, LLMs can perform both in-domain and out-of-domain tasks. Recent advancements in auto-regressive vision-language models (VLMs) built upon LLMs have showcased impressive performance in text-to-image generation. However, the potential of in-context learning for general image generation tasks remains largely unexplored. To address this, we introduce X-Prompt, a purely auto-regressive large-vision language model designed to deliver competitive performance across a wide range of both seen and unseen image generation tasks, all within a unified in-context learning framework. X-Prompt incorporates a specialized design that efficiently compresses valuable features from in-context examples, supporting longer in-context token sequences and improving its ability to generalize to unseen tasks. A unified training task for both text and image prediction enables X-Prompt to handle general image generation with enhanced task awareness from in-context examples. Extensive experiments validate the model's performance across diverse seen image generation tasks and its capacity to generalize to previously unseen tasks.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
ULSR-GS: Ultra Large-scale Surface Reconstruction Gaussian Splatting with Multi-View Geometric Consistency
Authors:
Zhuoxiao Li,
Shanliang Yao,
Taoyu Wu,
Yong Yue,
Wufan Zhao,
Rongjun Qin,
Angel F. Garcia-Fernandez,
Andrew Levers,
Xiaohui Zhu
Abstract:
While Gaussian Splatting (GS) demonstrates efficient and high-quality scene rendering and small area surface extraction ability, it falls short in handling large-scale aerial image surface extraction tasks. To overcome this, we present ULSR-GS, a framework dedicated to high-fidelity surface extraction in ultra-large-scale scenes, addressing the limitations of existing GS-based mesh extraction meth…
▽ More
While Gaussian Splatting (GS) demonstrates efficient and high-quality scene rendering and small area surface extraction ability, it falls short in handling large-scale aerial image surface extraction tasks. To overcome this, we present ULSR-GS, a framework dedicated to high-fidelity surface extraction in ultra-large-scale scenes, addressing the limitations of existing GS-based mesh extraction methods. Specifically, we propose a point-to-photo partitioning approach combined with a multi-view optimal view matching principle to select the best training images for each sub-region. Additionally, during training, ULSR-GS employs a densification strategy based on multi-view geometric consistency to enhance surface extraction details. Experimental results demonstrate that ULSR-GS outperforms other state-of-the-art GS-based works on large-scale aerial photogrammetry benchmark datasets, significantly improving surface extraction accuracy in complex urban environments. Project page: https://ulsrgs.github.io.
△ Less
Submitted 25 June, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings
Authors:
Razi Mahmood,
Pingkun Yan,
Diego Machado Reyes,
Ge Wang,
Mannudeep K. Kalra,
Parisa Kaviani,
Joy T. Wu,
Tanveer Syeda-Mahmood
Abstract:
Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, an…
▽ More
Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, and severity of a large number of clinical findings. We then performed phrasal grounding to localize their associated anatomical regions on chest radiograph images. The textual and visual measures are then combined to rate the quality of the generated reports. We present results that compare this evaluation metric with other textual metrics on a gold standard dataset derived from the MIMIC collection and show its robustness and sensitivity to factual errors.
△ Less
Submitted 22 May, 2025; v1 submitted 1 December, 2024;
originally announced December 2024.
-
Multistage spatial model for informing release of Wolbachia-infected mosquitoes as disease control
Authors:
Zhuolin Qu,
Tong Wu
Abstract:
Wolbachia is a naturally occurring bacterium that can infect Aedes mosquitoes and reduce the transmission of mosquito-borne diseases, including dengue fever, Zika, and chikungunya. Field trials have been conducted worldwide to suppress local epidemics. We introduce a novel partial differential equation model to simulate the spread of Wolbachia infection in mosquito populations. Our model incorpora…
▽ More
Wolbachia is a naturally occurring bacterium that can infect Aedes mosquitoes and reduce the transmission of mosquito-borne diseases, including dengue fever, Zika, and chikungunya. Field trials have been conducted worldwide to suppress local epidemics. We introduce a novel partial differential equation model to simulate the spread of Wolbachia infection in mosquito populations. Our model incorporates the intricate Wolbachia maternal transmission cycle and detailed mosquito life stages, while also accounting for the spatial heterogeneity induced by mosquito dispersion across a two-dimensional domain. Prior modeling studies and field data indicate that a critical threshold of Wolbachia-infected mosquitoes is necessary for infection to persist among the mosquito population. Through our spatial model, we identify a threshold condition, termed the ``critical bubble'', for having a self-sustainable Wolbachia infection in the field. When releasing beyond this threshold, the model predicts a spatial wave of Wolbachia infection. We further quantify how this threshold and infection wave velocity depend on the diffusion process and other parameters. We numerically study various intervention scenarios to inform efficient Wolbachia release strategies. Our findings suggest that: (1) integrating Wolbachia release with pre-release mitigations targeting the adult mosquitoes, rather than the aquatic stages, better reduces the threshold for Wolbachia establishment; Habitats modification before the release may increase the threshold; (2) releases in the dry regions lower the threshold, though the infection waves may slow down or stall at the dry-wet interfaces due to the difference in carrying capacities; and (3) initiating releases just before the wet season further reduces the release threshold.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Abnormally enhanced Hall Lorenz number in the magnetic Weyl semimetal NdAlSi
Authors:
Nan Zhang,
Daifeng Tu,
Ding Li,
Kaixin Tang,
Linpeng Nie,
Houpu Li,
Hongyu Li,
Tao Qi,
Tao Wu,
Jianhui Zhou,
Ziji Xiang,
Xianhui Chen
Abstract:
In Landau's celebrated Fermi liquid theory, electrons in a metal obey the Wiedemann--Franz law at the lowest temperatures. This law states that electron heat and charge transport are linked by a constant $L_0$, i.e., the Sommerfeld value of the Lorenz number ($L$). Such relation can be violated at elevated temperatures where the abundant inelastic scattering leads to a reduction of the Lorenz numb…
▽ More
In Landau's celebrated Fermi liquid theory, electrons in a metal obey the Wiedemann--Franz law at the lowest temperatures. This law states that electron heat and charge transport are linked by a constant $L_0$, i.e., the Sommerfeld value of the Lorenz number ($L$). Such relation can be violated at elevated temperatures where the abundant inelastic scattering leads to a reduction of the Lorenz number ($L < L_0$). Here, we report a rare case of remarkably enhanced Lorenz number ($L > L_0$) discovered in the magnetic topological semimetal NdAlSi. Measurements of the transverse electrical and thermal transport coefficients reveal that the Hall Lorenz number $L_{xy}$ in NdAlSi starts to deviate from the canonical value far above its magnetic ordering temperature. Moreover, $L_{xy}$ displays strong nonmonotonic temperature and field dependence, reaching its maximum value close to 2$L_0$ in an intermediate parameter range. Further analysis excludes charge-neutral excitations as the origin of enhanced $L_{xy}$. Alternatively, we attribute it to the Kondo-type elastic scattering off localized 4$f$ electrons, which creates a peculiar energy distribution of the quasiparticle relaxation time. Our results provide insights into the perplexing transport phenomena caused by the interplay between charge and spin degrees of freedom.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
ThreatModeling-LLM: Automating Threat Modeling using Large Language Models for Banking System
Authors:
Tingmin Wu,
Shuiqiao Yang,
Shigang Liu,
David Nguyen,
Seung Jang,
Alsharif Abuadbba
Abstract:
Threat modeling is a crucial component of cybersecurity, particularly for industries such as banking, where the security of financial data is paramount. Traditional threat modeling approaches require expert intervention and manual effort, often leading to inefficiencies and human error. The advent of Large Language Models (LLMs) offers a promising avenue for automating these processes, enhancing b…
▽ More
Threat modeling is a crucial component of cybersecurity, particularly for industries such as banking, where the security of financial data is paramount. Traditional threat modeling approaches require expert intervention and manual effort, often leading to inefficiencies and human error. The advent of Large Language Models (LLMs) offers a promising avenue for automating these processes, enhancing both efficiency and efficacy. However, this transition is not straightforward due to three main challenges: (1) the lack of publicly available, domain-specific datasets, (2) the need for tailored models to handle complex banking system architectures, and (3) the requirement for real-time, adaptive mitigation strategies that align with compliance standards like NIST 800-53. In this paper, we introduce ThreatModeling-LLM, a novel and adaptable framework that automates threat modeling for banking systems using LLMs. ThreatModeling-LLM operates in three stages: 1) dataset creation, 2) prompt engineering and 3) model fine-tuning. We first generate a benchmark dataset using Microsoft Threat Modeling Tool (TMT). Then, we apply Chain of Thought (CoT) and Optimization by PROmpting (OPRO) on the pre-trained LLMs to optimize the initial prompt. Lastly, we fine-tune the LLM using Low-Rank Adaptation (LoRA) based on the benchmark dataset and the optimized prompt to improve the threat identification and mitigation generation capabilities of pre-trained LLMs.
△ Less
Submitted 14 May, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Channel Modeling for Ultraviolet Non-Line-of-Sight Communications Incorporating an Obstacle
Authors:
Tianfeng Wu,
Fang Yang,
Tian Cao,
Ling Cheng,
Yupeng Chen,
Jian Song,
Julian Cheng,
Zhu Han
Abstract:
Existing studies on ultraviolet (UV) non-line-of-sight (NLoS) channel modeling primarily focus on scenarios without any obstacle, which makes them unsuitable for small transceiver elevation angles in most cases. To address this issue, a UV NLoS channel model incorporating an obstacle was investigated in this paper, where the impacts of atmospheric scattering and obstacle reflection on UV signals w…
▽ More
Existing studies on ultraviolet (UV) non-line-of-sight (NLoS) channel modeling primarily focus on scenarios without any obstacle, which makes them unsuitable for small transceiver elevation angles in most cases. To address this issue, a UV NLoS channel model incorporating an obstacle was investigated in this paper, where the impacts of atmospheric scattering and obstacle reflection on UV signals were both taken into account. To validate the proposed model, we compared it to the related Monte-Carlo photon-tracing (MCPT) model that had been verified by outdoor experiments. Numerical results manifest that the path loss curves obtained by the proposed model agree well with those determined by the MCPT model, while its computation complexity is lower than that of the MCPT model. This work discloses that obstacle reflection can effectively reduce the channel path loss of UV NLoS communication systems.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Authors:
Dawei Li,
Bohan Jiang,
Liangjie Huang,
Alimohammad Beigi,
Chengshuai Zhao,
Zhen Tan,
Amrita Bhattacharjee,
Yuxuan Jiang,
Canyu Chen,
Tianhao Wu,
Kai Shu,
Lu Cheng,
Huan Liu
Abstract:
Assessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). However, traditional methods, whether matching-based or embedding-based, often fall short of judging subtle attributes and delivering satisfactory results. Recent advancements in Large Language Models (LLMs) inspire the "LLM-as-a-judge" paradigm, where LLMs are levera…
▽ More
Assessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). However, traditional methods, whether matching-based or embedding-based, often fall short of judging subtle attributes and delivering satisfactory results. Recent advancements in Large Language Models (LLMs) inspire the "LLM-as-a-judge" paradigm, where LLMs are leveraged to perform scoring, ranking, or selection across various tasks and applications. This paper provides a comprehensive survey of LLM-based judgment and assessment, offering an in-depth overview to advance this emerging field. We begin by giving detailed definitions from both input and output perspectives. Then we introduce a comprehensive taxonomy to explore LLM-as-a-judge from three dimensions: what to judge, how to judge and where to judge. Finally, we compile benchmarks for evaluating LLM-as-a-judge and highlight key challenges and promising directions, aiming to provide valuable insights and inspire future research in this promising research area. Paper list and more resources about LLM-as-a-judge can be found at https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge and https://llm-as-a-judge.github.io.
△ Less
Submitted 5 February, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Acousto-optic modulation based on an AlScN microring resonator for microwave-to-optical conversion
Authors:
Kewei Bian,
Yushuai Liu,
Weilin Rong,
Yuan Dong,
Qize Zhong,
Yang Qiu,
Xingyan Zhao,
Tao Wu,
Shaonan Zheng,
Ting Hu
Abstract:
Acoustic-optic (AO) modulation is critical for microwave and optical signal processing, computing and networking. Challenges remain to integrate AO devices on-chip using fabrication process compatible with complementary metal-oxide-semiconductor (CMOS) technology. This work presents the demonstration of an AO modulator exploiting a microring resonator (MRR) based on thin-film aluminum scandium nit…
▽ More
Acoustic-optic (AO) modulation is critical for microwave and optical signal processing, computing and networking. Challenges remain to integrate AO devices on-chip using fabrication process compatible with complementary metal-oxide-semiconductor (CMOS) technology. This work presents the demonstration of an AO modulator exploiting a microring resonator (MRR) based on thin-film aluminum scandium nitride (AlScN) photonic platform. Leveraging the high piezoelectric properties of AlScN, an MRR is employed with interdigital transducer (IDT) inside to couple microwave signals into acoustic resonant modes, enabling efficient by-directional optical modulation in the MRR. The fabricated MRR exhibits an optical loaded quality factor (Q) of 1.8*e4 at the optical L-band for the TE00 mode. A low effective half-wave voltage Vpi of 1.21 V is achieved, corresponding to a VpiL of 0.0242 Vcm, along with an optomechanical single-photon coupling strength g0 of 0.43 kHz between the 2.11 GHz acoustic mode and the TE00 optical mode. The device shows potential for applications in microwave photonics.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
Modeling of UV NLoS Communication Channels: From Atmospheric Scattering and Obstacle Reflection Perspectives
Authors:
Tianfeng Wu,
Fang Yang,
Tian Cao,
Ling Cheng,
Yupeng Chen,
Jian Song,
Julian Cheng,
Zhu Han
Abstract:
As transceiver elevation angles increase from small to large, existing ultraviolet (UV) non-line-of-sight (NLoS) models encounter two challenges: i) cannot estimate the channel characteristics of UV NLoS communication scenarios when there exists an obstacle in the overlap volume between the transmitter beam and the receiver field-of-view (FoV), and ii) cannot evaluate the channel path loss for the…
▽ More
As transceiver elevation angles increase from small to large, existing ultraviolet (UV) non-line-of-sight (NLoS) models encounter two challenges: i) cannot estimate the channel characteristics of UV NLoS communication scenarios when there exists an obstacle in the overlap volume between the transmitter beam and the receiver field-of-view (FoV), and ii) cannot evaluate the channel path loss for the wide beam and wide FoV scenarios with existing simplified single-scattering path loss models. To address these challenges, a UV NLoS scattering model incorporating an obstacle was investigated, where the obstacle's orientation angle, coordinates, and geometric dimensions were taken into account to approach actual application environments. Then, a UV NLoS reflection model was developed combined with specific geometric diagrams. Further, a simplified single-scattering path loss model was proposed with a closed-form expression. Finally, the proposed models were validated by comparing them with the Monte-Carlo photon-tracing model, the exact single-scattering model, and the latest simplified single-scattering model. Numerical results show that the path loss curves obtained by the proposed models agree well with those attained by related NLoS models under identical parameter settings, and avoiding obstacles is not always a good option for UV NLoS communications. Moreover, the accuracy of the proposed simplified model is superior to that of the existing simplified model for all kinds of transceiver FoV angles.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
A Multi-Layer Blockchain Simulator and Performance Evaluation of Social Internet of Vehicles with Multi-Connectivity Management
Authors:
Yi-Ting Sun,
Hsin-Chieh Lee,
Yun-Chen Yu,
Ting-Feng Wu,
Ibrahim Althamary,
Chih-Wei Huang
Abstract:
The evolution of vehicle-to-everything (V2X) communication brings significant challenges, such as data integrity and vulnerabilities stemming from centralized management. This paper presents an innovative integration of decentralized blockchain technology with V2X communication through a multi-layered architecture that combines the Simulation of Urban Mobility (SUMO) traffic simulator and the Bloc…
▽ More
The evolution of vehicle-to-everything (V2X) communication brings significant challenges, such as data integrity and vulnerabilities stemming from centralized management. This paper presents an innovative integration of decentralized blockchain technology with V2X communication through a multi-layered architecture that combines the Simulation of Urban Mobility (SUMO) traffic simulator and the BlockSim blockchain simulator. In addition, as the Social Internet of Vehicles (SIoV) emerges, efficient resource management becomes indispensable for ensuring seamless communication. We also propose a reference multi-connectivity management method named Enhanced MAX-SINR, designed to advance research in blockchain-specific approaches, taking into account retransmission successfull rates. We evaluate blockchain performance in diverse environments such as urban, suburban, and rural areas, demonstrating that enhancing the success rate of retransmitted blockchain-related messages significantly boosts blockchain transaction performance and provides a foundation for developing intelligent SIoV systems.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.