Skip to main content

Showing 201–250 of 1,723 results for author: Wu, T

.
  1. arXiv:2501.04390  [pdf, other

    cs.CV

    iFADIT: Invertible Face Anonymization via Disentangled Identity Transform

    Authors: Lin Yuan, Kai Liang, Xiong Li, Tao Wu, Nannan Wang, Xinbo Gao

    Abstract: Face anonymization aims to conceal the visual identity of a face to safeguard the individual's privacy. Traditional methods like blurring and pixelation can largely remove identifying features, but these techniques significantly degrade image quality and are vulnerable to deep reconstruction attacks. Generative models have emerged as a promising solution for anonymizing faces while preserving a na… ▽ More

    Submitted 16 January, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

  2. arXiv:2501.04044  [pdf, ps, other

    math.DG

    The spectral Einstein functional for the Witten Deformation

    Authors: Tong Wu, Yong Wang

    Abstract: In the paper, given two vector fields and the Witten deformation, we compute the spectral Einstein functional for the Witten deformation on even-dimensional spin manifolds without boundary.

    Submitted 5 January, 2025; originally announced January 2025.

    Comments: arXiv admin note: text overlap with arXiv:2412.08028

  3. arXiv:2501.03841  [pdf, other

    cs.RO

    OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

    Authors: Mingjie Pan, Jiyao Zhang, Tianshu Wu, Yinghao Zhao, Wenlong Gao, Hao Dong

    Abstract: The development of general robotic systems capable of manipulating in unstructured environments is a significant challenge. While Vision-Language Models(VLM) excel in high-level commonsense reasoning, they lack the fine-grained 3D spatial understanding required for precise manipulation tasks. Fine-tuning VLM on robotic datasets to create Vision-Language-Action Models(VLA) is a potential solution,… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

  4. arXiv:2501.02960  [pdf, other

    physics.plasm-ph

    Geometric curvature effect on suppressing the Ion-Temperature-Gradient mode near the magnetic axis

    Authors: Tiannan Wu, Shaojie Wang

    Abstract: Global gyrokinetic simulation of the ion temperature gradient mode shows that the radial electric field ($E_r$) well upshifts the critical temperature gradient near the magnetic axis, in the weak but not in the strong magnetic shear configuration. The geometric curvature effect significantly influences the $E \times B$ shear and the wave number near the axis, so that the $E_r$ well suppresses the… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: 6 pages, 5 figures

  5. arXiv:2501.02471  [pdf, other

    cs.CL cs.AI

    Hengqin-RA-v1: Advanced Large Language Model for Diagnosis and Treatment of Rheumatoid Arthritis with Dataset based Traditional Chinese Medicine

    Authors: Yishen Liu, Shengda Luo, Zishao Zhong, Tongtong Wu, Jianguo Zhang, Peiyao Ou, Yong Liang, Liang Liu, Hudan Pan

    Abstract: Large language models (LLMs) primarily trained on English texts, often face biases and inaccuracies in Chinese contexts. Their limitations are pronounced in fields like Traditional Chinese Medicine (TCM), where cultural and clinical subtleties are vital, further hindered by a lack of domain-specific data, such as rheumatoid arthritis (RA). To address these issues, this paper introduces Hengqin-RA-… ▽ More

    Submitted 27 March, 2025; v1 submitted 5 January, 2025; originally announced January 2025.

    Comments: 8 pages, 5 figures, AAAI-2025 Workshop

  6. arXiv:2501.02219  [pdf, other

    cs.LG cs.AI cs.IT

    Diffusion Model-Based Data Synthesis Aided Federated Semi-Supervised Learning

    Authors: Zhongwei Wang, Tong Wu, Zhiyong Chen, Liang Qian, Yin Xu, Meixia Tao

    Abstract: Federated semi-supervised learning (FSSL) is primarily challenged by two factors: the scarcity of labeled data across clients and the non-independent and identically distribution (non-IID) nature of data among clients. In this paper, we propose a novel approach, diffusion model-based data synthesis aided FSSL (DDSA-FSSL), which utilizes a diffusion model (DM) to generate synthetic data, bridging t… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: accepted by IEEE WCNC 2025

  7. arXiv:2501.01853  [pdf

    cond-mat.dis-nn physics.app-ph

    A self-learning magnetic Hopfield neural network with intrinsic gradient descent adaption

    Authors: Chang Niu, Huanyu Zhang, Chuanlong Xu, Wenjie Hu, Yunzhuo Wu, Yu Wu, Yadi Wang, Tong Wu, Yi Zhu, Yinyan Zhu, Wenbin Wang, Yizheng Wu, Lifeng Yin, Jiang Xiao, Weichao Yu, Hangwen Guo, Jian Shen

    Abstract: Physical neural networks using physical materials and devices to mimic synapses and neurons offer an energy-efficient way to implement artificial neural networks. Yet, training physical neural networks are difficult and heavily relies on external computing resources. An emerging concept to solve this issue is called physical self-learning that uses intrinsic physical parameters as trainable weight… ▽ More

    Submitted 6 January, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

    Comments: 21 pages, 5 figures

    Journal ref: Proc. Natl. Acad. Sci. U.S.A. 121 (51) e2416294121,(2024)

  8. arXiv:2501.01696  [pdf, ps, other

    stat.ML cs.IT cs.LG

    Guaranteed Nonconvex Low-Rank Tensor Estimation via Scaled Gradient Descent

    Authors: Tong Wu

    Abstract: Tensors, which give a faithful and effective representation to deliver the intrinsic structure of multi-dimensional data, play a crucial role in an increasing number of signal processing and machine learning problems. However, tensor data are often accompanied by arbitrary signal corruptions, including missing entries and sparse noise. A fundamental challenge is to reliably extract the meaningful… ▽ More

    Submitted 3 January, 2025; originally announced January 2025.

  9. arXiv:2501.01281  [pdf, other

    eess.SP

    Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems

    Authors: Shunxing Yang, Junteng Yao, Jie Tang, Tuo Wu, Maged Elkashlan, Chau Yuen, Merouane Debbah, Hyundong Shin, Matthew Valenti

    Abstract: Fluid antenna systems (FAS) enable dynamic antenna positioning, offering new opportunities to enhance integrated sensing and communication (ISAC) performance. However, existing studies primarily focus on communication enhancement or single-target sensing, leaving multi-target scenarios underexplored. Additionally, the joint optimization of beamforming and antenna positions poses a highly non-conve… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  10. arXiv:2501.00584  [pdf, other

    cs.CV cs.LG

    Online Video Understanding: OVBench and VideoChat-Online

    Authors: Zhenpeng Huang, Xinhao Li, Jiaqi Li, Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, Limin Wang

    Abstract: Multimodal Large Language Models (MLLMs) have significantly progressed in offline video understanding. However, applying these models to real-world scenarios, such as autonomous driving and human-computer interaction, presents unique challenges due to the need for real-time processing of continuous online video streams. To this end, this paper presents systematic efforts from three perspectives: e… ▽ More

    Submitted 17 April, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

    Comments: CVPR 2025 Camera Ready Version. Project Page: https://videochat-online.github.io

  11. arXiv:2501.00581  [pdf, other

    cs.CL cs.AI cs.LG

    Are the Values of LLMs Structurally Aligned with Humans? A Causal Perspective

    Authors: Yipeng Kang, Junqi Wang, Yexin Li, Mengmeng Wang, Wenming Tu, Quansen Wang, Hengli Li, Tingjun Wu, Xue Feng, Fangwei Zhong, Zilong Zheng

    Abstract: As large language models (LLMs) become increasingly integrated into critical applications, aligning their behavior with human values presents significant challenges. Current methods, such as Reinforcement Learning from Human Feedback (RLHF), typically focus on a limited set of coarse-grained values and are resource-intensive. Moreover, the correlations between these values remain implicit, leading… ▽ More

    Submitted 23 February, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

  12. arXiv:2501.00204  [pdf, other

    cs.MM cs.SI

    MSM-BD: Multimodal Social Media Bot Detection Using Heterogeneous Information

    Authors: Tingxuan Wu, Zhaorui Ma, Yanjun Cui, Ziyi Zhou, Eric Wang

    Abstract: Although social bots can be engineered for constructive applications, their potential for misuse in manipulative schemes and malware distribution cannot be overlooked. This dichotomy underscores the critical need to detect social bots on social media platforms. Advances in artificial intelligence have improved the abilities of social bots, allowing them to generate content that is almost indisting… ▽ More

    Submitted 30 December, 2024; originally announced January 2025.

    Comments: Accept at Springer Nature in Studies in Computational Intelligence

  13. arXiv:2501.00013  [pdf, other

    q-bio.QM cs.AI cs.LG

    Relation-Aware Equivariant Graph Networks for Epitope-Unknown Antibody Design and Specificity Optimization

    Authors: Lirong Wu, Haitao Lin, Yufei Huang, Zhangyang Gao, Cheng Tan, Yunfan Liu, Tailin Wu, Stan Z. Li

    Abstract: Antibodies are Y-shaped proteins that protect the host by binding to specific antigens, and their binding is mainly determined by the Complementary Determining Regions (CDRs) in the antibody. Despite the great progress made in CDR design, existing computational methods still encounter several challenges: 1) poor capability of modeling complex CDRs with long sequences due to insufficient contextual… ▽ More

    Submitted 13 December, 2024; originally announced January 2025.

  14. arXiv:2412.20613  [pdf

    cs.CV

    Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study

    Authors: Yulin Fei, Yuhui Gao, Xingyuan Xian, Xiaojin Zhang, Tao Wu, Wei Chen

    Abstract: With the rise of multimodal large language models, accurately extracting and understanding textual information from video content, referred to as video based optical character recognition (Video OCR), has become a crucial capability. This paper introduces a novel benchmark designed to evaluate the video OCR performance of multi-modal models in videos. Comprising 1,028 videos and 2,961 question-ans… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

    Comments: Accepted by CoLing 2025 (The 31st International Conference on Computational Linguistics)

  15. arXiv:2412.19645  [pdf, other

    cs.CV

    VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

    Authors: Tao Wu, Yong Zhang, Xiaodong Cun, Zhongang Qi, Junfu Pu, Huanzhang Dou, Guangcong Zheng, Ying Shan, Xi Li

    Abstract: Zero-shot customized video generation has gained significant attention due to its substantial application potential. Existing methods rely on additional models to extract and inject reference subject features, assuming that the Video Diffusion Model (VDM) alone is insufficient for zero-shot customized video generation. However, these methods often struggle to maintain consistent subject appearance… ▽ More

    Submitted 29 December, 2024; v1 submitted 27 December, 2024; originally announced December 2024.

    Comments: Project Page: https://wutao-cs.github.io/VideoMaker/

  16. arXiv:2412.18857  [pdf, other

    cs.LG cs.AI

    Computing Approximate Graph Edit Distance via Optimal Transport

    Authors: Qihao Cheng, Da Yan, Tianhao Wu, Zhongyi Huang, Qin Zhang

    Abstract: Given a graph pair $(G^1, G^2)$, graph edit distance (GED) is defined as the minimum number of edit operations converting $G^1$ to $G^2$. GED is a fundamental operation widely used in many applications, but its exact computation is NP-hard, so the approximation of GED has gained a lot of attention. Data-driven learning-based methods have been found to provide superior results compared to classical… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

    Comments: Accepted by SIGMOD2025. 26 pages, 21 figures

  17. arXiv:2412.18260  [pdf, other

    cs.CL

    Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study

    Authors: Xuefeng Jiang, Lvhua Wu, Sheng Sun, Jia Li, Jingjing Xue, Yuwei Wang, Tingting Wu, Min Liu

    Abstract: Code vulnerability detection (CVD) is essential for addressing and preventing system security issues, playing a crucial role in ensuring software security. Previous learning-based vulnerability detection methods rely on either fine-tuning medium-size sequence models or training smaller neural networks from scratch. Recent advancements in large pre-trained language models (LLMs) have showcased rema… ▽ More

    Submitted 5 January, 2025; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: Under Review

  18. arXiv:2412.17281  [pdf, other

    cs.LG

    Non-Convex Tensor Recovery from Local Measurements

    Authors: Tongle Wu, Ying Sun, Jicong Fan

    Abstract: Motivated by the settings where sensing the entire tensor is infeasible, this paper proposes a novel tensor compressed sensing model, where measurements are only obtained from sensing each lateral slice via mutually independent matrices. Leveraging the low tubal rank structure, we reparameterize the unknown tensor ${\boldsymbol {\mathcal X}}^\star$ using two compact tensor factors and formulate th… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: The paper was accepted by AAAI 2025

  19. arXiv:2412.16682  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents

    Authors: Feiran Jia, Tong Wu, Xin Qin, Anna Squicciarini

    Abstract: Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing complex real-world tasks through tool integration. This enhanced ability to interact with external systems and process various data sources, while powerful, introduces significant security vulnerabilities. In particular, indirect prompt injection attacks pose a critical threat, wher… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  20. arXiv:2412.16467  [pdf, other

    cs.CV

    Sensing Surface Patches in Volume Rendering for Inferring Signed Distance Functions

    Authors: Sijia Jiang, Tong Wu, Jing Hua, Zhizhong Han

    Abstract: It is vital to recover 3D geometry from multi-view RGB images in many 3D computer vision tasks. The latest methods infer the geometry represented as a signed distance field by minimizing the rendering error on the field through volume rendering. However, it is still challenging to explicitly impose constraints on surfaces for inferring more geometry details due to the limited ability of sensing su… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: To be appeared at AAAI25

  21. arXiv:2412.15843  [pdf, other

    eess.SP

    Rethinking Hardware Impairments in Multi-User Systems: Can FAS Make a Difference?

    Authors: Junteng Yao, Tuo Wu, Liaoshi Zhou, Ming Jin, Cunhua Pan, Maged Elkashlan, Fumiyuki Adachi, George K. Karagiannidis, Naofal Al-Dhahir, Chau Yuen

    Abstract: In this paper, we analyze the role of fluid antenna systems (FAS) in multi-user systems with hardware impairments (HIs). Specifically, we investigate a scenario where a base station (BS) equipped with multiple fluid antennas communicates with multiple users (CUs), each equipped with a single fluid antenna. Our objective is to maximize the minimum communication rate among all users by jointly optim… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  22. arXiv:2412.13830  [pdf, other

    cond-mat.supr-con

    Selective excitation of collective modes in multiband superconductor MgB2

    Authors: Jiayu Yuan, Liyu Shi, Tiequan Xu, Yue Wang, Zizhao Gan, Hao Wang, Tianyi Wu, Dong Wu, Tao Dong, Nanlin Wang

    Abstract: Recent developments in nonequilibrium and nonlinear terahertz (THz) spectroscopies have significantly advanced our understanding of collective excitations in superconductors. However, there is still debate surrounding the identification of Higgs or Leggett modes, as well as BCS charge fluctuations, in the well-known two-band superconductor MgB$_2$. Here, we utilized both multi-cycle and single-cyc… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  23. arXiv:2412.12487  [pdf, other

    cs.LG cs.DC

    Echo: Simulating Distributed Training At Scale

    Authors: Yicheng Feng, Yuetao Chen, Kaiwen Chen, Jingzong Li, Tianyuan Wu, Peng Cheng, Chuan Wu, Wei Wang, Tsung-Yi Ho, Hong Xu

    Abstract: Simulation offers unique values for both enumeration and extrapolation purposes, and is becoming increasingly important for managing the massive machine learning (ML) clusters and large-scale distributed training jobs. In this paper, we build Echo to tackle three key challenges in large-scale training simulation: (1) tracing the runtime training workloads at each device in an ex-situ fashion so we… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  24. arXiv:2412.12476  [pdf

    cond-mat.mtrl-sci

    Interfacial Perpendicular Magnetic Anisotropy of Ultrathin Fe(001) Film Grown on CoO(001) Surface

    Authors: Tong Wu, Yunzhuo Wu, Haoran Chen, Hongyue Xu, Zhen Cheng, Yuanfei Fan, Nan Jiang, Wentao Qin, Yongwei Cui, Yuqiang Gao, Guanhua Zhang, Zhe Yuan, Yizheng Wu

    Abstract: Exploring novel systems with perpendicular magnetic anisotropy (PMA) is vital for advancing memory devices. In this study, we report an intriguing PMA system involving an ultrathin Fe layer on an antiferromagnetic (AFM) CoO(001) surface. The measured perpendicular anisotropy field is inversely proportional to the Fe thickness, indicating an interfacial origin of PMA. Temperature-dependent measurem… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: 24 pages, 6 figures

  25. arXiv:2412.12083  [pdf, other

    cs.CV

    IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

    Authors: Zhibing Li, Tong Wu, Jing Tan, Mengchen Zhang, Jiaqi Wang, Dahua Lin

    Abstract: Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs, while still struggling with inherent ambiguities between lighting and material. On the other h… ▽ More

    Submitted 1 April, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: ICLR 2025. Project Page: https://lizb6626.github.io/IDArb/

  26. arXiv:2412.12032  [pdf, other

    cs.CV cs.AI

    FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

    Authors: Gaojian Wang, Feng Lin, Tong Wu, Zhenguang Liu, Zhongjie Ba, Kui Ren

    Abstract: This work asks: with abundant, unlabeled real faces, how to learn a robust and transferable facial representation that boosts various face security tasks with respect to generalization performance? We make the first attempt and propose a self-supervised pretraining framework to learn fundamental representations of real face images, FSFM, that leverages the synergy between masked image modeling (MI… ▽ More

    Submitted 6 April, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: 21 pages, 11 figures, project page: https://fsfm-3c.github.io

  27. arXiv:2412.11099  [pdf

    physics.optics

    Quasinormal mode as a foundational framework for all electromagnetic Fano resonances

    Authors: Mikhail Bochkarev, Nikolay Solodovchenko, Kirill Samusev, Mikhail Limonov, Tong Wu, Philippe Lalanne

    Abstract: Fano profiles are observed across various fields of wave physics. They emerge from interference phenomena and are quantified by the asymmetry parameter q. In optics, q is usually considered as a phenomenological coefficient obtained by fitting experimental or numerical data. In this work, we introduce an ab initio Maxwellian approach using quasinormal modes to analytically describe line shapes in… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

  28. arXiv:2412.11007  [pdf, other

    cs.DC cs.LG

    FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

    Authors: Jinliang Shi, Shigang Li, Youxuan Xu, Rongtian Fu, Xueying Wang, Tong Wu

    Abstract: Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior computing power, which is promising to boost the performance of matrix operators to a higher level. However, due to the irregularity of unstructured sparse data,… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: Accepted by 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP'25)

    ACM Class: C.1.4; I.2.11

  29. arXiv:2412.10726  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries

    Authors: Tao Wu, Chuhao Zhou, Yen Heng Wong, Lin Gu, Jianfei Yang

    Abstract: The rapid advancement of Vision-Language Models (VLMs) has significantly advanced the development of Embodied Question Answering (EQA), enhancing agents' abilities in language understanding and reasoning within complex and realistic scenarios. However, EQA in real-world scenarios remains challenging, as human-posed questions often contain noise that can interfere with an agent's exploration and re… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  30. arXiv:2412.07674  [pdf, other

    cs.CV

    FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

    Authors: Tong Wu, Yinghao Xu, Ryan Po, Mengchen Zhang, Guandao Yang, Jiaqi Wang, Ziwei Liu, Dahua Lin, Gordon Wetzstein

    Abstract: Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from the source images. Current methods attempt to distill identity and style from source i… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: NeurIPS 2024 (Datasets and Benchmarks Track); Project page: https://fiva-dataset.github.io/

  31. arXiv:2412.05784  [pdf, ps, other

    cs.AR cs.OS cs.PF cs.PL

    ASC-Hook: fast and transparent system call hook for Arm

    Authors: Yang Shen, Min Xie, Wenzhe Zhang, Tao Wu

    Abstract: Intercepting system calls is crucial for tools that aim to modify or monitor application behavior. However, existing system call interception tools on the ARM platform still suffer from limitations in terms of performance and completeness. This paper presents an efficient and comprehensive binary rewriting framework, ASC-Hook, specifically designed for intercepting system calls on the ARM platform… ▽ More

    Submitted 20 June, 2025; v1 submitted 7 December, 2024; originally announced December 2024.

    Comments: Accepted to LCTES 2025 (26th ACM SIGPLAN/SIGBED Int. Conf. on Languages, Compilers & Tools for Embedded Systems); 11 pages (incl. appendix), 6 figures. DOI: 10.1145/3735452.3735524

  32. arXiv:2412.05274  [pdf, other

    cs.CV

    SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images

    Authors: Jiahua Dong, Tong Wu, Rui Qian, Jiaqi Wang

    Abstract: The 3D contrastive learning paradigm has demonstrated remarkable performance in downstream tasks through pretraining on point cloud data. Recent advances involve additional 2D image priors associated with 3D point clouds for further improvement. Nonetheless, these existing frameworks are constrained by the restricted range of available point cloud datasets, primarily due to the high costs of obtai… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  33. arXiv:2412.04833  [pdf, ps, other

    cs.LG

    Wavelet Diffusion Neural Operator

    Authors: Peiyan Hu, Rui Wang, Xiang Zheng, Tao Zhang, Haodong Feng, Ruiqi Feng, Long Wei, Yue Wang, Zhi-Ming Ma, Tailin Wu

    Abstract: Simulating and controlling physical systems described by partial differential equations (PDEs) are crucial tasks across science and engineering. Recently, diffusion generative models have emerged as a competitive class of methods for these tasks due to their ability to capture long-term dependencies and model high-dimensional states. However, diffusion models typically struggle with handling syste… ▽ More

    Submitted 26 June, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  34. arXiv:2412.04449  [pdf, other

    cs.CV cs.CL

    p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

    Authors: Jun Zhang, Desen Meng, Ji Qi, Zhenpeng Huang, Tao Wu, Limin Wang

    Abstract: Despite the remarkable performance of multimodal large language models (MLLMs) across diverse tasks, the substantial training and inference costs impede their advancement. The majority of computation stems from the overwhelming volume of vision tokens processed by the transformer decoder. In this paper, we propose to build efficient MLLMs by leveraging the Mixture-of-Depths (MoD) mechanism, where… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: Technical Report; Code released at https://github.com/MCG-NJU/p-MoD

  35. arXiv:2412.04134  [pdf, other

    cs.LG

    M2PDE: Compositional Generative Multiphysics and Multi-component PDE Simulation

    Authors: Tao Zhang, Zhenhai Liu, Feipeng Qi, Yongjun Jiao, Tailin Wu

    Abstract: Multiphysics simulation, which models the interactions between multiple physical processes, and multi-component simulation of complex structures are critical in fields like nuclear and aerospace engineering. Previous studies use numerical solvers or ML-based surrogate models for these simulations. However, multiphysics simulations typically require integrating multiple specialized solvers-each for… ▽ More

    Submitted 11 May, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: 29pages,14 figures

  36. arXiv:2412.04127  [pdf, other

    quant-ph physics.atom-ph physics.optics

    Frequency-tunable biphoton generation via spontaneous four-wave mixing

    Authors: Jiun-Shiuan Shiu, Chang-Wei Lin, Yu-Chiao Huang, Meng-Jung Lin, I-Chia Huang, Ting-Ho Wu, Pei-Chen Kuan, Yong-Fan Chen

    Abstract: We present experimental results on tuning biphoton frequency by introducing a detuned coupling field in spontaneous four-wave mixing (SFWM), and examine its impact on the pairing ratio. This tunability is achieved by manipulating the inherent electromagnetically induced transparency (EIT) effect in the double-$Λ$ scheme. Introducing a detuned coupling field degrades the efficiency of EIT-based sti… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  37. arXiv:2412.03839  [pdf, other

    eess.SP

    Fluid Antenna Systems Enabling 6G:Principles, Applications, and Research Directions

    Authors: Tuo Wu, Kangda Zhi, Junteng Yao, Xiazhi Lai, Jianchao Zheng, Hong Niu, Maged Elkashlan, Kai-Kit Wong, Chan-Byoung Chae, Zhiguo Ding, George K. Karagiannidis, Merouane Debbah, Chau Yuen

    Abstract: Fluid antenna system (FAS) as a new version of reconfigurable antenna technologies promoting shape and position flexibility, has emerged as an exciting and possibly transformative technology for wireless communications systems. FAS represents any software-controlled fluidic, conductive or dielectric structure that can dynamically alter antenna's shape and position to change the gain, the radiation… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  38. arXiv:2412.03552  [pdf, other

    cs.CV

    Imagine360: Immersive 360 Video Generation from Perspective Anchor

    Authors: Jing Tan, Shuai Yang, Tong Wu, Jingwen He, Yuwei Guo, Ziwei Liu, Dahua Lin

    Abstract: $360^\circ$ videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in $360^\circ$ video format, we seek to lift standard perspective videos into $360^\circ$ equirectangular videos. To this end, we introduce Imagine360, the first perspective-to-$360^\circ… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: Project page: https://ys-imtech.github.io/projects/Imagine360

  39. arXiv:2412.02282  [pdf, other

    cs.NI cs.IT eess.SP

    Exploring Evolutionary Spectral Clustering for Temporal-Smoothed Clustered Cell-Free Networking

    Authors: Junyuan Wang, Tianyao Wu, Ouyang Zhou, Yaping Zhu

    Abstract: Clustered cell-free networking, which dynamically partitions the whole network into nonoverlapping subnetworks, has been recently proposed to mitigate the cell-edge problem in cellular networks. However, prior works only focused on optimizing clustered cell-free networking in static scenarios with fixed users. This could lead to a large number of handovers in the practical dynamic environment with… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: 5 pages, 3 figures

  40. arXiv:2412.01824  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

    Authors: Zeyi Sun, Ziyang Chu, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

    Abstract: In-context generation is a key component of large language models' (LLMs) open-task generalization capability. By leveraging a few examples as context, LLMs can perform both in-domain and out-of-domain tasks. Recent advancements in auto-regressive vision-language models (VLMs) built upon LLMs have showcased impressive performance in text-to-image generation. However, the potential of in-context le… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: code: https://github.com/SunzeY/X-Prompt

  41. arXiv:2412.01402  [pdf, ps, other

    cs.CV

    ULSR-GS: Ultra Large-scale Surface Reconstruction Gaussian Splatting with Multi-View Geometric Consistency

    Authors: Zhuoxiao Li, Shanliang Yao, Taoyu Wu, Yong Yue, Wufan Zhao, Rongjun Qin, Angel F. Garcia-Fernandez, Andrew Levers, Xiaohui Zhu

    Abstract: While Gaussian Splatting (GS) demonstrates efficient and high-quality scene rendering and small area surface extraction ability, it falls short in handling large-scale aerial image surface extraction tasks. To overcome this, we present ULSR-GS, a framework dedicated to high-fidelity surface extraction in ultra-large-scale scenes, addressing the limitations of existing GS-based mesh extraction meth… ▽ More

    Submitted 25 June, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: Project page: https://ulsrgs.github.io

  42. arXiv:2412.01031  [pdf, other

    cs.CL cs.AI cs.CV

    Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings

    Authors: Razi Mahmood, Pingkun Yan, Diego Machado Reyes, Ge Wang, Mannudeep K. Kalra, Parisa Kaviani, Joy T. Wu, Tanveer Syeda-Mahmood

    Abstract: Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, an… ▽ More

    Submitted 22 May, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

  43. arXiv:2411.18792  [pdf, other

    q-bio.PE

    Multistage spatial model for informing release of Wolbachia-infected mosquitoes as disease control

    Authors: Zhuolin Qu, Tong Wu

    Abstract: Wolbachia is a naturally occurring bacterium that can infect Aedes mosquitoes and reduce the transmission of mosquito-borne diseases, including dengue fever, Zika, and chikungunya. Field trials have been conducted worldwide to suppress local epidemics. We introduce a novel partial differential equation model to simulate the spread of Wolbachia infection in mosquito populations. Our model incorpora… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  44. Abnormally enhanced Hall Lorenz number in the magnetic Weyl semimetal NdAlSi

    Authors: Nan Zhang, Daifeng Tu, Ding Li, Kaixin Tang, Linpeng Nie, Houpu Li, Hongyu Li, Tao Qi, Tao Wu, Jianhui Zhou, Ziji Xiang, Xianhui Chen

    Abstract: In Landau's celebrated Fermi liquid theory, electrons in a metal obey the Wiedemann--Franz law at the lowest temperatures. This law states that electron heat and charge transport are linked by a constant $L_0$, i.e., the Sommerfeld value of the Lorenz number ($L$). Such relation can be violated at elevated temperatures where the abundant inelastic scattering leads to a reduction of the Lorenz numb… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 23 pages, 5 figures

  45. arXiv:2411.17058  [pdf, other

    cs.CR cs.AI

    ThreatModeling-LLM: Automating Threat Modeling using Large Language Models for Banking System

    Authors: Tingmin Wu, Shuiqiao Yang, Shigang Liu, David Nguyen, Seung Jang, Alsharif Abuadbba

    Abstract: Threat modeling is a crucial component of cybersecurity, particularly for industries such as banking, where the security of financial data is paramount. Traditional threat modeling approaches require expert intervention and manual effort, often leading to inefficiencies and human error. The advent of Large Language Models (LLMs) offers a promising avenue for automating these processes, enhancing b… ▽ More

    Submitted 14 May, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

  46. Channel Modeling for Ultraviolet Non-Line-of-Sight Communications Incorporating an Obstacle

    Authors: Tianfeng Wu, Fang Yang, Tian Cao, Ling Cheng, Yupeng Chen, Jian Song, Julian Cheng, Zhu Han

    Abstract: Existing studies on ultraviolet (UV) non-line-of-sight (NLoS) channel modeling primarily focus on scenarios without any obstacle, which makes them unsuitable for small transceiver elevation angles in most cases. To address this issue, a UV NLoS channel model incorporating an obstacle was investigated in this paper, where the impacts of atmospheric scattering and obstacle reflection on UV signals w… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

    Comments: Accepted by IEEE Global Communications Conference (GLOBECOM) 2024. arXiv admin note: substantial text overlap with arXiv:2411.15154

  47. arXiv:2411.16594  [pdf, other

    cs.AI cs.CL

    From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

    Authors: Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, Huan Liu

    Abstract: Assessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). However, traditional methods, whether matching-based or embedding-based, often fall short of judging subtle attributes and delivering satisfactory results. Recent advancements in Large Language Models (LLMs) inspire the "LLM-as-a-judge" paradigm, where LLMs are levera… ▽ More

    Submitted 5 February, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: v6: add new citations; 36 pages, 5 figures

  48. arXiv:2411.15701  [pdf

    physics.optics physics.app-ph

    Acousto-optic modulation based on an AlScN microring resonator for microwave-to-optical conversion

    Authors: Kewei Bian, Yushuai Liu, Weilin Rong, Yuan Dong, Qize Zhong, Yang Qiu, Xingyan Zhao, Tao Wu, Shaonan Zheng, Ting Hu

    Abstract: Acoustic-optic (AO) modulation is critical for microwave and optical signal processing, computing and networking. Challenges remain to integrate AO devices on-chip using fabrication process compatible with complementary metal-oxide-semiconductor (CMOS) technology. This work presents the demonstration of an AO modulator exploiting a microring resonator (MRR) based on thin-film aluminum scandium nit… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

    Comments: 11 pages, 5 figures

  49. arXiv:2411.15154  [pdf, other

    eess.SP

    Modeling of UV NLoS Communication Channels: From Atmospheric Scattering and Obstacle Reflection Perspectives

    Authors: Tianfeng Wu, Fang Yang, Tian Cao, Ling Cheng, Yupeng Chen, Jian Song, Julian Cheng, Zhu Han

    Abstract: As transceiver elevation angles increase from small to large, existing ultraviolet (UV) non-line-of-sight (NLoS) models encounter two challenges: i) cannot estimate the channel characteristics of UV NLoS communication scenarios when there exists an obstacle in the overlap volume between the transmitter beam and the receiver field-of-view (FoV), and ii) cannot evaluate the channel path loss for the… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: Accepted by IEEE Journal on Selected Areas in Communications

  50. arXiv:2411.14000  [pdf, other

    cs.NI

    A Multi-Layer Blockchain Simulator and Performance Evaluation of Social Internet of Vehicles with Multi-Connectivity Management

    Authors: Yi-Ting Sun, Hsin-Chieh Lee, Yun-Chen Yu, Ting-Feng Wu, Ibrahim Althamary, Chih-Wei Huang

    Abstract: The evolution of vehicle-to-everything (V2X) communication brings significant challenges, such as data integrity and vulnerabilities stemming from centralized management. This paper presents an innovative integration of decentralized blockchain technology with V2X communication through a multi-layered architecture that combines the Simulation of Urban Mobility (SUMO) traffic simulator and the Bloc… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.