Skip to main content

Showing 1–50 of 165 results for author: Qin, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13323  [pdf, ps, other

    cs.CR cs.AI cs.LG cs.SE

    Tady: A Neural Disassembler without Structural Constraint Violations

    Authors: Siliang Qin, Fengrui Yang, Hao Wang, Bolun Zhang, Zeyu Gao, Chao Zhang, Kai Chen

    Abstract: Disassembly is a crucial yet challenging step in binary analysis. While emerging neural disassemblers show promise for efficiency and accuracy, they frequently generate outputs violating fundamental structural constraints, which significantly compromise their practical usability. To address this critical problem, we regularize the disassembly solution space by formalizing and applying key structur… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Usenix Security'25

  2. arXiv:2506.07148  [pdf, other

    cs.CL

    Semantic-preserved Augmentation with Confidence-weighted Fine-tuning for Aspect Category Sentiment Analysis

    Authors: Yaping Chai, Haoran Xie, Joe S. Qin

    Abstract: Large language model (LLM) is an effective approach to addressing data scarcity in low-resource scenarios. Recent existing research designs hand-crafted prompts to guide LLM for data augmentation. We introduce a data augmentation strategy for the aspect category sentiment analysis (ACSA) task that preserves the original sentence semantics and has linguistic diversity, specifically by providing a s… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 10 pages, 7 figures, 4 tables

  3. arXiv:2506.03143  [pdf, ps, other

    cs.CL cs.AI cs.CV

    GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

    Authors: Qianhui Wu, Kanzhi Cheng, Rui Yang, Chaoyun Zhang, Jianwei Yang, Huiqiang Jiang, Jian Mu, Baolin Peng, Bo Qiao, Reuben Tan, Si Qin, Lars Liden, Qingwei Lin, Huan Zhang, Tong Zhang, Jianbing Zhang, Dongmei Zhang, Jianfeng Gao

    Abstract: One of the principal challenges in building VLM-powered GUI agents is visual grounding, i.e., localizing the appropriate screen region for action execution based on both the visual content and the textual plans. Most existing work formulates this as a text-based coordinate generation task. However, these approaches suffer from several limitations: weak spatial-semantic alignment, inability to hand… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  4. arXiv:2506.01698  [pdf, ps, other

    cs.CL cs.AI

    When LLMs Team Up: The Emergence of Collaborative Affective Computing

    Authors: Wenna Lai, Haoran Xie, Guandong Xu, Qing Li, S. Joe Qin

    Abstract: Affective Computing (AC) is essential in bridging the gap between human emotional experiences and machine understanding. Traditionally, AC tasks in natural language processing (NLP) have been approached through pipeline architectures, which often suffer from structure rigidity that leads to inefficiencies and limited adaptability. The advent of Large Language Models (LLMs) has revolutionized this… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 20 pages, 7 figures, and 3 tables

  5. arXiv:2505.24780  [pdf, ps, other

    cs.LG quant-ph

    QGAN-based data augmentation for hybrid quantum-classical neural networks

    Authors: Run-Ze He, Jun-Jian Su, Su-Juan Qin, Zheng-Ping Jin, Fei Gao

    Abstract: Quantum neural networks converge faster and achieve higher accuracy than classical models. However, data augmentation in quantum machine learning remains underexplored. To tackle data scarcity, we integrate quantum generative adversarial networks (QGANs) with hybrid quantum-classical neural networks (HQCNNs) to develop an augmentation framework. We propose two strategies: a general approach to enh… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  6. arXiv:2505.23404  [pdf, ps, other

    cs.CL

    Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models

    Authors: Mingyu Yu, Wei Wang, Yanjie Wei, Sujuan Qin

    Abstract: Adversarial attacks on Large Language Models (LLMs) via jailbreaking techniques-methods that circumvent their built-in safety and ethical constraints-have emerged as a critical challenge in AI security. These attacks compromise the reliability of LLMs by exploiting inherent weaknesses in their comprehension capabilities. This paper investigates the efficacy of jailbreaking strategies that are spec… ▽ More

    Submitted 5 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  7. arXiv:2505.22338  [pdf, ps, other

    cs.CL cs.AI

    Text2Grad: Reinforcement Learning from Natural Language Feedback

    Authors: Hanyang Wang, Lu Wang, Chaoyun Zhang, Tianjun Mao, Si Qin, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Traditional RLHF optimizes language models with coarse, scalar rewards that mask the fine-grained reasons behind success or failure, leading to slow and opaque learning. Recent work augments RL with textual critiques through prompting or reflection, improving interpretability but leaving model parameters untouched. We introduce Text2Grad, a reinforcement-learning paradigm that turns free-form text… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: The code for our method is available at https://github.com/microsoft/Text2Grad

  8. arXiv:2505.21494  [pdf, ps, other

    cs.CV

    Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

    Authors: Xiaojun Jia, Sensen Gao, Simeng Qin, Tianyu Pang, Chao Du, Yihao Huang, Xinfeng Li, Yiming Li, Bo Li, Yang Liu

    Abstract: Multimodal large language models (MLLMs) remain vulnerable to transferable adversarial examples. While existing methods typically achieve targeted attacks by aligning global features-such as CLIP's [CLS] token-between adversarial and target samples, they often overlook the rich local information encoded in patch tokens. This leads to suboptimal alignment and limited transferability, particularly f… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  9. arXiv:2505.19139  [pdf, ps, other

    cs.CV

    The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework

    Authors: Feiran Liu, Yuzhe Zhang, Xinyi Huang, Yinan Peng, Xinfeng Li, Lixu Wang, Yutong Shen, Ranjie Duan, Simeng Qin, Xiaojun Jia, Qingsong Wen, Wei Dong

    Abstract: Our research reveals a new privacy risk associated with the vision-language model (VLM) agentic framework: the ability to infer sensitive attributes (e.g., age and health information) and even abstract ones (e.g., personality and social traits) from a set of personal images, which we term "image private attribute profiling." This threat is particularly severe given that modern apps can easily acce… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  10. PARF: An Adaptive Abstraction-Strategy Tuner for Static Analysis

    Authors: Zhongyi Wang, Mingshuai Chen, Tengjie Lin, Linyu Yang, Junhao Zhuo, Qiuye Wang, Shengchao Qin, Xiao Yi, Jianwei Yin

    Abstract: We launch Parf - a toolkit for adaptively tuning abstraction strategies of static program analyzers in a fully automated manner. Parf models various types of external parameters (encoding abstraction strategies) as random variables subject to probability distributions over latticed parameter spaces. It incrementally refines the probability distributions based on accumulated intermediate results ge… ▽ More

    Submitted 9 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Per journal policy, the peer-reviewed version will be published in J. Comput. Sci. Technol. and supersede this preprint

  11. arXiv:2505.11340  [pdf, ps, other

    cs.SE cs.AI

    DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios

    Authors: Zeyu Gao, Yuxin Cui, Hao Wang, Siliang Qin, Yuanda Wang, Bolun Zhang, Chao Zhang

    Abstract: Decompilers are fundamental tools for critical security tasks, from vulnerability discovery to malware analysis, yet their evaluation remains fragmented. Existing approaches primarily focus on syntactic correctness through synthetic micro-benchmarks or subjective human ratings, failing to address real-world requirements for semantic fidelity and analyst usability. We present DecompileBench, the fi… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  12. arXiv:2505.08809  [pdf, ps, other

    cs.CR cs.AI

    MixBridge: Heterogeneous Image-to-Image Backdoor Attack through Mixture of Schrödinger Bridges

    Authors: Shixi Qin, Zhiyong Yang, Shilong Bao, Shi Wang, Qianqian Xu, Qingming Huang

    Abstract: This paper focuses on implanting multiple heterogeneous backdoor triggers in bridge-based diffusion models designed for complex and arbitrary input distributions. Existing backdoor formulations mainly address single-attack scenarios and are limited to Gaussian noise input models. To fill this gap, we propose MixBridge, a novel diffusion Schrödinger bridge (DSB) framework to cater to arbitrary inpu… ▽ More

    Submitted 26 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  13. arXiv:2504.14823  [pdf, other

    cs.GT

    Optimal Repurchasing Contract Design for Efficient Utilization of Computing Resources

    Authors: Zhengyan Deng, Yusen Zheng, Chenliang Sheng, Shaowen Qin

    Abstract: The rapid advancement of AI and other emerging technologies has triggered exponential growth in computing resources demand. Faced with prohibitive infrastructure costs for large-scale computing clusters, users are increasingly resorting to leased computing resources from third-party providers. However, prevalent overestimation of operational requirements frequently leads to substantial underutiliz… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  14. arXiv:2504.14603  [pdf, other

    cs.AI cs.HC cs.OS

    UFO2: The Desktop AgentOS

    Authors: Chaoyun Zhang, He Huang, Chiming Ni, Jian Mu, Si Qin, Shilin He, Lu Wang, Fangkai Yang, Pu Zhao, Chao Du, Liqun Li, Yu Kang, Zhao Jiang, Suzhen Zheng, Rujia Wang, Jiaxu Qian, Minghua Ma, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language. However, most existing CUAs remain conceptual prototypes, hindered by shallow OS integration, fragile screenshot-based interaction, and disruptive execution. We present UFO2, a multiagent AgentOS for Windows deskto… ▽ More

    Submitted 25 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: The source code of UFO2 is publicly available at https://github.com/microsoft/UFO/, with comprehensive documentation provided at https://microsoft.github.io/UFO/

  15. arXiv:2504.12971  [pdf, other

    cs.LG cs.AI

    Transferrable Surrogates in Expressive Neural Architecture Search Spaces

    Authors: Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B. Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, Linus Ericsson

    Abstract: Neural architecture search (NAS) faces a challenge in balancing the exploration of expressive, broad search spaces that enable architectural innovation with the need for efficient evaluation of architectures to effectively search such spaces. We investigate surrogate model training for improving search in highly expressive NAS search spaces based on context-free grammars. We show that i) surrogate… ▽ More

    Submitted 18 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Project page at: https://shiwenqin.github.io/TransferrableSurrogate/

  16. arXiv:2503.22228   

    cs.SE cs.AI cs.LG

    MFH: A Multi-faceted Heuristic Algorithm Selection Approach for Software Verification

    Authors: Jie Su, Liansai Deng, Cheng Wen, Rong Wang, Zhi Ma, Nan Zhang, Cong Tian, Zhenhua Duan, Shengchao Qin

    Abstract: Currently, many verification algorithms are available to improve the reliability of software systems. Selecting the appropriate verification algorithm typically demands domain expertise and non-trivial manpower. An automated algorithm selector is thus desired. However, existing selectors, either depend on machine-learned strategies or manually designed heuristics, encounter issues such as reliance… ▽ More

    Submitted 23 May, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

    Comments: The decision to withdraw the paper is driven by two reasons: 1. A conflict of interest arises from the proposed methods overlapping with pending patent applications by other authors. 2. Upon thorough review, it has been discovered that the paper contains ambiguities and inaccuracies in describing the method, potentially hindering readers' comprehension of the content

    ACM Class: I.2.6; I.2.11; D.2.4

  17. arXiv:2503.19708  [pdf, other

    physics.flu-dyn cs.LG

    Data-efficient rapid prediction of urban airflow and temperature fields for complex building geometries

    Authors: Shaoxiang Qin, Dongxue Zhan, Ahmed Marey, Dingyang Geng, Theodore Potsis, Liangzhu Leon Wang

    Abstract: Accurately predicting urban microclimate, including wind speed and temperature, based solely on building geometry requires capturing complex interactions between buildings and airflow, particularly long-range wake effects influenced by directional geometry. Traditional methods relying on computational fluid dynamics (CFD) are prohibitively expensive for large-scale simulations, while data-driven a… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  18. arXiv:2503.18991  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment

    Authors: Ruoxi Cheng, Haoxuan Ma, Weixin Wang, Zhiqiang Wang, Xiaoshuang Jia, Simeng Qin, Xiaochun Cao, Yang Liu, Xiaojun Jia

    Abstract: Robust alignment is vital for safely deploying large language models (LLMs). Existing techniques are either reward-based -- training a reward model on preference pairs and optimizing with reinforcement learning (RL) -- or reward-free -- directly fine-tuning on ranked outputs. Recent research shows that well-tuned reward-based pipelines remain the most robust, and single-response demonstrations can… ▽ More

    Submitted 29 May, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: The first three authors contributed equally to this work

  19. arXiv:2503.18429  [pdf, other

    cs.CV

    Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation

    Authors: Dingcheng Zhen, Shunshun Yin, Shiyang Qin, Hou Yi, Ziwei Zhang, Siyuan Liu, Gan Qi, Ming Tao

    Abstract: In this work, we introduce the first autoregressive framework for real-time, audio-driven portrait animation, a.k.a, talking head. Beyond the challenge of lengthy animation times, a critical challenge in realistic talking head generation lies in preserving the natural movement of diverse body parts. To this end, we propose Teller, the first streaming audio-driven protrait animation framework with… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Accept in CVPR 2025 Conference Submission

  20. arXiv:2503.17626  [pdf, other

    cs.RO cs.AI

    Transferable Latent-to-Latent Locomotion Policy for Efficient and Versatile Motion Control of Diverse Legged Robots

    Authors: Ziang Zheng, Guojian Zhan, Bin Shuai, Shengtao Qin, Jiangtao Li, Tao Zhang, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) has demonstrated remarkable capability in acquiring robot skills, but learning each new skill still requires substantial data collection for training. The pretrain-and-finetune paradigm offers a promising approach for efficiently adapting to new robot entities and tasks. Inspired by the idea that acquired knowledge can accelerate learning new tasks with the same robot a… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  21. arXiv:2503.13436  [pdf, other

    cs.CV cs.LG

    Unified Autoregressive Visual Generation and Understanding with Continuous Tokens

    Authors: Lijie Fan, Luming Tang, Siyang Qin, Tianhong Li, Xuan Yang, Siyuan Qiao, Andreas Steiner, Chen Sun, Yuanzhen Li, Tao Zhu, Michael Rubinstein, Michalis Raptis, Deqing Sun, Radu Soricut

    Abstract: We present UniFluid, a unified autoregressive framework for joint visual generation and understanding leveraging continuous visual tokens. Our unified autoregressive architecture processes multimodal image and text inputs, generating discrete tokens for text and continuous tokens for image. We find though there is an inherent trade-off between the image generation and understanding task, a careful… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Tech report

  22. arXiv:2503.13421  [pdf, other

    cs.DC eess.SP

    Optimal Expert Selection for Distributed Mixture-of-Experts at the Wireless Edge

    Authors: Shengling Qin, Hai Wu, Hongyang Du, Kaibin Huang

    Abstract: The emergence of distributed Mixture-of-Experts (DMoE) systems, which deploy expert models at edge nodes, offers a pathway to achieving connected intelligence in sixth-generation (6G) mobile networks and edge artificial intelligence (AI). However, current DMoE systems lack an effective expert selection algorithm to address the simultaneous task-expert relevance and channel diversity inherent in th… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  23. arXiv:2503.12874  [pdf, other

    cs.CV

    Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models

    Authors: Xiaojun Jia, Sensen Gao, Simeng Qin, Ke Ma, Xinfeng Li, Yihao Huang, Wei Dong, Yang Liu, Xiaochun Cao

    Abstract: Large pre-trained vision-language models (VLMs), such as CLIP, demonstrate impressive generalization but remain highly vulnerable to adversarial examples (AEs). Previous work has explored robust text prompts through adversarial training, achieving some improvement in both robustness and generalization. However, they primarily rely on singlegradient direction perturbations (e.g., PGD) to generate A… ▽ More

    Submitted 17 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  24. arXiv:2503.11069  [pdf, ps, other

    cs.AI cs.HC

    API Agents vs. GUI Agents: Divergence and Convergence

    Authors: Chaoyun Zhang, Shilin He, Liqun Li, Si Qin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

    Abstract: Large language models (LLMs) have evolved beyond simple text generation to power software agents that directly translate natural language commands into tangible actions. While API-based LLM agents initially rose to prominence for their robust automation capabilities and seamless integration with programmatic endpoints, recent progress in multimodal LLM research has enabled GUI-based LLM agents tha… ▽ More

    Submitted 23 June, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

  25. arXiv:2503.06588  [pdf, other

    cs.SD cs.CV

    Speech Audio Generation from dynamic MRI via a Knowledge Enhanced Conditional Variational Autoencoder

    Authors: Yaxuan Li, Han Jiang, Yifei Ma, Shihua Qin, Fangxu Xing

    Abstract: Dynamic Magnetic Resonance Imaging (MRI) of the vocal tract has become an increasingly adopted imaging modality for speech motor studies. Beyond image signals, systematic data loss, noise pollution, and audio file corruption can occur due to the unpredictability of the MRI acquisition environment. In such cases, generating audio from images is critical for data recovery in both clinical and resear… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  26. arXiv:2502.15261  [pdf, other

    cs.CL cs.AI

    Corrections Meet Explanations: A Unified Framework for Explainable Grammatical Error Correction

    Authors: Jingheng Ye, Shang Qin, Yinghui Li, Hai-Tao Zheng, Shen Wang, Qingsong Wen

    Abstract: Grammatical Error Correction (GEC) faces a critical challenge concerning explainability, notably when GEC systems are designed for language learners. Existing research predominantly focuses on explaining grammatical errors extracted in advance, thus neglecting the relationship between explanations and corrections. To address this gap, we introduce EXGEC, a unified explainable GEC framework that in… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: 19 pages, 2 figures, and 9 tables

  27. arXiv:2502.11651  [pdf, other

    cs.CV cs.AI

    MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression

    Authors: Linjie Mu, Zhongzhen Huang, Shengqian Qin, Yakun Zhu, Shaoting Zhang, Xiaofan Zhang

    Abstract: Large vision-language models (LVLMs) have shown great promise in medical applications, particularly in visual question answering (MedVQA) and diagnosis from medical images. However, existing datasets and models often fail to consider critical aspects of medical diagnostics, such as the integration of historical records and the analysis of disease progression over time. In this paper, we introduce… ▽ More

    Submitted 23 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  28. arXiv:2502.06909  [pdf, other

    cs.LG cs.AI cs.GT

    Meta-Computing Enhanced Federated Learning in IIoT: Satisfaction-Aware Incentive Scheme via DRL-Based Stackelberg Game

    Authors: Xiaohuan Li, Shaowen Qin, Xin Tang, Jiawen Kang, Jin Ye, Zhonghua Zhao, Yusi Zheng, Dusit Niyato

    Abstract: The Industrial Internet of Things (IIoT) leverages Federated Learning (FL) for distributed model training while preserving data privacy, and meta-computing enhances FL by optimizing and integrating distributed computing resources, improving efficiency and scalability. Efficient IIoT operations require a trade-off between model quality and training latency. Consequently, a primary challenge of FL i… ▽ More

    Submitted 20 April, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

  29. arXiv:2502.05798  [pdf, other

    cs.AR

    StreamDCIM: A Tile-based Streaming Digital CIM Accelerator with Mixed-stationary Cross-forwarding Dataflow for Multimodal Transformer

    Authors: Shantian Qin, Ziqing Qiang, Zhihua Fan, Wenming Li, Xuejun An, Xiaochun Ye, Dongrui Fan

    Abstract: Multimodal Transformers are emerging artificial intelligence (AI) models designed to process a mixture of signals from diverse modalities. Digital computing-in-memory (CIM) architectures are considered promising for achieving high efficiency while maintaining high accuracy. However, current digital CIM-based accelerators exhibit inflexibility in microarchitecture, dataflow, and pipeline to effecti… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted by 2025 IEEE International Symposium on Circuits and Systems (ISCAS)

  30. arXiv:2502.05155  [pdf, other

    cs.LG stat.ML

    Deep Dynamic Probabilistic Canonical Correlation Analysis

    Authors: Shiqin Tang, Shujian Yu, Yining Dong, S. Joe Qin

    Abstract: This paper presents Deep Dynamic Probabilistic Canonical Correlation Analysis (D2PCCA), a model that integrates deep learning with probabilistic modeling to analyze nonlinear dynamical systems. Building on the probabilistic extensions of Canonical Correlation Analysis (CCA), D2PCCA captures nonlinear latent dynamics and supports enhancements such as KL annealing for improved convergence and normal… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: accepted by ICASSP-25, code is available at \url{https://github.com/marcusstang/D2PCCA}

  31. arXiv:2502.01524  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective

    Authors: Xiaorui Ma, Haoran Xie, S. Joe Qin

    Abstract: The integration of vision-language modalities has been a significant focus in multimodal learning, traditionally relying on Vision-Language Pretrained Models. However, with the advent of Large Language Models (LLMs), there has been a notable shift towards incorporating LLMs with vision modalities. Following this, the training paradigms for incorporating vision modalities into LLMs have evolved. In… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: 28 pages, 3 figures

  32. arXiv:2502.01523  [pdf, other

    cs.CL

    CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering

    Authors: Zongxi Li, Yang Li, Haoran Xie, S. Joe Qin

    Abstract: Large language models (LLMs) are prone to hallucinations in question-answering (QA) tasks when faced with ambiguous questions. Users often assume that LLMs share their cognitive alignment, a mutual understanding of context, intent, and implicit details, leading them to omit critical information in the queries. However, LLMs generate responses based on assumptions that can misalign with user intent… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  33. arXiv:2502.01170  [pdf, other

    cs.LG

    Label Distribution Learning with Biased Annotations by Learning Multi-Label Representation

    Authors: Zhiqiang Kou, Si Qin, Hailin Wang, Mingkun Xie, Shuo Chen, Yuheng Jia, Tongliang Liu, Masashi Sugiyama, Xin Geng

    Abstract: Multi-label learning (MLL) has gained attention for its ability to represent real-world data. Label Distribution Learning (LDL), an extension of MLL to learning from label distributions, faces challenges in collecting accurate label distributions. To address the issue of biased annotations, based on the low-rank assumption, existing works recover true distributions from biased observations by expl… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  34. arXiv:2502.00639  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

    Authors: Tao Ren, Zishi Zhang, Zehao Li, Jingyang Jiang, Shentao Qin, Guanghao Li, Yan Li, Yi Zheng, Xinping Li, Min Zhan, Yijie Peng

    Abstract: The probabilistic diffusion model (DM), generating content by inferencing through a recursive chain structure, has emerged as a powerful framework for visual generation. After pre-training on enormous unlabeled data, the model needs to be properly aligned to meet requirements for downstream applications. How to efficiently align the foundation DM is a crucial task. Contemporary methods are either… ▽ More

    Submitted 24 March, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

  35. arXiv:2501.18845  [pdf, other

    cs.CL

    Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

    Authors: Yaping Chai, Haoran Xie, Joe S. Qin

    Abstract: The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could unexpectedly make the model overfit and fail to cope with complex tasks. Large language models (LLMs) trained on extensive corpora have prominent text generation cap… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: 20 pages, 4 figures, 4 tables

  36. arXiv:2501.16207  [pdf, ps, other

    cs.AI cs.CL cs.PL

    From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

    Authors: Jialun Cao, Yaojie Lu, Meiziniu Li, Haoyang Ma, Haokun Li, Mengda He, Cheng Wen, Le Sun, Hongyu Zhang, Shengchao Qin, Shing-Chi Cheung, Cong Tian

    Abstract: The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five… ▽ More

    Submitted 8 June, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: 20 pages

  37. arXiv:2501.05499  [pdf, other

    cs.LG cs.CE physics.flu-dyn

    Generalization of Urban Wind Environment Using Fourier Neural Operator Across Different Wind Directions and Cities

    Authors: Cheng Chen, Geng Tian, Shaoxiang Qin, Senwen Yang, Dingyang Geng, Dongxue Zhan, Jinqiu Yang, David Vidal, Liangzhu Leon Wang

    Abstract: Simulation of urban wind environments is crucial for urban planning, pollution control, and renewable energy utilization. However, the computational requirements of high-fidelity computational fluid dynamics (CFD) methods make them impractical for real cities. To address these limitations, this study investigates the effectiveness of the Fourier Neural Operator (FNO) model in predicting flow field… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

  38. arXiv:2501.03495  [pdf, other

    cs.CV cs.LG

    Textualize Visual Prompt for Image Editing via Diffusion Bridge

    Authors: Pengcheng Xu, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Ruoyu Zhao, Charles Ling, Boyu Wang

    Abstract: Visual prompt, a pair of before-and-after edited images, can convey indescribable imagery transformations and prosper in image editing. However, current visual prompt methods rely on a pretrained text-guided image-to-image generative model that requires a triplet of text, before, and after images for retraining over a text-to-image model. Such crafting triplets and retraining processes limit the s… ▽ More

    Submitted 27 January, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

    Comments: AAAI 2025

  39. arXiv:2412.20332  [pdf, ps, other

    cs.SC

    An Algorithm for Discriminating the Complete Multiplicities of a Parametric Univariate Polynomial

    Authors: Simin Qin, Bican Xia, Jing Yang

    Abstract: In this paper, we tackle the parametric complete multiplicity problem for a univariate polynomial. Our approach to the parametric complete multiplicity problem has a significant difference from the classical method, which relies on repeated gcd computation. Instead, we introduce a novel technique that uses incremental gcds of the given polynomial and its high-order derivatives. This approach, form… ▽ More

    Submitted 28 December, 2024; originally announced December 2024.

  40. arXiv:2412.18216  [pdf, other

    cs.CV cs.CL

    ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation

    Authors: Mengyang Wu, Yuzhi Zhao, Jialun Cao, Mingjie Xu, Zhongming Jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu

    Abstract: Controversial contents largely inundate the Internet, infringing various cultural norms and child protection standards. Traditional Image Content Moderation (ICM) models fall short in producing precise moderation decisions for diverse standards, while recent multimodal large language models (MLLMs), when adopted to general rule-based ICM, often produce classification and explanation results that a… ▽ More

    Submitted 20 January, 2025; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: Accepted by the AAAI 2025

  41. Image Gradient-Aided Photometric Stereo Network

    Authors: Kaixuan Wang, Lin Qi, Shiyu Qin, Kai Luo, Yakun Ju, Xia Li, Junyu Dong

    Abstract: Photometric stereo (PS) endeavors to ascertain surface normals using shading clues from photometric images under various illuminations. Recent deep learning-based PS methods often overlook the complexity of object surfaces. These neural network models, which exclusively rely on photometric images for training, often produce blurred results in high-frequency regions characterized by local discontin… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: 13 pages, 5 figures, published to Springer

    Journal ref: Pacific Rim International Conference on Artificial Intelligence. Singapore: Springer Nature Singapore, 2024: 284-296

  42. arXiv:2412.11279  [pdf, other

    cs.CV cs.AI cs.GR

    VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

    Authors: Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li

    Abstract: Video face swapping is becoming increasingly popular across various applications, yet existing methods primarily focus on static images and struggle with video face swapping because of temporal consistency and complex scenarios. In this paper, we present the first diffusion-based framework specifically designed for video face swapping. Our approach introduces a novel image-video hybrid training fr… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Comments: project page: https://hao-shao.com/projects/vividface.html

  43. arXiv:2412.10814  [pdf, other

    cs.LG cs.CE

    Diffusion-based Method for Satellite Pattern-of-Life Identification

    Authors: Yongchao Ye, Xinting Zhu, Xuejin Shen, Xiaoyu Chen, Lishuai Li, S. Joe Qin

    Abstract: Satellite pattern-of-life (PoL) identification is crucial for space safety and satellite monitoring, involving the analysis of typical satellite behaviors such as station-keeping, drift, etc. However, existing PoL identification methods remain underdeveloped due to the complexity of aerospace systems, variability in satellite behaviors, and fluctuating observation sampling rates. In a first attemp… ▽ More

    Submitted 21 May, 2025; v1 submitted 14 December, 2024; originally announced December 2024.

  44. arXiv:2412.10047  [pdf, other

    cs.AI

    Large Action Models: From Inception to Implementation

    Authors: Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, Ray Huang, Si Qin, Qisheng Su, Jiayi Ye, Yudi Zhang, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dy… ▽ More

    Submitted 13 January, 2025; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: 25pages,12 figures

  45. arXiv:2412.06599  [pdf

    eess.IV cs.CV physics.med-ph

    A No-Reference Medical Image Quality Assessment Method Based on Automated Distortion Recognition Technology: Application to Preprocessing in MRI-guided Radiotherapy

    Authors: Zilin Wang, Shengqi Chen, Jianrong Dai, Shirui Qin, Ying Cao, Ruiao Zhao, Guohua Wu, Yuan Tang, Jiayun Chen

    Abstract: Objective:To develop a no-reference image quality assessment method using automated distortion recognition to boost MRI-guided radiotherapy precision.Methods:We analyzed 106,000 MR images from 10 patients with liver metastasis,captured with the Elekta Unity MR-LINAC.Our No-Reference Quality Assessment Model includes:1)image preprocessing to enhance visibility of key diagnostic features;2)feature e… ▽ More

    Submitted 9 December, 2024; v1 submitted 9 December, 2024; originally announced December 2024.

  46. arXiv:2412.03555  [pdf, other

    cs.CV

    PaliGemma 2: A Family of Versatile VLMs for Transfer

    Authors: Andreas Steiner, André Susano Pinto, Michael Tschannen, Daniel Keysers, Xiao Wang, Yonatan Bitton, Alexey Gritsenko, Matthias Minderer, Anthony Sherbondy, Shangbang Long, Siyang Qin, Reeve Ingle, Emanuele Bugliarello, Sahar Kazemzadeh, Thomas Mesnard, Ibrahim Alabdulmohsin, Lucas Beyer, Xiaohua Zhai

    Abstract: PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. We combine the SigLIP-So400m vision encoder that was also used by PaliGemma with the whole range of Gemma 2 models, from the 2B one all the way up to the 27B model. We train these models at three resolutions (224px, 448px, and 896px) in multiple stages to equip them with broa… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  47. arXiv:2411.18279  [pdf, other

    cs.AI cs.CL cs.HC

    Large Language Model-Brained GUI Agents: A Survey

    Authors: Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: GUIs have long been central to human-computer interaction, providing an intuitive and visually-driven way to access and interact with digital systems. The advent of LLMs, particularly multimodal models, has ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. This has paved the way for a n… ▽ More

    Submitted 6 May, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: The collection of papers reviewed in this survey will be hosted and regularly updated on the GitHub repository: https://github.com/vyokky/LLM-Brained-GUI-Agents-Survey Additionally, a searchable webpage is available at https://aka.ms/gui-agent for easier access and exploration

  48. arXiv:2411.11348  [pdf, other

    physics.flu-dyn cs.LG

    Modeling Multivariable High-resolution 3D Urban Microclimate Using Localized Fourier Neural Operator

    Authors: Shaoxiang Qin, Dongxue Zhan, Dingyang Geng, Wenhui Peng, Geng Tian, Yurong Shi, Naiping Gao, Xue Liu, Liangzhu Leon Wang

    Abstract: Accurate urban microclimate analysis with wind velocity and temperature is vital for energy-efficient urban planning, supporting carbon reduction, enhancing public health and comfort, and advancing the low-altitude economy. However, traditional computational fluid dynamics (CFD) simulations that couple velocity and temperature are computationally expensive. Recent machine learning advancements off… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  49. arXiv:2411.07503  [pdf

    eess.IV cs.CV cs.LG physics.med-ph q-bio.TO

    A Novel Automatic Real-time Motion Tracking Method for Magnetic Resonance Imaging-guided Radiotherapy: Leveraging the Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation

    Authors: Shengqi Chen, Zilin Wang, Jianrong Dai, Shirui Qin, Ying Cao, Ruiao Zhao, Jiayun Chen, Guohua Wu, Yuan Tang

    Abstract: Background and Purpose: Accurate motion tracking in MRI-guided Radiotherapy (MRIgRT) is essential for effective treatment delivery. This study aimed to enhance motion tracking precision in MRIgRT through an automatic real-time markerless tracking method using an enhanced Tracking-Learning-Detection (ETLD) framework with automatic segmentation. Materials and Methods: We developed a novel MRIgRT mot… ▽ More

    Submitted 6 January, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

  50. arXiv:2411.06486  [pdf, other

    cs.CR cs.CV

    DDIM-Driven Coverless Steganography Scheme with Real Key

    Authors: Mingyu Yu, Haonan Miao, Zhengping Jin, Sujuan Qin

    Abstract: With the advancement of information hiding techniques, generation-based coverless steganography has emerged as an alternative to traditional methods, leveraging generative models to transform secret information into stego-objects rather than embedding it within the redundancy of the cover. However, existing generation-based approaches require pseudo-keys that must be shared between communication p… ▽ More

    Submitted 12 March, 2025; v1 submitted 10 November, 2024; originally announced November 2024.