Skip to main content

Showing 1–50 of 279 results for author: Bao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.07395  [pdf, ps, other

    cs.CV

    Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image Collections

    Authors: Yongtang Bao, Chengjie Tang, Yuze Wang, Haojie Li

    Abstract: Reconstructing and segmenting scenes from unconstrained photo collections obtained from the Internet is a novel but challenging task. Unconstrained photo collections are easier to get than well-captured photo collections. These unconstrained images suffer from inconsistent lighting and transient occlusions, which makes segmentation challenging. Previous segmentation methods cannot address transien… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

  2. arXiv:2507.04716  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Optimal Model Selection for Conformalized Robust Optimization

    Authors: Yajie Bao, Yang Hu, Haojie Ren, Peng Zhao, Changliang Zou

    Abstract: In decision-making under uncertainty, Contextual Robust Optimization (CRO) provides reliability by minimizing the worst-case decision loss over a prediction set, hedging against label variability. While recent advances use conformal prediction to construct prediction sets for machine learning models, the downstream decisions critically depend on model selection. This paper introduces novel model s… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  3. arXiv:2507.02828  [pdf, ps, other

    quant-ph cond-mat.stat-mech cond-mat.str-el cs.IT hep-th

    Designs from magic-augmented Clifford circuits

    Authors: Yuzhen Zhang, Sagar Vijay, Yingfei Gu, Yimu Bao

    Abstract: We introduce magic-augmented Clifford circuits -- architectures in which Clifford circuits are preceded and/or followed by constant-depth circuits of non-Clifford (``magic") gates -- as a resource-efficient way to realize approximate $k$-designs, with reduced circuit depth and usage of magic. We prove that shallow Clifford circuits, when augmented with constant-depth circuits of magic gates, can g… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 59 pages

  4. arXiv:2507.00739  [pdf, ps, other

    cs.CV eess.IV eess.SP

    Biorthogonal Tunable Wavelet Unit with Lifting Scheme in Convolutional Neural Network

    Authors: An Le, Hung Nguyen, Sungbal Seo, You-Suk Bae, Truong Nguyen

    Abstract: This work introduces a novel biorthogonal tunable wavelet unit constructed using a lifting scheme that relaxes both the orthogonality and equal filter length constraints, providing greater flexibility in filter design. The proposed unit enhances convolution, pooling, and downsampling operations, leading to improved image classification and anomaly detection in convolutional neural networks (CNN).… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  5. arXiv:2506.17336  [pdf, ps, other

    cs.CR cs.AI

    Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases

    Authors: Yubeen Bae, Minchan Kim, Jaejin Lee, Sangbum Kim, Jaehyung Kim, Yejin Choi, Niloofar Mireshghallah

    Abstract: Large language models (LLMs) are increasingly used as personal agents, accessing sensitive user data such as calendars, emails, and medical records. Users currently face a trade-off: They can send private records, many of which are stored in remote databases, to powerful but untrusted LLM providers, increasing their exposure risk. Alternatively, they can run less powerful models locally on trusted… ▽ More

    Submitted 1 July, 2025; v1 submitted 19 June, 2025; originally announced June 2025.

    Comments: 29 pages

  6. arXiv:2506.15524  [pdf, ps, other

    cs.CV

    NTIRE 2025 Image Shadow Removal Challenge Report

    Authors: Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu, Jiarui Wang, Qiming Hu, Xiaojie Guo, Xin Lu, Jiarong Yang, Yuanfei Bao, Anya Hu, Zihao Fan, Kunyu Wang, Jie Xiao, Xi Wang, Xueyang Fu, Zheng-Jun Zha, Yu-Fan Lin, Chia-Ming Lee , et al. (57 additional authors not shown)

    Abstract: This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were e… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  7. arXiv:2506.15021  [pdf, ps, other

    cs.LG cs.AI

    SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models

    Authors: Gyuhak Kim, Sumiran Singh Thakur, Su Min Park, Wei Wei, Yujia Bao

    Abstract: Supervised fine-tuning (SFT) has become an essential step in tailoring large language models (LLMs) to align with human expectations and specific downstream tasks. However, existing SFT methods typically treat each training instance as a uniform sequence, giving equal importance to all tokens regardless of their relevance. This overlooks the fact that only a subset of tokens often contains critica… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  8. arXiv:2506.13497  [pdf, ps, other

    cs.DC

    DDiT: Dynamic Resource Allocation for Diffusion Transformer Model Serving

    Authors: Heyang Huang, Cunchen Hu, Jiaqi Zhu, Ziyuan Gao, Liangliang Xu, Yizhou Shan, Yungang Bao, Sun Ninghui, Tianwei Zhang, Sa Wang

    Abstract: The Text-to-Video (T2V) model aims to generate dynamic and expressive videos from textual prompts. The generation pipeline typically involves multiple modules, such as language encoder, Diffusion Transformer (DiT), and Variational Autoencoders (VAE). Existing serving systems often rely on monolithic model deployment, while overlooking the distinct characteristics of each module, leading to ineffic… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  9. arXiv:2506.04953  [pdf, ps, other

    cs.CV

    APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval

    Authors: Hong Gao, Yiming Bao, Xuezhen Tu, Bin Zhong, Minling Zhang

    Abstract: Current multimodal large language models (MLLMs) struggle with hour-level video understanding, facing significant challenges not only in modeling the substantial information volume of long videos but also in overcoming the memory wall and resource constraints during both training and inference. Although recent training-free approaches have alleviated resource demands by compressing visual features… ▽ More

    Submitted 28 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  10. arXiv:2506.02453  [pdf, ps, other

    cs.CV

    PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation

    Authors: Kunyu Wang, Xueyang Fu, Yuanfei Bao, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha

    Abstract: Continual Test-Time Adaptation (CTTA) aims to online adapt a pre-trained model to changing environments during inference. Most existing methods focus on exploiting target data, while overlooking another crucial source of information, the pre-trained weights, which encode underutilized domain-invariant priors. This paper takes the geometric attributes of pre-trained weights as a starting point, sys… ▽ More

    Submitted 3 July, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  11. arXiv:2506.00823  [pdf, other

    cs.CL

    Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks

    Authors: Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Zhengwen Feng, Hao Peng, Jianwei Yin

    Abstract: Large language models (LLMs) are trained on extensive datasets that encapsulate substantial world knowledge. However, their outputs often include confidently stated inaccuracies. Earlier works suggest that LLMs encode truthfulness as a distinct linear feature, termed the "truth direction", which can classify truthfulness reliably. We address several open questions about the truth direction: (i) wh… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 19 pages, 16 figures; accepted to Findings of ACL 2025

  12. arXiv:2505.18425  [pdf, ps, other

    cs.AI

    Advertising in AI systems: Society must be vigilant

    Authors: Menghua Wu, Yujia Bao

    Abstract: AI systems have increasingly become our gateways to the Internet. We argue that just as advertising has driven the monetization of web search and social media, so too will commercial incentives shape the content served by AI. Unlike traditional media, however, the outputs of these systems are dynamic, personalized, and lack clear provenance -- raising concerns for transparency and regulation. In t… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  13. arXiv:2505.18279  [pdf, ps, other

    cs.MA cs.AI cs.CL cs.LG

    Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control

    Authors: Alireza Rezazadeh, Zichao Li, Ange Lou, Yuying Zhao, Wei Wei, Yujia Bao

    Abstract: Complex tasks are increasingly delegated to ensembles of specialized LLM-based agents that reason, communicate, and coordinate actions-both among themselves and through interactions with external tools, APIs, and databases. While persistent memory has been shown to enhance single-agent performance, most approaches assume a monolithic, single-user context-overlooking the benefits and challenges of… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  14. arXiv:2505.17412  [pdf, ps, other

    cs.CV

    Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

    Authors: Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Yikang Yang, Yajie Bao, Jiachen Qian, Siyu Zhu, Xun Cao, Philip Torr, Yao Yao

    Abstract: Generating high-resolution 3D shapes using volumetric representations such as Signed Distance Functions (SDFs) presents substantial computational and memory challenges. We introduce Direct3D-S2, a scalable 3D generation framework based on sparse volumes that achieves superior output quality with dramatically reduced training costs. Our key innovation is the Spatial Sparse Attention (SSA) mechanism… ▽ More

    Submitted 26 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Project page: https://www.neural4d.com/research/direct3d-s2

  15. arXiv:2505.16533  [pdf, ps, other

    cs.CV

    Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction

    Authors: Jiacong Chen, Qingyu Mao, Youneng Bao, Xiandong Meng, Fanyang Meng, Ronggang Wang, Yongsheng Liang

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a high-fidelity and efficient paradigm for online free-viewpoint video (FVV) reconstruction, offering viewers rapid responsiveness and immersive experiences. However, existing online methods face challenge in prohibitive storage requirements primarily due to point-wise modeling that fails to exploit the motion properties. To address this limitation, we p… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 17 pages, 9 figures

  16. arXiv:2505.15216  [pdf, ps, other

    cs.CR cs.AI cs.CL cs.LG

    BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems

    Authors: Andy K. Zhang, Joey Ji, Celeste Menders, Riya Dulepet, Thomas Qin, Ron Y. Wang, Junrong Wu, Kyleen Liao, Jiliang Li, Jinghan Hu, Sara Hong, Nardos Demilew, Shivatmica Murgai, Jason Tran, Nishka Kacheria, Ethan Ho, Denis Liu, Lauren McLane, Olivia Bruvik, Dai-Rong Han, Seungwoo Kim, Akhil Vyas, Cuiyuanxiu Chen, Ryan Li, Weiran Xu , et al. (9 additional authors not shown)

    Abstract: AI agents have the potential to significantly alter the cybersecurity landscape. Here, we introduce the first framework to capture offensive and defensive cyber-capabilities in evolving real-world systems. Instantiating this framework with BountyBench, we set up 25 systems with complex, real-world codebases. To capture the vulnerability lifecycle, we define three task types: Detect (detecting a ne… ▽ More

    Submitted 9 July, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: 93 pages

  17. arXiv:2505.11752  [pdf, other

    cs.LG

    Permutation Randomization on Nonsmooth Nonconvex Optimization: A Theoretical and Experimental Study

    Authors: Wei Zhang, Arif Hassan Zidan, Afrar Jahin, Yu Bao, Tianming Liu

    Abstract: While gradient-based optimizers that incorporate randomization often showcase superior performance on complex optimization, the theoretical foundations underlying this superiority remain insufficiently understood. A particularly pressing question has emerged: What is the role of randomization in dimension-free nonsmooth nonconvex optimization? To address this gap, we investigate the theoretical an… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  18. arXiv:2505.11748  [pdf, other

    cs.LG

    HOME-3: High-Order Momentum Estimator with Third-Power Gradient for Convex and Smooth Nonconvex Optimization

    Authors: Wei Zhang, Arif Hassan Zidan, Afrar Jahin, Yu Bao, Tianming Liu

    Abstract: Momentum-based gradients are essential for optimizing advanced machine learning models, as they not only accelerate convergence but also advance optimizers to escape stationary points. While most state-of-the-art momentum techniques utilize lower-order gradients, such as the squared first-order gradient, there has been limited exploration of higher-order gradients, particularly those raised to pow… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  19. arXiv:2505.11040  [pdf, ps, other

    cs.LG

    Efficient Attention via Pre-Scoring: Prioritizing Informative Keys in Transformers

    Authors: Zhexiang Li, Haoyu Wang, Yutong Bao, David Woodruff

    Abstract: Recent advances in transformer architectures deeply enhance long-context language modeling. Among them, HyperAttention achieves competitive efficiency by combining a single-level LSH-based clustering with uniform residual sampling. However,such a sampling limits crucial keys' capturing, which in turn raises the overall perplexity. In this paper, we propose a pre-scoring mechanism to assist HyperAt… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  20. arXiv:2505.05017  [pdf, other

    cs.CL

    Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization

    Authors: Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Jiang Zong, Hao Peng, Jianwei Yin

    Abstract: Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks. Since the majority of knowledge is acquired during pre-training, attributing the predictions of fine-tuned LLMs to their pre-training data may provide valuable insights. Influence functions have been proposed as a means to explain model predictions based on training data. However, existing approaches fai… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 9 pages, accepted by IJCAI 2025

  21. arXiv:2505.04986  [pdf, other

    stat.ML cs.LG

    Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach

    Authors: Qian Peng, Yajie Bao, Haojie Ren, Zhaojun Wang, Changliang Zou

    Abstract: Conformal prediction is a powerful tool for constructing prediction intervals for black-box models, providing a finite sample coverage guarantee for exchangeable data. However, this exchangeability is compromised when some entries of the test feature are contaminated, such as in the case of cellwise outliers. To address this issue, this paper introduces a novel framework called detect-then-impute… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: 23 pages, 15 figures

  22. arXiv:2504.14669  [pdf, other

    cs.CL

    Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data

    Authors: Wei Zou, Sen Yang, Yu Bao, Shujian Huang, Jiajun Chen, Shanbo Cheng

    Abstract: The rise of Large Language Models (LLMs) has reshaped machine translation (MT), but multilingual MT still relies heavily on parallel data for supervised fine-tuning (SFT), facing challenges like data scarcity for low-resource languages and catastrophic forgetting. To address these issues, we propose TRANS-ZERO, a self-play framework that leverages only monolingual data and the intrinsic multilingu… ▽ More

    Submitted 17 May, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: 11 pages, 4 figures, accepted by ACL 2025 as findings

  23. arXiv:2504.13914  [pdf, other

    cs.CL

    Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

    Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

    Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More

    Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  24. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  25. arXiv:2504.09379  [pdf, other

    cs.CV

    Low-Light Image Enhancement using Event-Based Illumination Estimation

    Authors: Lei Sun, Yuhan Bao, Jiajun Zhai, Jingyun Liang, Yulun Zhang, Kaiwei Wang, Danda Pani Paudel, Luc Van Gool

    Abstract: Low-light image enhancement (LLIE) aims to improve the visibility of images captured in poorly lit environments. Prevalent event-based solutions primarily utilize events triggered by motion, i.e., ''motion events'' to strengthen only the edge texture, while leaving the high dynamic range and excellent low-light responsiveness of event cameras largely unexplored. This paper instead opens a new aven… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  26. arXiv:2504.07491  [pdf, ps, other

    cs.CV

    Kimi-VL Technical Report

    Authors: Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (70 additional authors not shown)

    Abstract: We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2.8B parameters in its language decoder (Kimi-VL-A3B). Kimi-VL demonstrates strong performance across challenging domains: as a general-purpose VLM, Kimi-VL excels in multi-… ▽ More

    Submitted 23 June, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: Updated Kimi-VL-A3B-Thinking-2506 information

  27. arXiv:2504.00954  [pdf, other

    cs.CV cs.AI

    IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval

    Authors: Bangwei Liu, Yicheng Bao, Shaohui Lin, Xuhong Wang, Xin Tan, Yingchun Wang, Yuan Xie, Chaochao Lu

    Abstract: Multimodal retrieval systems are becoming increasingly vital for cutting-edge AI technologies, such as embodied AI and AI-driven digital content industries. However, current multimodal retrieval tasks lack sufficient complexity and demonstrate limited practical application value. It spires us to design Instance-Driven Multimodal Image Retrieval (IDMR), a novel task that requires models to retrieve… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  28. arXiv:2503.23035  [pdf, other

    cs.CV

    FreeInv: Free Lunch for Improving DDIM Inversion

    Authors: Yuxiang Bao, Huijie Liu, Xun Gao, Huan Fu, Guoliang Kang

    Abstract: Naive DDIM inversion process usually suffers from a trajectory deviation issue, i.e., the latent trajectory during reconstruction deviates from the one during inversion. To alleviate this issue, previous methods either learn to mitigate the deviation or design cumbersome compensation strategy to reduce the mismatch error, exhibiting substantial time and computation cost. In this work, we present a… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  29. arXiv:2503.20202  [pdf, other

    cs.CL cs.AI cs.HC cs.RO

    SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain

    Authors: Nan Gao, Yihua Bao, Dongdong Weng, Jiayi Zhao, Jia Li, Yan Zhou, Pengfei Wan, Di Zhang

    Abstract: Co-speech gesture generation enhances human-computer interaction realism through speech-synchronized gesture synthesis. However, generating semantically meaningful gestures remains a challenging problem. We propose SARGes, a novel framework that leverages large language models (LLMs) to parse speech content and generate reliable semantic gesture labels, which subsequently guide the synthesis of me… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  30. arXiv:2503.17860  [pdf, other

    cs.CL

    Enhancing Retrieval Systems with Inference-Time Logical Reasoning

    Authors: Felix Faltings, Wei Wei, Yujia Bao

    Abstract: Traditional retrieval methods rely on transforming user queries into vector representations and retrieving documents based on cosine similarity within an embedding space. While efficient and scalable, this approach often fails to handle complex queries involving logical constructs such as negations, conjunctions, and disjunctions. In this paper, we propose a novel inference-time logical reasoning… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  31. arXiv:2503.16432  [pdf

    cs.HC cs.AI cs.CL

    Multimodal Transformer Models for Turn-taking Prediction: Effects on Conversational Dynamics of Human-Agent Interaction during Cooperative Gameplay

    Authors: Young-Ho Bae, Casey C. Bennett

    Abstract: This study investigates multimodal turn-taking prediction within human-agent interactions (HAI), particularly focusing on cooperative gaming environments. It comprises both model development and subsequent user study, aiming to refine our understanding and improve conversational dynamics in spoken dialogue systems (SDSs). For the modeling phase, we introduce a novel transformer-based deep learning… ▽ More

    Submitted 5 February, 2025; originally announced March 2025.

    Comments: 36 pages

  32. arXiv:2503.16080  [pdf, other

    cs.CR

    Fast Homomorphic Linear Algebra with BLAS

    Authors: Youngjin Bae, Jung Hee Cheon, Guillaume Hanrot, Jai Hyun Park, Damien Stehlé

    Abstract: Homomorphic encryption is a cryptographic paradigm allowing to compute on encrypted data, opening a wide range of applications in privacy-preserving data manipulation, notably in AI. Many of those applications require significant linear algebra computations (matrix x vector products, and matrix x matrix products). This central role of linear algebra computations goes far beyond homomorphic algeb… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  33. arXiv:2503.14573  [pdf

    eess.IV cs.CV cs.GR

    Submillimeter-Accurate 3D Lumbar Spine Reconstruction from Biplanar X-Ray Images: Incorporating a Multi-Task Network and Landmark-Weighted Loss

    Authors: Wanxin Yu, Zhemin Zhu, Cong Wang, Yihang Bao, Chunjie Xia, Rongshan Cheng, Yan Yu, Tsung-Yuan Tsai

    Abstract: Three-dimensional reconstruction of the spine under weight-bearing conditions from biplanar X-ray images is of great importance for the clinical assessment of spinal diseases. However, the current fully automated reconstruction methods only achieve millimeter-level accuracy, making it difficult to meet clinical standards. This study developed and validated a fully automated method for high-accurac… ▽ More

    Submitted 18 May, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: 24 pages, 11 figures, 5 tables

  34. arXiv:2503.13200  [pdf, other

    cs.LG cs.AI

    Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services

    Authors: Yiman Bao, Jie Gao, Jinke He, Frans A. Oliehoek, Oded Cats

    Abstract: Efficient timing in ride-matching is crucial for improving the performance of ride-hailing and ride-pooling services, as it determines the number of drivers and passengers considered in each matching process. Traditional batched matching methods often use fixed time intervals to accumulate ride requests before assigning matches. While this approach increases the number of available drivers and pas… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  35. arXiv:2503.10674  [pdf

    cs.CL cs.AI

    Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS

    Authors: Shafiuddin Rehan Ahmed, Ankit Parag Shah, Quan Hung Tran, Vivek Khetan, Sukryool Kang, Ankit Mehta, Yujia Bao, Wei Wei

    Abstract: Climate change has intensified the need for transparency and accountability in organizational practices, making Environmental, Social, and Governance (ESG) reporting increasingly crucial. Frameworks like the Global Reporting Initiative (GRI) and the new European Sustainability Reporting Standards (ESRS) aim to standardize ESG reporting, yet generating comprehensive reports remains challenging due… ▽ More

    Submitted 28 May, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: Long paper

  36. arXiv:2503.10573  [pdf, other

    cs.LG

    Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach

    Authors: Afrar Jahin, Arif Hassan Zidan, Wei Zhang, Yu Bao, Tianming Liu

    Abstract: With the rapid advancement of Artificial Intelligence (AI), Large Language Models (LLMs) have significantly impacted a wide array of domains, including healthcare, engineering, science, education, and mathematical reasoning. Among these, mathematical reasoning remains a particularly challenging capability, often requiring multi-step logic and abstract generalization. While prior work has explored… ▽ More

    Submitted 19 May, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  37. arXiv:2503.07328  [pdf, other

    cs.PL cs.LO cs.SE

    Complete the Cycle: Reachability Types with Expressive Cyclic References

    Authors: Haotian Deng, Siyuan He, Songlin Jia, Yuyan Bao, Tiark Rompf

    Abstract: Reachability Types (RT) are a qualified type system for tracking aliasing and separation in functional and higher-order programming. By formalizing resource reachability with a sound static type system, RT enable higher-order programming patterns with runtime safety and non-interference guarantees. However, previous RT systems have been based on calculi that restrict cyclic dependencies and are sh… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  38. arXiv:2503.04919  [pdf, other

    cs.CV

    FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement

    Authors: Ian Huang, Yanan Bao, Karen Truong, Howard Zhou, Cordelia Schmid, Leonidas Guibas, Alireza Fathi

    Abstract: Scene generation with 3D assets presents a complex challenge, requiring both high-level semantic understanding and low-level geometric reasoning. While Multimodal Large Language Models (MLLMs) excel at semantic tasks, their application to 3D scene generation is hindered by their limited grounding on 3D geometry. In this paper, we investigate how to best work with MLLMs in an object placement task.… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  39. arXiv:2503.01214  [pdf, other

    cs.CV physics.optics

    One-Step Event-Driven High-Speed Autofocus

    Authors: Yuhan Bao, Shaohua Gao, Wenyong Li, Kaiwei Wang

    Abstract: High-speed autofocus in extreme scenes remains a significant challenge. Traditional methods rely on repeated sampling around the focus position, resulting in ``focus hunting''. Event-driven methods have advanced focusing speed and improved performance in low-light conditions; however, current approaches still require at least one lengthy round of ``focus hunting'', involving the collection of a co… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Main text: 9 pages, 6 figures. Supplementary Material: 4 pages, 3 figures. Accepted by CVPR2025

  40. arXiv:2502.16002  [pdf, other

    cs.CL

    KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse

    Authors: Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang

    Abstract: We describe KVLink, an approach for efficient key-value (KV) cache reuse in large language models (LLMs). In many LLM applications, different inputs can share overlapping context, such as the same retrieved document appearing in multiple queries. However, the LLMs still need to encode the entire context for each query, leading to redundant computation. In this paper, we investigate a new strategy… ▽ More

    Submitted 21 May, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

  41. arXiv:2502.12893  [pdf, other

    cs.CL

    H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking

    Authors: Martin Kuo, Jianyi Zhang, Aolin Ding, Qinsi Wang, Louis DiValentin, Yujia Bao, Wei Wei, Hai Li, Yiran Chen

    Abstract: Large Reasoning Models (LRMs) have recently extended their powerful reasoning capabilities to safety checks-using chain-of-thought reasoning to decide whether a request should be answered. While this new approach offers a promising route for balancing model utility and safety, its robustness remains underexplored. To address this gap, we introduce Malicious-Educator, a benchmark that disguises ext… ▽ More

    Submitted 26 February, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: Website: https://maliciouseducator.org/

  42. arXiv:2502.10896  [pdf

    cs.CL

    Developing Conversational Speech Systems for Robots to Detect Speech Biomarkers of Cognition in People Living with Dementia

    Authors: Rohith Perumandla, Young-Ho Bae, Diego Izaguirre, Esther Hwang, Andrew Murphy, Long-Jing Hsu, Selma Sabanovic, Casey C. Bennett

    Abstract: This study presents the development and testing of a conversational speech system designed for robots to detect speech biomarkers indicative of cognitive impairments in people living with dementia (PLwD). The system integrates a backend Python WebSocket server and a central core module with a large language model (LLM) fine-tuned for dementia to process user input and generate robotic conversation… ▽ More

    Submitted 15 February, 2025; originally announced February 2025.

    Comments: Main paper 28 pages long (pg 2-30), includes 5 figures, 5 tables, 1 Appendix at end

  43. arXiv:2502.10448  [pdf

    cs.CR

    Supply Chain Network Security Investment Strategies Based on Nonlinear Budget Constraints: The Moderating Roles of Market Share and Attack Risk

    Authors: Jiajie Cheng, Jiaxin Wang, Caijiao Li, Luxiang Zhang, Yusheng Fan, Yujie Bao, Wen Zhou

    Abstract: In the context of the rapid development of digital supply chain networks, dealing with the increasing cybersecurity threats and formulating effective security investment strategies to defend against cyberattack risks are the core issues in supply chain management. Cybersecurity investment decision-making is a key strategic task in enterprise supply chain manage-ment. Traditional game theory models… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: Under Consideration at Operations Management Research

  44. arXiv:2502.07380  [pdf, other

    cs.RO

    Demonstrating Wheeled Lab: Modern Sim2Real for Low-cost, Open-source Wheeled Robotics

    Authors: Tyler Han, Preet Shah, Sidharth Rajagopal, Yanda Bao, Sanghun Jung, Sidharth Talia, Gabriel Guo, Bryan Xu, Bhaumik Mehta, Emma Romig, Rosario Scalise, Byron Boots

    Abstract: Simulation has been pivotal in recent robotics milestones and is poised to play a prominent role in the field's future. However, recent robotic advances often rely on expensive and high-maintenance platforms, limiting access to broader robotics audiences. This work introduces Wheeled Lab, a framework for the low-cost, open-source wheeled platforms that are already widely established in education a… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: Under Review

  45. arXiv:2502.00818  [pdf, other

    stat.ML cs.LG

    Error-quantified Conformal Inference for Time Series

    Authors: Junxi Wu, Dongjian Hu, Yajie Bao, Shu-Tao Xia, Changliang Zou

    Abstract: Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data. Conformal inference provides a pivotal and flexible instrument for assessing the uncertainty of machine learning models through prediction sets. Recently, a series of online conformal inference methods updated thresholds of prediction sets by performing onli… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

    Comments: ICLR 2025 camera version

  46. arXiv:2501.15017  [pdf, ps, other

    astro-ph.EP astro-ph.IM cs.LG

    SPOCK 2.0: Update to the FeatureClassifier in the Stability of Planetary Orbital Configurations Klassifier

    Authors: Elio Thadhani, Yolanda Ba, Hanno Rein, Daniel Tamayo

    Abstract: The Stability of Planetary Orbital Configurations Klassifier (SPOCK) package collects machine learning models for predicting the stability and collisional evolution of compact planetary systems. In this paper we explore improvements to SPOCK's binary stability classifier (FeatureClassifier), which predicts orbital stability by collecting data over a short N-body integration of a system. We find th… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

    Comments: 3 pages, 1 table. Submitted to RNAAS

  47. arXiv:2501.12709  [pdf, other

    quant-ph cs.AI cs.CR cs.DC

    Practical quantum federated learning and its experimental demonstration

    Authors: Zhi-Ping Liu, Xiao-Yu Cao, Hao-Wen Liu, Xiao-Ran Sun, Yu Bao, Yu-Shuo Lu, Hua-Lei Yin, Zeng-Bing Chen

    Abstract: Federated learning is essential for decentralized, privacy-preserving model training in the data-driven era. Quantum-enhanced federated learning leverages quantum resources to address privacy and scalability challenges, offering security and efficiency advantages beyond classical methods. However, practical and scalable frameworks addressing privacy concerns in the quantum computing era remain und… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Comments: 21 pages, 5 figures, 3 tables

  48. arXiv:2501.12599  [pdf, ps, other

    cs.AI cs.LG

    Kimi k1.5: Scaling Reinforcement Learning with LLMs

    Authors: Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang , et al. (71 additional authors not shown)

    Abstract: Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior pu… ▽ More

    Submitted 2 June, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: 25 pages

  49. arXiv:2501.03122  [pdf, other

    cs.CV

    Normalizing Batch Normalization for Long-Tailed Recognition

    Authors: Yuxiang Bao, Guoliang Kang, Linlin Yang, Xiaoyue Duan, Bo Zhao, Baochang Zhang

    Abstract: In real-world scenarios, the number of training samples across classes usually subjects to a long-tailed distribution. The conventionally trained network may achieve unexpected inferior performance on the rare class compared to the frequent class. Most previous works attempt to rectify the network bias from the data-level or from the classifier-level. Differently, in this paper, we identify that t… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  50. arXiv:2412.16145  [pdf, other

    cs.LG cs.AI cs.CL

    Offline Reinforcement Learning for LLM Multi-Step Reasoning

    Authors: Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, Yi Wu

    Abstract: Improving the multi-step reasoning ability of large language models (LLMs) with offline reinforcement learning (RL) is essential for quickly adapting them to complex tasks. While Direct Preference Optimization (DPO) has shown promise in aligning LLMs with human preferences, it is less suitable for multi-step reasoning tasks because (1) DPO relies on paired preference data, which is not readily ava… ▽ More

    Submitted 25 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.