Skip to main content

Showing 1–50 of 241 results for author: Qi, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18348  [pdf, ps, other

    cs.AI

    Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team

    Authors: Weilun Yu, Shixiang Tang, Yonggui Huang, Nanqing Dong, Li Fan, Honggang Qi, Wei Liu, Xiaoli Diao, Xi Chen, Wanli Ouyang

    Abstract: Scientific progress increasingly relies on effective collaboration among researchers, a dynamic that large language models (LLMs) have only begun to emulate. While recent LLM-based scientist agents show promise in autonomous scientific discovery, they often lack the interactive reasoning and evaluation mechanisms essential to real-world research. We propose IDVSCI (Internal Discussion and Vote SCI… ▽ More

    Submitted 27 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  2. arXiv:2506.12314  [pdf, ps, other

    cs.RO eess.SY

    Explosive Output to Enhance Jumping Ability: A Variable Reduction Ratio Design Paradigm for Humanoid Robots Knee Joint

    Authors: Xiaoshuai Ma, Haoxiang Qi, Qingqing Li, Haochen Xu, Xuechao Chen, Junyao Gao, Zhangguo Yu, Qiang Huang

    Abstract: Enhancing the explosive power output of the knee joints is critical for improving the agility and obstacle-crossing capabilities of humanoid robots. However, a mismatch between the knee-to-center-of-mass (CoM) transmission ratio and jumping demands, coupled with motor performance degradation at high speeds, restricts the duration of high-power output and limits jump performance. To address these p… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  3. arXiv:2506.11469  [pdf, ps, other

    cs.AI

    Structure-Aware Automatic Channel Pruning by Searching with Graph Embedding

    Authors: Zifan Liu, Yuan Cao, Yanwei Yu, Heng Qi, Jie Gui

    Abstract: Channel pruning is a powerful technique to reduce the computational overhead of deep neural networks, enabling efficient deployment on resource-constrained devices. However, existing pruning methods often rely on local heuristics or weight-based criteria that fail to capture global structural dependencies within the network, leading to suboptimal pruning decisions and degraded model performance. T… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 12 pages, 2 figures

  4. arXiv:2506.10006  [pdf

    cs.MM cs.AI cs.CV cs.LG

    HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction

    Authors: Jie Qin, Wei Yang, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi

    Abstract: Current HER2 assessment models for breast cancer predominantly analyze H&E or IHC images in isolation,despite clinical reliance on their synergistic interpretation. However, concurrent acquisition of both modalities is often hindered by workflow complexity and cost constraints. We propose an adaptive bimodal framework enabling flexible single-/dual-modality HER2 prediction through three innovation… ▽ More

    Submitted 12 April, 2025; originally announced June 2025.

    Comments: 7 pages,5 figures,3 tables,submitted to the 33rd ACM International Conference on Multimedia(ACM MM 2025)

  5. arXiv:2506.01608  [pdf, ps, other

    cs.CV cs.AI cs.LG q-bio.OT

    EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models

    Authors: Andy Bonnetto, Haozhe Qi, Franklin Leong, Matea Tashkovska, Mahdi Rad, Solaiman Shokur, Friedhelm Hummel, Silvestro Micera, Marc Pollefeys, Alexander Mathis

    Abstract: Understanding behavior requires datasets that capture humans while carrying out complex tasks. The kitchen is an excellent environment for assessing human motor and cognitive function, as many complex actions are naturally exhibited in kitchens from chopping to cleaning. Here, we introduce the EPFL-Smart-Kitchen-30 dataset, collected in a noninvasive motion capture platform inside a kitchen enviro… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Code and data at: https://github.com/amathislab/EPFL-Smart-Kitchen

  6. arXiv:2505.23247  [pdf, ps, other

    cs.LG cs.AI math.OC

    Accelerating RLHF Training with Reward Variance Increase

    Authors: Zonglin Yang, Zhexuan Gu, Houduo Qi, Yancheng Yuan

    Abstract: Reinforcement learning from human feedback (RLHF) is an essential technique for ensuring that large language models (LLMs) are aligned with human values and preferences during the post-training phase. As an effective RLHF approach, group relative policy optimization (GRPO) has demonstrated success in many LLM-based applications. However, efficient GRPO-based RLHF training remains a challenge. Rece… ▽ More

    Submitted 17 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  7. arXiv:2505.21089  [pdf, other

    cs.CV

    DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response

    Authors: Junjue Wang, Weihao Xuan, Heli Qi, Zhihao Liu, Kunyi Liu, Yuhan Wu, Hongruixuan Chen, Jian Song, Junshi Xia, Zhuo Zheng, Naoto Yokoya

    Abstract: Large vision-language models (VLMs) have made great achievements in Earth vision. However, complex disaster scenes with diverse disaster types, geographic regions, and satellite sensors have posed new challenges for VLM applications. To fill this gap, we curate a remote sensing vision-language dataset (DisasterM3) for global-scale disaster assessment and response. DisasterM3 includes 26,988 bi-tem… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: A multi-hazard, multi-sensor, and multi-task vision-language dataset for global-scale disaster assessment and response

    ACM Class: I.4.9

  8. arXiv:2505.21076  [pdf, ps, other

    cs.CV

    DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding

    Authors: Weihao Xuan, Junjue Wang, Heli Qi, Zihang Chen, Zhuo Zheng, Yanfei Zhong, Junshi Xia, Naoto Yokoya

    Abstract: Multimodal large language models have demonstrated remarkable capabilities in visual understanding, but their application to long-term Earth observation analysis remains limited, primarily focusing on single-temporal or bi-temporal imagery. To address this gap, we introduce DVL-Suite, a comprehensive framework for analyzing long-term urban dynamics through remote sensing imagery. Our suite compris… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  9. arXiv:2505.20236  [pdf, ps, other

    cs.CV

    Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models

    Authors: Weihao Xuan, Qingcheng Zeng, Heli Qi, Junjue Wang, Naoto Yokoya

    Abstract: Uncertainty quantification is essential for assessing the reliability and trustworthiness of modern AI systems. Among existing approaches, verbalized uncertainty, where models express their confidence through natural language, has emerged as a lightweight and interpretable solution in large language models (LLMs). However, its effectiveness in vision-language models (VLMs) remains insufficiently s… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  10. arXiv:2505.14566  [pdf, ps, other

    cs.LG cs.AI

    KIPPO: Koopman-Inspired Proximal Policy Optimization

    Authors: Andrei Cozma, Landon Harris, Hairong Qi

    Abstract: Reinforcement Learning (RL) has made significant strides in various domains, and policy gradient methods like Proximal Policy Optimization (PPO) have gained popularity due to their balance in performance, training stability, and computational efficiency. These methods directly optimize policies through gradient-based updates. However, developing effective control policies for environments with com… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted for IJCAI 2025. This arXiv submission is the full version of the conference paper, including the appendix and supplementary material omitted from the IJCAI proceedings

  11. arXiv:2505.12668  [pdf, ps, other

    cs.SE

    Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation

    Authors: Hanzhuo Tan, Xiaolong Tian, Hanrui Qi, Jiaming Liu, Zuchen Gao, Siyi Wang, Qi Luo, Jing Li, Yuqun Zhang

    Abstract: Recent advances in LLM-based decompilers have been shown effective to convert low-level binaries into human-readable source code. However, there still lacks a comprehensive benchmark that provides large-scale binary-source function pairs, which is critical for advancing the LLM decompilation technology. Creating accurate binary-source mappings incurs severe issues caused by complex compilation set… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  12. arXiv:2505.08747  [pdf, other

    cs.CV cs.AI

    Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion

    Authors: Huiyan Qi, Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Ee-Peng Lim

    Abstract: Nutrition estimation is an important component of promoting healthy eating and mitigating diet-related health risks. Despite advances in tasks such as food classification and ingredient recognition, progress in nutrition estimation is limited due to the lack of datasets with nutritional annotations. To address this issue, we introduce FastFood, a dataset with 84,446 images across 908 fast food cat… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Accepted for publication in ACM International Conference on Multimedia Retrieval 2025

  13. arXiv:2505.03123  [pdf

    eess.IV cs.CV cs.MM

    STG: Spatiotemporal Graph Neural Network with Fusion and Spatiotemporal Decoupling Learning for Prognostic Prediction of Colorectal Cancer Liver Metastasis

    Authors: Yiran Zhu, Wei Yang, Yan su, Zesheng Li, Chengchang Pan, Honggang Qi

    Abstract: We propose a multimodal spatiotemporal graph neural network (STG) framework to predict colorectal cancer liver metastasis (CRLM) progression. Current clinical models do not effectively integrate the tumor's spatial heterogeneity, dynamic evolution, and complex multimodal data relationships, limiting their predictive accuracy. Our STG framework combines preoperative CT imaging and clinical data int… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 9 pages, 4 figures, 5 tables

  14. arXiv:2505.01135  [pdf, other

    cs.LG

    Dual-Forecaster: A Multimodal Time Series Model Integrating Descriptive and Predictive Texts

    Authors: Wenfa Wu, Guanyu Zhang, Zheng Tan, Yi Wang, Hongsheng Qi

    Abstract: Most existing single-modal time series models rely solely on numerical series, which suffer from the limitations imposed by insufficient information. Recent studies have revealed that multimodal models can address the core issue by integrating textual information. However, these models focus on either historical or future textual information, overlooking the unique contributions each plays in time… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  15. arXiv:2504.13685  [pdf, other

    cs.CL cs.LG stat.AP stat.CO

    Deep literature reviews: an application of fine-tuned language models to migration research

    Authors: Stefano M. Iacus, Haodong Qi, Jiyoung Han

    Abstract: This paper presents a hybrid framework for literature reviews that augments traditional bibliometric methods with large language models (LLMs). By fine-tuning open-source LLMs, our approach enables scalable extraction of qualitative insights from large volumes of research content, enhancing both the breadth and depth of knowledge synthesis. To improve annotation efficiency and consistency, we intr… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  16. arXiv:2504.03026  [pdf, other

    cs.CV

    HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations

    Authors: Yiran Xu, Siqi Xie, Zhuofang Li, Harris Shadmany, Yinxiao Li, Luciano Sbaiz, Miaosen Wang, Junjie Ke, Jose Lezama, Hang Qi, Han Zhang, Jesse Berent, Ming-Hsuan Yang, Irfan Essa, Jia-Bin Huang, Feng Yang

    Abstract: Image retargeting aims to change the aspect-ratio of an image while maintaining its content and structure with less visual artifacts. Existing methods still generate many artifacts or fail to maintain original content or structure. To address this, we introduce HALO, an end-to-end trainable solution for image retargeting. Since humans are more sensitive to distortions in salient areas than non-sal… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  17. arXiv:2504.00762  [pdf, other

    cs.AI

    Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

    Authors: Jianhao Chen, Zishuo Xun, Bocheng Zhou, Han Qi, Hangfan Zhang, Qiaosheng Zhang, Yang Chen, Wei Hu, Yuzhong Qu, Wanli Ouyang, Shuyue Hu

    Abstract: This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths that potentially arise from diverse training data and paradigms. By using consistency as a si… ▽ More

    Submitted 8 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  18. arXiv:2503.22747  [pdf, other

    cs.LG cs.AI cs.ET

    LeForecast: Enterprise Hybrid Forecast by Time Series Intelligence

    Authors: Zheng Tan, Yiwen Nie, Wenfa Wu, Guanyu Zhang, Yanze Liu, Xinyuan Tian, Kailin Gao, Mengya Liu, Qijiang Cheng, Haipeng Jiang, Yingzheng Ma, Wei Zheng, Yuci Zhu, Yuanyuan Sun, Xiangyu Lei, Xiyu Guan, Wanqing Huang, Shouming Liu, Xiangquan Meng, Pengzhan Qu, Chao Yang, Jiaxuan Fan, Yuan He, Hongsheng Qi, Yangzhou Du

    Abstract: Demand is spiking in industrial fields for multidisciplinary forecasting, where a broad spectrum of sectors needs planning and forecasts to streamline intelligent business management, such as demand forecasting, product planning, inventory optimization, etc. Specifically, these tasks expecting intelligent approaches to learn from sequentially collected historical data and then foresee most possibl… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  19. arXiv:2503.18712  [pdf, other

    cs.CV

    LLaVAction: evaluating and training multi-modal large language models for action recognition

    Authors: Shaokai Ye, Haozhe Qi, Alexander Mathis, Mackenzie W. Mathis

    Abstract: Understanding human behavior requires measuring behavioral actions. Due to its complexity, behavior is best mapped onto a rich, semantic structure such as language. The recent development of multi-modal large language models (MLLMs) is a promising candidate for a wide range of action understanding tasks. In this work, we focus on evaluating and then improving MLLMs to perform action recognition. W… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: https://github.com/AdaptiveMotorControlLab/LLaVAction

  20. arXiv:2503.18223  [pdf, ps, other

    cs.CV cs.IR q-bio.NC q-bio.QM

    MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps

    Authors: Valentin Gabeff, Haozhe Qi, Brendan Flaherty, Gencer Sumbül, Alexander Mathis, Devis Tuia

    Abstract: Monitoring wildlife is essential for ecology and ethology, especially in light of the increasing human impact on ecosystems. Camera traps have emerged as habitat-centric sensors enabling the study of wildlife populations at scale with minimal disturbance. However, the lack of annotated video datasets limits the development of powerful video understanding models needed to process the vast amount of… ▽ More

    Submitted 4 June, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: CVPR 2025; Benchmark and code at: https://github.com/eceo-epfl/MammAlps. After submission of v1, we noticed that a few audio files were not correctly aligned with the corresponding video. We fixed the issue, which had little to no impact on performance. We also now report results for three runs

  21. arXiv:2503.10497  [pdf, other

    cs.CL

    MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation

    Authors: Weihao Xuan, Rui Yang, Heli Qi, Qingcheng Zeng, Yunze Xiao, Aosong Feng, Dairui Liu, Yun Xing, Junjue Wang, Fan Gao, Jinghui Lu, Yuang Jiang, Huitao Li, Xin Li, Kunyu Yu, Ruihai Dong, Shangding Gu, Yuekang Li, Xiaofei Xie, Felix Juefei-Xu, Foutse Khomh, Osamu Yoshie, Qingyu Chen, Douglas Teodoro, Nan Liu , et al. (7 additional authors not shown)

    Abstract: Existing large language model (LLM) evaluation benchmarks primarily focus on English, while current multilingual tasks lack parallel questions that specifically assess cross-linguistic reasoning abilities. This dual limitation makes it challenging to comprehensively assess LLMs' performance in the multilingual setting. To fill this gap, we introduce MMLU-ProX, a comprehensive benchmark covering 29… ▽ More

    Submitted 26 May, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  22. arXiv:2503.00643  [pdf, other

    cs.CV cs.AI

    Deep Change Monitoring: A Hyperbolic Representative Learning Framework and a Dataset for Long-term Fine-grained Tree Change Detection

    Authors: Yante Li, Hanwen Qi, Haoyu Chen, Xinlian Liang, Guoying Zhao

    Abstract: In environmental protection, tree monitoring plays an essential role in maintaining and improving ecosystem health. However, precise monitoring is challenging because existing datasets fail to capture continuous fine-grained changes in trees due to low-resolution images and high acquisition costs. In this paper, we introduce UAVTC, a large-scale, long-term, high-resolution dataset collected using… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: 10 pages, 6 figures

  23. arXiv:2503.00052  [pdf

    cs.CV cs.AI cs.LG

    RURA-Net: A general disease diagnosis method based on Zero-Shot Learning

    Authors: Yan Su, Qiulin Wu, Weizhen Li, Chengchang Pan, Honggang Qi

    Abstract: The training of deep learning models relies on a large amount of labeled data. However, the high cost of medical labeling seriously hinders the development of deep learning in the medical field. Our study proposes a general disease diagnosis approach based on Zero-Shot Learning. The Siamese neural network is used to find similar diseases for the target diseases, and the U-Net segmentation model is… ▽ More

    Submitted 26 February, 2025; originally announced March 2025.

    Comments: 10 pages, 3 figures, 6 tables, submitted to The 28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2025)

  24. arXiv:2502.20224  [pdf

    eess.IV cs.AI cs.CV

    RURANET++: An Unsupervised Learning Method for Diabetic Macular Edema Based on SCSE Attention Mechanisms and Dynamic Multi-Projection Head Clustering

    Authors: Wei Yang, Yiran Zhu, Jiayu Shen, Yuhan Tang, Chengchang Pan, Hui He, Yan Su, Honggang Qi

    Abstract: Diabetic Macular Edema (DME), a prevalent complication among diabetic patients, constitutes a major cause of visual impairment and blindness. Although deep learning has achieved remarkable progress in medical image analysis, traditional DME diagnosis still relies on extensive annotated data and subjective ophthalmologist assessments, limiting practical applications. To address this, we present RUR… ▽ More

    Submitted 7 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: 10 pages, 2 figures, 5 tables, submitted to The 28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2025)

  25. arXiv:2502.19692  [pdf

    eess.IV cs.CV

    A Residual Multi-task Network for Joint Classification and Regression in Medical Imaging

    Authors: Junji Lin, Yi Zhang, Yunyue Pan, Yuli Chen, Chengchang Pan, Honggang Qi

    Abstract: Detection and classification of pulmonary nodules is a challenge in medical image analysis due to the variety of shapes and sizes of nodules and their high concealment. Despite the success of traditional deep learning methods in image classification, deep networks still struggle to perfectly capture subtle changes in lung nodule detection. Therefore, we propose a residual multi-task network (Res-M… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  26. arXiv:2502.19153  [pdf

    eess.IV cs.CV cs.LG

    RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images

    Authors: Yuhan Tang, Yudian Wang, Weizhen Li, Ye Yue, Chengchang Pan, Honggang Qi

    Abstract: Fundus image quality is crucial for diagnosing eye diseases, but real-world conditions often result in blurred or unreadable images, increasing diagnostic uncertainty. To address these challenges, this study proposes RetinaRegen, a hybrid model for retinal image restoration that integrates a readability classifi-cation model, a Diffusion Model, and a Variational Autoencoder (VAE). Ex-periments on… ▽ More

    Submitted 27 February, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  27. A Hybrid Cross-Stage Coordination Pre-ranking Model for Online Recommendation Systems

    Authors: Binglei Zhao, Houying Qi, Guang Xu, Mian Ma, Xiwei Zhao, Feng Mei, Sulong Xu, Jinghe Hu

    Abstract: Large-scale recommendation systems often adopt cascading architecture consisting of retrieval, pre-ranking, ranking, and re-ranking stages. With strict latency requirements, pre-ranking utilizes lightweight models to perform a preliminary selection from massive retrieved candidates. However, recent works focus solely on improving consistency with ranking, relying exclusively on downstream stages.… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW 2025

  28. arXiv:2502.09256  [pdf

    cs.CV cs.AI

    DynSegNet:Dynamic Architecture Adjustment for Adversarial Learning in Segmenting Hemorrhagic Lesions from Fundus Images

    Authors: Zesheng Li, Minwen Liao, Haoran Chen, Yan Su, Chengchang Pan, Honggang Qi

    Abstract: The hemorrhagic lesion segmentation plays a critical role in ophthalmic diagnosis, directly influencing early disease detection, treatment planning, and therapeutic efficacy evaluation. However, the task faces significant challenges due to lesion morphological variability, indistinct boundaries, and low contrast with background tissues. To improve diagnostic accuracy and treatment outcomes, develo… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 12 pages,4 figures

  29. arXiv:2502.08386  [pdf, other

    cs.DC cs.NI

    Accelerating Stable Matching between Workers and Spatial-Temporal Tasks for Dynamic MCS: A Stagewise Service Trading Approach

    Authors: Houyi Qi, Minghui Liwang, Xianbin Wang, Liqun Fu, Yiguang Hong, Li Li, Zhipeng Cheng

    Abstract: Designing proper incentives in mobile crowdsensing (MCS) networks represents a critical mechanism in engaging distributed mobile users (workers) to contribute heterogeneous data for diverse applications (tasks). We develop a novel stagewise trading framework to reach efficient and stable matching between tasks and workers, upon considering the diversity of tasks and the dynamism of MCS networks. T… ▽ More

    Submitted 14 March, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

  30. arXiv:2502.08118  [pdf, ps, other

    cs.DC cs.NI

    Future Resource Bank for ISAC: Achieving Fast and Stable Win-Win Matching for Both Individuals and Coalitions

    Authors: Houyi Qi, Minghui Liwang, Seyyedali Hosseinalipour, Liqun Fu, Sai Zou, Wei Ni

    Abstract: Future wireless networks must support emerging applications where environmental awareness is as critical as data transmission. Integrated Sensing and Communication (ISAC) enables this vision by allowing base stations (BSs) to allocate bandwidth and power to mobile users (MUs) for communications and cooperative sensing. However, this resource allocation is highly challenging due to: (i) dynamic res… ▽ More

    Submitted 9 July, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  31. arXiv:2502.05434  [pdf, other

    cs.LG

    Sample-Efficient Reinforcement Learning from Human Feedback via Information-Directed Sampling

    Authors: Han Qi, Haochen Yang, Qiaosheng Zhang, Zhuoran Yang

    Abstract: We study the problem of reinforcement learning from human feedback (RLHF), a critical problem in training large language models, from a theoretical perspective. Our main contribution is the design of novel sample-efficient RLHF algorithms based on information-directed sampling (IDS), an online decision-making principle inspired by information theory. Our algorithms maximize the sum of the value fu… ▽ More

    Submitted 12 February, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

  32. arXiv:2502.00622  [pdf, other

    cs.RO cs.CV cs.LG

    Strengthening Generative Robot Policies through Predictive World Modeling

    Authors: Han Qi, Haocheng Yin, Aris Zhu, Yilun Du, Heng Yang

    Abstract: We present generative predictive control (GPC), a learning control framework that (i) clones a generative diffusion-based policy from expert demonstrations, (ii) trains a predictive action-conditioned world model from both expert demonstrations and random explorations, and (iii) synthesizes an online planner that ranks and optimizes the action proposals from (i) by looking ahead into the future us… ▽ More

    Submitted 21 May, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: Website: https://computationalrobotics.seas.harvard.edu/GPC

  33. arXiv:2501.19017  [pdf, ps, other

    cs.CL

    Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation

    Authors: Bin Zhu, Huiyan Qi, Yinxuan Gui, Jingjing Chen, Chong-Wah Ngo, Ee-Peng Lim

    Abstract: Multimodal Large Language Models (MLLMs) have exhibited remarkable advancements in integrating different modalities, excelling in complex understanding and generation tasks. Despite their success, MLLMs remain vulnerable to conversational adversarial inputs, particularly negation arguments. This paper systematically evaluates state-of-the-art MLLMs across diverse benchmarks, revealing significant… ▽ More

    Submitted 1 June, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

  34. arXiv:2501.17206  [pdf, other

    cs.AI cs.RO

    Integrating Reinforcement Learning and AI Agents for Adaptive Robotic Interaction and Assistance in Dementia Care

    Authors: Fengpei Yuan, Nehal Hasnaeen, Ran Zhang, Bryce Bible, Joseph Riley Taylor, Hairong Qi, Fenghui Yao, Xiaopeng Zhao

    Abstract: This study explores a novel approach to advancing dementia care by integrating socially assistive robotics, reinforcement learning (RL), large language models (LLMs), and clinical domain expertise within a simulated environment. This integration addresses the critical challenge of limited experimental data in socially assistive robotics for dementia care, providing a dynamic simulation environment… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: 18 pages, 12 figures

  35. arXiv:2501.14314  [pdf, other

    cs.LG

    Graph Feedback Bandits on Similar Arms: With and Without Graph Structures

    Authors: Han Qi, Fei Guo, Li Zhu, Qiaosheng Zhang, Xuelong Li

    Abstract: In this paper, we study the stochastic multi-armed bandit problem with graph feedback. Motivated by applications in clinical trials and recommendation systems, we assume that two arms are connected if and only if they are similar (i.e., their means are close to each other). We establish a regret lower bound for this problem under the novel feedback structure and introduce two upper confidence boun… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.11171

  36. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  37. A 3-Step Optimization Framework with Hybrid Models for a Humanoid Robot's Jump Motion

    Authors: Haoxiang Qi, Zhangguo Yu, Xuechao Chen, Yaliang Liu, Chuanku Yi, Chencheng Dong, Fei Meng, Qiang Huang

    Abstract: High dynamic jump motions are challenging tasks for humanoid robots to achieve environment adaptation and obstacle crossing. The trajectory optimization is a practical method to achieve high-dynamic and explosive jumping. This paper proposes a 3-step trajectory optimization framework for generating a jump motion for a humanoid robot. To improve iteration speed and achieve ideal performance, the fr… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  38. arXiv:2501.09279  [pdf

    cs.AI

    Text Semantics to Flexible Design: A Residential Layout Generation Method Based on Stable Diffusion Model

    Authors: Zijin Qiu, Jiepeng Liu, Yi Xia, Hongtuo Qi, Pengkun Liu

    Abstract: Flexibility in the AI-based residential layout design remains a significant challenge, as traditional methods like rule-based heuristics and graph-based generation often lack flexibility and require substantial design knowledge from users. To address these limitations, we propose a cross-modal design approach based on the Stable Diffusion model for generating flexible residential layouts. The meth… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  39. arXiv:2501.05439  [pdf, other

    cs.RO cs.AI cs.LG

    From Simple to Complex Skills: The Case of In-Hand Object Reorientation

    Authors: Haozhi Qi, Brent Yi, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

    Abstract: Learning policies in simulation and transferring them to the real world has become a promising approach in dexterous manipulation. However, bridging the sim-to-real gap for each new task requires substantial human effort, such as careful reward engineering, hyperparameter tuning, and system identification. In this work, we present a system that leverages low-level skills to address these challenge… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: website: https://dexhier.github.io

  40. arXiv:2501.00059  [pdf, other

    cs.CL cs.AI

    Large Language Models for Mathematical Analysis

    Authors: Ziye Chen, Hao Qi

    Abstract: Mathematical problem-solving is a key field in artificial intelligence (AI) and a critical benchmark for evaluating the capabilities of large language models (LLMs). While extensive research has focused on mathematical problem-solving, most existing work and datasets concentrate on computational tasks, leaving gaps in areas like mathematical analysis, which demands rigorous proofs and formal reaso… ▽ More

    Submitted 28 December, 2024; originally announced January 2025.

  41. arXiv:2412.14222  [pdf, other

    cs.AI cs.CL cs.LG stat.OT

    A Survey on Large Language Model-based Agents for Statistics and Data Science

    Authors: Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang

    Abstract: In recent years, data science agents powered by Large Language Models (LLMs), known as "data agents," have shown significant potential to transform the traditional data analysis paradigm. This survey provides an overview of the evolution, capabilities, and applications of LLM-based data agents, highlighting their role in simplifying complex data tasks and lowering the entry barrier for users witho… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  42. arXiv:2412.02270  [pdf, other

    cs.CV cs.AI

    Sustainable Self-evolution Adversarial Training

    Authors: Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang

    Abstract: With the wide application of deep neural network models in various computer vision tasks, there has been a proliferation of adversarial example generation strategies aimed at deeply exploring model security. However, existing adversarial training defense models, which rely on single or limited types of attacks under a one-time learning process, struggle to adapt to the dynamic and evolving nature… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

    Comments: Accepted to ACMMM 2024

  43. arXiv:2411.13076  [pdf, other

    cs.CV

    Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving

    Authors: Hao Zhou, Zhanning Gao, Maosheng Ye, Zhili Chen, Qifeng Chen, Tongyi Cao, Honggang Qi

    Abstract: In light of the dynamic nature of autonomous driving environments and stringent safety requirements, general MLLMs combined with CLIP alone often struggle to represent driving-specific scenarios accurately, particularly in complex interactions and long-tail cases. To address this, we propose the Hints of Prompt (HoP) framework, which introduces three key enhancements: Affinity hint to emphasize in… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

  44. arXiv:2411.07579  [pdf, other

    cs.CV cs.GR

    Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation

    Authors: Han Qi, Tao Cai, Xiyue Han

    Abstract: Recently, 3D Gaussian Splatting has dominated novel-view synthesis with its real-time rendering speed and state-of-the-art rendering quality. However, during the rendering process, the use of the Jacobian of the affine approximation of the projection transformation leads to inevitable errors, resulting in blurriness, artifacts and a lack of scene consistency in the final rendered images. To addres… ▽ More

    Submitted 14 November, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

  45. arXiv:2411.02479  [pdf, other

    cs.RO cs.AI cs.LG

    Digitizing Touch with an Artificial Multimodal Fingertip

    Authors: Mike Lambeta, Tingfan Wu, Ali Sengul, Victoria Rose Most, Nolan Black, Kevin Sawyer, Romeo Mercado, Haozhi Qi, Alexander Sohn, Byron Taylor, Norb Tydingco, Gregg Kammerer, Dave Stroud, Jake Khatha, Kurt Jenkins, Kyle Most, Neal Stein, Ricardo Chavira, Thomas Craven-Bartle, Eric Sanchez, Yitian Ding, Jitendra Malik, Roberto Calandra

    Abstract: Touch is a crucial sensing modality that provides rich information about object properties and interactions with the physical environment. Humans and robots both benefit from using touch to perceive and interact with the surrounding environment (Johansson and Flanagan, 2009; Li et al., 2020; Calandra et al., 2017). However, no existing systems provide rich, multi-modal digital touch-sensing capabi… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: 28 pages

    ACM Class: I.2.0; I.2.9

  46. arXiv:2410.10323  [pdf, other

    cs.CL

    MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media

    Authors: Wei Zhai, Nan Bai, Qing Zhao, Jianqiang Li, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, Guanghui Fu

    Abstract: As the prevalence of mental health challenges, social media has emerged as a key platform for individuals to express their emotions.Deep learning tends to be a promising solution for analyzing mental health on social media. However, black box models are often inflexible when switching between tasks, and their results typically lack explanations. With the rise of large language models (LLMs), their… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  47. arXiv:2410.05063  [pdf, other

    cs.LG cs.CV cs.RO

    Control-oriented Clustering of Visual Latent Representation

    Authors: Han Qi, Haocheng Yin, Heng Yang

    Abstract: We initiate a study of the geometry of the visual representation space -- the information channel from the vision encoder to the action decoder -- in an image-based control pipeline learned from behavior cloning. Inspired by the phenomenon of neural collapse (NC) in image classification (arXiv:2008.08186), we empirically demonstrate the prevalent emergence of a similar law of clustering in the vis… ▽ More

    Submitted 5 February, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Website: https://computationalrobotics.seas.harvard.edu/ControlOriented_NC

  48. arXiv:2410.01577  [pdf, other

    cs.CV cs.LG

    Coordinate-Based Neural Representation Enabling Zero-Shot Learning for 3D Multiparametric Quantitative MRI

    Authors: Guoyan Lao, Ruimin Feng, Haikun Qi, Zhenfeng Lv, Qiangqiang Liu, Chunlei Liu, Yuyao Zhang, Hongjiang Wei

    Abstract: Quantitative magnetic resonance imaging (qMRI) offers tissue-specific physical parameters with significant potential for neuroscience research and clinical practice. However, lengthy scan times for 3D multiparametric qMRI acquisition limit its clinical utility. Here, we propose SUMMIT, an innovative imaging methodology that includes data acquisition and an unsupervised reconstruction for simultane… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  49. arXiv:2409.16843  [pdf, other

    stat.AP cs.LG

    Optimal starting point for time series forecasting

    Authors: Yiming Zhong, Yinuo Ren, Guangyao Cao, Feng Li, Haobo Qi

    Abstract: Recent advances on time series forecasting mainly focus on improving the forecasting models themselves. However, when the time series data suffer from potential structural breaks or concept drifts, the forecasting performance might be significantly reduced. In this paper, we introduce a novel approach called Optimal Starting Point Time Series Forecast (OSP-TSP) for optimal forecasting, which can b… ▽ More

    Submitted 8 February, 2025; v1 submitted 25 September, 2024; originally announced September 2024.

  50. arXiv:2409.08273  [pdf, other

    cs.RO cs.AI cs.CV

    Hand-Object Interaction Pretraining from Videos

    Authors: Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik

    Abstract: We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic ba… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.