Skip to main content

Showing 1–50 of 684 results for author: Ji, H

.
  1. arXiv:2507.02092  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CV

    Energy-Based Transformers are Scalable Learners and Thinkers

    Authors: Alexi Gladstone, Ganesh Nanduru, Md Mofijul Islam, Peixuan Han, Hyeonjeong Ha, Aman Chadha, Yilun Du, Heng Ji, Jundong Li, Tariq Iqbal

    Abstract: Inference-time computation techniques, analogous to human System 2 Thinking, have recently become popular for improving model performances. However, most existing approaches suffer from several limitations: they are modality-specific (e.g., working only in text), problem-specific (e.g., verifiable domains like math and coding), or require additional supervision/training on top of unsupervised pret… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2507.01663  [pdf, ps, other

    cs.LG cs.AI

    AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training

    Authors: Zhenyu Han, Ansheng You, Haibo Wang, Kui Luo, Guang Yang, Wenqi Shi, Menglong Chen, Sicheng Zhang, Zeshun Lan, Chunshi Deng, Huazhong Ji, Wenjie Liu, Yu Huang, Yixiang Zhang, Chenyi Pan, Jing Wang, Xin Huang, Chunsheng Li, Jianping Wu

    Abstract: Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs). Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks, while task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance. Moreover, most existing frameworks are tightly coupled w… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2506.23918  [pdf, ps, other

    cs.CV

    Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

    Authors: Zhaochen Su, Peng Xia, Hangyu Guo, Zhenhua Liu, Yan Ma, Xiaoye Qu, Jiaqi Liu, Yanshu Li, Kaide Zeng, Zhengyuan Yang, Linjie Li, Yu Cheng, Heng Ji, Junxian He, Yi R. Fung

    Abstract: Recent progress in multimodal reasoning has been significantly advanced by textual Chain-of-Thought (CoT), a paradigm where models conduct reasoning within language. This text-centric approach, however, treats vision as a static, initial context, creating a fundamental "semantic gap" between rich perceptual data and discrete symbolic thought. Human cognition often transcends language, utilizing vi… ▽ More

    Submitted 3 July, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

    Comments: Preprint in progress. We maintain a real-time GitHub repository tracking progress at: https://github.com/zhaochen0110/Awesome_Think_With_Images

  4. arXiv:2506.23445  [pdf

    cond-mat.mtrl-sci cond-mat.str-el

    Topotactic phase transformation in correlated vanadium dioxide through oxygen vacancy ordering

    Authors: Xuanchi Zhou, Xiaohui Yao, Xiaomei Qiao, Jiahui Ji, Guowei Zhou, Huihui Ji, Xiaohong Xu

    Abstract: Controlling the insulator-metal transition (IMT) in correlated oxide system through oxygen vacancy ordering opens up a new paradigm for exploring exotic structural transformation and physical functionality. Oxygen vacancy serves as a powerful tuning knob for adjusting the IMT property in VO2, though driving topochemical reduction to V2O3 remains challenging due to structural incompatibility and co… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  5. arXiv:2506.20949  [pdf, ps, other

    cs.AI cs.CL

    Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation

    Authors: Chenkai Sun, Denghui Zhang, ChengXiang Zhai, Heng Ji

    Abstract: Given the growing influence of language model-based agents on high-stakes societal decisions, from public policy to healthcare, ensuring their beneficial impact requires understanding the far-reaching implications of their suggestions. We propose a proof-of-concept framework that projects how model-generated advice could propagate through societal systems on a macroscopic scale over time, enabling… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  6. arXiv:2506.07459  [pdf, ps, other

    cs.LG q-bio.QM

    ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning

    Authors: Ziwen Wang, Jiajun Fan, Ruihan Guo, Thao Nguyen, Heng Ji, Ge Liu

    Abstract: Protein generative models have shown remarkable promise in protein design but still face limitations in success rate, due to the scarcity of high-quality protein datasets for supervised pretraining. We present ProteinZero, a novel framework that enables scalable, automated, and continuous self-improvement of the inverse folding model through online reinforcement learning. To achieve computationall… ▽ More

    Submitted 10 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  7. arXiv:2506.07413  [pdf, ps, other

    cs.LG cs.CV

    Variational Supervised Contrastive Learning

    Authors: Ziwen Wang, Jiajun Fan, Thao Nguyen, Heng Ji, Ge Liu

    Abstract: Contrastive learning has proven to be highly efficient and adaptable in shaping representation spaces across diverse modalities by pulling similar samples together and pushing dissimilar ones apart. However, two key limitations persist: (1) Without explicit regulation of the embedding distribution, semantically related instances can inadvertently be pushed apart unless complementary signals guide… ▽ More

    Submitted 26 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  8. arXiv:2506.06972  [pdf, ps, other

    cs.CL

    Atomic Reasoning for Scientific Table Claim Verification

    Authors: Yuji Zhang, Qingyun Wang, Cheng Qian, Jiateng Liu, Chenkai Sun, Denghui Zhang, Tarek Abdelzaher, Chengxiang Zhai, Preslav Nakov, Heng Ji

    Abstract: Scientific texts often convey authority due to their technical language and complex data. However, this complexity can sometimes lead to the spread of misinformation. Non-experts are particularly susceptible to misleading claims based on scientific tables due to their high information density and perceived credibility. Existing table claim verification models, including state-of-the-art large lang… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  9. arXiv:2506.05869  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Loss Functions for Predictor-based Neural Architecture Search

    Authors: Han Ji, Yuqi Feng, Jiahao Fan, Yanan Sun

    Abstract: Evaluation is a critical but costly procedure in neural architecture search (NAS). Performance predictors have been widely adopted to reduce evaluation costs by directly estimating architecture performance. The effectiveness of predictors is heavily influenced by the choice of loss functions. While traditional predictors employ regression loss functions to evaluate the absolute accuracy of archite… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  10. arXiv:2506.05297  [pdf, ps, other

    eess.IV cs.CV

    DM-SegNet: Dual-Mamba Architecture for 3D Medical Image Segmentation with Global Context Modeling

    Authors: Hangyu Ji

    Abstract: Accurate 3D medical image segmentation demands architectures capable of reconciling global context modeling with spatial topology preservation. While State Space Models (SSMs) like Mamba show potential for sequence modeling, existing medical SSMs suffer from encoder-decoder incompatibility: the encoder's 1D sequence flattening compromises spatial structures, while conventional decoders fail to lev… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  11. arXiv:2506.05021  [pdf

    cond-mat.mtrl-sci physics.chem-ph

    Mechanistic Insights into Water-Splitting, Proton Migration, and Hydrogen Evolution Reaction in g-C3N4/TiO2-B and Li-F co-doped Heterostructures

    Authors: Shuhan Tang, Qi Jiang, Shuang Qiu, Hanyang Ji, Xiaojie Liu

    Abstract: Solar water splitting has received a lot of attention due to its high efficiency and clean energy production potential. Herein, based on the band alignment principle, the g-C3N4/TiO2-B(001) heterostructure is strategically designed, then a Li-F co-doping approach is developed and implemented, leading to significant enhancement in the photocatalytic hydrogen evolution efficiency of the heterostruct… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  12. arXiv:2506.04001  [pdf, ps, other

    cs.LG cs.AI

    CARL: Causality-guided Architecture Representation Learning for an Interpretable Performance Predictor

    Authors: Han Ji, Yuqi Feng, Jiahao Fan, Yanan Sun

    Abstract: Performance predictors have emerged as a promising method to accelerate the evaluation stage of neural architecture search (NAS). These predictors estimate the performance of unseen architectures by learning from the correlation between a small set of trained architectures and their performance. However, most existing predictors ignore the inherent distribution shift between limited training sampl… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  13. arXiv:2506.02167  [pdf, other

    cs.CV cs.AI

    Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360-Degree Firefighting Videos

    Authors: Aditi Tiwari, Farzaneh Masoud, Dac Trong Nguyen, Jill Kraft, Heng Ji, Klara Nahrstedt

    Abstract: Modern AI systems struggle most in environments where reliability is critical - scenes with smoke, poor visibility, and structural deformation. Each year, tens of thousands of firefighters are injured on duty, often due to breakdowns in situational perception. We introduce Fire360, a benchmark for evaluating perception and reasoning in safety-critical firefighting scenarios. The dataset includes 2… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 20 pages, 9 figures, 6 tables

  14. arXiv:2506.00886  [pdf, ps, other

    cs.AI

    Toward a Theory of Agents as Tool-Use Decision-Makers

    Authors: Hongru Wang, Cheng Qian, Manling Li, Jiahao Qiu, Boyang Xue, Mengdi Wang, Heng Ji, Kam-Fai Wong

    Abstract: As Large Language Models (LLMs) evolve into increasingly autonomous agents, fundamental questions about their epistemic foundations remain unresolved: What defines an agent? How should it make decisions? And what objectives should guide its behavior? In this position paper, we argue that true autonomy requires agents to be grounded in a coherent epistemic framework that governs what they know, wha… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  15. arXiv:2506.00671  [pdf, ps, other

    cs.CL

    DeepRAG: Integrating Hierarchical Reasoning and Process Supervision for Biomedical Multi-Hop QA

    Authors: Yuelyu Ji, Hang Zhang, Shiven Verma, Hui Ji, Chun Li, Yushui Han, Yanshan Wang

    Abstract: We propose DeepRAG, a novel framework that integrates DeepSeek hierarchical question decomposition capabilities with RAG Gym unified retrieval-augmented generation optimization using process level supervision. Targeting the challenging MedHopQA biomedical question answering task, DeepRAG systematically decomposes complex queries into precise sub-queries and employs concept level reward signals inf… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  16. arXiv:2505.22379  [pdf, other

    physics.flu-dyn

    Dynamics of thin film flows on a vertical fibre with vapor absorption

    Authors: Souradip Chattopadhyay, Zihao Yu, Y. Sungtaek Ju, Hangjie Ji

    Abstract: Water vapor capture through free surface flows plays a crucial role in various industrial applications, such as liquid desiccant air conditioning systems, water harvesting, and dewatering. This paper studies the dynamics of a silicone liquid sorbent (also known as water-absorbing silicone oil) flowing down a vertical cylindrical fibre while absorbing water vapor. We propose a one-sided thin-film-t… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 32 pages, 13 figures

  17. arXiv:2505.21397  [pdf, ps, other

    cs.CL

    DecisionFlow: Advancing Large Language Model as Principled Decision Maker

    Authors: Xiusi Chen, Shanyong Wang, Cheng Qian, Hongru Wang, Peixuan Han, Heng Ji

    Abstract: In high-stakes domains such as healthcare and finance, effective decision-making demands not just accurate outcomes but transparent and explainable reasoning. However, current language models often lack the structured deliberation needed for such tasks, instead generating decisions and justifications in a disconnected, post-hoc manner. To address this, we propose DecisionFlow, a novel decision mod… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 24 pages, 13 figures

  18. arXiv:2505.20759  [pdf, ps, other

    cs.CV cs.AI

    PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding

    Authors: Ansel Blume, Jeonghwan Kim, Hyeonjeong Ha, Elen Chatikyan, Xiaomeng Jin, Khanh Duy Nguyen, Nanyun Peng, Kai-Wei Chang, Derek Hoiem, Heng Ji

    Abstract: Real-world objects are composed of distinctive, object-specific parts. Identifying these parts is key to performing fine-grained, compositional reasoning-yet, large multimodal models (LMMs) struggle to perform this seemingly straightforward task. In this work, we introduce PARTONOMY, an LMM benchmark designed for pixel-level part grounding. We construct PARTONOMY from existing part datasets and ou… ▽ More

    Submitted 15 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: 18 pages

  19. arXiv:2505.20067  [pdf, ps, other

    cs.SI cs.AI cs.CY

    Community Moderation and the New Epistemology of Fact Checking on Social Media

    Authors: Isabelle Augenstein, Michiel Bakker, Tanmoy Chakraborty, David Corney, Emilio Ferrara, Iryna Gurevych, Scott Hale, Eduard Hovy, Heng Ji, Irene Larraz, Filippo Menczer, Preslav Nakov, Paolo Papotti, Dhruv Sahnan, Greta Warren, Giovanni Zagni

    Abstract: Social media platforms have traditionally relied on internal moderation teams and partnerships with independent fact-checking organizations to identify and flag misleading content. Recently, however, platforms including X (formerly Twitter) and Meta have shifted towards community-driven content moderation by launching their own versions of crowd-sourced fact-checking -- Community Notes. If effecti… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 1 Figure, 2 tables

  20. arXiv:2505.16832  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization

    Authors: Haonian Ji, Shi Qiu, Siyang Xin, Siwei Han, Zhaorun Chen, Dake Zhang, Hongyi Wang, Huaxiu Yao

    Abstract: While foundation models (FMs), such as diffusion models and large vision-language models (LVLMs), have been widely applied in educational contexts, their ability to generate pedagogically effective visual explanations remains limited. Most existing approaches focus primarily on textual reasoning, overlooking the critical role of structured and interpretable visualizations in supporting conceptual… ▽ More

    Submitted 27 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 16 pages; 7 figures

  21. arXiv:2505.15181  [pdf

    cond-mat.str-el

    Manipulating the hydrogen-induced insulator-metal transition through artificial microstructure engineering

    Authors: Xuanchi Zhou, Xiaohui Yao, Wentian Lu, Jinjian Guo, Jiahui Ji, Lili Lang, Guowei Zhou, Chunwei Yao, Xiaomei Qiao, Huihui Ji, Zhe Yuan, Xiaohong Xu

    Abstract: Hydrogen-associated filling-controlled Mottronics within electron-correlated system provides a groundbreaking paradigm to explore exotic physical functionality and phenomena. Dynamically controlling hydrogen-induced phase transitions through external fields offers a promising route for designing protonic devices in multidisciplinary fields, but faces high-speed bottlenecks owing to slow bulk diffu… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  22. arXiv:2505.15068  [pdf, other

    cs.AI cs.CL cs.LG

    ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges

    Authors: Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang, Avirup Sil, Chengxiang Zhai, Kathleen McKeown, Heng Ji

    Abstract: Recent progress in large language models (LLMs) has enabled substantial advances in solving mathematical problems. However, existing benchmarks often fail to reflect the complexity of real-world problems, which demand open-ended, interdisciplinary reasoning and integration of computational tools. To address this gap, we introduce ModelingBench, a novel benchmark featuring real-world-inspired, open… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 36 Pages, 26 Figures, 5 Tables

  23. arXiv:2505.12565  [pdf, ps, other

    cs.AI cs.CL cs.LG q-bio.QM

    mCLM: A Function-Infused and Synthesis-Friendly Modular Chemical Language Model

    Authors: Carl Edwards, Chi Han, Gawon Lee, Thao Nguyen, Bowen Jin, Chetan Kumar Prasad, Sara Szymkuć, Bartosz A. Grzybowski, Ying Diao, Jiawei Han, Ge Liu, Hao Peng, Martin D. Burke, Heng Ji

    Abstract: Despite their ability to understand chemical knowledge and accurately generate sequential representations, large language models (LLMs) remain limited in their capacity to propose novel molecules with drug-like properties. In addition, the molecules that LLMs propose can often be challenging to make in the lab. To more effectively enable the discovery of functional small molecules, LLMs need to le… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  24. arXiv:2505.11961  [pdf, ps, other

    math.NA

    An Immersed Finite Element Method for Anisotropic Elliptic Interface Problems with Nonhomogeneous Jump Conditions

    Authors: Haifeng Ji, Zhilin Li

    Abstract: A new finite element method (FEM) using meshes that do not necessarily align with the interface is developed for two- and three-dimensional anisotropic elliptic interface problems with nonhomogeneous jump conditions. The degrees of freedom of the proposed method are the same as those of traditional nonconforming FEMs, while the function space is modified to account for the jump conditions of the s… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    MSC Class: 65N15; 65N30; 35R05

  25. arXiv:2505.08971  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training

    Authors: Yangyi Chen, Hao Peng, Tong Zhang, Heng Ji

    Abstract: In standard large vision-language models (LVLMs) pre-training, the model typically maximizes the joint probability of the caption conditioned on the image via next-token prediction (NTP); however, since only a small subset of caption tokens directly relates to the visual content, this naive NTP unintentionally fits the model to noise and increases the risk of hallucination. We present PRIOR, a sim… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: The code will be available at https://github.com/Yangyi-Chen/PRIOR

  26. arXiv:2505.08162  [pdf, ps, other

    cs.CR

    GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM

    Authors: Hengyu Ding, Houran Ji, Jia Li, Jinhang Chen, Chin-Wing Sham, Yao Wang

    Abstract: With the rapid advancement of quantum computing technology, post-quantum cryptography (PQC) has emerged as a pivotal direction for next-generation encryption standards. Among these, lattice-based cryptographic schemes rely heavily on the fast Number Theoretic Transform (NTT) over polynomial rings, whose performance directly determines encryption/decryption throughput and energy efficiency. However… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  27. arXiv:2505.07849  [pdf, ps, other

    cs.SE cs.AI cs.IR

    SweRank: Software Issue Localization with Code Ranking

    Authors: Revanth Gangi Reddy, Tarun Suresh, JaeHyeok Doo, Ye Liu, Xuan Phi Nguyen, Yingbo Zhou, Semih Yavuz, Caiming Xiong, Heng Ji, Shafiq Joty

    Abstract: Software issue localization, the task of identifying the precise code locations (files, classes, or functions) relevant to a natural language issue description (e.g., bug report, feature request), is a critical yet time-consuming aspect of software development. While recent LLM-based agentic approaches demonstrate promise, they often incur significant latency and cost due to complex multi-step rea… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  28. arXiv:2505.07775  [pdf, ps, other

    cs.CL cs.AI cs.CY

    Must Read: A Systematic Survey of Computational Persuasion

    Authors: Nimet Beyza Bozdag, Shuhaib Mehri, Xiaocheng Yang, Hyeonjeong Ha, Zirui Cheng, Esin Durmus, Jiaxuan You, Heng Ji, Gokhan Tur, Dilek Hakkani-Tür

    Abstract: Persuasion is a fundamental aspect of communication, influencing decision-making across diverse contexts, from everyday conversations to high-stakes scenarios such as politics, marketing, and law. The rise of conversational AI systems has significantly expanded the scope of persuasion, introducing both opportunities and risks. AI-driven persuasion can be leveraged for beneficial applications, but… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  29. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  30. arXiv:2505.06566  [pdf

    cs.CV

    Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search

    Authors: Zequn Xie, Haoming Ji, Lingwei Meng

    Abstract: Text-to-image person search aims to identify an individual based on a text description. To reduce data collection costs, large-scale text-image datasets are created from co-occurrence pairs found online. However, this can introduce noise, particularly mismatched pairs, which degrade retrieval performance. Existing methods often focus on negative samples, amplifying this noise. To address these iss… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  31. arXiv:2505.02784  [pdf, other

    cs.CV

    Advances in Automated Fetal Brain MRI Segmentation and Biometry: Insights from the FeTA 2024 Challenge

    Authors: Vladyslav Zalevskyi, Thomas Sanchez, Misha Kaandorp, Margaux Roulet, Diego Fajardo-Rojas, Liu Li, Jana Hutter, Hongwei Bran Li, Matthew Barkovich, Hui Ji, Luca Wilhelmi, Aline Dändliker, Céline Steger, Mériam Koob, Yvan Gomez, Anton Jakovčić, Melita Klaić, Ana Adžić, Pavel Marković, Gracia Grabarić, Milan Rados, Jordina Aviles Verdera, Gregor Kasprian, Gregor Dovjak, Raphael Gaubert-Rachmühl , et al. (45 additional authors not shown)

    Abstract: Accurate fetal brain tissue segmentation and biometric analysis are essential for studying brain development in utero. The FeTA Challenge 2024 advanced automated fetal brain MRI analysis by introducing biometry prediction as a new task alongside tissue segmentation. For the first time, our diverse multi-centric test set included data from a new low-field (0.55T) MRI dataset. Evaluation metrics wer… ▽ More

    Submitted 8 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  32. arXiv:2505.02387  [pdf, ps, other

    cs.CL cs.AI cs.LG

    RM-R1: Reward Modeling as Reasoning

    Authors: Xiusi Chen, Gaotang Li, Ziqi Wang, Bowen Jin, Cheng Qian, Yu Wang, Hongru Wang, Yu Zhang, Denghui Zhang, Tong Zhang, Hanghang Tong, Heng Ji

    Abstract: Reward modeling is essential for aligning large language models with human preferences through reinforcement learning from human feedback. To provide accurate reward signals, a reward model (RM) should stimulate deep thinking and conduct interpretable reasoning before assigning a score or a judgment. Inspired by recent advances of long chain-of-thought on reasoning-intensive tasks, we hypothesize… ▽ More

    Submitted 17 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

    Comments: 25 pages, 8 figures

  33. arXiv:2505.02332  [pdf, other

    physics.plasm-ph

    Record Magnetic Field Generation by Laser-Driven Capacitor-Coil Targets

    Authors: Lan Gao, Yang Zhang, Hantao Ji, Brandon K. Russell, Geoffrey Pomraning, Jesse Griff-McMahon, Sallee Klein, Carolyn Kuranz, Mingsheng Wei

    Abstract: Magnetic fields generated by capacitor-coil targets driven by intense short-pulse lasers have been characterized using ultrafast proton radiography. A 1-kJ, 15-ps laser at a center wavelength of 1053 nm irradiated the back plate of the capacitor with an intensity of $\sim$8.3 $\times$ 10$^{18}$ W$/$cm$^{2}$, creating ultra large currents in the connecting coils. High-quality proton data obtained i… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  34. arXiv:2505.02326  [pdf, other

    physics.plasm-ph

    Determining Magnetic and Electric Field Generations in Laser-Driven Coil Targets

    Authors: Yang Zhang, Lan Gao, Hantao Ji, Brandon K. Russell, Geoffrey Pomraning, Jesse Griff-McMahon, Sallee Klein, Carolyn Kuranz, Mingsheng Wei

    Abstract: Laser-driven capacitor coils are widely used to generate intense magnetic fields for various applications in high-energy-density physics research. Accurate measurement of the magnetic fields is essential but challenging, due to the overlapping contributions from magnetic and electric fields in proton radiography, which is the primary tool diagnosing the field generation around the coils. In this s… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  35. arXiv:2504.20314  [pdf, other

    cs.LG cs.AI

    Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

    Authors: Qitao Tan, Sung-En Chang, Rui Xia, Huidong Ji, Chence Yang, Ci Zhang, Jun Liu, Zheng Zhan, Zhou Zou, Yanzhi Wang, Jin Lu, Geng Yuan

    Abstract: Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. However, this seemingly promising approach faces a significant and long-ignored challenge. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms,… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  36. arXiv:2504.18838  [pdf, other

    cs.CL

    Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

    Authors: Yixin Cao, Shibo Hong, Xinze Li, Jiahao Ying, Yubo Ma, Haiyuan Liang, Yantao Liu, Zijun Yao, Xiaozhi Wang, Dan Huang, Wenxuan Zhang, Lifu Huang, Muhao Chen, Lei Hou, Qianru Sun, Xingjun Ma, Zuxuan Wu, Min-Yen Kan, David Lo, Qi Zhang, Heng Ji, Jing Jiang, Juanzi Li, Aixin Sun, Xuanjing Huang , et al. (2 additional authors not shown)

    Abstract: Large Language Models (LLMs) are advancing at an amazing speed and have become indispensable across academia, industry, and daily applications. To keep pace with the status quo, this survey probes the core challenges that the rise of LLMs poses for evaluation. We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  37. arXiv:2504.17040  [pdf, other

    cs.CV cs.AI

    DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs

    Authors: Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong, Silvio Savarese, Heng Ji, Ran Xu

    Abstract: We present DyMU, an efficient, training-free framework that dynamically reduces the computational burden of vision-language models (VLMs) while maintaining high task performance. Our approach comprises two key components. First, Dynamic Token Merging (DToMe) reduces the number of visual token embeddings by merging similar tokens based on image complexity, addressing the inherent inefficiency of fi… ▽ More

    Submitted 10 May, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  38. arXiv:2504.16939  [pdf, other

    cs.AI cs.CL

    A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions

    Authors: Emre Can Acikgoz, Cheng Qian, Hongru Wang, Vardhan Dongre, Xiusi Chen, Heng Ji, Dilek Hakkani-Tür, Gokhan Tur

    Abstract: Recent advances in Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users. Yet, fundamental questions about their capabilities, limitations, and paths forward remain open. This survey paper presents a desideratum for next-generation Conversa… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  39. arXiv:2504.14870  [pdf, ps, other

    cs.AI cs.CL

    Acting Less is Reasoning More! Teaching Model to Act Efficiently

    Authors: Hongru Wang, Cheng Qian, Wanjun Zhong, Xiusi Chen, Jiahao Qiu, Shijue Huang, Bowen Jin, Mengdi Wang, Kam-Fai Wong, Heng Ji

    Abstract: Tool-integrated reasoning (TIR) augments large language models (LLMs) with the ability to invoke external tools during long-form reasoning, such as search engines and code interpreters, to solve tasks beyond the capabilities of internal reasoning. While reinforcement learning (RL) has shown promise in training such agents, most of existing approaches typically optimize only for final correctness w… ▽ More

    Submitted 31 May, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

  40. arXiv:2504.14574  [pdf

    physics.ed-ph physics.optics

    Utilizing Optic Fiber Interferometry in Forced Vibration Experimentation for Educational Purposes

    Authors: Mingyuan Wang, Manli Zhou, Hengda Ji, Tao Lan

    Abstract: This study introduces an experimental teaching method that employs optic fiber interferometry (OFI) to investigate forced vibration phenomena. It is designed for undergraduate physics majors with foundational mechanics and optics training and optics-focused graduate students. This approach aims to deepen students' understanding of forced vibration theory and interferometric measurement principles… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  41. arXiv:2504.13958  [pdf, other

    cs.LG cs.AI cs.CL

    ToolRL: Reward is All Tool Learning Needs

    Authors: Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-Tür, Gokhan Tur, Heng Ji

    Abstract: Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios. Recent advancements in reinforcement learning (RL), particularly with R1-like models, have demonstrated promising reasoning and generalization abilities. Yet, reward design for tool use presents unique ch… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 19 Pages, 12 Figures, 12 Tables

  42. arXiv:2504.13460  [pdf, other

    cs.CV cs.AI

    Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization

    Authors: Hongwei Ji, Wulian Yun, Mengshi Qi, Huadong Ma

    Abstract: Traditional temporal action localization (TAL) methods rely on large amounts of detailed annotated data, whereas few-shot TAL reduces this dependence by using only a few training samples to identify unseen action categories. However, existing few-shot TAL methods typically focus solely on video-level information, neglecting textual information, which can provide valuable semantic support for the l… ▽ More

    Submitted 6 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  43. arXiv:2504.12643  [pdf, ps, other

    cs.CV

    RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding

    Authors: Hang Ji, Tao Ni, Xufeng Huang, Zhan Shi, Tao Luo, Xin Zhan, Junbo Chen

    Abstract: This technical report introduces a targeted improvement to the StreamPETR framework, specifically aimed at enhancing velocity estimation, a critical factor influencing the overall NuScenes Detection Score. While StreamPETR exhibits strong 3D bounding box detection performance as reflected by its high mean Average Precision our analysis identified velocity estimation as a substantial bottleneck whe… ▽ More

    Submitted 6 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  44. arXiv:2504.10707  [pdf

    physics.geo-ph cs.LG

    Distinct hydrologic response patterns and trends worldwide revealed by physics-embedded learning

    Authors: Haoyu Ji, Yalan Song, Tadd Bindas, Chaopeng Shen, Yuan Yang, Ming Pan, Jiangtao Liu, Farshid Rahmani, Ather Abbas, Hylke Beck, Kathryn Lawson, Yoshihide Wada

    Abstract: To track rapid changes within our water sector, Global Water Models (GWMs) need to realistically represent hydrologic systems' response patterns - such as baseflow fraction - but are hindered by their limited ability to learn from data. Here we introduce a high-resolution physics-embedded big-data-trained model as a breakthrough in reliably capturing characteristic hydrologic response patterns ('s… ▽ More

    Submitted 22 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  45. arXiv:2504.07316  [pdf, other

    cs.CL

    Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization

    Authors: Shujin Wu, Cheng Qian, Yi R. Fung, Paul Pu Liang, Heng Ji

    Abstract: The growing capabilities of large language models (LLMs) present a key challenge of maintaining effective human oversight. Weak-to-strong generalization (W2SG) offers a promising framework for supervising increasingly capable LLMs using weaker ones. Traditional W2SG methods rely on passive learning, where a weak teacher provides noisy demonstrations to train a strong student. This hinders students… ▽ More

    Submitted 11 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  46. arXiv:2504.06659  [pdf, other

    cs.LG cs.AI cs.CL

    Bridging the Gap Between Preference Alignment and Machine Unlearning

    Authors: Xiaohua Feng, Yuyuan Li, Huwei Ji, Jiaming Zhang, Li Zhang, Tianyu Du, Chaochao Chen

    Abstract: Despite advances in Preference Alignment (PA) for Large Language Models (LLMs), mainstream methods like Reinforcement Learning with Human Feedback (RLHF) face notable challenges. These approaches require high-quality datasets of positive preference examples, which are costly to obtain and computationally intensive due to training instability, limiting their use in low-resource scenarios. LLM unlea… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 17 pages

  47. arXiv:2504.04238  [pdf, other

    cs.CL cs.AI

    Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models

    Authors: Yuheng Wu, Wentao Guo, Zirui Liu, Heng Ji, Zhaozhuo Xu, Denghui Zhang

    Abstract: This paper investigates the emergence of Theory-of-Mind (ToM) capabilities in large language models (LLMs) from a mechanistic perspective, focusing on the role of extremely sparse parameter patterns. We introduce a novel method to identify ToM-sensitive parameters and reveal that perturbing as little as 0.001% of these parameters significantly degrades ToM performance while also impairing contextu… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  48. arXiv:2503.24377  [pdf, other

    cs.CL cs.AI

    Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

    Authors: Rui Wang, Hongru Wang, Boyang Xue, Jianhui Pang, Shudong Liu, Yi Chen, Jiahao Qiu, Derek Fai Wong, Heng Ji, Kam-Fai Wong

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to perform complex reasoning tasks, transitioning from fast and intuitive thinking (System 1) to slow and deep reasoning (System 2). While System 2 reasoning improves task accuracy, it often incurs substantial computational costs due to its slow thinking nature and inefficient or unnecessary reasoning beh… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: In Progress; Paper list Repo: https://github.com/DevoAllen/Awesome-Reasoning-Economy-Papers

  49. arXiv:2503.20666  [pdf, other

    cs.HC cs.CL

    TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews

    Authors: Huimin Xu, Seungjun Yi, Terence Lim, Jiawei Xu, Andrew Well, Carlos Mery, Aidong Zhang, Yuji Zhang, Heng Ji, Keshav Pingali, Yan Leng, Ying Ding

    Abstract: Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data. TA provides valuable insights in healthcare but is resource-intensive. Large Language Models (LLMs) have been introduced to perform TA, yet their applications in healthcare remain unexplored. Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-A… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Submitted to the American Medical Informatics Association (AMIA) 2025 Annual Symposium, 10 pages

  50. arXiv:2503.15126  [pdf, other

    cs.CV cs.AI

    Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation

    Authors: Haoyu Ji, Bowen Chen, Weihong Ren, Wenze Huang, Zhihao Yang, Zhiyong Wang, Honghai Liu

    Abstract: Skeleton-based Temporal Action Segmentation (STAS) aims to segment and recognize various actions from long, untrimmed sequences of human skeletal movements. Current STAS methods typically employ spatio-temporal modeling to establish dependencies among joints as well as frames, and utilize one-hot encoding with cross-entropy loss for frame-wise classification supervision. However, these methods ove… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.