Skip to main content

Showing 1–50 of 944 results for author: Guo, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00489  [pdf, ps, other

    cs.DB

    Towards Efficient Random-Order Enumeration for Join Queries

    Authors: Pengyu Chen, Zizheng Guo, Jianwei Yang, Dongjing Miao

    Abstract: In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful representation of the whole join result, the result tuples are required to be enumerated in uniformly random order. However, existing studies lack an efficient rando… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  2. arXiv:2507.00419  [pdf, ps, other

    physics.geo-ph cs.AI

    Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding

    Authors: Yimin Dou, Xinming Wu, Nathan L Bangs, Harpreet Singh Sethi, Jintao Li, Hang Gao, Zhixiang Guo

    Abstract: Understanding Earth's subsurface is critical for energy transition, natural hazard mitigation, and planetary science. Yet subsurface analysis remains fragmented, with separate models required for structural interpretation, stratigraphic analysis, geobody segmentation, and property modeling-each tightly coupled to specific data distributions and task formulations. We introduce the Geological Everyt… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2507.00371  [pdf

    cs.CV

    PlantSegNeRF: A few-shot, cross-dataset method for plant 3D instance point cloud reconstruction via joint-channel NeRF with multi-view image instance matching

    Authors: Xin Yang, Ruiming Du, Hanyang Huang, Jiayang Xie, Pengyao Xie, Leisen Fang, Ziyue Guo, Nanjun Jiang, Yu Jiang, Haiyan Cen

    Abstract: Organ segmentation of plant point clouds is a prerequisite for the high-resolution and accurate extraction of organ-level phenotypic traits. Although the fast development of deep learning has boosted much research on segmentation of plant point clouds, the existing techniques for organ segmentation still face limitations in resolution, segmentation accuracy, and generalizability across various pla… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  4. arXiv:2506.23674  [pdf, ps, other

    cs.CV

    Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration

    Authors: Dongyue Wu, Zilin Guo, Jialong Zuo, Nong Sang, Changxin Gao

    Abstract: The ever-growing size of training datasets enhances the generalization capability of modern machine learning models but also incurs exorbitant computational costs. Existing data pruning approaches aim to accelerate training by removing those less important samples. However, they often rely on gradients or proxy models, leading to prohibitive additional costs of gradient back-propagation and proxy… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV2025

  5. arXiv:2506.23088  [pdf, ps, other

    cs.CV

    Where, What, Why: Towards Explainable Driver Attention Prediction

    Authors: Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao, Yueyao Lin, Linkai Liu, Zipeng Guo, Hao Fei, Xiaobo Xia, Chao Gou

    Abstract: Modeling task-driven attention in driving is a fundamental challenge for both autonomous vehicles and cognitive science. Existing methods primarily predict where drivers look by generating spatial heatmaps, but fail to capture the cognitive motivations behind attention allocation in specific contexts, which limits deeper understanding of attention mechanisms. To bridge this gap, we introduce Expla… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV 2025

  6. arXiv:2506.22749  [pdf, ps, other

    cs.CV

    Deep Learning based Joint Geometry and Attribute Up-sampling for Large-Scale Colored Point Clouds

    Authors: Yun Zhang, Feifan Chen, Na Li, Zhiwei Guo, Xu Wang, Fen Miao, Sam Kwong

    Abstract: Colored point cloud, which includes geometry and attribute components, is a mainstream representation enabling realistic and immersive 3D applications. To generate large-scale and denser colored point clouds, we propose a deep learning-based Joint Geometry and Attribute Up-sampling (JGAU) method that learns to model both geometry and attribute patterns while leveraging spatial attribute correlatio… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  7. arXiv:2506.22740  [pdf, ps, other

    cs.AI stat.ML

    Explanations are a means to an end

    Authors: Jessica Hullman, Ziyang Guo, Berk Ustun

    Abstract: Modern methods for explainable machine learning are designed to describe how models map inputs to outputs--without deep consideration of how these explanations will be used in practice. This paper argues that explanations should be designed and evaluated with a specific end in mind. We describe how to formalize this end in a framework based in statistical decision theory. We show how this function… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  8. arXiv:2506.22179  [pdf, ps, other

    cs.CV cs.AI

    Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition

    Authors: Wenhan Wu, Zhishuai Guo, Chen Chen, Hongfei Xue, Aidong Lu

    Abstract: Zero-shot skeleton-based action recognition aims to develop models capable of identifying actions beyond the categories encountered during training. Previous approaches have primarily focused on aligning visual and semantic representations but often overlooked the importance of fine-grained action patterns in the semantic space (e.g., the hand movements in drinking water and brushing teeth). To ad… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025

  9. arXiv:2506.21655  [pdf, ps, other

    cs.LG cs.AI cs.CV

    APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization

    Authors: Minjie Hong, Zirun Guo, Yan Xia, Zehan Wang, Ziang Zhang, Tao Jin, Zhou Zhao

    Abstract: Multimodal Large Language Models (MLLMs) are powerful at integrating diverse data, but they often struggle with complex reasoning. While Reinforcement learning (RL) can boost reasoning in LLMs, applying it to MLLMs is tricky. Common issues include a drop in performance on general tasks and the generation of overly detailed or "overthinking" reasoning. Our work investigates how the KL penalty and o… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  10. arXiv:2506.20179  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Progressive Alignment Degradation Learning for Pansharpening

    Authors: Enzhe Zhao, Zhichang Guo, Yao Li, Fanghui Song, Boying Wu

    Abstract: Deep learning-based pansharpening has been shown to effectively generate high-resolution multispectral (HRMS) images. To create supervised ground-truth HRMS images, synthetic data generated using the Wald protocol is commonly employed. This protocol assumes that networks trained on artificial low-resolution data will perform equally well on high-resolution data. However, well-trained models typica… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 13 pages, 9 figures

  11. arXiv:2506.20151  [pdf, ps, other

    cs.CV cs.AI

    EAR: Erasing Concepts from Unified Autoregressive Models

    Authors: Haipeng Fan, Shiyuan Zhang, Baohunesitu, Zihang Guo, Huaiwen Zhang

    Abstract: Autoregressive (AR) models have achieved unified and strong performance across both visual understanding and image generation tasks. However, removing undesired concepts from AR models while maintaining overall generation quality remains an open challenge. In this paper, we propose Erasure Autoregressive Model (EAR), a fine-tuning method for effective and utility-preserving concept erasure in AR m… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures, 1 tables

  12. arXiv:2506.19368  [pdf, ps, other

    cs.CR

    Yotta: A Large-Scale Trustless Data Trading Scheme for Blockchain System

    Authors: Xiang Liu, Zhanpeng Guo, Liangxi Liu, Mengyao Zheng, Yiming Qiu, Linshan Jiang

    Abstract: Data trading is one of the key focuses of Web 3.0. However, all the current methods that rely on blockchain-based smart contracts for data exchange cannot support large-scale data trading while ensuring data security, which falls short of fulfilling the spirit of Web 3.0. Even worse, there is currently a lack of discussion on the essential properties that large-scale data trading should satisfy. I… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures, Exploratory Paper

    Journal ref: Nanyang Blockchain Conference 2025

  13. arXiv:2506.19303  [pdf, ps, other

    cs.RO

    Robotic Perception with a Large Tactile-Vision-Language Model for Physical Property Inference

    Authors: Zexiang Guo, Hengxiang Chen, Xinheng Mai, Qiusang Qiu, Gan Ma, Zhanat Kappassov, Qiang Li, Nutan Chen

    Abstract: Inferring physical properties can significantly enhance robotic manipulation by enabling robots to handle objects safely and efficiently through adaptive grasping strategies. Previous approaches have typically relied on either tactile or visual data, limiting their ability to fully capture properties. We introduce a novel cross-modal perception framework that integrates visual observations with ta… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: This paper has been accepted by the 2025 International Conference on Climbing and Walking Robots (CLAWAR). These authors contributed equally to this work: Zexiang Guo, Hengxiang Chen, Xinheng Mai

  14. arXiv:2506.18862  [pdf, ps, other

    cs.CV cs.AI

    TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

    Authors: Zhongbin Guo, Yuhao Wang, Ping Jian, Xinyue Chen, Wei Peng, Ertai E

    Abstract: Satellite image time-series analysis demands fine-grained spatial-temporal reasoning, which remains a challenge for existing multimodal large language models (MLLMs). In this work, we study the capabilities of MLLMs on a novel task that jointly targets temporal change understanding and future scene generation, aiming to assess their potential for modeling complex multimodal dynamics over time. We… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Submitted to the 33rd ACM International Conference on Multimedia. Our dataset can be found at https://huggingface.co/datasets/IceInPot/TAMMs

  15. arXiv:2506.18394  [pdf, ps, other

    cs.SE

    Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval

    Authors: Xiao Cheng, Zhihao Guo, Huan Huo, Yulei Sui

    Abstract: Memory-related errors in C programming continue to pose significant challenges in software development, primarily due to the complexities of manual memory management inherent in the language. These errors frequently serve as vectors for severe vulnerabilities, while their repair requires extensive knowledge of program logic and C's memory model. Automated Program Repair (APR) has emerged as a crit… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  16. arXiv:2506.18292  [pdf

    cs.CV

    Rapeseed population point cloud completion network (RP-PCN) with dynamic graph convolution for 3D reconstruction of crop canopy occlusion architecture

    Authors: Ziyue Guo, Xin Yang, Yutao Shen, Yang Zhu, Lixi Jiang, Haiyan Cen

    Abstract: Quantitative descriptions of complete canopy architecture are crucial for evaluating crop photosynthesis and yield to guide ideotype design. Although three-dimensional (3D) sensing technologies have been developed for plant and canopy reconstruction, severe occlusion and complex architectures hinder accurate canopy descriptions. In this study, we propose a point cloud completion model for 3D recon… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  17. arXiv:2506.17562  [pdf, ps, other

    cs.CV cs.CL

    LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning

    Authors: Haoxuan Che, Haibo Jin, Zhengrui Guo, Yi Lin, Cheng Jin, Hao Chen

    Abstract: LLMs have demonstrated significant potential in Medical Report Generation (MRG), yet their development requires large amounts of medical image-report pairs, which are commonly scattered across multiple centers. Centralizing these data is exceptionally challenging due to privacy regulations, thereby impeding model development and broader adoption of LLM-driven MRG models. To address this challenge,… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  18. arXiv:2506.17281  [pdf, ps, other

    cs.IR cs.AI

    CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models

    Authors: Junze Chen, Xinjie Yang, Cheng Yang, Junfei Bao, Zeyuan Guo, Yawen Li, Chuan Shi

    Abstract: Recommender systems (RSs) are designed to retrieve candidate items a user might be interested in from a large pool. A common approach is using graph neural networks (GNNs) to capture high-order interaction relationships. As large language models (LLMs) have shown strong capabilities across domains, researchers are exploring their use to enhance recommendation. However, prior work limits LLMs to re… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  19. arXiv:2506.16024  [pdf, ps, other

    cs.CL cs.AI

    From General to Targeted Rewards: Surpassing GPT-4 in Open-Ended Long-Context Generation

    Authors: Zhihan Guo, Jiele Wu, Wenqian Cui, Yifei Zhang, Minda Hu, Yufei Wang, Irwin King

    Abstract: Current research on long-form context in Large Language Models (LLMs) primarily focuses on the understanding of long-contexts, the Open-ended Long Text Generation (Open-LTG) remains insufficiently explored. Training a long-context generation model requires curation of gold standard reference data, which is typically nonexistent for informative Open-LTG tasks. However, previous methods only utilize… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  20. arXiv:2506.15721  [pdf, ps, other

    cs.LG

    Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration

    Authors: Junqi Gao, Zhichang Guo, Dazhi Zhang, Dong Li, Runze Liu, Pengfei Li, Kai Tian, Biqing Qi

    Abstract: Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2)… ▽ More

    Submitted 23 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

  21. arXiv:2506.15695  [pdf, ps, other

    cs.LG

    SimuGen: Multi-modal Agentic Framework for Constructing Block Diagram-Based Simulation Models

    Authors: Xinxing Ren, Qianbo Zang, Zekun Guo

    Abstract: Recent advances in large language models (LLMs) have shown impressive performance in mathematical reasoning and code generation. However, LLMs still struggle in the simulation domain, particularly in generating Simulink models, which are essential tools in engineering and scientific research. Our preliminary experiments indicate that LLM agents often fail to produce reliable and complete Simulink… ▽ More

    Submitted 27 May, 2025; originally announced June 2025.

  22. arXiv:2506.14448  [pdf, ps, other

    cs.CL

    How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison

    Authors: Jiayin Wang, Zhiquang Guo, Weizhi Ma, Min Zhang

    Abstract: As evaluation designs of large language models may shape our trajectory toward artificial general intelligence, comprehensive and forward-looking assessment is essential. Existing benchmarks primarily assess static knowledge, while intelligence also entails the ability to rapidly learn from experience. To this end, we advocate for the evaluation of Test-time Learning, the capacity to improve perfo… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  23. arXiv:2506.13034  [pdf, ps, other

    astro-ph.EP astro-ph.IM cs.AI

    SpaceTrack-TimeSeries: Time Series Dataset towards Satellite Orbit Analysis

    Authors: Zhixin Guo, Qi Shi, Xiaofan Xu, Sixiang Shan, Limin Qin, Linqiang Ge, Rui Zhang, Ya Dai, Hua Zhu, Guowei Jiang

    Abstract: With the rapid advancement of aerospace technology and the large-scale deployment of low Earth orbit (LEO) satellite constellations, the challenges facing astronomical observations and deep space exploration have become increasingly pronounced. As a result, the demand for high-precision orbital data on space objects-along with comprehensive analyses of satellite positioning, constellation configur… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  24. arXiv:2506.12525  [pdf, ps, other

    cs.RO

    A Spatial Relationship Aware Dataset for Robotics

    Authors: Peng Wang, Minh Huy Pham, Zhihao Guo, Wei Zhou

    Abstract: Robotic task planning in real-world environments requires not only object recognition but also a nuanced understanding of spatial relationships between objects. We present a spatial-relationship-aware dataset of nearly 1,000 robot-acquired indoor images, annotated with object attributes, positions, and detailed spatial relationships. Captured using a Boston Dynamics Spot robot and labelled with a… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 7 pages; 7 figures, 1 table

  25. arXiv:2506.11041  [pdf, ps, other

    cs.LG

    ChemHGNN: A Hierarchical Hypergraph Neural Network for Reaction Virtual Screening and Discovery

    Authors: Xiaobao Huang, Yihong Ma, Anjali Gurajapu, Jules Schleinitz, Zhichun Guo, Sarah E. Reisman, Nitesh V. Chawla

    Abstract: Reaction virtual screening and discovery are fundamental challenges in chemistry and materials science, where traditional graph neural networks (GNNs) struggle to model multi-reactant interactions. In this work, we propose ChemHGNN, a hypergraph neural network (HGNN) framework that effectively captures high-order relationships in reaction networks. Unlike GNNs, which require constructing complete… ▽ More

    Submitted 21 May, 2025; originally announced June 2025.

  26. arXiv:2506.10521  [pdf, ps, other

    cs.AI cs.CL

    Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

    Authors: Yuhao Zhou, Yiheng Wang, Xuming He, Ruoyao Xiao, Zhiwei Li, Qiantai Feng, Zijie Guo, Yuejin Yang, Hao Wu, Wenxuan Huang, Jiaqi Wei, Dan Si, Xiuqi Yao, Jia Bu, Haiwen Huang, Tianfan Fu, Shixiang Tang, Ben Fei, Dongzhan Zhou, Fenghua Ling, Yan Lu, Siqi Sun, Chenhui Li, Guanjie Zheng, Jiancheng Lv , et al. (2 additional authors not shown)

    Abstract: Scientific discoveries increasingly rely on complex multimodal reasoning based on information-intensive scientific data and domain-specific expertise. Empowered by expert-level scientific benchmarks, scientific Multimodal Large Language Models (MLLMs) hold the potential to significantly enhance this discovery process in realistic workflows. However, current scientific benchmarks mostly focus on ev… ▽ More

    Submitted 25 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 82 pages

  27. arXiv:2506.09002  [pdf, ps, other

    cs.SE

    Boosting Rust Unit Test Coverage through Hybrid Program Analysis and Large Language Models

    Authors: Bei Chu, Yang Feng, Kui Liu, Hange Shi, Zifan Nan, Zhaoqiang Guo, Baowen Xu

    Abstract: Unit testing is essential for ensuring software reliability and correctness. Classic Search-Based Software Testing (SBST) methods and concolic execution-based approaches for generating unit tests often fail to achieve high coverage due to difficulties in handling complex program units, such as branching conditions and external dependencies. Recent work has increasingly utilized large language mode… ▽ More

    Submitted 10 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 10 pages, 5 figures

  28. arXiv:2506.08700  [pdf, ps, other

    cs.CL cs.CV

    ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts

    Authors: Ruiran Su, Jiasheng Si, Zhijiang Guo, Janet B. Pierrehumbert

    Abstract: Scientific fact-checking has mostly focused on text and tables, overlooking scientific charts, which are key for presenting quantitative evidence and statistical reasoning. We introduce ClimateViz, the first large-scale benchmark for scientific fact-checking using expert-curated scientific charts. ClimateViz contains 49,862 claims linked to 2,896 visualizations, each labeled as support, refute, or… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  29. arXiv:2506.07984  [pdf, ps, other

    cs.CV cs.LG

    CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray

    Authors: Mingquan Lin, Gregory Holste, Song Wang, Yiliang Zhou, Yishu Wei, Imon Banerjee, Pengyi Chen, Tianjie Dai, Yuexi Du, Nicha C. Dvornek, Yuyan Ge, Zuowei Guo, Shouhei Hanaoka, Dongkyun Kim, Pablo Messina, Yang Lu, Denis Parra, Donghyun Son, Álvaro Soto, Aisha Urooj, René Vidal, Yosuke Yamagishi, Zefan Yang, Ruichi Zhang, Yang Zhou , et al. (8 additional authors not shown)

    Abstract: The CXR-LT series is a community-driven initiative designed to enhance lung disease classification using chest X-rays (CXR). It tackles challenges in open long-tailed lung disease classification and enhances the measurability of state-of-the-art techniques. The first event, CXR-LT 2023, aimed to achieve these goals by providing high-quality benchmark CXR data for model development and conducting c… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 17 pages, 3 figures

  30. arXiv:2506.07642  [pdf, ps, other

    cs.CL

    TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review

    Authors: Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Zhijiang Guo, Ngai Wong

    Abstract: While Large Language Models (LLMs) have shown significant potential in assisting peer review, current methods often struggle to generate thorough and insightful reviews while maintaining efficiency. In this paper, we propose TreeReview, a novel framework that models paper review as a hierarchical and bidirectional question-answering process. TreeReview first constructs a tree of review questions b… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 30 pages, 17 figures

  31. arXiv:2506.07367  [pdf, ps, other

    cs.AR

    A Survey on LUT-based Deep Neural Networks Implemented in FPGAs

    Authors: Zeyu Guo

    Abstract: Low-latency, energy-efficient deep neural networks (DNNs) inference are critical for edge applications, where traditional cloud-based deployment suffers from high latency and security risks. Field-Programmable Gate Arrays (FPGAs) offer a compelling solution, balancing reconfigurability, power efficiency, and real-time performance. However, conventional FPGA-based DNNs rely heavily on digital signa… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  32. arXiv:2506.07180  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs

    Authors: Wenrui Zhou, Shu Yang, Qingsong Yang, Zikun Guo, Lijie Hu, Di Wang

    Abstract: As video large language models (Video-LLMs) become increasingly integrated into real-world applications that demand grounded multimodal reasoning, ensuring their factual consistency and reliability is of critical importance. However, sycophancy, the tendency of these models to align with user input even when it contradicts the visual evidence, undermines their trustworthiness in such contexts. Cur… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 24 pages

  33. arXiv:2506.06283  [pdf, other

    cs.CV cs.AI

    Facial Foundational Model Advances Early Warning of Coronary Artery Disease from Live Videos with DigitalShadow

    Authors: Juexiao Zhou, Zhongyi Han, Mankun Xin, Xingwei He, Guotao Wang, Jiaoyan Song, Gongning Luo, Wenjia He, Xintong Li, Yuetan Chu, Juanwen Chen, Bo Wang, Xia Wu, Wenwen Duan, Zhixia Guo, Liyan Bai, Yilin Pan, Xuefei Bi, Lu Liu, Long Feng, Xiaonan He, Xin Gao

    Abstract: Global population aging presents increasing challenges to healthcare systems, with coronary artery disease (CAD) responsible for approximately 17.8 million deaths annually, making it a leading cause of global mortality. As CAD is largely preventable, early detection and proactive management are essential. In this work, we introduce DigitalShadow, an advanced early warning system for CAD, powered b… ▽ More

    Submitted 23 April, 2025; originally announced June 2025.

  34. arXiv:2506.06157  [pdf, other

    cs.SI cs.CL

    Masked Language Models are Good Heterogeneous Graph Generalizers

    Authors: Jinyu Yang, Cheng Yang, Shanyuan Cui, Zeyuan Guo, Liangwei Yang, Muhan Zhang, Chuan Shi

    Abstract: Heterogeneous graph neural networks (HGNNs) excel at capturing structural and semantic information in heterogeneous graphs (HGs), while struggling to generalize across domains and tasks. Recently, some researchers have turned to integrating HGNNs with large language models (LLMs) for more generalizable heterogeneous graph learning. However, these approaches typically extract structural information… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  35. arXiv:2506.05302  [pdf, ps, other

    cs.CV

    Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

    Authors: Weifeng Lin, Xinyu Wei, Ruichuan An, Tianhe Ren, Tingwei Chen, Renrui Zhang, Ziyu Guo, Wentao Zhang, Lei Zhang, Hongsheng Li

    Abstract: We present Perceive Anything Model (PAM), a conceptually straightforward and efficient framework for comprehensive region-level visual understanding in images and videos. Our approach extends the powerful segmentation model SAM 2 by integrating Large Language Models (LLMs), enabling simultaneous object segmentation with the generation of diverse, region-specific semantic outputs, including categor… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 19 pages, 13 figures, Website: https://Perceive-Anything.github.io

  36. arXiv:2506.05218  [pdf, ps, other

    cs.CV

    MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

    Authors: Zhang Li, Yuliang Liu, Qiang Liu, Zhiyin Ma, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiarui Zhang, Xinyu Wang, Xiang Bai

    Abstract: We introduce MonkeyOCR, a vision-language model for document parsing that advances the state of the art by leveraging a Structure-Recognition-Relation (SRR) triplet paradigm. This design simplifies what would otherwise be a complex multi-tool pipeline (as in MinerU's modular approach) and avoids the inefficiencies of processing full pages with giant end-to-end models (e.g., large multimodal LLMs l… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  37. arXiv:2506.05183  [pdf, ps, other

    cs.LG cs.AI

    TreeRPO: Tree Relative Policy Optimization

    Authors: Zhicheng Yang, Zhijiang Guo, Yinya Huang, Xiaodan Liang, Yiwei Wang, Jing Tang

    Abstract: Large Language Models (LLMs) have shown remarkable reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR) methods. However, a key limitation of existing approaches is that rewards defined at the full trajectory level provide insufficient guidance for optimizing the intermediate steps of a reasoning process. To address this, we introduce \textbf{\name}, a novel method… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 13pages, 6 figures

  38. arXiv:2506.03119  [pdf, ps, other

    cs.CV

    Controllable Human-centric Keyframe Interpolation with Generative Prior

    Authors: Zujin Guo, Size Wu, Zhongang Cai, Wei Li, Chen Change Loy

    Abstract: Existing interpolation methods use pre-trained video diffusion priors to generate intermediate frames between sparsely sampled keyframes. In the absence of 3D geometric guidance, these methods struggle to produce plausible results for complex, articulated human motions and offer limited control over the synthesized dynamics. In this paper, we introduce PoseFuse3D Keyframe Interpolator (PoseFuse3D-… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Project Page: https://gseancdat.github.io/projects/PoseFuse3D_KI

  39. arXiv:2506.02678  [pdf, ps, other

    cs.CL cs.CE math.NA

    TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression

    Authors: Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Yeyun Gong, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu

    Abstract: Large Language Models (LLMs) have recently achieved remarkable progress by leveraging Reinforcement Learning and extended Chain-of-Thought (CoT) techniques. However, the challenge of performing efficient language reasoning--especially during inference with extremely long outputs--has drawn increasing attention from the research community. In this work, we propose a dynamic ratio-based training pip… ▽ More

    Submitted 14 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  40. arXiv:2506.02435  [pdf, ps, other

    cs.GT

    A Transformer-Based Neural Network for Optimal Deterministic-Allocation and Anonymous Joint Auction Design

    Authors: Zhen Zhang, Luowen Liu, Wanzhi Zhang, Zitian Guo, Kun Huang, Qi Qi, Qiang Liu, Xingxing Wang

    Abstract: With the advancement of machine learning, an increasing number of studies are employing automated mechanism design (AMD) methods for optimal auction design. However, all previous AMD architectures designed to generate optimal mechanisms that satisfy near dominant strategy incentive compatibility (DSIC) fail to achieve deterministic allocation, and some also lack anonymity, thereby impacting the ef… ▽ More

    Submitted 12 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  41. arXiv:2506.02161  [pdf, ps, other

    cs.CV

    TIIF-Bench: How Does Your T2I Model Follow Your Instructions?

    Authors: Xinyu Wei, Jinrui Zhang, Zeqing Wang, Hongyang Wei, Zhen Guo, Lei Zhang

    Abstract: The rapid advancements of Text-to-Image (T2I) models have ushered in a new phase of AI-generated content, marked by their growing ability to interpret and follow user instructions. However, existing T2I model evaluation benchmarks fall short in limited prompt diversity and complexity, as well as coarse evaluation metrics, making it difficult to evaluate the fine-grained alignment performance betwe… ▽ More

    Submitted 25 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: 23 pages, 12 figures, 11 tables

  42. arXiv:2506.01668  [pdf, ps, other

    cs.MM cs.IR

    Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach

    Authors: Heng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang

    Abstract: Stickers, though small, are a highly condensed form of visual expression, ubiquitous across messaging platforms and embraced by diverse cultures, genders, and age groups. Despite their popularity, sticker retrieval remains an underexplored task due to the significant human effort and subjectivity involved in constructing high-quality sticker query datasets. Although large language models (LLMs) ex… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  43. arXiv:2506.00936  [pdf, ps, other

    cs.LG cs.AI q-bio.QM

    Uncertainty-Aware Metabolic Stability Prediction with Dual-View Contrastive Learning

    Authors: Peijin Guo, Minghui Li, Hewen Pan, Bowen Chen, Yang Wu, Zikang Guo, Leo Yu Zhang, Shengshan Hu, Shengqing Hu

    Abstract: Accurate prediction of molecular metabolic stability (MS) is critical for drug research and development but remains challenging due to the complex interplay of molecular interactions. Despite recent advances in graph neural networks (GNNs) for MS prediction, current approaches face two critical limitations: (1) incomplete molecular modeling due to atom-centric message-passing mechanisms that disre… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: This manuscript has been accepted for publication at ECML-PKDD 2025. The final version will be published in the conference proceedings

  44. arXiv:2506.00791  [pdf, ps, other

    cs.HC

    CO-OPERA: A Human-AI Collaborative Playwriting Tool to Support Creative Storytelling for Interdisciplinary Drama Education

    Authors: Xuejiao Ma, Haibo Zhao, Zinuo Guo, Yijie Guo, Guanhong Liu, Bo Jiang

    Abstract: Drama-in-education is an interdisciplinary instructional approach that integrates subjects such as language, history, and psychology. Its core component is playwriting. Based on need-finding interviews of 13 teachers, we found that current general-purpose AI tools cannot effectively assist teachers and students during playwriting. Therefore, we propose CO-OPERA - a collaborative playwriting tool i… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  45. arXiv:2506.00384  [pdf, other

    cs.LG cs.DC cs.OS

    Deep-Learning-Driven Prefetching for Far Memory

    Authors: Yutong Huang, Zhiyuan Guo, Yiying Zhang

    Abstract: Modern software systems face increasing runtime performance demands, particularly in emerging architectures like far memory, where local-memory misses incur significant latency. While machine learning (ML) has proven effective in offline systems optimization, its application to high-frequency, runtime-level problems remains limited due to strict performance, generalization, and integration constra… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  46. arXiv:2506.00297  [pdf, ps, other

    cs.LG cs.AI q-bio.BM

    Improving Protein Sequence Design through Designability Preference Optimization

    Authors: Fanglei Xue, Andrew Kubaney, Zhichun Guo, Joseph K. Min, Ge Liu, Yi Yang, David Baker

    Abstract: Protein sequence design methods have demonstrated strong performance in sequence generation for de novo protein design. However, as the training objective was sequence recovery, it does not guarantee designability--the likelihood that a designed sequence folds into the desired structure. To bridge this gap, we redefine the training objective by steering sequence generation toward high designabilit… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  47. arXiv:2505.23932  [pdf, ps, other

    cs.CL

    SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving

    Authors: Wendong Xu, Jing Xiong, Chenyang Zhao, Qiujiang Chen, Haoran Wang, Hui Shen, Zhongwei Wan, Jianbo Dai, Taiqiang Wu, He Xiao, Chaofan Tao, Z. Morley Mao, Ying Sheng, Zhijiang Guo, Hongxia Yang, Bei Yu, Lingpeng Kong, Quanquan Gu, Ngai Wong

    Abstract: We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs) that closely mirrors real-world software development workflows. Unlike traditional static benchmarks, SwingArena models the collaborative process of software iteration by pairing LLMs as submitters, who generate patches, and reviewers, who create test cases and verify the patches through continuous integrati… ▽ More

    Submitted 2 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  48. arXiv:2505.23522  [pdf, ps, other

    cs.CV cs.LG

    OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

    Authors: Fengxiang Wang, Mingshuo Chen, Xuming He, YiFan Zhang, Feng Liu, Zijie Guo, Zhenghao Hu, Jiong Wang, Jingyi Xu, Zhangrui Li, Fenghua Ling, Ben Fei, Weijia Li, Long Lan, Wenjing Yang, Wenlong Zhang, Lei Bai

    Abstract: Existing benchmarks for Earth science multimodal learning exhibit critical limitations in systematic coverage of geosystem components and cross-sphere interactions, often constrained to isolated subsystems (only in Human-activities sphere or atmosphere) with limited evaluation dimensions (less than 16 tasks). To address these gaps, we introduce OmniEarth-Bench, the first comprehensive multimodal b… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  49. arXiv:2505.22604  [pdf, ps, other

    cs.CV

    Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective

    Authors: Ruixuan Zhang, He Wang, Zhengyu Zhao, Zhiqing Guo, Xun Yang, Yunfeng Diao, Meng Wang

    Abstract: Rapid advances in Artificial Intelligence Generated Images (AIGI) have facilitated malicious use, such as forgery and misinformation. Therefore, numerous methods have been proposed to detect fake images. Although such detectors have been proven to be universally vulnerable to adversarial attacks, defenses in this field are scarce. In this paper, we first identify that adversarial training (AT), wi… ▽ More

    Submitted 30 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  50. arXiv:2505.22120  [pdf, ps, other

    cs.CL

    LoKI: Low-damage Knowledge Implanting of Large Language Models

    Authors: Runyu Wang, Peng Ping, Zhengyu Guo, Xiaoye Zhang, Quan Shi, Liting Zhou, Tianbo Ji

    Abstract: Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pre-training is overwritten. Current Parameter-Efficient Fine-Tuning (PEFT) methods for Large Language Models (LLMs), while efficient, often sacrifice general capabilities. To address the issue of CF in a general-purpose PEFT framework, we propose \textbf{Lo}w-d… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.