Skip to main content

Showing 1–50 of 571 results for author: Guo, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05885  [pdf, other

    cs.RO

    Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes

    Authors: Ziang Guo, Artem Lykov, Zakhar Yagudin, Mikhail Konenkov, Dzmitry Tsetserukou

    Abstract: Recent research about Large Language Model based autonomous driving solutions shows a promising picture in planning and control fields. However, heavy computational resources and hallucinations of Large Language Models continue to hinder the tasks of predicting precise trajectories and instructing control signals. To address this problem, we propose Co-driver, a novel autonomous driving assistant… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: The paper is submitted to the IEEE conference

  2. arXiv:2405.05229  [pdf, other

    cs.IR cs.DL

    myAURA: Personalized health library for epilepsy management via knowledge graph sparsification and visualization

    Authors: Rion Brattig Correia, Jordan C. Rozum, Leonard Cross, Jack Felag, Michael Gallant, Ziqi Guo, Bruce W. Herr II, Aehong Min, Deborah Stungis Rocha, Xuan Wang, Katy Börner, Wendy Miller, Luis M. Rocha

    Abstract: Objective: We report the development of the patient-centered myAURA application and suite of methods designed to aid epilepsy patients, caregivers, and researchers in making decisions about care and self-management. Materials and Methods: myAURA rests on the federation of an unprecedented collection of heterogeneous data resources relevant to epilepsy, such as biomedical databases, social media,… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.01943  [pdf, other

    cs.CL cs.AI cs.LG

    Dependency-Aware Semi-Structured Sparsity of GLU Variants in Large Language Models

    Authors: Zhiyu Guo, Hidetaka Kamigaito, Taro Wanatnabe

    Abstract: The rapid advancement in Large Language Models (LLMs) has markedly enhanced the capabilities of language understanding and generation. However, the substantial model size poses hardware challenges, affecting both memory size for serving and inference latency for token generation. To address those challenges, we propose Dependency-aware Semi-structured Sparsity (DaSS), a novel method for the recent… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  4. arXiv:2405.00630  [pdf, other

    cs.CV

    Depth Priors in Removal Neural Radiance Fields

    Authors: Zhihao Guo, Peng Wang

    Abstract: Neural Radiance Fields (NeRF) have shown impressive results in 3D reconstruction and generating novel views. A key challenge within NeRF is the editing of reconstructed scenes, such as object removal, which requires maintaining consistency across multiple views and ensuring high-quality synthesised perspectives. Previous studies have incorporated depth priors, typically from LiDAR or sparse depth… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 15 pages

    MSC Class: 68T40; 68T07; 68T45 ACM Class: I.4.5

  5. arXiv:2405.00236  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    STT: Stateful Tracking with Transformers for Autonomous Driving

    Authors: Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, Congcong Li

    Abstract: Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying c… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: ICRA 2024

  6. arXiv:2404.19484  [pdf, other

    cs.LG cs.AI cs.CL

    More Compute Is What You Need

    Authors: Zhen Guo

    Abstract: Large language model pre-training has become increasingly expensive, with most practitioners relying on scaling laws to allocate compute budgets for model size and training tokens, commonly referred to as Compute-Optimal or Chinchilla Optimal. In this paper, we hypothesize a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models,… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  7. arXiv:2404.19245  [pdf, other

    cs.CL cs.AI

    HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

    Authors: Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, Chengzhong Xu

    Abstract: Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures

  8. arXiv:2404.17667  [pdf, other

    eess.SP cs.LG

    SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals

    Authors: Cheng Ding, Zhicheng Guo, Zhaoliang Chen, Randall J Lee, Cynthia Rudin, Xiao Hu

    Abstract: Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for developing foundation models for phys… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  9. arXiv:2404.16812  [pdf, other

    cs.DC

    ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

    Authors: Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

    Abstract: Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some i… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC'24)

  10. arXiv:2404.16022  [pdf, other

    cs.CV

    PuLID: Pure and Lightning ID Customization via Contrastive Alignment

    Authors: Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He

    Abstract: We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. By incorporating a Lightning T2I branch with a standard diffusion one, PuLID introduces both contrastive alignment loss and accurate ID loss, minimizing disruption to the original model and ensuring high ID fidelity. Experiments show that PuLID achieves superior perform… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Tech Report. Codes and models will be available at https://github.com/ToTheBeginning/PuLID

  11. arXiv:2404.14719  [pdf, other

    cs.CR

    Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs

    Authors: Ruitong Liu, Yanbin Wang, Haitao Xu, Bin Liu, Jianguo Sun, Zhenhao Guo, Wenrui Ma

    Abstract: Currently, deep learning successfully applies to code vulnerability detection by learning from code sequences or property graphs. However, sequence-based methods often overlook essential code attributes such as syntax, control flow, and data dependencies, whereas graph-based approaches might underestimate the semantics of code and face challenges in capturing long-distance contextual information.… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures

  12. arXiv:2404.13779  [pdf, other

    cs.CL cs.LG

    Automated Text Mining of Experimental Methodologies from Biomedical Literature

    Authors: Ziqing Guo

    Abstract: Biomedical literature is a rapidly expanding field of science and technology. Classification of biomedical texts is an essential part of biomedicine research, especially in the field of biology. This work proposes the fine-tuned DistilBERT, a methodology-specific, pre-trained generative classification language model for mining biomedicine texts. The model has proven its effectiveness in linguistic… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  13. arXiv:2404.13230  [pdf, other

    cs.IT math.CO

    Random Gabidulin Codes Achieve List Decoding Capacity in the Rank Metric

    Authors: Zeyu Guo, Chaoping Xing, Chen Yuan, Zihan Zhang

    Abstract: Gabidulin codes, serving as the rank-metric counterpart of Reed-Solomon codes, constitute an important class of maximum rank distance (MRD) codes. However, unlike the fruitful positive results about the list decoding of Reed-Solomon codes, results concerning the list decodability of Gabidulin codes in the rank metric are all negative so far. For example, in contrast to Reed-Solomon codes, which ar… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  14. arXiv:2404.11032  [pdf, other

    cs.LG cs.SI

    CORE: Data Augmentation for Link Prediction via Information Bottleneck

    Authors: Kaiwen Dong, Zhichun Guo, Nitesh V. Chawla

    Abstract: Link prediction (LP) is a fundamental task in graph representation learning, with numerous applications in diverse domains. However, the generalizability of LP models is often compromised due to the presence of noisy or spurious information in graphs and the inherent incompleteness of graph data. To address these challenges, we draw inspiration from the Information Bottleneck principle and propose… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  15. arXiv:2404.11019  [pdf, other

    cs.LG

    You do not have to train Graph Neural Networks at all on text-attributed graphs

    Authors: Kaiwen Dong, Zhichun Guo, Nitesh V. Chawla

    Abstract: Graph structured data, specifically text-attributed graphs (TAG), effectively represent relationships among varied entities. Such graphs are essential for semi-supervised node classification tasks. Graph Neural Networks (GNNs) have emerged as a powerful tool for handling this graph-structured data. Although gradient descent is commonly utilized for training GNNs for node classification, this study… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: preprint

  16. arXiv:2404.10163  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

    Authors: Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti Oulasvirta

    Abstract: From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpa… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  17. arXiv:2404.08990  [pdf

    cs.CV cs.AI cs.RO

    A Fourier-enhanced multi-modal 3D small object optical mark recognition and positioning method for percutaneous abdominal puncture surgical navigation

    Authors: Zezhao Guo, Yanzhong Guo, Zhanfang Zhao

    Abstract: Navigation for thoracoabdominal puncture surgery is used to locate the needle entry point on the patient's body surface. The traditional reflective ball navigation method is difficult to position the needle entry point on the soft, irregular, smooth chest and abdomen. Due to the lack of clear characteristic points on the body surface using structured light technology, it is difficult to identify a… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 19 pages, 6 figures,

  18. arXiv:2404.08660  [pdf, other

    cs.IR cs.LG

    How Does Message Passing Improve Collaborative Filtering?

    Authors: Mingxuan Ju, William Shiao, Zhichun Guo, Yanfang Ye, Yozen Liu, Neil Shah, Tong Zhao

    Abstract: Collaborative filtering (CF) has exhibited prominent results for recommender systems and been broadly utilized for real-world applications. A branch of research enhances CF methods by message passing used in graph neural networks, due to its strong capabilities of extracting knowledge from graph-structured data, like user-item bipartite graphs that naturally exist in CF. They assume that message p… ▽ More

    Submitted 27 March, 2024; originally announced April 2024.

  19. arXiv:2404.08408  [pdf, other

    cs.LG cs.AI eess.SP physics.geo-ph

    Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning

    Authors: Hongtao Wang, Li Long, Jiangshe Zhang, Xiaoli Wei, Chunxia Zhang, Zhenbo Guo

    Abstract: Contemporary automatic first break (FB) picking methods typically analyze 1D signals, 2D source gathers, or 3D source-receiver gathers. Utilizing higher-dimensional data, such as 2D or 3D, incorporates global features, improving the stability of local picking. Despite the benefits, high-dimensional data requires structured input and increases computational demands. Addressing this, we propose a no… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  20. arXiv:2404.08273  [pdf, other

    cs.CV cs.CR

    Struggle with Adversarial Defense? Try Diffusion

    Authors: Yujie Li, Yanbin Wang, Haitao Xu, Bin Liu, Jianguo Sun, Zhenhao Guo, Wenrui Ma

    Abstract: Adversarial attacks induce misclassification by introducing subtle perturbations. Recently, diffusion models are applied to the image classifiers to improve adversarial robustness through adversarial training or by purifying adversarial noise. However, diffusion-based adversarial training often encounters convergence challenges and high computational expenses. Additionally, diffusion-based purific… ▽ More

    Submitted 18 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  21. arXiv:2404.07620  [pdf, other

    eess.IV cs.CV

    Diffusion Probabilistic Multi-cue Level Set for Reducing Edge Uncertainty in Pancreas Segmentation

    Authors: Yue Gou, Yuming Xing, Shengzhu Shi, Zhichang Guo

    Abstract: Accurately segmenting the pancreas remains a huge challenge. Traditional methods encounter difficulties in semantic localization due to the small volume and distorted structure of the pancreas, while deep learning methods encounter challenges in obtaining accurate edges because of low contrast and organ overlapping. To overcome these issues, we propose a multi-cue level set method based on the dif… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  22. arXiv:2404.07413  [pdf, other

    cs.CL cs.AI

    JetMoE: Reaching Llama2 Performance with 0.1M Dollars

    Authors: Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin

    Abstract: Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence. This report introduces JetMoE-8B, a new LLM trained with less than $0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, the JetMoE… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  23. arXiv:2404.06483  [pdf, other

    cs.CV

    RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos

    Authors: Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma

    Abstract: Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: extracting weak rPPG signals from video segments with large spatiotemporal redundancy… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.12788

  24. arXiv:2404.04653  [pdf, other

    cs.CV cs.RO

    HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene

    Authors: Ziang Guo, Stepan Perminov, Mikhail Konenkov, Dzmitry Tsetserukou

    Abstract: Many established vision perception systems for autonomous driving scenarios ignore the influence of light conditions, one of the key elements for driving safety. To address this problem, we present HawkDrive, a novel perception system with hardware and software solutions. Hardware that utilizes stereo vision perception, which has been demonstrated to be a more reliable way of estimating depth info… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE IV 2024

  25. arXiv:2404.04050  [pdf, other

    cs.CV

    No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

    Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

    Abstract: To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' c… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR Highlight. Code is available at https://github.com/yangyangyang127/Seg-NN. arXiv admin note: text overlap with arXiv:2308.12961

  26. arXiv:2404.00837  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    Automated HER2 Scoring in Breast Cancer Images Using Deep Learning and Pyramid Sampling

    Authors: Sahan Yoruc Selcuk, Xilin Yang, Bijie Bai, Yijie Zhang, Yuzhu Li, Musa Aydin, Aras Firat Unal, Aditya Gomatam, Zhen Guo, Darrow Morgan Angus, Goren Kolodney, Karine Atlan, Tal Keidar Haran, Nir Pillar, Aydogan Ozcan

    Abstract: Human epidermal growth factor receptor 2 (HER2) is a critical protein in cancer cell growth that signifies the aggressiveness of breast cancer (BC) and helps predict its prognosis. Accurate assessment of immunohistochemically (IHC) stained tissue slides for HER2 expression levels is essential for both treatment guidance and understanding of cancer mechanisms. Nevertheless, the traditional workflow… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 21 Pages, 7 Figures

  27. arXiv:2403.19094  [pdf, other

    cs.CL

    Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

    Authors: Yuxuan Yao, Han Wu, Zhijiang Guo, Biyan Zhou, Jiahui Gao, Sichun Luo, Hanxu Hou, Xiaojin Fu, Linqi Song

    Abstract: Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  28. arXiv:2403.18306  [pdf, other

    cs.DB

    Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method

    Authors: Zhixin Guo, Tao Wang, Chaoyang Wang, Jianping Zhou, Guanjie Zheng, Xinbing Wang, Chenghu Zhou

    Abstract: The rare earth elements Sm and Nd significantly address fundamental questions about crustal growth, such as its spatiotemporal evolution and the interplay between orogenesis and crustal accretion. Their relative immobility during high-grade metamorphism makes the Sm-Nd isotopic system crucial for inferring crustal formation times. Historically, data have been disseminated sporadically in the scien… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  29. arXiv:2403.18280  [pdf, other

    cs.IR

    Improving Out-of-Vocabulary Handling in Recommendation Systems

    Authors: William Shiao, Mingxuan Ju, Zhichun Guo, Xin Chen, Evangelos Papalexakis, Tong Zhao, Neil Shah, Yozen Liu

    Abstract: Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommendin… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 11 pages, 6 figures

  30. arXiv:2403.17400  [pdf, other

    cs.NI cs.DC

    A Survey on Resource Management in Joint Communication and Computing-Embedded SAGIN

    Authors: Qian Chen, Zheng Guo, Weixiao Meng, Shuai Han, Cheng Li, Tony Q. S. Quek

    Abstract: The advent of the 6G era aims for ubiquitous connectivity, with the integration of non-terrestrial networks (NTN) offering extensive coverage and enhanced capacity. As manufacturing advances and user demands evolve, space-air-ground integrated networks (SAGIN) with computational capabilities emerge as a viable solution for services requiring low latency and high computational power. Resource manag… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 43 pages, 17 figures

  31. arXiv:2403.16950  [pdf, other

    cs.CL cs.AI cs.LG

    Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

    Authors: Yinhong Liu, Han Zhou, Zhijiang Guo, Ehsan Shareghi, Ivan Vulić, Anna Korhonen, Nigel Collier

    Abstract: Large Language Models (LLMs) have demonstrated promising capabilities as automatic evaluators in assessing the quality of generated natural language. However, LLMs still exhibit biases in evaluation and often struggle to generate coherent evaluations that align with human assessments. In this work, we first conduct a systematic study of the misalignment between LLM evaluators and human judgement,… ▽ More

    Submitted 25 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  32. arXiv:2403.16410  [pdf, other

    cs.CV

    Spike-NeRF: Neural Radiance Field Based On Spike Camera

    Authors: Yijia Guo, Yuanxi Bai, Liwen Hu, Mianzhi Liu, Ziyi Guo, Lei Ma, Tiejun Huang

    Abstract: As a neuromorphic sensor with high temporal resolution, spike cameras offer notable advantages over traditional cameras in high-speed vision applications such as high-speed optical estimation, depth estimation, and object tracking. Inspired by the success of the spike camera, we proposed Spike-NeRF, the first Neural Radiance Field derived from spike data, to achieve 3D reconstruction and novel vie… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by ICME2024

  33. arXiv:2403.15401  [pdf

    cs.CY cs.AI cs.CL

    Large Language Model for Mental Health: A Systematic Review

    Authors: Zhijun Guo, Alvina Lai, Johan Hilge Thygesen, Joseph Farrington, Thomas Keen, Kezhi Li

    Abstract: Large language models (LLMs) have received much attention and shown their potential in digital health, while their application in mental health is subject to ongoing debate. This systematic review aims to summarize and characterize the use of LLMs in mental health by investigating the strengths and limitations of the latest work in LLMs and discusses the challenges and opportunities for early scre… ▽ More

    Submitted 19 February, 2024; originally announced March 2024.

  34. arXiv:2403.14947  [pdf, other

    cs.CV

    GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

    Authors: Haoxuan Qu, Ziyan Guo, Jun Liu

    Abstract: Recently, while text-driven human motion generation has received massive research attention, most existing text-driven motion generators are generally only designed to generate motion sequences in a blank background. While this is the case, in practice, human beings naturally perform their motions in 3D scenes, rather than in a blank background. Considering this, we here aim to perform scene-aware… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  35. arXiv:2403.14624  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

    Authors: Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li

    Abstract: The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in visual math problem-solving remain insufficiently evaluated and understood. We investigate current benchmarks to incorporate excessive visual content within textual questions, which potentially assist MLLMs in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 46 Pages, Work in Progress, Benchmark Project Page: https://mathverse-cuhk.github.io

  36. arXiv:2403.12077  [pdf, other

    cs.CL cs.AI cs.IR

    Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions

    Authors: Xuming Hu, Xiaochuan Li, Junzhe Chen, Yinghui Li, Yangning Li, Xiaoguang Li, Yasheng Wang, Qun Liu, Lijie Wen, Philip S. Yu, Zhijiang Guo

    Abstract: Generative search engines have the potential to transform how people seek information online, but generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate. Nonetheless, retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating the most vulnerable par… ▽ More

    Submitted 25 February, 2024; originally announced March 2024.

    Comments: 21 pages, 7 figures, 4 tables

  37. arXiv:2403.11703  [pdf, other

    cs.CV cs.AI

    LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

    Authors: Ruyi Xu, Yuan Yao, Zonghao Guo, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Maosong Sun, Gao Huang

    Abstract: Visual encoding constitutes the basis of large multimodal models (LMMs) in understanding the visual world. Conventional LMMs process images in fixed sizes and limited resolutions, while recent explorations in this direction are limited in adaptivity, efficiency, and even correctness. In this work, we first take GPT-4V and LLaVA-1.5 as representative examples and expose systematic flaws rooted in t… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Preprint

  38. arXiv:2403.11447  [pdf, other

    cs.CV

    Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction

    Authors: Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, Houqiang Li

    Abstract: 3D Gaussian Splatting (3DGS) has become an emerging tool for dynamic scene reconstruction. However, existing methods focus mainly on extending static 3DGS into a time-variant representation, while overlooking the rich motion information carried by 2D observations, thus suffering from performance degradation and model redundancy. To address the above problem, we propose a novel motion-aware enhance… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  39. arXiv:2403.10085  [pdf, other

    cs.CV

    VRHCF: Cross-Source Point Cloud Registration via Voxel Representation and Hierarchical Correspondence Filtering

    Authors: Guiyu Zhao, Zewen Du, Zhentao Guo, Hongbin Ma

    Abstract: Addressing the challenges posed by the substantial gap in point cloud data collected from diverse sensors, achieving robust cross-source point cloud registration becomes a formidable task. In response, we present a novel framework for point cloud registration with broad applicability, suitable for both homologous and cross-source registration scenarios. To tackle the issues arising from different… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME), 2024

  40. arXiv:2403.09100  [pdf

    physics.med-ph cs.CV cs.LG eess.IV physics.optics

    Virtual birefringence imaging and histological staining of amyloid deposits in label-free tissue using autofluorescence microscopy and deep learning

    Authors: Xilin Yang, Bijie Bai, Yijie Zhang, Musa Aydin, Sahan Yoruc Selcuk, Zhen Guo, Gregory A. Fishbein, Karine Atlan, William Dean Wallace, Nir Pillar, Aydogan Ozcan

    Abstract: Systemic amyloidosis is a group of diseases characterized by the deposition of misfolded proteins in various organs and tissues, leading to progressive organ dysfunction and failure. Congo red stain is the gold standard chemical stain for the visualization of amyloid deposits in tissue sections, as it forms complexes with the misfolded proteins and shows a birefringence pattern under polarized lig… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 20 Pages, 5 Figures

  41. arXiv:2403.07952  [pdf, other

    cs.CV cs.AI cs.MM

    AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production

    Authors: Jiuniu Wang, Zehua Du, Yuyuan Zhao, Bo Yuan, Kexiang Wang, Jian Liang, Yaxi Zhao, Yihen Lu, Gengliang Li, Junlong Gao, Xin Tu, Zhenyu Guo

    Abstract: The Agent and AIGC (Artificial Intelligence Generated Content) technologies have recently made significant progress. We propose AesopAgent, an Agent-driven Evolutionary System on Story-to-Video Production. AesopAgent is a practical application of agent technology for multimodal content generation. The system integrates multiple generative capabilities within a unified framework, so that individual… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 22 pages, 13 figures

  42. arXiv:2403.07714  [pdf, other

    cs.CL

    StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

    Authors: Zhicheng Guo, Sijie Cheng, Hao Wang, Shihao Liang, Yujia Qin, Peng Li, Zhiyuan Liu, Maosong Sun, Yang Liu

    Abstract: Large Language Models (LLMs) have witnessed remarkable advancements in recent years, prompting the exploration of tool learning, which integrates LLMs with external tools to address diverse real-world challenges. Assessing the capability of LLMs to utilise tools necessitates large-scale and stable benchmarks. However, previous works relied on either hand-crafted online tools with limited scale, or… ▽ More

    Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  43. arXiv:2403.06017  [pdf, ps, other

    cs.LG cs.CY

    Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark

    Authors: Xiaowei Qian, Zhimeng Guo, Jialiang Li, Haitao Mao, Bingheng Li, Suhang Wang, Yao Ma

    Abstract: Fair graph learning plays a pivotal role in numerous practical applications. Recently, many fair graph learning methods have been proposed; however, their evaluation often relies on poorly constructed semi-synthetic datasets or substandard real-world datasets. In such cases, even a basic Multilayer Perceptron (MLP) can outperform Graph Neural Networks (GNNs) in both utility and fairness. In this w… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  44. arXiv:2403.05812  [pdf, other

    cs.CL cs.AI

    Algorithmic progress in language models

    Authors: Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla

    Abstract: We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months,… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  45. arXiv:2403.05652  [pdf, other

    cs.LG cs.AI

    What is different between these datasets?

    Authors: Varun Babbar, Zhicheng Guo, Cynthia Rudin

    Abstract: The performance of machine learning models heavily depends on the quality of input data, yet real-world applications often encounter various data-related challenges. One such challenge could arise when curating training data or deploying the model in the real world - two comparable datasets in the same domain may have different distributions. While numerous techniques exist for detecting distribut… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  46. arXiv:2403.05396  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction

    Authors: Zhengrui Guo, Jiabo Ma, Yingxue Xu, Yihui Wang, Liansheng Wang, Hao Chen

    Abstract: Histopathology serves as the gold standard in cancer diagnosis, with clinical reports being vital in interpreting and understanding this process, guiding cancer treatment and patient care. The automation of histopathology report generation with deep learning stands to significantly enhance clinical efficiency and lessen the labor-intensive, time-consuming burden on pathologists in report writing.… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  47. Intent-aware Recommendation via Disentangled Graph Contrastive Learning

    Authors: Yuling Wang, Xiao Wang, Xiangzhou Huang, Yanhua Yu, Haoyang Li, Mengdi Zhang, Zirui Guo, Wei Wu

    Abstract: Graph neural network (GNN) based recommender systems have become one of the mainstream trends due to the powerful learning ability from user behavior data. Understanding the user intents from behavior data is the key to recommender systems, which poses two basic requirements for GNN-based recommender systems. One is how to learn complex and diverse intents especially when the user behavior is usua… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted by IJCAI 2023

    Journal ref: [C]//Proceedings of the 32th international joint conference on artificial intelligence. 2023: 2343-2351

  48. arXiv:2403.03535  [pdf, other

    cs.CV cs.LG

    Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications

    Authors: Minyang Hu, Hong Chang, Zong Guo, Bingpeng Ma, Shiguan Shan, Xilin Chen

    Abstract: Few-shot learning (FSL) aims to learn novel tasks with very few labeled samples by leveraging experience from \emph{related} training tasks. In this paper, we try to understand FSL by delving into two key questions: (1) How to quantify the relationship between \emph{training} and \emph{novel} tasks? (2) How does the relationship affect the \emph{adaptation difficulty} on novel tasks for different… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  49. arXiv:2403.02576  [pdf, other

    cs.DL cs.LG cs.SI

    AceMap: Knowledge Discovery through Academic Graph

    Authors: Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng Jin, Shiyu Liang, Bin Lu, Haiwen Wang, Yi Xu, Cheng Deng, Shao Zhang, Huquan Kang, Xingli Wang, Qi Li, Zhixin Guo, Jiexing Qi, Pan Liu, Yuyang Ren, Lyuwen Wu, Jungang Yang, Jianping Zhou , et al. (1 additional authors not shown)

    Abstract: The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publicatio… ▽ More

    Submitted 14 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Technical Report for AceMap (https://www.acemap.info)

  50. arXiv:2402.19014  [pdf, other

    cs.CV

    Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

    Authors: Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun

    Abstract: Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifically concerned with text-rich scenarios containing abundant document elements. Nevertheless, the importance of fine-grained features remains largely… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.