Skip to main content

Showing 1–50 of 290 results for author: Yan, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.00527  [pdf, other

    cs.DB

    ChatBI: Towards Natural Language to Complex Business Intelligence SQL

    Authors: Jinqing Lian, Xinyi Liu, Yingxia Shao, Yang Dong, Ming Wang, Zhang Wei, Tianqi Wan, Ming Dong, Hailin Yan

    Abstract: The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges. In this paper, we propose ChatBI, a comprehensive and eff… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  2. arXiv:2404.19652  [pdf, other

    cs.CV cs.AI

    VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

    Authors: Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai

    Abstract: Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queri… ▽ More

    Submitted 4 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  3. arXiv:2404.18359  [pdf, other

    cs.CL cs.AI

    FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

    Authors: Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He

    Abstract: In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choi… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  4. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  5. arXiv:2404.16573  [pdf, other

    cs.CV

    Multi-Scale Representations by Varying Window Attention for Semantic Segmentation

    Authors: Haotian Yan, Ming Wu, Chuang Zhang

    Abstract: Multi-scale learning is central to semantic segmentation. We visualize the effective receptive field (ERF) of canonical multi-scale representations and point out two risks in learning them: scale inadequacy and field inactivation. A novel multi-scale learner, varying window attention (VWA), is presented to address these issues. VWA leverages the local window attention (LWA) and disentangles LWA in… ▽ More

    Submitted 26 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: ICLR2024 Poster

  6. arXiv:2404.16423  [pdf, other

    cs.CV cs.RO

    Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images

    Authors: Hongyu Yan, Yadong Mu

    Abstract: Image-guided object assembly represents a burgeoning research topic in computer vision. This paper introduces a novel task: translating multi-view images of a structural 3D model (for example, one constructed with building blocks drawn from a 3D-object library) into a detailed sequence of assembly instructions executable by a robotic arm. Fed with multi-view images of the target 3D model for repli… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  7. arXiv:2404.15010  [pdf, other

    cs.CV

    X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

    Authors: Shuofeng Sun, Yongming Rao, Jiwen Lu, Haibin Yan

    Abstract: Numerous prior studies predominantly emphasize constructing relation vectors for individual neighborhood points and generating dynamic kernels for each vector and embedding these into high-dimensional spaces to capture implicit local structures. However, we contend that such implicit high-dimensional structure modeling approch inadequately represents the local geometric structure of point clouds d… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  8. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  9. arXiv:2404.12224  [pdf, other

    cs.CL

    Length Generalization of Causal Transformers without Position Encoding

    Authors: Jie Wang, Tao Ji, Yuanbin Wu, Hang Yan, Tao Gui, Qi Zhang, Xuanjing Huang, Xiaoling Wang

    Abstract: Generalizing to longer sentences is important for recent Transformer-based language models. Besides algorithms manipulating explicit position features, the success of Transformers without position encodings (NoPE) provides a new way to overcome the challenge. In this paper, we study the length generalization property of NoPE. We find that although NoPE can extend to longer sequences than the commo… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  10. arXiv:2404.11046  [pdf, other

    cs.AI cs.CV cs.LG

    Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model

    Authors: Hao Yan, Yuhong Guo

    Abstract: Federated learning aims to tackle the ``isolated data island" problem, where it trains a collective model from physically isolated clients while safeguarding the privacy of users' data. However, supervised federated learning necessitates that each client labels their data for training, which can be both time-consuming and resource-intensive, and may even be impractical for edge devices. Moreover,… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  11. arXiv:2404.06512  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

    Authors: Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution. Recent efforts have aimed to enhance the high-resolution understanding capabilities of LVLMs, yet they remain capped at approximately 1500 x 1500 pixels and constrained to a relatively narrow reso… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Code and models are publicly available at https://github.com/InternLM/InternLM-XComposer

  12. arXiv:2404.05520  [pdf, other

    cs.SE

    The Fact Selection Problem in LLM-Based Program Repair

    Authors: Nikhil Parasaram, Huijie Yan, Boyu Yang, Zineb Flahy, Abriele Qudsi, Damian Ziaber, Earl Barr, Sergey Mechtaev

    Abstract: Recent research has shown that incorporating bug-related facts, such as stack traces and GitHub issues, into prompts enhances the bug-fixing capabilities of large language models (LLMs). Considering the ever-increasing context window of these models, a critical question arises: what and how many facts should be included in prompts to maximise the chance of correctly fixing bugs? To answer this que… ▽ More

    Submitted 9 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Code, scripts and data necessary to reproduce this work are available at https://github.com/PyRepair/maniple

  13. arXiv:2404.04403  [pdf, other

    stat.ME cs.AI

    Low-Rank Robust Subspace Tensor Clustering for Metro Passenger Flow Modeling

    Authors: Jiuyun Hu, Ziyue Li, Chen Zhang, Fugee Tsung, Hao Yan

    Abstract: Tensor clustering has become an important topic, specifically in spatio-temporal modeling, due to its ability to cluster spatial modes (e.g., stations or road segments) and temporal modes (e.g., time of the day or day of the week). Our motivating example is from subway passenger flow modeling, where similarities between stations are commonly found. However, the challenges lie in the innate high-di… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Conditionally Accepted in INFORMS Journal of Data Science

  14. arXiv:2404.01612  [pdf, other

    cs.CV

    Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo

    Authors: Zongrui Li, Zhan Lu, Haojie Yan, Boxin Shi, Gang Pan, Qian Zheng, Xudong Jiang

    Abstract: Natural Light Uncalibrated Photometric Stereo (NaUPS) relieves the strict environment and light assumptions in classical Uncalibrated Photometric Stereo (UPS) methods. However, due to the intrinsic ill-posedness and high-dimensional ambiguities, addressing NaUPS is still an open question. Existing works impose strong assumptions on the environment lights and objects' material, restricting the effe… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Paper accepted by CVPR2024

  15. arXiv:2404.00417  [pdf, other

    cs.LG cs.AI cs.CV

    Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation

    Authors: HongWei Yan, Liyuan Wang, Kaisheng Ma, Yi Zhong

    Abstract: To accommodate real-world dynamics, artificial intelligence systems need to cope with sequentially arriving content in an online manner. Beyond regular Continual Learning (CL) attempting to address catastrophic forgetting with offline training of each task, Online Continual Learning (OCL) is a more challenging yet realistic setting that performs CL in a one-pass data stream. Current OCL methods pr… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  16. arXiv:2403.18241  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation

    Authors: Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji

    Abstract: 3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  17. arXiv:2403.17914  [pdf, other

    cs.AI

    Hierarchical Multi-label Classification for Fine-level Event Extraction from Aviation Accident Reports

    Authors: Xinyu Zhao, Hao Yan, Yongming Liu

    Abstract: A large volume of accident reports is recorded in the aviation domain, which greatly values improving aviation safety. To better use those reports, we need to understand the most important events or impact factors according to the accident reports. However, the increasing number of accident reports requires large efforts from domain experts to label those reports. In order to make the labeling pro… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted in INFORMS Journal of Data Science

  18. arXiv:2403.17891  [pdf, other

    cs.LG cs.AI

    Image-based Novel Fault Detection with Deep Learning Classifiers using Hierarchical Labels

    Authors: Nurettin Sergin, Jiayu Huang, Tzyy-Shuh Chang, Hao Yan

    Abstract: One important characteristic of modern fault classification systems is the ability to flag the system when faced with previously unseen fault types. This work considers the unknown fault detection capabilities of deep neural network-based fault classifiers. Specifically, we propose a methodology on how, when available, labels regarding the fault taxonomy can be used to increase unknown fault detec… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted in IISE Transaction

  19. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  20. arXiv:2403.16358  [pdf, other

    cs.CV

    ChebMixer: Efficient Graph Representation Learning with MLP Mixer

    Authors: Xiaoyan Kui, Haonan Yan, Qinsong Li, Liming Chen, Beiji Zou

    Abstract: Graph neural networks have achieved remarkable success in learning graph representations, especially graph Transformer, which has recently shown superior performance on various graph mining tasks. However, graph Transformer generally treats nodes as tokens, which results in quadratic complexity regarding the number of nodes during self-attention computation. The graph MLP Mixer addresses this chal… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  21. arXiv:2403.16210  [pdf, other

    cs.CV cs.AI cs.GR

    Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

    Authors: Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji

    Abstract: We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single tri-plane tensor, from which multiple Singed… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Video: https://youtu.be/lRn-HqyCrLI

  22. arXiv:2403.15679  [pdf, other

    cs.CV cs.MM

    DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes

    Authors: Hao Yan, Zhihui Ke, Xiaobo Zhou, Tie Qiu, Xidong Shi, Dadong Jiang

    Abstract: Implicit neural representations for video (NeRV) have recently become a novel way for high-quality video representation. However, existing works employ a single network to represent the entire video, which implicitly confuse static and dynamic information. This leads to an inability to effectively compress the redundant static information and lack the explicitly modeling of global temporal-coheren… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: CVPR 2024. Project page at https://haoyan14.github.io/DS-NeRV

  23. arXiv:2403.14240  [pdf, other

    cs.CV

    Weak Supervision with Arbitrary Single Frame for Micro- and Macro-expression Spotting

    Authors: Wang-Wang Yu, Xian-Shi Zhang, Fu-Ya Luo, Yijun Cao, Kai-Fu Yang, Hong-Mei Yan, Yong-Jie Li

    Abstract: Frame-level micro- and macro-expression spotting methods require time-consuming frame-by-frame observation during annotation. Meanwhile, video-level spotting lacks sufficient information about the location and number of expressions during training, resulting in significantly inferior performance compared with fully-supervised spotting. To bridge this gap, we propose a point-level weakly-supervised… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  24. arXiv:2403.14112  [pdf, other

    cs.CL

    Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

    Authors: Jiaxing Sun, Weiquan Huang, Jiang Wu, Chenya Gu, Wei Li, Songyang Zhang, Hang Yan, Conghui He

    Abstract: We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. We evaluated 7 English and 12 Chinese-oriented LLMs on CHARM, employing 5 representative prompt strategies for improving LLMs' reasoning ability, such as Chain-of-Thought.… ▽ More

    Submitted 19 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Equal contribution: Jiaxing Sun, Weiquan Huang, Jiang Wu; Corresponding author: Conghui He

  25. arXiv:2403.13365  [pdf, other

    cs.RO cs.CV

    ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

    Authors: Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu

    Abstract: Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' grasping and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  26. arXiv:2403.13267  [pdf, other

    cs.SI

    Dynamic Information Dissemination Model Incorporating Non-Adjacent Node Interaction

    Authors: Xinyu Li, Jinyang Huang, Xiang Zhang, Peng Zhao, Meng Wang, Guohang Zhuang, Huan Yan, Xiao Sun, Meng Wang

    Abstract: Describing the dynamics of information dissemination within social networks poses a formidable challenge. Despite multiple endeavors aimed at addressing this issue, only a limited number of studies have effectively replicated and forecasted the evolving course of information dissemination. In this paper, we propose a novel model, DM-NAI, which not only considers the information transfer between ad… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.06385

  27. arXiv:2403.12171  [pdf, other

    cs.CL cs.AI

    EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

    Authors: Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  28. arXiv:2403.11202  [pdf, other

    cs.AR cs.AI cs.PL

    Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework

    Authors: Kaiyan Chang, Kun Wang, Nan Yang, Ying Wang, Dantong Jin, Wenlong Zhu, Zhirong Chen, Cangyuan Li, Hao Yan, Yunhao Zhou, Zhuoliang Zhao, Yuan Cheng, Yudong Pan, Yiqi Liu, Mengdi Wang, Shengwen Liang, yinhe han, Huawei Li, Xiaowei Li

    Abstract: Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by L… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by DAC 2024; please note that this is not the final camera-ready version

  29. arXiv:2403.08193  [pdf, other

    cs.LG cs.AR cs.ET

    Learning-driven Physically-aware Large-scale Circuit Gate Sizing

    Authors: Yuyang Ye, Peng Xu, Lizheng Ren, Tinghuan Chen, Hao Yan, Bei Yu, Longxing Shi

    Abstract: Gate sizing plays an important role in timing optimization after physical design. Existing machine learning-based gate sizing works cannot optimize timing on multiple timing paths simultaneously and neglect the physical constraint on layouts. They cause sub-optimal sizing solutions and low-efficiency issues when compared with commercial gate sizing tools. In this work, we propose a learning-driven… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  30. arXiv:2403.06141  [pdf, other

    cs.SI

    Information Dissemination Model Based on User Attitude and Public Opinion Environment

    Authors: Xinyu Li, Jinyang Huang, Xiang Zhang, Peng Zhao, Meng Wang, Guohang Zhuang, Huan Yan, Xiao Sun, Meng Wang

    Abstract: Modeling the information dissemination process in social networks is a challenging problem. Despite numerous attempts to address this issue, existing studies often assume that user attitudes have only one opportunity to alter during the information dissemination process. Additionally, these studies tend to consider the transformation of user attitudes as solely influenced by a single user, overloo… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  31. arXiv:2403.05217  [pdf, other

    cs.CL cs.AI cs.IR

    Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering

    Authors: Hongda Sun, Yuxuan Liu, Chengwei Wu, Haiyu Yan, Cheng Tai, Xin Gao, Shuo Shang, Rui Yan

    Abstract: Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, nei… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: TheWebConf 2024 (WWW 2024) oral, code repo: https://github.com/EthanLeo-LYX/LLMQA

  32. arXiv:2403.02757  [pdf, other

    cs.CL

    In-Memory Learning: A Declarative Learning Framework for Large Language Models

    Authors: Bo Wang, Tianxiang Sun, Hang Yan, Siyin Wang, Qingyuan Cheng, Xipeng Qiu

    Abstract: The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experience… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  33. arXiv:2402.19282  [pdf, other

    cs.CL

    WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

    Authors: Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, Jia Yu, ChaoBin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Lin Dahua, Yu Qiao, Hang Yan , et al. (1 additional authors not shown)

    Abstract: This paper presents WanJuan-CC, a safe and high-quality open-sourced English webtext dataset derived from Common Crawl data. The study addresses the challenges of constructing large-scale pre-training datasets for language models, which require vast amounts of high-quality data. A comprehensive process was designed to handle Common Crawl data, including extraction, heuristic rule filtering, fuzzy… ▽ More

    Submitted 17 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  34. arXiv:2402.16594  [pdf, other

    cs.CV

    CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition

    Authors: Qixuan Zheng, Ming Zhang, Hong Yan

    Abstract: To achieve greater accuracy, hypergraph matching algorithms require exponential increases in computational resources. Recent kd-tree-based approximate nearest neighbor (ANN) methods, despite the sparsity of their compatibility tensor, still require exhaustive calculations for large-scale graph matching. This work utilizes CUR tensor decomposition and introduces a novel cascaded second and third-or… ▽ More

    Submitted 30 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024

  35. arXiv:2402.16319  [pdf, other

    cs.CL

    Data-freeWeight Compress and Denoise for Large Language Models

    Authors: Runyu Peng, Yunhua Zhou, Qipeng Guo, Yang Gao, Hang Yan, Xipeng Qiu, Dahua Lin

    Abstract: Large Language Models (LLMs) are reshaping the research landscape in artificial intelligence, particularly as model parameters scale up significantly, unlocking remarkable capabilities across various domains. Nevertheless, the scalability of model parameters faces constraints due to limitations in GPU memory and computational speed. To address these constraints, various weight compression methods… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  36. arXiv:2402.15637  [pdf, other

    cs.CL

    Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models

    Authors: Yanzheng Xiang, Hanqi Yan, Lin Gui, Yulan He

    Abstract: In-context learning has become a popular paradigm in natural language processing. However, its performance can be significantly influenced by the order of in-context demonstration examples. In this paper, we found that causal language models (CausalLMs) are more sensitive to this order compared to prefix language models (PrefixLMs). We attribute this phenomenon to the auto-regressive attention mas… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  37. arXiv:2402.15309  [pdf, other

    cs.LG cs.CL

    Counterfactual Generation with Identifiability Guarantees

    Authors: Hanqi Yan, Lingjing Kong, Lin Gui, Yuejie Chi, Eric Xing, Yulan He, Kun Zhang

    Abstract: Counterfactual generation lies at the core of various machine learning tasks, including image translation and controllable text generation. This generation process usually requires the identification of the disentangled latent representations, such as content and style, that underlie the observed data. However, it becomes more challenging when faced with a scarcity of paired data and labeling info… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Neurips23. Controllable generation in causal perspective with a case study of ChatGPT, sheds light on theory-guaranteed alignment in language models

  38. arXiv:2402.14963  [pdf, other

    cs.CL cs.AI

    Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning

    Authors: Hanqi Yan, Qinglin Zhu, Xinyu Wang, Lin Gui, Yulan He

    Abstract: While Large language models (LLMs) have the capability to iteratively reflect on their own outputs, recent studies have observed their struggles with knowledge-rich problems without access to external resources. In addition to the inefficiency of LLMs in self-assessment, we also observe that LLMs struggle to revisit their predictions despite receiving explicit negative feedback. Therefore, We prop… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Code is available at https://github.com/hanqi-qi/Mirror.git

  39. arXiv:2402.14526  [pdf, other

    cs.CL cs.AI

    Balanced Data Sampling for Language Model Training with Clustering

    Authors: Yunfan Shao, Linyang Li, Zhaoye Fei, Hang Yan, Dahua Lin, Xipeng Qiu

    Abstract: Data plays a fundamental role in the training of Large Language Models (LLMs). While attention has been paid to the collection and composition of datasets, determining the data sampling strategy in training remains an open question. Most LLMs are trained with a simple strategy, random sampling. However, this sampling strategy ignores the unbalanced nature of training data distribution, which can b… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  40. arXiv:2402.14310  [pdf, other

    cs.CL

    Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize Encoded Knowledge

    Authors: Jinlan Fu, Shenzhen Huangfu, Hang Yan, See-Kiong Ng, Xipeng Qiu

    Abstract: Large Language Models (LLMs) have recently showcased remarkable generalizability in various domains. Despite their extensive knowledge, LLMs still face challenges in efficiently utilizing encoded knowledge to develop accurate and logical reasoning processes. To mitigate this problem, we introduced Hint-before-Solving Prompting (HSP), which guides the model to generate hints (e.g., specific knowled… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 18 pages

  41. arXiv:2402.13583  [pdf, other

    cs.CL

    LongWanjuan: Towards Systematic Measurement for Long Text Quality

    Authors: Kai Lv, Xiaoran Liu, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin

    Abstract: The quality of training data are crucial for enhancing the long-text capabilities of foundation models. Despite existing efforts to refine data quality through heuristic rules and evaluations based on data diversity and difficulty, there's a lack of systematic approaches specifically tailored for assessing long texts. Addressing this gap, our work systematically measures the quality of long texts… ▽ More

    Submitted 21 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Update Figures

  42. arXiv:2402.13055  [pdf, other

    cs.CL cs.AI

    Identifying Semantic Induction Heads to Understand In-Context Learning

    Authors: Jie Ren, Qipeng Guo, Hang Yan, Dongrui Liu, Xipeng Qiu, Dahua Lin

    Abstract: Although large language models (LLMs) have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness. To gain a better understanding of LLMs, we conduct a detailed analysis of the operations of attention heads and aim to better understand the in-context learning of LLMs. Specifically, we investigate whether attention heads en… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  43. arXiv:2402.13013  [pdf, other

    cs.CL

    Code Needs Comments: Enhancing Code LLMs with Comment Augmentation

    Authors: Demin Song, Honglin Guo, Yunhua Zhou, Shuhao Xing, Yudong Wang, Zifan Song, Wenwei Zhang, Qipeng Guo, Hang Yan, Xipeng Qiu, Dahua Lin

    Abstract: The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs). We examine the impact of pre-training data on code-focused LLMs' performance by assessing the comment density as a measure of PL-NL alignment. Given the scarcity of code-comment aligned data in pre-train… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  44. arXiv:2402.12399  [pdf, other

    cs.LG cs.AI cs.CL

    Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

    Authors: Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

    Abstract: Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively… ▽ More

    Submitted 21 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  45. arXiv:2402.12226  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

    Authors: Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yugang Jiang, Xipeng Qiu

    Abstract: We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the… ▽ More

    Submitted 7 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 28 pages, 16 figures, under review, work in progress

  46. arXiv:2402.06852  [pdf

    cs.AI cs.CL

    ChemLLM: A Chemical Large Language Model

    Authors: Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Wanli Ouyang, Dongzhan Zhou, Shufei Zhang, Mao Su, Han-Sen Zhong, Yuqiang Li

    Abstract: Large language models (LLMs) have made impressive progress in chemistry applications. However, the community lacks an LLM specifically designed for chemistry. The main challenges are two-fold: firstly, most chemical data and scientific knowledge are stored in structured databases, which limits the model's ability to sustain coherent dialogue when used directly. Secondly, there is an absence of obj… ▽ More

    Submitted 25 April, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures

  47. arXiv:2402.06332  [pdf, other

    cs.CL

    InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

    Authors: Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin

    Abstract: The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatil… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  48. arXiv:2402.06150  [pdf, other

    cs.LG cs.CV

    Domain Generalization with Small Data

    Authors: Kecheng Chen, Elena Gal, Hong Yan, Haoliang Li

    Abstract: In this work, we propose to tackle the problem of domain generalization in the context of \textit{insufficient samples}. Instead of extracting latent feature embeddings based on deterministic models, we propose to learn a domain-invariant representation based on the probabilistic framework by mapping each data point into probabilistic embeddings. Specifically, we first extend empirical maximum mea… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted by International Journal of Computer Vision

  49. arXiv:2402.00822  [pdf, other

    cs.HC cs.AI

    WiOpen: A Robust Wi-Fi-based Open-set Gesture Recognition Framework

    Authors: Xiang Zhang, Jingyang Huang, Huan Yan, Peng Zhao, Guohang Zhuang, Zhi Liu, Bin Liu

    Abstract: Recent years have witnessed a growing interest in Wi-Fi-based gesture recognition. However, existing works have predominantly focused on closed-set paradigms, where all testing gestures are predefined during training. This poses a significant challenge in real-world applications, as unseen gestures might be misclassified as known classes during testing. To address this issue, we propose WiOpen, a… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  50. arXiv:2401.17221  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MouSi: Poly-Visual-Expert Vision-Language Models

    Authors: Xiaoran Fan, Tao Ji, Changhao Jiang, Shuo Li, Senjie Jin, Sirui Song, Junke Wang, Boyang Hong, Lu Chen, Guodong Zheng, Ming Zhang, Caishuang Huang, Rui Zheng, Zhiheng Xi, Yuhao Zhou, Shihan Dou, Junjie Ye, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.