Skip to main content

Showing 1–50 of 249 results for author: He, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.19738  [pdf, other

    cs.HC

    DiaryHelper: Exploring the Use of an Automatic Contextual Information Recording Agent for Elicitation Diary Study

    Authors: Junze Li, Changyang He, Jiaxiong Hu, Boyang Jia, Alon Halevy, Xiaojuan Ma

    Abstract: Elicitation diary studies, a type of qualitative, longitudinal research method, involve participants to self-report aspects of events of interest at their occurrences as memory cues for providing details and insights during post-study interviews. However, due to time constraints and lack of motivation, participants' diary entries may be vague or incomplete, impairing their later recall. To address… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CHI 2024

  2. arXiv:2404.18359  [pdf, other

    cs.CL cs.AI

    FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

    Authors: Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He

    Abstract: In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choi… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  3. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  4. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  5. arXiv:2404.15992  [pdf, other

    cs.CV eess.IV

    HDDGAN: A Heterogeneous Dual-Discriminator Generative Adversarial Network for Infrared and Visible Image Fusion

    Authors: Guosheng Lu, Zile Fang, Chunming He, Zhigang Zhao

    Abstract: Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images, enabling the capture of important features and hidden details of subjects in complex scenes and disturbed environments. Consequently, IVIF offers distinct advantages in practical applications such as video surveillance, night navigation… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  6. arXiv:2404.15341  [pdf, other

    eess.SP cs.LG

    Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise

    Authors: Jing-Xiao Liao, Chao He, Jipu Li, Jinwei Sun, Shiping Zhang, Xiaoge Zhang

    Abstract: Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  7. arXiv:2404.15254  [pdf, other

    cs.CV

    UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

    Authors: Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He

    Abstract: This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios. The UniMER dataset consists of a large-scale training set UniMER-1M offering an unprecedented scale and diversity with one million training instances and a meticulously designed test set UniMER-Test that reflects a diverse range of formula distributio… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures

  8. arXiv:2404.14239  [pdf, other

    cs.CV

    MultiBooth: Towards Generating All Your Concepts in an Image from Text

    Authors: Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Li Xiu

    Abstract: This paper introduces MultiBooth, a novel and efficient technique for multi-concept customization in image generation from text. Despite the significant advancements in customized generation methods, particularly with the success of diffusion models, existing methods often struggle with multi-concept scenarios due to low concept fidelity and high inference cost. MultiBooth addresses these issues b… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Project Page: https://multibooth.github.io/ . Github Page: https://github.com/chenyangzhu1/MultiBooth

  9. arXiv:2404.13972  [pdf, other

    cs.CV

    Non-Uniform Exposure Imaging via Neuromorphic Shutter Control

    Authors: Mingyuan Lin, Jian Liu, Chi Zhang, Zibo Zhao, Chu He, Lei Yu

    Abstract: By leveraging the blur-noise trade-off, imaging with non-uniform exposures largely extends the image acquisition flexibility in harsh environments. However, the limitation of conventional cameras in perceiving intra-frame dynamic information prevents existing methods from being implemented in the real-world frame acquisition for real-time adaptive camera shutter control. To address this challenge,… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2404.13659  [pdf, other

    cs.CV

    LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing

    Authors: Tong Wang, Guanzhou Chen, Xiaodong Zhang, Chenxi Liu, Xiaoliang Tan, Jiaqi Wang, Chanjuan He, Wenlin Zhou

    Abstract: Despite the rapid evolution of semantic segmentation for land cover classification in high-resolution remote sensing imagery, integrating multiple data modalities such as Digital Surface Model (DSM), RGB, and Near-infrared (NIR) remains a challenge. Current methods often process only two types of data, missing out on the rich information that additional modalities can provide. Addressing this gap,… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  11. arXiv:2404.09622  [pdf, other

    cs.RO cs.AI

    DIDLM:A Comprehensive Multi-Sensor Dataset with Infrared Cameras, Depth Cameras, LiDAR, and 4D Millimeter-Wave Radar in Challenging Scenarios for 3D Mapping

    Authors: WeiSheng Gong, Chen He, KaiJie Su, QingYong Li

    Abstract: This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  12. arXiv:2404.08334  [pdf, other

    eess.SY cs.RO

    Guaranteed Completion of Complex Tasks via Temporal Logic Trees and Hamilton-Jacobi Reachability

    Authors: Frank J. Jiang, Kaj Munhoz Arfvidsson, Chong He, Mo Chen, Karl H. Johansson

    Abstract: In this paper, we present an approach for guaranteeing the completion of complex tasks with cyber-physical systems (CPS). Specifically, we leverage temporal logic trees constructed using Hamilton-Jacobi reachability analysis to (1) check for the existence of control policies that complete a specified task and (2) develop a computationally-efficient approach to synthesize the full set of control in… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  13. arXiv:2404.07584  [pdf, other

    cs.CL

    UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs

    Authors: Chaoqun He, Renjie Luo, Shengding Hu, Yuanqian Zhao, Jie Zhou, Hanghao Wu, Jiajie Zhang, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: Evaluation is pivotal for honing Large Language Models (LLMs), pinpointing their capabilities and guiding enhancements. The rapid development of LLMs calls for a lightweight and easy-to-use framework for swift evaluation deployment. However, due to the various implementation details to consider, developing a comprehensive evaluation platform is never easy. Existing platforms are often complex and… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  14. arXiv:2404.06512  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

    Authors: Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution. Recent efforts have aimed to enhance the high-resolution understanding capabilities of LVLMs, yet they remain capped at approximately 1500 x 1500 pixels and constrained to a relatively narrow reso… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Code and models are publicly available at https://github.com/InternLM/InternLM-XComposer

  15. arXiv:2404.06395  [pdf, other

    cs.CL cs.LG

    MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

    Authors: Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce… ▽ More

    Submitted 22 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Enlarge the font size in several figures

  16. arXiv:2404.05089  [pdf, other

    cs.CL cs.LG

    SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

    Authors: Alexandre Muzio, Alex Sun, Churan He

    Abstract: The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input. Despite their promise, MoEs face challenges, particularly in terms of memory requirements. To address this, our work introduces SEER-MoE, a novel two-stage framework for reducing both the memory footprint and compute requiremen… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 8+3 pages

  17. arXiv:2404.04823  [pdf, other

    cs.CV

    3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

    Authors: Weijia Li, Haote Yang, Zhenghao Hu, Juepeng Zheng, Gui-Song Xia, Conghui He

    Abstract: 3D building reconstruction from monocular remote sensing images is an important and challenging research problem that has received increasing attention in recent years, owing to its low cost of data acquisition and availability for large-scale applications. However, existing methods rely on expensive 3D-annotated samples for fully-supervised training, restricting their application to large-scale c… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: accepted by CVPR 2024

  18. arXiv:2404.02638  [pdf, other

    cs.CV

    SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation

    Authors: Junyan Ye, Qiyan Luo, Jinhua Yu, Huaping Zhong, Zhimeng Zheng, Conghui He, Weijia Li

    Abstract: This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i.e., using satellite and street-view image pairs. The main challenge lies in overcoming the significant perspective differences between street views and satellite views. In this work, we introduce SG-BEV, a novel approach for satellite-guided BEV fusion for cross-view semantic segmentation. To over… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: accepted by CVPR 2024

  19. arXiv:2403.20213  [pdf, other

    cs.CV

    H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

    Authors: Chao Pang, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Xingxing Weng, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He

    Abstract: The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs. Existing Remote Sensing specific Vision Language Models (RSVLMs) still have considerable potential for improvement, primarily owing to the lack… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Equal contribution: Chao Pang, Jiang Wu; Corresponding author: Gui-Song Xia, Conghui He

  20. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  21. Persuasion or Insulting? Unpacking Discursive Strategies of Gender Debate in Everyday Feminism in China

    Authors: Yue Deng, Zheng Chen, Changyang He, Zhicong Lu, Bo Li

    Abstract: Speaking out for women's daily needs on social media has become a crucial form of everyday feminism in China. Gender debate naturally intertwines with such feminist advocacy, where users in opposite stances discuss gender-related issues through intense discourse. The complexities of gender debate necessitate a systematic understanding of discursive strategies for achieving effective gender communi… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 19 pages, 3 figures, In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24)

  22. arXiv:2403.15853  [pdf

    eess.IV cs.CV

    An edge detection-based deep learning approach for tear meniscus height measurement

    Authors: Kesheng Wang, Kunhui Xu, Xiaoyu Chen, Chunlei He, Jianfeng Zhang, Dexing Kong, Qi Dai, Shoujun Huang

    Abstract: Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask lab… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 22 pages, 5 figures

  23. arXiv:2403.14112  [pdf, other

    cs.CL

    Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

    Authors: Jiaxing Sun, Weiquan Huang, Jiang Wu, Chenya Gu, Wei Li, Songyang Zhang, Hang Yan, Conghui He

    Abstract: We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense. We evaluated 7 English and 12 Chinese-oriented LLMs on CHARM, employing 5 representative prompt strategies for improving LLMs' reasoning ability, such as Chain-of-Thought.… ▽ More

    Submitted 19 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Equal contribution: Jiaxing Sun, Weiquan Huang, Jiang Wu; Corresponding author: Conghui He

  24. arXiv:2403.07920  [pdf, other

    q-bio.BM cs.AI cs.CL cs.LG

    ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

    Authors: Le Zhuo, Zewen Chi, Minghao Xu, Heyan Huang, Heqi Zheng, Conghui He, Xian-Ling Mao, Wentao Zhang

    Abstract: We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural language text is interspersed with an arbitrary number of proteins. Besides, we propose the protein-as-word language modeling approach to train ProtLLM. By dev… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

    Comments: https://protllm.github.io/project/

  25. arXiv:2403.04703  [pdf, other

    cs.RO

    mmPlace: Robust Place Recognition with Intermediate Frequency Signal of Low-cost Single-chip Millimeter Wave Radar

    Authors: Chengzhen Meng, Yifan Duan, Chenming He, Dequan Wang, Xiaoran Fan, Yanyong Zhang

    Abstract: Place recognition is crucial for tasks like loop-closure detection and re-localization. Single-chip millimeter wave radar (single-chip radar in short) emerges as a low-cost sensor option for place recognition, with the advantage of insensitivity to degraded visual environments. However, it encounters two challenges. Firstly, sparse point cloud from single-chip radar leads to poor performance when… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 8 pages, 8 figures

  26. arXiv:2403.02127  [pdf, other

    cs.CV cs.AI cs.CL

    LOCR: Location-Guided Transformer for Optical Character Recognition

    Authors: Yu Sun, Dongzhan Zhou, Chen Lin, Conghui He, Wanli Ouyang, Han-Sen Zhong

    Abstract: Academic documents are packed with texts, equations, tables, and figures, requiring comprehensive understanding for accurate Optical Character Recognition (OCR). While end-to-end OCR methods offer improved accuracy over layout-based approaches, they often grapple with significant repetition issues, especially with complex layouts in Out-Of-Domain (OOD) documents.To tackle this issue, we propose LO… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  27. arXiv:2403.00529  [pdf, other

    cs.SD cs.LG eess.AS

    VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

    Authors: Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee

    Abstract: Achieving nuanced and accurate emulation of human voice has been a longstanding goal in artificial intelligence. Although significant progress has been made in recent years, the mainstream of speech synthesis models still relies on supervised speaker modeling and explicit reference utterances. However, there are many aspects of human voice, such as emotion, intonation, and speaking style, for whic… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: preprint

  28. arXiv:2402.19282  [pdf, other

    cs.CL

    WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

    Authors: Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, Jia Yu, ChaoBin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Lin Dahua, Yu Qiao, Hang Yan , et al. (1 additional authors not shown)

    Abstract: This paper presents WanJuan-CC, a safe and high-quality open-sourced English webtext dataset derived from Common Crawl data. The study addresses the challenges of constructing large-scale pre-training datasets for language models, which require vast amounts of high-quality data. A comprehensive process was designed to handle Common Crawl data, including extraction, heuristic rule filtering, fuzzy… ▽ More

    Submitted 17 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  29. arXiv:2402.17645  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

    Authors: Shuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang, Rui Qian, Conghui He, Dahua Lin, Jiaqi Wang

    Abstract: We present SongComposer, an innovative LLM designed for song composition. It could understand and generate melodies and lyrics in symbolic song representations, by leveraging the capability of LLM. Existing music-related LLM treated the music as quantized audio signals, while such implicit encoding leads to inefficient encoding and poor flexibility. In contrast, we resort to symbolic song represen… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: project page: https://pjlab-songcomposer.github.io/ code: https://github.com/pjlab-songcomposer/songcomposer

  30. arXiv:2402.14008  [pdf, other

    cs.CL

    OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

    Authors: Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun

    Abstract: Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains. With traditional benchmarks becoming less challenging for these models, new rigorous challenges are essential to gauge their advanced abilities. In this work, we present Olym… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  31. arXiv:2402.13583  [pdf, other

    cs.CL

    LongWanjuan: Towards Systematic Measurement for Long Text Quality

    Authors: Kai Lv, Xiaoran Liu, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin

    Abstract: The quality of training data are crucial for enhancing the long-text capabilities of foundation models. Despite existing efforts to refine data quality through heuristic rules and evaluations based on data diversity and difficulty, there's a lack of systematic approaches specifically tailored for assessing long texts. Addressing this gap, our work systematically measures the quality of long texts… ▽ More

    Submitted 21 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Update Figures

  32. arXiv:2402.05935  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

    Authors: Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao

    Abstract: We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory

  33. arXiv:2402.03578  [pdf, other

    cs.MA cs.AI

    LLM Multi-Agent Systems: Challenges and Open Problems

    Authors: Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, Zhaozhuo Xu, Chaoyang He

    Abstract: This paper explores existing works of multi-agent systems and identifies challenges that remain inadequately addressed. By leveraging the diverse capabilities and roles of individual agents within a multi-agent system, these systems can tackle complex tasks through collaboration. We discuss optimizing task allocation, fostering robust reasoning through iterative debates, managing complex and layer… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  34. arXiv:2401.16420  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

    Authors: Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension. This model goes beyond conventional vision-language understanding, adeptly crafting interleaved text-image content from diverse inputs like outlines, detailed textual specifications, and reference images, enabling highly customizable content creation. InternLM-XCo… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Code and models are available at https://github.com/InternLM/InternLM-XComposer

  35. arXiv:2401.14624  [pdf, other

    cs.CL

    Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

    Authors: Zhaoye Fei, Yunfan Shao, Linyang Li, Zhiyuan Zeng, Conghui He, Hang Yan, Dahua Lin, Xipeng Qiu

    Abstract: Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains. Previous works have primarily focused on manually specifying resources and collecting high-quality data on specific domains, which significantly consume time and effort. To address this limitation, we propose an efficient… ▽ More

    Submitted 4 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: We have released the full data (total of 735GB) in https://huggingface.co/datasets/Query-of-CC/knowledge_pile_full and partial data (about 40GB) in https://huggingface.co/datasets/Query-of-CC/knowledge_pile

  36. arXiv:2401.11767  [pdf, other

    cs.CV

    Concealed Object Segmentation with Hierarchical Coherence Modeling

    Authors: Fengyang Xiao, Pan Zhang, Chunming He, Runze Hu, Yutao Liu

    Abstract: Concealed object segmentation (COS) is a challenging task that involves localizing and segmenting those concealed objects that are visually blended with their surrounding environments. Despite achieving remarkable success, existing COS segmenters still struggle to achieve complete segmentation results in extremely concealed scenarios. In this paper, we propose a Hierarchical Coherence Modeling (HC… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted to CICAI 2023. 13 pages, 6 figures, 4 tables

  37. arXiv:2401.11408  [pdf, other

    cs.CL cs.AI

    SEBERTNets: Sequence Enhanced BERT Networks for Event Entity Extraction Tasks Oriented to the Finance Field

    Authors: Congqing He, Xiangyu Zhu, Yuquan Le, Yuzhong Liu, Jianhong Yin

    Abstract: Event extraction lies at the cores of investment analysis and asset management in the financial field, and thus has received much attention. The 2019 China conference on knowledge graph and semantic computing (CCKS) challenge sets up a evaluation competition for event entity extraction task oriented to the finance field. In this task, we mainly focus on how to extract the event entity accurately,… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: CCKS 2019

  38. arXiv:2401.08615  [pdf, other

    cs.CV

    Online Anomaly Detection over Live Social Video Streaming

    Authors: Chengkun He, Xiangmin Zhou, Chen Wang, Iqbal Gondal, Jie Shao, Xun Yi

    Abstract: Social video anomaly is an observation in video streams that does not conform to a common pattern of dataset's behaviour. Social video anomaly detection plays a critical role in applications from e-commerce to e-learning. Traditionally, anomaly detection techniques are applied to find anomalies in video broadcasting. However, they neglect the live social video streams which contain interactive tal… ▽ More

    Submitted 1 December, 2023; originally announced January 2024.

  39. Understanding Emotional Disclosure via Diary-keeping in Quarantine on Social Media

    Authors: Yue Deng, Changyang He, Bo Li

    Abstract: Quarantine is a widely-adopted measure during health crises caused by highly-contagious diseases like COVID-19, yet it poses critical challenges to public mental health. Given this context, emotional disclosure on social media in the form of keeping a diary emerges as a popular way for individuals to express emotions and record their mental health status. However, the exploration of emotional disc… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: 13 pages, 6 figures, In Proceedings of The Eleventh International Symposium of Chinese CHI (Chinese CHI 2023)

  40. arXiv:2401.00912  [pdf, other

    cs.CV

    ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

    Authors: Chenhang He, Ruihuang Li, Guowen Zhang, Lei Zhang

    Abstract: Window-based transformers have demonstrated strong ability in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner. However, because of the sparse nature of point clouds, the number of voxels per window varies significantly. Current methods partition the voxels in each window into multiple subsets of equal… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: 11 pages, 7 figures

  41. arXiv:2312.15430  [pdf, other

    cs.CV

    Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

    Authors: Jianqiang Ren, Chao He, Lin Liu, Jiahao Chen, Yutong Wang, Yafei Song, Jianfang Li, Tangli Xue, Siqi Hu, Tao Chen, Kunkun Zheng, Jianjing Xiang, Liefeng Bo

    Abstract: There is a growing demand for customized and expressive 3D characters with the emergence of AI agents and Metaverse, but creating 3D characters using traditional computer graphics tools is a complex and time-consuming task. To address these challenges, we propose a user-friendly framework named Make-A-Character (Mach) to create lifelike 3D avatars from text descriptions. The framework leverages th… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: Technical Report

  42. arXiv:2312.14232  [pdf, other

    cs.CV cs.AI

    Parrot Captions Teach CLIP to Spot Text

    Authors: Yiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou

    Abstract: Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias. Such bias causes CLIP models to `Parrot' the visual text embedded within images while disregarding the authentic visual semantics. We uncover that in the most popular image-text dataset LAION-2B, the captions also densely parrot (spell) the text embedded in images. O… ▽ More

    Submitted 1 February, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: project page: https://linyq17.github.io/CLIP-Parrot-Bias/. Add more analysis and ablation studies. Update Figure 3 with a more precise metric

  43. arXiv:2312.07571  [pdf, other

    cs.CV cs.AI cs.LG

    Investigating YOLO Models Towards Outdoor Obstacle Detection For Visually Impaired People

    Authors: Chenhao He, Pramit Saha

    Abstract: The utilization of deep learning-based object detection is an effective approach to assist visually impaired individuals in avoiding obstacles. In this paper, we implemented seven different YOLO object detection models \textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and YOLOv5 and performed comprehensive evaluation with carefully tuned hyperparameters, to analyze how these… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  44. arXiv:2312.00853  [pdf, other

    cs.CV

    Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution

    Authors: Xi Yang, Chenhang He, Jianqi Ma, Lei Zhang

    Abstract: Real-world low-resolution (LR) videos have diverse and complex degradations, imposing great challenges on video super-resolution (VSR) algorithms to reproduce their high-resolution (HR) counterparts with high quality. Recently, the diffusion models have shown compelling performance in generating realistic details for image restoration tasks. However, the diffusion process has randomness, making it… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  45. arXiv:2311.17911  [pdf, other

    cs.CV

    OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

    Authors: Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

    Abstract: Hallucination, posed as a pervasive challenge of multi-modal large language models (MLLMs), has significantly impeded their real-world usage that demands precise judgment. Existing methods mitigate this issue with either training with specific designed data or inferencing with external knowledge from other sources, incurring inevitable additional costs. In this paper, we present OPERA, a novel MLL… ▽ More

    Submitted 12 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024, code is available at https://github.com/shikiw/OPERA

  46. arXiv:2311.16839  [pdf, other

    cs.CV cs.CL

    Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

    Authors: Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He

    Abstract: Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images. This paper introduces a novel solution, Hallucination-Aware Direct Preference Optimization (HA-DPO), which… ▽ More

    Submitted 6 February, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Project Website: https://opendatalab.github.io/HA-DPO, Code: https://github.com/opendatalab/HA-DPO

  47. arXiv:2311.13196  [pdf, other

    cs.IT eess.SP stat.ME

    Optimal Time of Arrival Estimation for MIMO Backscatter Channels

    Authors: Chen He, Luyang Han, Z. Jane Wang

    Abstract: In this paper, we propose a novel time of arrival (TOA) estimator for multiple-input-multiple-output (MIMO) backscatter channels in closed form. The proposed estimator refines the estimation precision from the topological structure of the MIMO backscatter channels, and can considerably enhance the estimation accuracy. Particularly, we show that for the general $M \times N$ bistatic topology, the m… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  48. arXiv:2311.13094  [pdf, ps, other

    math.OC cs.LG

    Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian

    Authors: Chuan He, Zhaosong Lu

    Abstract: In this paper we consider a nonconvex unconstrained optimization problem minimizing a twice differentiable objective function with Hölder continuous Hessian. Specifically, we first propose a Newton-conjugate gradient (Newton-CG) method for finding an approximate first-order stationary point (FOSP) of this problem, assuming the associated the Hölder parameters are explicitly known. Then we develop… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.03139

  49. Toward parallel intelligence: an interdisciplinary solution for complex systems

    Authors: Yong Zhao, Zhengqiu Zhu, Bin Chen, Sihang Qiu, Jincai Huang, Xin Lu, Weiyi Yang, Chuan Ai, Kuihua Huang, Cheng He, Yucheng Jin, Zhong Liu, Fei-Yue Wang

    Abstract: The growing complexity of real-world systems necessitates interdisciplinary solutions to confront myriad challenges in modeling, analysis, management, and control. To meet these demands, the parallel systems method rooted in Artificial systems, Computational experiments, and Parallel execution (ACP) approach has been developed. The method cultivates a cycle, termed parallel intelligence, which ite… ▽ More

    Submitted 25 March, 2024; v1 submitted 5 October, 2023; originally announced November 2023.

    Comments: 41 pages, 6 figures. The Innovation (2023)

  50. arXiv:2311.12793  [pdf, other

    cs.CV

    ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

    Authors: Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin

    Abstract: In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data. To address this bottleneck, we introduce the ShareGPT4V dataset, a pioneering large-scale resource featuring 1.2 million highly descriptive captions, which surpasses existing datasets in diversity and information content, covering world kno… ▽ More

    Submitted 28 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Project: https://ShareGPT4V.github.io