Skip to main content

Showing 1–13 of 13 results for author: Cui, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  2. arXiv:2403.01079  [pdf, other

    cs.LG cs.AI

    Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework

    Authors: Junxian Li, Bin Shi, Erfei Cui, Hua Wei, Qinghua Zheng

    Abstract: We study the challenging problem for inference tasks on large-scale graph datasets of Graph Neural Networks: huge time and memory consumption, and try to overcome it by reducing reliance on graph structure. Even though distilling graph knowledge to student MLP is an excellent idea, it faces two major problems of positional information loss and low generalization. To solve the problems, we propose… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 20 pages, with Appendix

  3. arXiv:2310.17796  [pdf, other

    cs.CV cs.MM

    ControlLLM: Augment Language Models with Tools by Searching on Graphs

    Authors: Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang

    Abstract: We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises… ▽ More

    Submitted 18 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: 24 pages, 9 figures, 12 tables

  4. arXiv:2310.07801  [pdf, other

    cs.CV cs.AI stat.ME

    Trajectory-aware Principal Manifold Framework for Data Augmentation and Image Generation

    Authors: Elvis Han Cui, Bingbin Li, Yanan Li, Weng Kee Wong, Donghui Wang

    Abstract: Data augmentation for deep learning benefits model training, image transformation, medical imaging analysis and many other fields. Many existing methods generate new samples from a parametric distribution, like the Gaussian, with little attention to generate samples along the data manifold in either the input or feature space. In this paper, we verify that there are theoretical and practical advan… ▽ More

    Submitted 30 July, 2023; originally announced October 2023.

    Comments: 20 figures

  5. arXiv:2308.10875   

    cs.NE cs.AI cs.LG

    Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries

    Authors: Elvis Han Cui, Zizhao Zhang, Culsome Junwen Chen, Weng Kee Wong

    Abstract: Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its c… ▽ More

    Submitted 16 October, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Revision, unpublished manuscript

  6. arXiv:2211.07351  [pdf, other

    stat.ME cs.AI math.ST stat.AP

    A Roadmap to Asymptotic Properties with Applications to COVID-19 Data

    Authors: Elvis Han Cui

    Abstract: Asymptotic properties of statistical estimators play a significant role both in practice and in theory. However, many asymptotic results in statistics rely heavily on the independent and identically distributed (iid) assumption, which is not realistic when we have fixed designs. In this article, we build a roadmap of general procedures for deriving asymptotic properties under fixed designs and the… ▽ More

    Submitted 6 October, 2022; originally announced November 2022.

    Comments: 10 pages

  7. arXiv:2112.12359  [pdf, other

    cs.CV

    Dual Path Structural Contrastive Embeddings for Learning Novel Objects

    Authors: Bingbin Li, Elvis Han Cui, Yanan Li, Donghui Wang, Weng Kee Wong

    Abstract: Learning novel classes from a very few labeled samples has attracted increasing attention in machine learning areas. Recent research on either meta-learning based or transfer-learning based paradigm demonstrates that gaining information on a good feature space can be an effective solution to achieve favorable performance on few-shot tasks. In this paper, we propose a simple but effective paradigm… ▽ More

    Submitted 4 January, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

  8. arXiv:2106.09889  [pdf, other

    cs.CL cs.CV cs.MM

    GEM: A General Evaluation Benchmark for Multimodal Tasks

    Authors: Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti

    Abstract: In this paper, we present GEM as a General Evaluation benchmark for Multimodal tasks. Different from existing datasets such as GLUE, SuperGLUE, XGLUE and XTREME that mainly focus on natural language tasks, GEM is a large-scale vision-language benchmark, which consists of GEM-I for image-language tasks and GEM-V for video-language tasks. Comparing with existing multimodal datasets such as MSCOCO an… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted by Findings of ACL 2021

  9. arXiv:2104.10041  [pdf, other

    cs.NE cs.AI stat.AP stat.CO

    Particle swarm optimization in constrained maximum likelihood estimation a case study

    Authors: Elvis Cui, Dongyuan Song, Weng Kee Wong

    Abstract: The aim of paper is to apply two types of particle swarm optimization, global best andlocal best PSO to a constrained maximum likelihood estimation problem in pseudotime anal-ysis, a sub-field in bioinformatics. The results have shown that particle swarm optimizationis extremely useful and efficient when the optimization problem is non-differentiable and non-convex so that analytical solution can… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: 11 pages, 7 figures

  10. arXiv:2006.02635  [pdf, other

    cs.CL cs.CV

    M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

    Authors: Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Jianfeng Gao, Dongdong Zhang, Nan Duan

    Abstract: We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training. Our goal is to learn universal representations that can map objects occurred in different modalities or texts expressed in different languages into a common semantic space. In addition, to explicitly encourage… ▽ More

    Submitted 31 March, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: Accepted to CVPR 2021

  11. arXiv:2004.01401  [pdf, ps, other

    cs.CL

    XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

    Authors: Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

    Abstract: In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it pr… ▽ More

    Submitted 22 May, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

  12. arXiv:2003.01473  [pdf, ps, other

    cs.CL cs.CV cs.LG

    XGPT: Cross-modal Generative Pre-Training for Image Captioning

    Authors: Qiaolin Xia, Haoyang Huang, Nan Duan, Dongdong Zhang, Lei Ji, Zhifang Sui, Edward Cui, Taroon Bharti, Xin Liu, Ming Zhou

    Abstract: While many BERT-based cross-modal pre-trained models produce excellent results on downstream understanding tasks like image-text retrieval and VQA, they cannot be applied to generation tasks directly. In this paper, we propose XGPT, a new method of Cross-modal Generative Pre-Training for Image Captioning that is designed to pre-train text-to-image caption generators through three novel generation… ▽ More

    Submitted 4 March, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: 12 pages, 3 figures, 7 tables

  13. arXiv:2001.07966  [pdf, other

    cs.CV

    ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

    Authors: Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti

    Abstract: In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding. Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them. The model is pre-trained on four tasks simultaneously: Masked Language Modeling (MLM), Masked Object Classification (MOC), Masked Region Feature Regression (MRF… ▽ More

    Submitted 23 January, 2020; v1 submitted 22 January, 2020; originally announced January 2020.