Skip to main content

Showing 1–50 of 519 results for author: Wei, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05808  [pdf, other

    cs.CV

    Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes

    Authors: Ruihao Gong, Yang Yong, Zining Wang, Jinyang Guo, Xiuying Wei, Yuqing Ma, Xianglong Liu

    Abstract: Neural network sparsity has attracted many research interests due to its similarity to biological schemes and high energy efficiency. However, existing methods depend on long-time training or fine-tuning, which prevents large-scale applications. Recently, some works focusing on post-training sparsity (PTS) have emerged. They get rid of the high training cost but usually suffer from distinct accura… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05538  [pdf, other

    cs.CV

    A Survey on Personalized Content Synthesis with Diffusion Models

    Authors: Xulu Zhang, Xiao-Yong Wei, Wengyu Zhang, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li

    Abstract: Recent advancements in generative models have significantly impacted content creation, leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user-provided examples, PCS aims to customize the subject of interest to specific user-defined prompts. Over the past two years, more than 150 methods have been proposed. However, existing surveys mainly focus on text-to-image… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.04765  [pdf, other

    cs.LG cs.AI cs.DC

    When Foresight Pruning Meets Zeroth-Order Optimization: Efficient Federated Learning for Low-Memory Devices

    Authors: Pengyu Zhang, Yingjie Liu, Yingbo Zhou, Xiao Du, Xian Wei, Ting Wang, Mingsong Chen

    Abstract: Although Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design, it fails to work on low-memory AIoT devices due to its heavy memory usage. To address this problem, various federated pruning methods are proposed to reduce memory usage during inference. However, few of them can substantially mitigate the memory burdens during pruning and training.… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2405.03267  [pdf, other

    cs.DC cs.DB cs.IR

    Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

    Authors: Rongxin Cheng, Yifan Peng, Xingda Wei, Hongrui Xie, Rong Chen, Sijie Shen, Haibo Chen

    Abstract: Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by 5.7$\times$ and 1.7$\times$, these indexes… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2404.17466  [pdf, other

    physics.comp-ph cs.LG physics.plasm-ph

    FTL: Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks

    Authors: Zhe Bai, Xishuo Wei, William Tang, Leonid Oliker, Zhihong Lin, Samuel Williams

    Abstract: Deep learning algorithms provide a new paradigm to study high-dimensional dynamical behaviors, such as those in fusion plasma systems. Development of novel model reduction methods, coupled with detection of abnormal modes with plasma physics, opens a unique opportunity for building efficient models to identify plasma instabilities for real-time control. Our Fusion Transfer Learning (FTL) model dem… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 18 pages, 10 figures

    MSC Class: 76W05; 68T45 ACM Class: J.2; I.2.10

  6. arXiv:2404.17360  [pdf, other

    cs.CV

    UniRGB-IR: A Unified Framework for Visible-Infrared Downstream Tasks via Adapter Tuning

    Authors: Maoxun Yuan, Bo Cui, Tianyi Zhao, Xingxing Wei

    Abstract: Semantic analysis on visible (RGB) and infrared (IR) images has gained attention for its ability to be more accurate and robust under low-illumination and complex weather conditions. Due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  7. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  8. arXiv:2404.15380  [pdf, other

    cs.LG cs.AI

    ControlTraj: Controllable Trajectory Generation with Topology-Constrained Diffusion Model

    Authors: Yuanshao Zhu, James Jianqiao Yu, Xiangyu Zhao, Qidong Liu, Yongchao Ye, Wei Chen, Zijian Zhang, Xuetao Wei, Yuxuan Liang

    Abstract: Generating trajectory data is among promising solutions to addressing privacy concerns, collection costs, and proprietary restrictions usually associated with human mobility analyses. However, existing trajectory generation methods are still in their infancy due to the inherent diversity and unpredictability of human activities, grappling with issues such as fidelity, flexibility, and generalizabi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  9. arXiv:2404.15081  [pdf, other

    cs.CV cs.CR cs.LG

    Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

    Authors: Jingyao Xu, Yuetong Lu, Yandong Li, Siyang Lu, Dongdong Wang, Xiang Wei

    Abstract: Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and effi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Published at CVPR 2024

  10. arXiv:2404.12385  [pdf, other

    cs.CV cs.GR

    MeshLRM: Large Reconstruction Model for High-Quality Mesh

    Authors: Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, Zexiang Xu

    Abstract: We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a p… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  11. arXiv:2404.12139  [pdf, other

    cs.CV

    Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

    Authors: Shouwei Ruan, Yinpeng Dong, Hanqing Liu, Yao Huang, Hang Su, Xingxing Wei

    Abstract: Vision-Language Pre-training (VLP) models like CLIP have achieved remarkable success in computer vision and particularly demonstrated superior robustness to distribution shifts of 2D images. However, their robustness under 3D viewpoint variations is still limited, which can hinder the development for real-world applications. This paper successfully addresses this concern while keeping VLPs' origin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 20 pages

  12. arXiv:2404.09392  [pdf, ps, other

    cs.IT cs.LG cs.NI eess.SP

    An Autoencoder-Based Constellation Design for AirComp in Wireless Federated Learning

    Authors: Yujia Mu, Xizixiang Wei, Cong Shen

    Abstract: Wireless federated learning (FL) relies on efficient uplink communications to aggregate model updates across distributed edge devices. Over-the-air computation (a.k.a. AirComp) has emerged as a promising approach for addressing the scalability challenge of FL over wireless links with limited communication resources. Unlike conventional methods, AirComp allows multiple edge devices to transmit upli… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  13. arXiv:2404.08408  [pdf, other

    cs.LG cs.AI eess.SP physics.geo-ph

    Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning

    Authors: Hongtao Wang, Li Long, Jiangshe Zhang, Xiaoli Wei, Chunxia Zhang, Zhenbo Guo

    Abstract: Contemporary automatic first break (FB) picking methods typically analyze 1D signals, 2D source gathers, or 3D source-receiver gathers. Utilizing higher-dimensional data, such as 2D or 3D, incorporates global features, improving the stability of local picking. Despite the benefits, high-dimensional data requires structured input and increases computational demands. Addressing this, we propose a no… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  14. arXiv:2404.06119  [pdf, other

    cs.CV

    DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

    Authors: Junkai Yan, Yipeng Gao, Qize Yang, Xihan Wei, Xuansong Xie, Ancong Wu, Wei-Shi Zheng

    Abstract: Text-to-3D generation, which synthesizes 3D assets according to an overall text description, has significantly progressed. However, a challenge arises when the specific appearances need customizing at designated viewpoints but referring solely to the overall description for generating 3D objects. For instance, ambiguity easily occurs when producing a T-shirt with distinct patterns on its front and… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  15. arXiv:2404.05446  [pdf, other

    cs.CL

    XL$^2$Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies

    Authors: Xuanfan Ni, Hengyi Cai, Xiaochi Wei, Shuaiqiang Wang, Dawei Yin, Piji Li

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks but are constrained by their small context window sizes. Various efforts have been proposed to expand the context window to accommodate even up to 200K input tokens. Meanwhile, building high-quality benchmarks with much longer text lengths and more demanding tasks to provide comprehensive evaluations is of i… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Work in progress

  16. arXiv:2404.01179  [pdf, other

    cs.CV cs.LG

    BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning

    Authors: Hongwei Zheng, Linyuan Zhou, Han Li, Jinming Su, Xiaoming Wei, Xiaoming Xu

    Abstract: Data mixing methods play a crucial role in semi-supervised learning (SSL), but their application is unexplored in long-tailed semi-supervised learning (LTSSL). The primary reason is that the in-batch mixing manner fails to address class imbalance. Furthermore, existing LTSSL methods mainly focus on re-balancing data quantity but ignore class-wise uncertainty, which is also vital for class balance.… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: This paper is accepted to CVPR 2024. The supplementary material is included

  17. arXiv:2404.00544  [pdf, other

    cs.CV cs.AI

    Deep Extrinsic Manifold Representation for Vision Tasks

    Authors: Tongtong Zhang, Xian Wei, Yuanxiang Li

    Abstract: Non-Euclidean data is frequently encountered across different fields, yet there is limited literature that addresses the fundamental challenge of training neural networks with manifold representations as outputs. We introduce the trick named Deep Extrinsic Manifold Representation (DEMR) for visual tasks in this context. DEMR incorporates extrinsic manifold embedding into deep neural networks, whic… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  18. arXiv:2403.20271  [pdf, other

    cs.CV

    Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

    Authors: Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li

    Abstract: The interaction between humans and artificial intelligence (AI) is a crucial factor that reflects the effectiveness of multimodal large language models (MLLMs). However, current MLLMs primarily focus on image-level comprehension and limit interaction to textual instructions, thereby constraining their flexibility in usage and depth of response. In this paper, we introduce the Draw-and-Understand p… ▽ More

    Submitted 31 March, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: 16 pages, 7 figures

  19. How to Cache Important Contents for Multi-modal Service in Dynamic Networks: A DRL-based Caching Scheme

    Authors: Zhe Zhang, Marc St-Hilaire, Xin Wei, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: With the continuous evolution of networking technologies, multi-modal services that involve video, audio, and haptic contents are expected to become the dominant multimedia service in the near future. Edge caching is a key technology that can significantly reduce network load and content transmission latency, which is critical for the delivery of multi-modal contents. However, existing caching app… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Journal ref: IEEE Transactions on Multimedia (Early Access), 2024

  20. arXiv:2403.18201  [pdf, other

    cs.CV

    Few-shot Online Anomaly Detection and Segmentation

    Authors: Shenxing Wei, Xing Wei, Zhiheng Ma, Songlin Dong, Shaochen Zhang, Yihong Gong

    Abstract: Detecting anomaly patterns from images is a crucial artificial intelligence technique in industrial applications. Recent research in this domain has emphasized the necessity of a large volume of training data, overlooking the practical scenario where, post-deployment of the model, unlabeled data containing both normal and abnormal samples can be utilized to enhance the model's performance. Consequ… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  21. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  22. arXiv:2403.16886  [pdf, other

    cs.IT eess.SP

    Movable-Antenna Position Optimization: A Graph-based Approach

    Authors: Weidong Mei, Xin Wei, Boyu Ning, Zhi Chen, Rui Zhang

    Abstract: Fluid antennas (FAs) and movable antennas (MAs) have emerged as promising technologies in wireless communications, which offer the flexibility to improve channel conditions by adjusting transmit/receive antenna positions within a spatial region. In this letter, we focus on an MA-enhanced multiple-input single-output (MISO) communication system, aiming to optimize the positions of multiple transmit… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 5 pages, 6 figures. We propose a graph-based algorithm that is able to optimally solve the fluid-/movable-antenna position optimization problem in polynomial time

  23. arXiv:2403.16395  [pdf, other

    cs.CV

    Multi-attention Associate Prediction Network for Visual Tracking

    Authors: Xinglong Sun, Haijiang Sun, Shan Jiang, Jiacheng Wang, Xilai Wei, Zhonghe Hu

    Abstract: Classification-regression prediction networks have realized impressive success in several modern deep trackers. However, there is an inherent difference between classification and regression tasks, so they have diverse even opposite demands for feature matching. Existed models always ignore the key issue and only employ a unified matching block in two task branches, decaying the decision quality.… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  24. arXiv:2403.14987  [pdf, other

    cs.CV

    Generative Active Learning for Image Synthesis Personalization

    Authors: Xulu Zhang, Wengyu Zhang, Xiao-Yong Wei, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li

    Abstract: This paper presents a pilot study that explores the application of active learning, traditionally studied in the context of discriminative models, to generative models. We specifically focus on image synthesis personalization tasks. The primary challenge in conducting active learning on generative models lies in the open-ended nature of querying, which differs from the closed form of querying in d… ▽ More

    Submitted 16 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  25. arXiv:2403.14972  [pdf, other

    cs.AI cs.CL cs.MA cs.MM

    A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning

    Authors: Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li

    Abstract: This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the iss… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Work in progress

  26. arXiv:2403.13535  [pdf, other

    cs.CV

    IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

    Authors: Siying Cui, Jia Guo, Xiang An, Jiankang Deng, Yongle Zhao, Xinyu Wei, Ziyong Feng

    Abstract: Leveraging Stable Diffusion for the generation of personalized portraits has emerged as a powerful and noteworthy tool, enabling users to create high-fidelity, custom character avatars based on their specific prompts. However, existing personalization methods face challenges, including test-time fine-tuning, the requirement of multiple input images, low preservation of identity, and limited divers… ▽ More

    Submitted 20 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 14 pages, 15 figures

  27. arXiv:2403.11482  [pdf, other

    cs.LG physics.geo-ph

    SeisFusion: Constrained Diffusion Model with Input Guidance for 3D Seismic Data Interpolation and Reconstruction

    Authors: Shuang Wang, Fei Deng, Peifan Jiang, Zishan Gong, Xiaolin Wei, Yuqing Wang

    Abstract: Geographical, physical, or economic constraints often result in missing traces within seismic data, making the reconstruction of complete seismic data a crucial step in seismic data processing. Traditional methods for seismic data reconstruction require the selection of multiple empirical parameters and struggle to handle large-scale continuous missing data. With the development of deep learning,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  28. arXiv:2403.09969  [pdf, other

    cs.LG

    Prediction of Vessel Arrival Time to Pilotage Area Using Multi-Data Fusion and Deep Learning

    Authors: Xiaocai Zhang, Xiuju Fu, Zhe Xiao, Haiyan Xu, Xiaoyang Wei, Jimmy Koh, Daichi Ogawa, Zheng Qin

    Abstract: This paper investigates the prediction of vessels' arrival time to the pilotage area using multi-data fusion and deep learning approaches. Firstly, the vessel arrival contour is extracted based on Multivariate Kernel Density Estimation (MKDE) and clustering. Secondly, multiple data sources, including Automatic Identification System (AIS), pilotage booking information, and meteorological data, are… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: The 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

  29. arXiv:2403.09560  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    Self-Consistency Training for Hamiltonian Prediction

    Authors: He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng, Bin Shao, Tie-Yan Liu

    Abstract: Hamiltonian prediction is a versatile formulation to leverage machine learning for solving molecular science problems. Yet, its applicability is limited by insufficient labeled data for training. In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose an exact training method that does not require labeled data. This merit addresses t… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  30. arXiv:2403.07257  [pdf, other

    cs.AR cs.ET

    The Dawn of AI-Native EDA: Opportunities and Challenges of Large Circuit Models

    Authors: Lei Chen, Yiqi Chen, Zhufei Chu, Wenji Fang, Tsung-Yi Ho, Ru Huang, Yu Huang, Sadaf Khan, Min Li, Xingquan Li, Yu Li, Yun Liang, Jinwei Liu, Yi Liu, Yibo Lin, Guojie Luo, Zhengyuan Shi, Guangyu Sun, Dimitrios Tsaras, Runsheng Wang, Ziyi Wang, Xinming Wei, Zhiyao Xie, Qiang Xu, Chenhao Xue , et al. (14 additional authors not shown)

    Abstract: Within the Electronic Design Automation (EDA) domain, AI-driven solutions have emerged as formidable tools, yet they typically augment rather than redefine existing methodologies. These solutions often repurpose deep learning models from other domains, such as vision, text, and graph analytics, applying them to circuit design without tailoring to the unique complexities of electronic circuits. Suc… ▽ More

    Submitted 1 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: The authors are ordered alphabetically. Contact: qxu@cse[dot]cuhk[dot]edu[dot]hk, gluo@pku[dot]edu[dot]cn, yuan.mingxuan@huawei[dot]com

  31. arXiv:2403.06670  [pdf, other

    cs.CV cs.AI

    CEAT: Continual Expansion and Absorption Transformer for Non-Exemplar Class-Incremental Learning

    Authors: Xinyuan Gao, Songlin Dong, Yuhang He, Xing Wei, Yihong Gong

    Abstract: In real-world applications, dynamic scenarios require the models to possess the capability to learn new tasks continuously without forgetting the old knowledge. Experience-Replay methods store a subset of the old images for joint training. In the scenario of more strict privacy protection, storing the old images becomes infeasible, which leads to a more severe plasticity-stability dilemma and clas… ▽ More

    Submitted 11 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  32. arXiv:2403.06574  [pdf, other

    cs.CL

    AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models

    Authors: Yuting Wei, Yuanxing Xu, Xinru Wei, Simin Yang, Yangfu Zhu, Yuqing Li, Di Liu, Bin Wu

    Abstract: Given the importance of ancient Chinese in capturing the essence of rich historical and cultural heritage, the rapid advancements in Large Language Models (LLMs) necessitate benchmarks that can effectively evaluate their understanding of ancient contexts. To meet this need, we present AC-EVAL, an innovative benchmark designed to assess the advanced knowledge and reasoning capabilities of LLMs with… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  33. LightSword: A Customized Virtual Reality Exergame for Long-Term Cognitive Inhibition Training in Older Adults

    Authors: Qiuxin Du, Zhen Song, Haiyan Jiang, Xiaoying Wei, Dongdong Weng, Mingming Fan

    Abstract: The decline of cognitive inhibition significantly impacts older adults' quality of life and well-being, making it a vital public health problem in today's aging society. Previous research has demonstrated that Virtual reality (VR) exergames have great potential to enhance cognitive inhibition among older adults. However, existing commercial VR exergames were unsuitable for older adults' long-term… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 23 pages

    Journal ref: Proceedings of the CHI Conference on Human Factors in Computing Systems 2024 (CHI '24)

  34. arXiv:2403.03215  [pdf, other

    cs.RO

    A Safety-Critical Framework for UGVs in Complex Environments: A Data-Driven Discrepancy-Aware Approach

    Authors: Skylar X. Wei, Lu Gan, Joel W. Burdick

    Abstract: This work presents a novel data-driven multi-layered planning and control framework for the safe navigation of a class of unmanned ground vehicles (UGVs) in the presence of unknown stationary obstacles and additive modeling uncertainties. The foundation of this framework is a novel robust model predictive planner, designed to generate optimal collision-free trajectories given an occupancy grid map… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  35. arXiv:2403.00303  [pdf, other

    cs.CV

    ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

    Authors: Chen Duan, Pei Fu, Shan Guo, Qianyi Jiang, Xiaoming Wei

    Abstract: In recent years, text-image joint pre-training techniques have shown promising results in various tasks. However, in Optical Character Recognition (OCR) tasks, aligning text instances with their corresponding text regions in images poses a challenge, as it requires effective alignment between text and OCR-Text (referring to the text in images as OCR-Text to distinguish from the text in natural lan… ▽ More

    Submitted 17 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  36. arXiv:2402.12216  [pdf, other

    cs.CY cs.AI

    Copyleft for Alleviating AIGC Copyright Dilemma: What-if Analysis, Public Perception and Implications

    Authors: Xinwei Guo, Yujun Li, Yafeng Peng, Xuetao Wei

    Abstract: As AIGC has impacted our society profoundly in the past years, ethical issues have received tremendous attention. The most urgent one is the AIGC copyright dilemma, which can immensely stifle the development of AIGC and greatly cost the entire society. Given the complexity of AIGC copyright governance and the fact that no perfect solution currently exists, previous work advocated copyleft on AI go… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 9 pages, 8 figures

  37. arXiv:2402.07233  [pdf, other

    cs.CL cs.AI

    TransGPT: Multi-modal Generative Pre-trained Transformer for Transportation

    Authors: Peng Wang, Xiang Wei, Fangxu Hu, Wenjuan Han

    Abstract: Natural language processing (NLP) is a key component of intelligent transportation systems (ITS), but it faces many challenges in the transportation domain, such as domain-specific knowledge and data, and multi-modal inputs and outputs. This paper presents TransGPT, a novel (multi-modal) large language model for the transportation domain, which consists of two independent variants: TransGPT-SM for… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  38. arXiv:2402.06967  [pdf, other

    cs.CL cs.AI

    Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue

    Authors: Jian Wang, Chak Tou Leong, Jiashuo Wang, Dongding Lin, Wenjie Li, Xiao-Yong Wei

    Abstract: Tuning pretrained language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner leads to unsatisfactory chat consis… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: Work in progress

  39. arXiv:2402.06136  [pdf, other

    cs.CV

    SIR: Multi-view Inverse Rendering with Decomposable Shadow for Indoor Scenes

    Authors: Xiaokang Wei, Zhuoman Liu, Yan Luximon

    Abstract: We propose SIR, an efficient method to decompose differentiable shadows for inverse rendering on indoor scenes using multi-view data, addressing the challenges in accurately decomposing the materials and lighting conditions. Unlike previous methods that struggle with shadow fidelity in complex lighting environments, our approach explicitly learns shadows for enhanced realism in material estimation… ▽ More

    Submitted 8 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  40. arXiv:2402.04991  [pdf, other

    cs.HC

    Exploring the Opportunity of Augmented Reality (AR) in Supporting Older Adults Explore and Learn Smartphone Applications

    Authors: Xiaofu Jin, Wai Tong, Xiaoying Wei, Xian Wang, Emily Kuang, Xiaoyu Mo, Huamin Qu, Mingming Fan

    Abstract: The global aging trend compels older adults to navigate the evolving digital landscape, presenting a substantial challenge in mastering smartphone applications. While Augmented Reality (AR) holds promise for enhancing learning and user experience, its role in aiding older adults' smartphone app exploration remains insufficiently explored. Therefore, we conducted a two-phase study: (1) a workshop w… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  41. arXiv:2402.03379  [pdf, other

    cs.IR cs.AI cs.LG

    Entire Chain Uplift Modeling with Context-Enhanced Learning for Intelligent Marketing

    Authors: Yinqiu Huang, Shuli Wang, Min Gao, Xue Wei, Changhao Li, Chuan Luo, Yinhua Zhu, Xiong Xiao, Yi Luo

    Abstract: Uplift modeling, vital in online marketing, seeks to accurately measure the impact of various strategies, such as coupons or discounts, on different users by predicting the Individual Treatment Effect (ITE). In an e-commerce setting, user behavior follows a defined sequential chain, including impression, click, and conversion. Marketing strategies exert varied uplift effects at each stage within t… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted by WWW2024

  42. arXiv:2401.16420  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

    Authors: Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension. This model goes beyond conventional vision-language understanding, adeptly crafting interleaved text-image content from diverse inputs like outlines, detailed textual specifications, and reference images, enabling highly customizable content creation. InternLM-XCo… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Code and models are available at https://github.com/InternLM/InternLM-XComposer

  43. arXiv:2401.13354  [pdf, other

    cs.OS cs.NI

    Characterizing Network Requirements for GPU API Remoting in AI Applications

    Authors: Tianxia Wang, Zhuofu Chen, Xingda Wei, Jinyu Gu, Rong Chen, Haibo Chen

    Abstract: GPU remoting is a promising technique for supporting AI applications. Networking plays a key role in enabling remoting. However, for efficient remoting, the network requirements in terms of latency and bandwidth are unknown. In this paper, we take a GPU-centric approach to derive the minimum latency and bandwidth requirements for GPU remoting, while ensuring no (or little) performance degradation… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  44. arXiv:2401.10731  [pdf, other

    cs.CV

    Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion

    Authors: Tianyi Zhao, Maoxun Yuan, Feng Jiang, Nan Wang, Xingxing Wei

    Abstract: Object detection in visible (RGB) and infrared (IR) images has been widely applied in recent years. Leveraging the complementary characteristics of RGB and IR images, the object detector provides reliable and robust object localization from day to night. Most existing fusion strategies directly input RGB and IR images into deep neural networks, leading to inferior detection performance. However, t… ▽ More

    Submitted 7 May, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: 11pages, 11figures

  45. arXiv:2401.04408  [pdf, other

    cs.IR cs.LG

    Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems

    Authors: Qinyi Luo, Penghan Wang, Wei Zhang, Fan Lai, Jiachen Mao, Xiaohan Wei, Jun Song, Wei-Yu Tsai, Shuai Yang, Yuxi Hu, Xuehai Qian

    Abstract: Huge embedding tables in modern Deep Learning Recommender Models (DLRM) require prohibitively large memory during training and inference. Aiming to reduce the memory footprint of training, this paper proposes FIne-grained In-Training Embedding Dimension optimization (FIITED). Given the observation that embedding vectors are not equally important, FIITED adjusts the dimension of each individual emb… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 16 pages, 9 figures

    ACM Class: I.2.6; H.3.3

  46. arXiv:2401.01208  [pdf, other

    cs.CV

    FGENet: Fine-Grained Extraction Network for Congested Crowd Counting

    Authors: Hao-Yuan Ma, Li Zhang, Xiang-Yi Wei

    Abstract: Crowd counting has gained significant popularity due to its practical applications. However, mainstream counting methods ignore precise individual localization and suffer from annotation noise because of counting from estimating density maps. Additionally, they also struggle with high-density images.To address these issues, we propose an end-to-end model called Fine-Grained Extraction Network (FGE… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Accepted by 30th International Conference on MultiMedia Modeling

  47. arXiv:2312.17133  [pdf, other

    cs.CV

    ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

    Authors: Yifan Bai, Zeyang Zhao, Yihong Gong, Xing Wei

    Abstract: We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive… ▽ More

    Submitted 13 February, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  48. arXiv:2312.16886  [pdf, other

    cs.CV

    MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

    Authors: Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen

    Abstract: We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices. It is an amalgamation of a myriad of architectural designs and techniques that are mobile-oriented, which comprises a set of language models at the scale of 1.4B and 2.7B parameters, trained from scratch, a multimodal vision model that is pre-trained in the CLIP fashion, cross-modality int… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Tech Report

  49. arXiv:2312.16279  [pdf, other

    cs.CV

    Cloud-Device Collaborative Learning for Multimodal Large Language Models

    Authors: Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang

    Abstract: The burgeoning field of Multimodal Large Language Models (MLLMs) has exhibited remarkable performance in diverse tasks such as captioning, commonsense reasoning, and visual scene understanding. However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compre… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  50. arXiv:2312.16036  [pdf, other

    cs.LG

    Ensemble Learning to Assess Dynamics of Affective Experience Ratings and Physiological Change

    Authors: Felix Dollack, Kiyoshi Kiyokawa, Huakun Liu, Monica Perusquia-Hernandez, Chirag Raman, Hideaki Uchiyama, Xin Wei

    Abstract: The congruence between affective experiences and physiological changes has been a debated topic for centuries. Recent technological advances in measurement and data analysis provide hope to solve this epic challenge. Open science and open data practices, together with data analysis challenges open to the academic community, are also promising tools for solving this problem. In this entry to the Em… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: This manuscript is to be published in the 2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) proceedings