Skip to main content

Showing 1–50 of 1,096 results for author: Lin, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05945  [pdf, other

    cs.CV

    Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

    Authors: Peng Gao, Le Zhuo, Ziyi Lin, Chris Liu, Junsong Chen, Ruoyi Du, Enze Xie, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

    Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  2. arXiv:2405.05803  [pdf, other

    cs.CV cs.AI

    Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

    Authors: Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji

    Abstract: Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention s… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.05252  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

    Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

    Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  4. arXiv:2405.04342  [pdf, other

    cs.LG

    The Curse of Diversity in Ensemble-Based Exploration

    Authors: Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin, Aaron Courville

    Abstract: We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated d… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  5. arXiv:2405.04332  [pdf, other

    cs.CR

    WALLETRADAR: Towards Automating the Detection of Vulnerabilities in Browser-based Cryptocurrency Wallets

    Authors: Pengcheng Xia, Yanhui Guo, Zhaowen Lin, Jun Wu, Pengbo Duan, Ningyu He, Kailong Wang, Tianming Liu, Yinliang Yue, Guoai Xu, Haoyu Wang

    Abstract: Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive se… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Just accepted by the Automated Software Engineering Journal

  6. arXiv:2405.04086  [pdf, other

    cs.CL

    Optimizing Language Model's Reasoning Abilities with Weak Supervision

    Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, Jingbo Shang

    Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  7. arXiv:2405.03990  [pdf, other

    cs.NI cs.AI

    TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

    Authors: Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang

    Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures. This paper has been accepted by ICDCS 2024. arXiv admin note: substantial text overlap with arXiv:2404.14204

  8. arXiv:2405.03613  [pdf, other

    cs.CV

    Dual Relation Mining Network for Zero-Shot Learning

    Authors: Jinwei Han, Yingguo Gao, Zhiwen Lin, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

    Abstract: Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, w… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  9. arXiv:2405.01851  [pdf, other

    cs.LG cs.AI

    Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

    Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu

    Abstract: There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been e… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  10. arXiv:2405.00954  [pdf, other

    cs.CV

    X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

    Authors: Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplif… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: ICML2024

  11. arXiv:2405.00700  [pdf

    cs.NE cond-mat.str-el

    Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

    Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

    Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More

    Submitted 16 April, 2024; originally announced May 2024.

    Comments: 18 pages,4 figures

  12. arXiv:2404.19209  [pdf, other

    cs.DC

    AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices

    Authors: Zheng Lin, Bin Guo, Sicong Liu, Wentao Zhou, Yasan Ding, Yu Zhang, Zhiwen Yu

    Abstract: Deep neural network (DNN) has driven extensive applications in mobile technology. However, for long-running mobile apps like voice assistants or video applications on smartphones, energy efficiency is critical for battery-powered devices. The rise of heterogeneous processors in mobile devices today has introduced new challenges for optimizing energy efficiency. Our key insight is that partitioning… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  13. arXiv:2404.18533  [pdf, other

    cs.AI cs.HC

    Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

    Authors: Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang

    Abstract: Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic a… ▽ More

    Submitted 29 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  14. arXiv:2404.17808  [pdf, other

    cs.CL

    Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

    Authors: Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Peng Liu, Hui Chen, Guiguang Ding

    Abstract: Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while keeping all tokens that have be… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  15. arXiv:2404.17785  [pdf, other

    cs.CL

    Temporal Scaling Law for Large Language Models

    Authors: Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Jianwei Niu, Guiguang Ding

    Abstract: Recently, Large Language Models (LLMs) are widely adopted in a wide range of tasks, leading to increasing attention towards the research on how scaling LLMs affects their performance. Existing works, termed as Scaling Laws, have discovered that the loss of LLMs scales as power laws with model size, computational budget, and dataset size. However, the performance of LLMs throughout the training pro… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Work in progress

  16. arXiv:2404.17466  [pdf, other

    physics.comp-ph cs.LG physics.plasm-ph

    FTL: Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks

    Authors: Zhe Bai, Xishuo Wei, William Tang, Leonid Oliker, Zhihong Lin, Samuel Williams

    Abstract: Deep learning algorithms provide a new paradigm to study high-dimensional dynamical behaviors, such as those in fusion plasma systems. Development of novel model reduction methods, coupled with detection of abnormal modes with plasma physics, opens a unique opportunity for building efficient models to identify plasma instabilities for real-time control. Our Fusion Transfer Learning (FTL) model dem… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 18 pages, 10 figures

    MSC Class: 76W05; 68T45 ACM Class: J.2; I.2.10

  17. arXiv:2404.16994  [pdf, other

    cs.CV

    PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

    Authors: Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng

    Abstract: Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders the progress of video-language models. This paper investigates a straight-forward, highly efficient, and resource-light approach to adapting an existi… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  18. arXiv:2404.16811  [pdf, other

    cs.CL cs.AI

    Make Your LLM Fully Utilize the Context

    Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou

    Abstract: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on t… ▽ More

    Submitted 26 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures, 3 tables, 9 examples

  19. arXiv:2404.15141  [pdf, other

    cs.CV cs.AI

    CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

    Authors: Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji

    Abstract: Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapol… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  20. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg , et al. (62 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 12 pages

  21. arXiv:2404.14204  [pdf, other

    cs.NI

    TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading

    Authors: Guanqiao Qu, Zheng Lin, Qian Chen, Jian Li, Fangming Liu, Xianhao Chen, Kaibin Huang

    Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 15 pages, 11 figures. Part of this work has been accepted by ICDCS 2024

  22. arXiv:2404.12674  [pdf, other

    cs.DC cs.LG cs.PF

    Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

    Authors: Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, John D. Owens

    Abstract: Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance i… ▽ More

    Submitted 27 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: 12 pages, 11 figures, 4 tables

  23. arXiv:2404.11677  [pdf, other

    cs.AI

    Cross-Problem Learning for Solving Vehicle Routing Problems

    Authors: Zhuoyi Lin, Yaoxin Wu, Bangjian Zhou, Zhiguang Cao, Wen Song, Yingqian Zhang, Senthilnath Jayavelu

    Abstract: Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transform… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  24. arXiv:2404.10718  [pdf, other

    cs.CV

    GazeHTA: End-to-end Gaze Target Detection with Head-Target Association

    Authors: Zhi-Yi Lin, Jouh Yeong Chew, Jan van Gemert, Xucong Zhang

    Abstract: We propose an end-to-end approach for gaze target detection: predicting a head-target connection between individuals and the target image regions they are looking at. Most of the existing methods use independent components such as off-the-shelf head detectors or have problems in establishing associations between heads and gaze targets. In contrast, we investigate an end-to-end multi-person Gaze ta… ▽ More

    Submitted 18 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  25. arXiv:2404.10444  [pdf, other

    math.ST cs.LG stat.ML

    Semi-supervised Fréchet Regression

    Authors: Rui Qiu, Zhou Yu, Zhenhua Lin

    Abstract: This paper explores the field of semi-supervised Fréchet regression, driven by the significant costs associated with obtaining non-Euclidean labels. Methodologically, we propose two novel methods: semi-supervised NW Fréchet regression and semi-supervised kNN Fréchet regression, both based on graph distance acquired from all feature instances. These methods extend the scope of existing semi-supervi… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  26. arXiv:2404.09833  [pdf, other

    cs.CV cs.AI

    Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

    Authors: Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang

    Abstract: Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF)… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page (with code): https://video2game.github.io/

  27. arXiv:2404.09730  [pdf, other

    cs.LG math.CA math.NA

    Convergence Analysis of Probability Flow ODE for Score-based Generative Models

    Authors: Daniel Zhengyu Huang, Jiaoyang Huang, Zhengjiang Lin

    Abstract: Score-based generative models have emerged as a powerful approach for sampling high-dimensional probability distributions. Despite their effectiveness, their theoretical underpinnings remain relatively underdeveloped. In this work, we study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives. Assuming access to $L^2$-… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 33 pages, 7 figures

  28. arXiv:2404.08958  [pdf, other

    cs.CV cs.CL cs.LG

    AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

    Authors: Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, Qinghua Hu

    Abstract: Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introdu… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  29. arXiv:2404.08237  [pdf, other

    cs.CV cs.AI

    IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

    Authors: Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Massimo Tistarelli, Zhe Jin

    Abstract: Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision T… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: ready to submit to IEEE Transactions on Information Forensics and Security (TIFS)

  30. arXiv:2404.07965  [pdf, other

    cs.CL cs.AI

    Rho-1: Not All Tokens Are What You Need

    Authors: Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

    Abstract: Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that "Not all tokens in a corpus are equally important for language model training". Our initial analysis delves into token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: First two authors equal contribution

  31. arXiv:2404.06448  [pdf, other

    cs.LG cs.AI

    Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

    Authors: Zihan Fang, Zheng Lin, Zhe Chen, Xianhao Chen, Yue Gao, Yuguang Fang

    Abstract: Recently, there has been a surge in the development of advanced intelligent generative content (AIGC), especially large language models (LLMs). However, for many downstream tasks, it is necessary to fine-tune LLMs using private data. While federated learning offers a promising privacy-preserving solution to LLM fine-tuning, the substantial size of an LLM, combined with high computational and commu… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 15 pages, 16 figures

  32. arXiv:2404.06391  [pdf, other

    cs.LG stat.ML

    Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

    Authors: Zhanran Lin, Puheng Li, Lei Wu

    Abstract: One of the most intriguing findings in the structure of neural network landscape is the phenomenon of mode connectivity: For two typical global minima, there exists a path connecting them without barrier. This concept of mode connectivity has played a crucial role in understanding important phenomena in deep learning. In this paper, we conduct a fine-grained analysis of this connectivity phenome… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: The first two authors contributed equally

  33. arXiv:2404.06244  [pdf, other

    cs.CV

    Anchor-based Robust Finetuning of Vision-Language Models

    Authors: Jinwei Han, Zhiwen Lin, Zhongyisun Sun, Yingguo Gao, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

    Abstract: We aim at finetuning a vision-language model without hurting its out-of-distribution (OOD) generalization. We address two types of OOD generalization, i.e., i) domain shift such as natural to sketch images, and ii) zero-shot capability to recognize the category that was not contained in the finetune data. Arguably, the diminished OOD generalization after finetuning stems from the excessively simpl… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  34. arXiv:2404.06201  [pdf, other

    cs.SE cs.AI

    Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

    Authors: Zhihao Lin, Wei Ma, Tao Lin, Yaowen Zheng, Jingquan Ge, Jun Wang, Jacques Klein, Tegawende Bissyande, Yang Liu, Li Li

    Abstract: Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. Like traditional SE tools, open-source collaboration is key in realising the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximising the sources of high-… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  35. arXiv:2404.04970  [pdf, other

    cs.LG

    How to characterize imprecision in multi-view clustering?

    Authors: Jinyi Xu, Zuowei Zhang, Ze Lin, Yixiang Chen, Zhe Liu, Weiping Ding

    Abstract: It is still challenging to cluster multi-view data since existing methods can only assign an object to a specific (singleton) cluster when combining different view information. As a result, it fails to characterize imprecision of objects in overlapping regions of different clusters, thus leading to a high risk of errors. In this paper, we thereby want to answer the question: how to characterize im… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 19 pages with 8 pages of supplementary

  36. arXiv:2404.02517  [pdf, other

    cs.CV

    HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

    Authors: Zhongyu Xia, ZhiWei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang

    Abstract: Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird's-eye-view (BEV) semantic segmentation. To improve perception precision, large image encoders, high-resolution images, and long-term temporal inputs have been adopted in recent 3D perception models, bringing remarkable performanc… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  37. arXiv:2404.02241  [pdf, other

    cs.CV

    Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks. When training DM and CM, intermediate weight checkpoints are not fully utilized and only the last converged checkpoint is used. In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by pro… ▽ More

    Submitted 7 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  38. arXiv:2404.01892  [pdf, other

    cs.CV

    Minimize Quantization Output Error with Bias Compensation

    Authors: Cheng Gong, Haoshuai Zheng, Mengting Hu, Zheng Lin, Deng-Ping Fan, Yuzhi Zhang, Tao Li

    Abstract: Quantization is a promising method that reduces memory usage and computational intensity of Deep Neural Networks (DNNs), but it often leads to significant output error that hinder model deployment. In this paper, we propose Bias Compensation (BC) to minimize the output error, thus realizing ultra-low-precision quantization without model fine-tuning. Instead of optimizing the non-convex quantizatio… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures

  39. arXiv:2404.01862  [pdf, other

    cs.CV cs.HC cs.MM

    Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

    Authors: Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

    Abstract: Co-speech gestures, if presented in the lively form of videos, can achieve superior visual effects in human-machine interaction. While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work. There are two main challenges: 1) A suitable motion feature is n… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 22 pages, 8 figures, CVPR 2024

  40. arXiv:2404.01697  [pdf, other

    stat.ML cs.LG

    Preventing Model Collapse in Gaussian Process Latent Variable Models

    Authors: Ying Li, Zhidi Lin, Feng Yin, Michael Minyi Zhang

    Abstract: Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models, commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, which leads to a type of model collapse characterized primarily by vague latent representations that do not r… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  41. arXiv:2404.01291  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Evaluating Text-to-Visual Generation with Image-to-Text Generation

    Authors: Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan

    Abstract: Despite significant progress in generative AI, comprehensive evaluation remains challenging because of the lack of effective metrics and standardized benchmarks. For instance, the widely-used CLIPScore measures the alignment between a (generated) image and text prompt, but it fails to produce reliable scores for complex prompts involving compositions of objects, attributes, and relations. One reas… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: We open-source our data, model, and code at: https://github.com/linzhiqiu/t2v_metrics ; Project page: https://linzhiqiu.github.io/papers/vqascore

  42. arXiv:2404.01127  [pdf, other

    cs.CV cs.AI

    Medical Visual Prompting (MVP): A Unified Framework for Versatile and High-Quality Medical Image Segmentation

    Authors: Yulin Chen, Guoheng Huang, Kai Huang, Zijin Lin, Guo Zhong, Shenghong Luo, Jie Deng, Jian Zhou

    Abstract: Accurate segmentation of lesion regions is crucial for clinical diagnosis and treatment across various diseases. While deep convolutional networks have achieved satisfactory results in medical image segmentation, they face challenges such as loss of lesion shape information due to continuous convolution and downsampling, as well as the high cost of manually labeling lesions with varying shapes and… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  43. arXiv:2404.01030  [pdf, ps, other

    cs.CV cs.AI cs.CY

    Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation

    Authors: Yixin Wan, Arjun Subramonian, Anaelia Ovalle, Zongyu Lin, Ashima Suvarna, Christina Chance, Hritik Bansal, Rebecca Pattichis, Kai-Wei Chang

    Abstract: The recent advancement of large and powerful models with Text-to-Image (T2I) generation abilities -- such as OpenAI's DALLE-3 and Google's Gemini -- enables users to generate high-quality images from textual prompts. However, it has become increasingly evident that even simple prompts could cause T2I models to exhibit conspicuous social bias in generated images. Such bias might lead to both alloca… ▽ More

    Submitted 1 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  44. arXiv:2404.00323  [pdf, other

    cs.CV cs.LG

    CLIP-driven Outliers Synthesis for few-shot OOD detection

    Authors: Hao Sun, Rundong He, Zhongyi Han, Zhicong Lin, Yongshun Gong, Yilong Yin

    Abstract: Few-shot OOD detection focuses on recognizing out-of-distribution (OOD) images that belong to classes unseen during training, with the use of only a small number of labeled in-distribution (ID) images. Up to now, a mainstream strategy is based on large-scale vision-language models, such as CLIP. However, these methods overlook a crucial issue: the lack of reliable OOD supervision information, whic… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 9 pages,5 figures

  45. arXiv:2403.19417  [pdf, other

    cs.CV

    OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

    Authors: Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu

    Abstract: We present OAKINK2, a dataset of bimanual object manipulation tasks for complex daily activities. In pursuit of constructing the complex tasks into a structured representation, OAKINK2 introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task. OAKINK2 features on an object-centric perspective for decoding the complex tasks, treating them… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: To be appeared in CVPR 2024. 26 pages

  46. arXiv:2403.19067  [pdf, other

    cs.CV

    Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

    Authors: Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

    Abstract: Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  47. arXiv:2403.18811  [pdf, other

    cs.CV cs.GR cs.SD eess.AS

    Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

    Authors: Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy

    Abstract: We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between t… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  48. arXiv:2403.17503  [pdf, other

    cs.LG cs.CV

    DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning

    Authors: Huiping Zhuang, Run He, Kai Tong, Ziqian Zeng, Cen Chen, Zhiping Lin

    Abstract: Class-incremental learning (CIL) under an exemplar-free constraint has presented a significant challenge. Existing methods adhering to this constraint are prone to catastrophic forgetting, far more so than replay-based techniques that retain access to past samples. In this paper, to solve the exemplar-free CIL problem, we propose a Dual-Stream Analytic Learning (DS-AL) approach. The DS-AL contains… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted in AAAI 2024

  49. arXiv:2403.17445  [pdf, other

    cs.LG cs.AI cs.CL

    Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model

    Authors: Jiqun Chu, Zuoquan Lin

    Abstract: Modeling long-range dependencies in sequential data is a crucial step in sequence learning. A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences. However, It is unclear whether the success of S4 can be attributed to its intricate parameterization and HiPPO initialization or simply due to State Space Models (SSMs). To f… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 tables, 3 figures

  50. arXiv:2403.17437  [pdf, other

    cs.SE

    An Empirical Study of ChatGPT-related projects on GitHub

    Authors: Zheng Lin, Neng Zhang

    Abstract: As ChatGPT possesses powerful capabilities in natural language processing and code analysis, it has received widespread attention since its launch. Developers have applied its powerful capabilities to various domains through software projects which are hosted on the largest open-source platform (GitHub) worldwide. Simultaneously, these projects have triggered extensive discussions. In order to com… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.