Skip to main content

Showing 1–50 of 3,034 results for author: Wang, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05526  [pdf, other

    cs.RO

    Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview

    Authors: Yuhang Ming, Xingrui Yang, Weihan Wang, Zheng Chen, Jinglun Feng, Yifan Xing, Guofeng Zhang

    Abstract: Neural Radiance Fields (NeRF) have emerged as a powerful paradigm for 3D scene representation, offering high-fidelity renderings and reconstructions from a set of sparse and unstructured sensor data. In the context of autonomous robotics, where perception and understanding of the environment are pivotal, NeRF holds immense promise for improving performance. In this paper, we present a comprehensiv… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 32 pages, 5 figures, 8 tables

  2. arXiv:2405.05254  [pdf, other

    cs.CL

    You Only Cache Once: Decoder-Decoder Architectures for Language Models

    Authors: Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

    Abstract: We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO onl… ▽ More

    Submitted 9 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.05031  [pdf, other

    cs.CV

    Mitigating Bias Using Model-Agnostic Data Attribution

    Authors: Sander De Coninck, Wei-Cheng Wang, Sam Leroux, Pieter Simoens

    Abstract: Mitigating bias in machine learning models is a critical endeavor for ensuring fairness and equity. In this paper, we propose a novel approach to address bias by leveraging pixel image attributions to identify and regularize regions of images containing significant information about bias attributes. Our method utilizes a model-agnostic approach to extract pixel attributions by employing a convolut… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to the 2024 IEEE CVPR Workshop on Fair, Data-efficient, and Trusted Computer Vision

  4. arXiv:2405.04950  [pdf, other

    cs.CV cs.AI cs.CL

    VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

    Authors: Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang

    Abstract: Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context. Yet, a challenging type of visual math lies in the multimodal graph theory problem, which demands that LMMs understand the graphical structures accurately and perform multi-step reasoning on the visual graph. Addi… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 17 pages; Accepted by ICML 2024

  5. arXiv:2405.04865  [pdf, ps, other

    cs.LG eess.SP

    Regime Learning for Differentiable Particle Filters

    Authors: John-Joseph Brady, Yuhui Luo, Wenwu Wang, Victor Elvira, Yunpeng Li

    Abstract: Differentiable particle filters are an emerging class of models that combine sequential Monte Carlo techniques with the flexibility of neural networks to perform state space inference. This paper concerns the case where the system may switch between a finite set of state-space models, i.e. regimes. No prior approaches effectively learn both the individual regimes and the switching process simultan… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    MSC Class: 68T37 ACM Class: I.2.6

  6. arXiv:2405.04781  [pdf, other

    cs.CL

    CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization

    Authors: Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, Xing zhang

    Abstract: Large language models (LLMs) have demonstrated astonishing capabilities in natural language processing (NLP) tasks, sparking interest in their application to professional domains with higher specialized requirements. However, restricted access to closed-source LLMs via APIs and the difficulty in collecting massive high-quality datasets pose obstacles to the development of large language models in… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  7. arXiv:2405.03565  [pdf, other

    cs.CV

    Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing

    Authors: Han Liu, Siyang Zhao, Xiaotong Zhang, Feng Zhang, Wei Wang, Fenglong Ma, Hongyang Chen, Hong Yu, Xianchao Zhang

    Abstract: Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all. While prevailing methods have shown promising performance via transferring knowledge from seen classes to unseen classes, they are still limited by (1) Inherent dissimilarities among classes make the transformation of features learned from seen classes t… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to AAAI 2024

  8. arXiv:2405.03215  [pdf, other

    cs.DC

    OMP-Engineer: Bridging Syntax Analysis and In-Context Learning for Efficient Automated OpenMP Parallelization

    Authors: Weidong Wang, Haoran Zhu

    Abstract: In advancing parallel programming, particularly with OpenMP, the shift towards NLP-based methods marks a significant innovation beyond traditional S2S tools like Autopar and Cetus. These NLP approaches train on extensive datasets of examples to efficiently generate optimized parallel code, streamlining the development process. This method's strength lies in its ability to swiftly produce paralleli… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  9. arXiv:2405.03202  [pdf, other

    cs.CV

    Hierarchical Space-Time Attention for Micro-Expression Recognition

    Authors: Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei Wang

    Abstract: Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  10. arXiv:2405.02759  [pdf, other

    cs.GR

    Region-Aware Color Smudging

    Authors: Ying Jiang, Pengfei Xu, Congyi Zhang, Hongbo Fu, Henry Lau, Wenping Wang

    Abstract: Color smudge operations from digital painting software enable users to create natural shading effects in high-fidelity paintings by interactively mixing colors. To precisely control results in traditional painting software, users tend to organize flat-filled color regions in multiple layers and smudge them to generate different color gradients. However, the requirement to carefully deal with regio… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  11. arXiv:2405.02717  [pdf, other

    cs.CV

    AFter: Attention-based Fusion Router for RGBT Tracking

    Authors: Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, Bin Luo

    Abstract: Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel \emph{A}ttention-based \emph{F}usion rou\emp… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Peer review

  12. arXiv:2405.01769  [pdf, other

    cs.CL

    A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

    Authors: Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, William Yang Wang

    Abstract: In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 35 pages, 6 figures

  13. arXiv:2405.01412  [pdf, other

    cs.CR cs.NI

    Applying Transparent Shaping for Zero Trust Architecture Implementation in AWS: A Case Study

    Authors: Wenjia Wang, Seyed Masoud Sadjadi, Naphtali Rishe

    Abstract: This study introduces a methodology integrating Zero Trust Architecture (ZTA) principles and Transparent Shaping into an AWS-hosted Online File Manager (OFM) application, enhancing security without substantial code modifications. We evaluate our approach with the Mozilla Observatory, highlighting significant security improvements and outlining a promising direction for applying Transparent Shaping… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 8 pages, 2 figures, 2 tables

  14. arXiv:2405.01389  [pdf, other

    cs.LG

    Invariant Risk Minimization Is A Total Variation Model

    Authors: Zhao-Rong Lai, Wei-Wen Wang

    Abstract: Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with resp… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  15. arXiv:2405.00579  [pdf, other

    cs.GT

    LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game

    Authors: Jianfeng Lu, Yue Chen, Shuqin Cao, Longbiao Chen, Wei Wang, Yun Xin

    Abstract: Although Hierarchical Federated Learning (HFL) utilizes edge servers (ESs) to alleviate communication burdens, its model performance will be degraded by non-IID data and limited communication resources. Current works often assume that data is uniformly distributed, which however contradicts the heterogeneity of IoT. Solutions of additional model training to check the data distribution inevitably i… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  16. arXiv:2405.00483  [pdf, other

    cs.CV cs.MM

    In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol

    Authors: Wei-Han Wang, Chin-Yuan Yeh, Hsi-Wen Chen, De-Nian Yang, Ming-Syan Chen

    Abstract: As deep generative models advance, we anticipate deepfakes achieving "perfection"-generating no discernible artifacts or noise. However, current deepfake detectors, intentionally or inadvertently, rely on such artifacts for detection, as they are exclusive to deepfakes and absent in genuine examples. To bridge this gap, we introduce the Rebalanced Deepfake Detection Protocol (RDDP) to stress-test… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  17. arXiv:2405.00253  [pdf, other

    cs.CL cs.SE

    CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification

    Authors: Yuchen Tian, Weixiang Yan, Qian Yang, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma

    Abstract: Large Language Models (LLMs) have made significant advancements in the field of code generation, offering unprecedented support for automated programming and assisting developers. However, LLMs sometimes generate code that appears plausible but fails to meet the expected requirements or executes incorrectly. This phenomenon of hallucinations in the coding field has not been explored. To advance th… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  18. arXiv:2405.00233  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

    Authors: Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate at high bitrates or within narrow domains such as speech and lack the semantic clues required for efficient language modelling. Addressing these chal… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Demo and code: https://haoheliu.github.io/SemantiCodec/

  19. arXiv:2405.00077  [pdf, other

    cs.LG eess.SP

    BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

    Authors: Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

    Abstract: Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samp… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  20. arXiv:2404.19384  [pdf, other

    cs.CV cs.AI

    Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

    Authors: Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai

    Abstract: Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previ… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  21. arXiv:2404.19382  [pdf, other

    cs.CV

    Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective

    Authors: Xiaoxuan Han, Songlin Yang, Wei Wang, Yang Li, Jing Dong

    Abstract: Advanced text-to-image diffusion models raise safety concerns regarding identity privacy violation, copyright infringement, and Not Safe For Work content generation. Towards this, unlearning methods have been developed to erase these involved concepts from diffusion models. However, these unlearning methods only shift the text-to-image mapping and preserve the visual content within the generative… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  22. arXiv:2404.19368  [pdf, other

    cs.SE

    Exploring Multi-Lingual Bias of Large Code Models in Code Generation

    Authors: Chaozheng Wang, Zongjie Li, Cuiyun Gao, Wenxuan Wang, Ting Peng, Hailiang Huang, Yuetang Deng, Shuai Wang, Michael R. Lyu

    Abstract: Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despi… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 12 pages

  23. arXiv:2404.19330  [pdf, other

    cs.CV cs.AI

    G2LTraj: A Global-to-Local Generation Approach for Trajectory Prediction

    Authors: Zhanwei Zhang, Zishuo Hua, Minghao Chen, Wei Lu, Binbin Lin, Deng Cai, Wenxiao Wang

    Abstract: Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in k… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  24. arXiv:2404.19218  [pdf

    cs.LG

    Flight Trajectory Prediction Using an Enhanced CNN-LSTM Network

    Authors: Qinzhi Hao, Jiali Zhang, Tengyu Jing, Wei Wang

    Abstract: Aiming at the problem of low accuracy of flight trajectory prediction caused by the high speed of fighters, the diversity of tactical maneuvers, and the transient nature of situational change in close range air combat, this paper proposes an enhanced CNN-LSTM network as a fighter flight trajectory prediction method. Firstly, we extract spatial features from fighter trajectory data using CNN, aggre… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  25. arXiv:2404.18630  [pdf, other

    cs.CV

    4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations

    Authors: Wenbo Wang, Hsuan-I Ho, Chen Guo, Boxiang Rong, Artur Grigorev, Jie Song, Juan Jose Zarate, Otmar Hilliges

    Abstract: The studies of human clothing for digital avatars have predominantly relied on synthetic datasets. While easy to collect, synthetic data often fall short in realism and fail to capture authentic clothing dynamics. Addressing this gap, we introduce 4D-DRESS, the first real-world 4D dataset advancing human clothing research with its high-quality 4D textured scans and garment meshes. 4D-DRESS capture… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 paper, 21 figures, 9 tables

  26. arXiv:2404.18279  [pdf, other

    cs.CV

    Out-of-distribution Detection in Medical Image Analysis: A survey

    Authors: Zesheng Hong, Yubiao Yue, Yubin Chen, Huanjie Lin, Yuanmei Luo, Mini Han Wang, Weidong Wang, Jialong Xu, Xiaoqi Yang, Zhenzhang Li, Sihong Xie

    Abstract: Computer-aided diagnostics has benefited from the development of deep learning-based computer vision techniques in these years. Traditional supervised deep learning methods assume that the test sample is drawn from the identical distribution as the training data. However, it is possible to encounter out-of-distribution samples in real-world clinical scenarios, which may cause silent failure in dee… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 23 pages, 3 figures

  27. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 7 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  28. arXiv:2404.18094  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    USAT: A Universal Speaker-Adaptive Text-to-Speech Approach

    Authors: Wenbin Wang, Yang Song, Sanjay Jha

    Abstract: Conventional text-to-speech (TTS) research has predominantly focused on enhancing the quality of synthesized speech for speakers in the training dataset. The challenge of synthesizing lifelike speech for unseen, out-of-dataset speakers, especially those with limited reference data, remains a significant and unresolved problem. While zero-shot or few-shot speaker-adaptive TTS approaches have been e… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 15 pages, 13 figures. Copyright has been transferred to IEEE

    Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2024

  29. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  30. arXiv:2404.17826  [pdf, other

    cs.IR

    A Taxation Perspective for Fair Re-ranking

    Authors: Chen Xu, Xiaopeng Ye, Wenjie Wang, Liang Pang, Jun Xu, Tat-Seng Chua

    Abstract: Fair re-ranking aims to redistribute ranking slots among items more equitably to ensure responsibility and ethics. The exploration of redistribution problems has a long history in economics, offering valuable insights for conceptualizing fair re-ranking as a taxation process. Such a formulation provides us with a fresh perspective to re-examine fair re-ranking and inspire the development of new me… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted in SIGIR 2024

  31. arXiv:2404.17806  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

    Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

    Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  32. arXiv:2404.17229  [pdf, other

    cs.RO

    Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision

    Authors: Cong Fan, Shengkai Zhang, Kezhong Liu, Shuai Wang, Zheng Yang, Wei Wang

    Abstract: Complementary to prevalent LiDAR and camera systems, millimeter-wave (mmWave) radar is robust to adverse weather conditions like fog, rainstorms, and blizzards but offers sparse point clouds. Current techniques enhance the point cloud by the supervision of LiDAR's data. However, high-performance LiDAR is notably expensive and is not commonly available on vehicles. This paper presents mmEMP, a supe… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by ICRA 2024

  33. arXiv:2404.16924  [pdf, other

    cs.IR cs.CL

    A Survey of Generative Search and Recommendation in the Era of Large Language Models

    Authors: Yongqi Li, Xinyu Lin, Wenjie Wang, Fuli Feng, Liang Pang, Wenjie Li, Liqiang Nie, Xiangnan He, Tat-Seng Chua

    Abstract: With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs. As the two sides of the same coin, both revolve around the same core research problem, matching queries with documents or users with items. In the recent few decades, search and recommendation have experienced synchronous technological paradigm shifts, inclu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  34. arXiv:2404.16850  [pdf, other

    cs.CR

    Membership Information Leakage in Federated Contrastive Learning

    Authors: Kongyang Chen, Wenfeng Wang, Zixin Wang, Wangjun Zhang, Zhipeng Li, Yao Huang

    Abstract: Federated Contrastive Learning (FCL) represents a burgeoning approach for learning from decentralized unlabeled data while upholding data privacy. In FCL, participant clients collaborate in learning a global encoder using unlabeled data, which can serve as a versatile feature extractor for diverse downstream tasks. Nonetheless, FCL is susceptible to privacy risks, such as membership information le… ▽ More

    Submitted 6 March, 2024; originally announced April 2024.

  35. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  36. arXiv:2404.16789  [pdf, other

    cs.LG cs.AI cs.CL

    Continual Learning of Large Language Models: A Comprehensive Survey

    Authors: Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Hao Wang

    Abstract: The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 57 pages, 2 figures, 4 tables. Work in progress

  37. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  38. arXiv:2404.16408  [pdf, other

    cs.IT eess.SY

    Event-Triggered Resilient Filtering for 2-D Systems with Asynchronous-Delay: Handling Binary Encoding Decoding with Probabilistic Bit Flips

    Authors: Yu Chen, Wei Wang

    Abstract: In this paper, the event-triggered resilient filtering problem is investigated for a class of two-dimensional systems with asynchronous-delay under binary encoding-decoding schemes with probabilistic bit flips. To reduce unnecessary communications and computations in complex network systems, alleviate network energy consumption, and optimize the use of network resources, a new event-triggered mech… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  39. arXiv:2404.16369  [pdf, other

    cs.CL

    Don't Say No: Jailbreaking LLM by Suppressing Refusal

    Authors: Yukai Zhou, Wenjie Wang

    Abstract: Ensuring the safety alignment of Large Language Models (LLMs) is crucial to generating responses consistent with human values. Despite their ability to recognize and avoid harmful queries, LLMs are vulnerable to "jailbreaking" attacks, where carefully crafted prompts elicit them to produce toxic content. One category of jailbreak attacks is reformulating the task as adversarial attacks by elicitin… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  40. arXiv:2404.16324  [pdf, other

    math.NA cs.LG eess.SP

    Improved impedance inversion by deep learning and iterated graph Laplacian

    Authors: Davide Bianchi, Florian Bossmann, Wenlong Wang, Mingming Liu

    Abstract: Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results. We present a hybrid method that combines deep learning with iterated graph Lap… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  41. arXiv:2404.16322  [pdf, other

    cs.DB

    Bridging Speed and Accuracy to Approximate $K$-Nearest Neighbor Search

    Authors: Mingyu Yang, Jiabao Jin, Xiangyu Wang, Zhitao Shen, Wei Jia, Wentao Li, Wei Wang

    Abstract: Approximate K-Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. The efficiency of AKNN search largely depends on the computation of distances, a process that significantly affects the runtime. To improve computational efficiency, existing work often opts for estimating approximate distances rather than computing exact distances, at the cost of reduced… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages

  42. arXiv:2404.15661  [pdf, other

    cs.GR cs.CG cs.CV

    CWF: Consolidating Weak Features in High-quality Mesh Simplification

    Authors: Rui Xu, Longdu Liu, Ningna Wang, Shuangmin Chen, Shiqing Xin, Xiaohu Guo, Zichun Zhong, Taku Komura, Wenping Wang, Changhe Tu

    Abstract: In mesh simplification, common requirements like accuracy, triangle quality, and feature alignment are often considered as a trade-off. Existing algorithms concentrate on just one or a few specific aspects of these requirements. For example, the well-known Quadric Error Metrics (QEM) approach prioritizes accuracy and can preserve strong feature lines/points as well but falls short in ensuring high… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 14 pages, 22 figures

  43. arXiv:2404.15382  [pdf, other

    cs.LG cs.AI cs.NI

    Feature Distribution Shift Mitigation with Contrastive Pretraining for Intrusion Detection

    Authors: Weixing Wang, Haojin Yang, Christoph Meinel, Hasan Yagiz Özkan, Cristian Bermudez Serna, Carmen Mas-Machuca

    Abstract: In recent years, there has been a growing interest in using Machine Learning (ML), especially Deep Learning (DL) to solve Network Intrusion Detection (NID) problems. However, the feature distribution shift problem remains a difficulty, because the change in features' distributions over time negatively impacts the model's performance. As one promising solution, model pretraining has emerged as a no… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: accepted by ICMLCN24

  44. arXiv:2404.15311  [pdf, other

    eess.SP cs.AI cs.LG

    Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression

    Authors: Eric Modesitt, Haicheng Yin, Williams Huang Wang, Brian Lu

    Abstract: The task of Electroencephalogram (EEG) analysis is paramount to the development of Brain-Computer Interfaces (BCIs). However, to reach the goal of developing robust, useful BCIs depends heavily on the speed and the accuracy at which BCIs can understand neural dynamics. In response to that goal, this paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted HCI International 2024

  45. arXiv:2404.15271  [pdf, other

    cs.CV cs.AI cs.CL

    Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

    Authors: Wanrong Zhu, Jennifer Healey, Ruiyi Zhang, William Yang Wang, Tong Sun

    Abstract: Recent advancements in instruction-following models have made user interactions with models more user-friendly and efficient, broadening their applicability. In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources. In this work, we introduce a novel multimodal instruction-following framework for layout planning, allowing use… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  46. arXiv:2404.15045  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-Head Mixture-of-Experts

    Authors: Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei

    Abstract: Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a small subset of experts are activated for optimization. (2) Lacking fine-grained analytical capabilities for multiple semantic concepts within individual tokens. We propose Multi-Head Mixture-of-Experts… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  47. arXiv:2404.14568  [pdf, other

    cs.CV

    UVMap-ID: A Controllable and Personalized UV Map Generative Model

    Authors: Weijie Wang, Jichao Zhang, Chang Liu, Xia Li, Xingqian Xu, Humphrey Shi, Nicu Sebe, Bruno Lepri

    Abstract: Recently, diffusion models have made significant strides in synthesizing realistic 2D human images based on provided text prompts. Building upon this, researchers have extended 2D text-to-image diffusion models into the 3D domain for generating human textures (UV Maps). However, some important problems about UV Map Generative models are still not solved, i.e., how to generate personalized texture… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  48. arXiv:2404.14215  [pdf, other

    cs.CL

    Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction

    Authors: Zheye Deng, Chunkit Chan, Weiqi Wang, Yuxi Sun, Wei Fan, Tianshi Zheng, Yauwai Yim, Yangqiu Song

    Abstract: The task of condensing large chunks of textual information into concise and structured tables has gained attention recently due to the emergence of Large Language Models (LLMs) and their potential benefit for downstream tasks, such as text summarization and text mining. Previous approaches often generate tables that directly replicate information from the text, limiting their applicability in broa… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.13957  [pdf, other

    cs.CL

    How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO

    Authors: Man Tik Ng, Hui Tung Tse, Jen-tse Huang, Jingjing Li, Wenxuan Wang, Michael R. Lyu

    Abstract: The role-play ability of Large Language Models (LLMs) has emerged as a popular research direction. However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals. Such an oversight limits the potential for advancements in digital human clones and non-player characters in video games. To bridge this gap,… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 9 pages

  50. arXiv:2404.13788  [pdf, other

    cs.CV cs.AI

    AnyPattern: Towards In-context Image Copy Detection

    Authors: Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang

    Abstract: This paper explores in-context learning for image copy detection (ICD), i.e., prompting an ICD model to identify replicated images with new tampering patterns without the need for additional training. The prompts (or the contexts) are from a small set of image-replica pairs that reflect the new patterns and are used at inference time. Such in-context ICD has good realistic value, because it requir… ▽ More

    Submitted 28 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: The project is publicly available at https://anypattern.github.io. arXiv admin note: text overlap with arXiv:2403.06098