Skip to main content

Showing 1–50 of 1,695 results for author: Wang, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05590  [pdf, other

    cs.CR cs.AR cs.LG

    TroLLoc: Logic Locking and Layout Hardening for IC Security Closure against Hardware Trojans

    Authors: Fangzhou Wang, Qijing Wang, Lilas Alrahis, Bangqi Fu, Shui Jiang, Xiaopeng Zhang, Ozgur Sinanoglu, Tsung-Yi Ho, Evangeline F. Y. Young, Johann Knechtel

    Abstract: Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many security threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically protect the physical layouts of ICs against… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05500  [pdf

    cs.RO eess.SY

    Research on the Tender Leaf Identification and Mechanically Perceptible Plucking Finger for High-quality Green Tea

    Authors: Wei Zhang, Yong Chen, Qianqian Wang, Jun Chen

    Abstract: BACKGROUND: Intelligent identification and precise plucking are the keys to intelligent tea harvesting robots, which are of increasing significance nowadays. Aiming at plucking tender leaves for high-quality green tea producing, in this paper, a tender leaf identification algorithm and a mechanically perceptible plucking finger have been proposed. RESULTS: Based on segmentation algorithm and color… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.03673  [pdf, other

    cs.CV cs.AI

    MemoryMamba: Memory-Augmented State Space Model for Defect Recognition

    Authors: Qianning Wang, He Hu, Yucheng Zhou

    Abstract: As automation advances in manufacturing, the demand for precise and sophisticated defect detection technologies grows. Existing vision models for defect recognition methods are insufficient for handling the complexities and variations of defects in contemporary manufacturing settings. These models especially struggle in scenarios involving limited or imbalanced defect data. In this work, we introd… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 15 pages, 7 figures

  4. arXiv:2405.03003  [pdf, other

    cs.LG cs.AI cs.CL

    Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

    Authors: Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, Jia Li

    Abstract: Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $ΔW=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to fur… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  5. CVTGAD: Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection

    Authors: Jindong Li, Qianli Xing, Qi Wang, Yi Chang

    Abstract: Unsupervised graph-level anomaly detection (UGAD) has received remarkable performance in various critical disciplines, such as chemistry analysis and bioinformatics. Existing UGAD paradigms often adopt data augmentation techniques to construct multiple views, and then employ different strategies to obtain representations from different views for jointly conducting UGAD. However, most previous work… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  6. arXiv:2404.18598  [pdf, other

    cs.CV cs.GR

    Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting

    Authors: Tianyidan Xie, Rui Ma, Qian Wang, Xiaoqian Ye, Feixuan Liu, Ying Tai, Zhenyu Zhang, Zili Yi

    Abstract: Recent advancements in image inpainting, particularly through diffusion modeling, have yielded promising outcomes. However, when tested in scenarios involving the completion of images based on the foreground objects, current methods that aim to inpaint an image in an end-to-end manner encounter challenges such as "over-imagination", inconsistency between foreground and background, and limited dive… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 16 pages, 9 figures, project page: https://anywheremultiagent.github.io

  7. arXiv:2404.18319  [pdf, other

    cs.IR

    User Welfare Optimization in Recommender Systems with Competing Content Creators

    Authors: Fan Yao, Yiming Liao, Mingzhe Wu, Chuanhao Li, Yan Zhu, James Yang, Qifan Wang, Haifeng Xu, Hongning Wang

    Abstract: Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global use… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  8. arXiv:2404.17871  [pdf, other

    cs.SE cs.AI

    A Survey of Deep Learning Library Testing Methods

    Authors: Xiaoyu Zhang, Weipeng Jiang, Chao Shen, Qi Li, Qian Wang, Chenhao Lin, Xiaohong Guan

    Abstract: In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Study… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 34 pages, 8 figures, 4 tables

  9. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  10. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  11. arXiv:2404.15943  [pdf, other

    cs.LG cs.AI

    Decentralized Personalized Federated Learning based on a Conditional Sparse-to-Sparser Scheme

    Authors: Qianyu Long, Qiyuan Wang, Christos Anagnostopoulos, Daning Bi

    Abstract: Decentralized Federated Learning (DFL) has become popular due to its robustness and avoidance of centralized coordination. In this paradigm, clients actively engage in training by exchanging models with their networked neighbors. However, DFL introduces increased costs in terms of training and communication. Existing methods focus on minimizing communication often overlooking training efficiency a… ▽ More

    Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: 15 pages, 9 figures, 3 pages theory

  12. arXiv:2404.15690  [pdf, other

    cs.CL cs.LG

    Neural Proto-Language Reconstruction

    Authors: Chenxuan Cui, Ying Chen, Qinxin Wang, David R. Mortensen

    Abstract: Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neu… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  13. arXiv:2404.15677  [pdf, other

    cs.CV

    CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models

    Authors: Qinghe Wang, Baolu Li, Xiaomin Li, Bing Cao, Liqian Ma, Huchuan Lu, Xu Jia

    Abstract: Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models. More specifically, we consi… ▽ More

    Submitted 27 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Code will be released very soon: https://github.com/qinghew/CharacterFactory

  14. arXiv:2404.15595  [pdf, other

    cs.LG cs.CE

    Variational Deep Survival Machines: Survival Regression with Censored Outcomes

    Authors: Qinxin Wang, Jiayuan Huang, Junhui Li, Jiaming Liu

    Abstract: Survival regression aims to predict the time when an event of interest will take place, typically a death or a failure. A fully parametric method [18] is proposed to estimate the survival function as a mixture of individual parametric distributions in the presence of censoring. In this paper, We present a novel method to predict the survival time by better clustering the survival data and combine… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  15. arXiv:2404.15580  [pdf, other

    cs.CV

    MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

    Authors: Jiaxin Zhuang, Linshan Wu, Qiong Wang, Varut Vardhanabhuti, Lin Luo, Hao Chen

    Abstract: The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the per… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: submitted to journal

  16. arXiv:2404.14162  [pdf, other

    cs.CV

    FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

    Authors: Chenhui Wang, Tao Chen, Zhihao Chen, Zhizhong Huang, Taoran Jiang, Qi Wang, Hongming Shan

    Abstract: Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventiona… ▽ More

    Submitted 4 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  17. arXiv:2404.13947  [pdf, other

    cs.CV

    Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA

    Authors: Dongze Hao, Qunbo Wang, Longteng Guo, Jie Jiang, Jing Liu

    Abstract: Knowledge-based Visual Question Answering (VQA) requires models to incorporate external knowledge to respond to questions about visual content. Previous methods mostly follow the "retrieve and generate" paradigm. Initially, they utilize a pre-trained retriever to fetch relevant knowledge documents, subsequently employing them to generate answers. While these methods have demonstrated commendable p… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  18. arXiv:2404.12587  [pdf, other

    cs.AI

    Reinforcement Learning Approach for Integrating Compressed Contexts into Knowledge Graphs

    Authors: Ngoc Quach, Qi Wang, Zijun Gao, Qifeng Sun, Bo Guan, Lillian Floyd

    Abstract: The widespread use of knowledge graphs in various fields has brought about a challenge in effectively integrating and updating information within them. When it comes to incorporating contexts, conventional methods often rely on rules or basic machine learning models, which may not fully grasp the complexity and fluidity of context information. This research suggests an approach based on reinforcem… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by the 2024 International Conference on Machine Learning and Neural Networks (MLNN 2024)

  19. arXiv:2404.12022  [pdf, other

    cs.CL

    Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

    Authors: Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

    Abstract: Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  20. arXiv:2404.11613  [pdf, other

    cs.CV

    InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

    Authors: Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao

    Abstract: 3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant proper… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Project page: https://johanan528.github.io/Infusion

  21. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  22. arXiv:2404.09619  [pdf, other

    cs.CV cs.AI

    UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark

    Authors: Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang

    Abstract: As an alternative to expensive expert evaluation, Image Aesthetic Assessment (IAA) stands out as a crucial task in computer vision. However, traditional IAA methods are typically constrained to a single data source or task, restricting the universality and broader application. In this work, to better align with human aesthetics, we propose a Unified Multi-modal Image Aesthetic Assessment (UNIAA) f… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  23. arXiv:2404.09540  [pdf, other

    cs.CV

    Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement

    Authors: Chi Wang, Junming Huang, Rong Zhang, Qi Wang, Haotian Yang, Haibin Huang, Chongyang Ma, Weiwei Xu

    Abstract: Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  24. PrintListener: Uncovering the Vulnerability of Fingerprint Authentication via the Finger Friction Sound

    Authors: Man Zhou, Shuao Su, Qian Wang, Qi Li, Yuting Zhou, Xiaojing Ma, Zhengxiong Li

    Abstract: Fingerprint authentication has been extensively employed in contemporary identity verification systems owing to its rapidity and cost-effectiveness. Due to its widespread use, fingerprint leakage may cause sensitive information theft, enormous economic and personnel losses, and even a potential compromise of national security. As a fingerprint that can coincidentally match a specific proportion of… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: in Proc. of NDSS, 2024

  25. arXiv:2404.09192  [pdf, other

    cs.SD cs.AI eess.AS

    Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

    Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, Jinzuomu Zhong, Benlai Tang

    Abstract: Over the past decade, a series of unflagging efforts have been dedicated to developing highly expressive and controllable text-to-speech (TTS) systems. In general, the holistic TTS comprises two interconnected components: the frontend module and the backend module. The frontend excels in capturing linguistic representations from the raw text input, while the backend module converts linguistic cues… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  26. arXiv:2404.08958  [pdf, other

    cs.CV cs.CL cs.LG

    AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

    Authors: Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, Qinghua Hu

    Abstract: Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introdu… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  27. arXiv:2404.06709  [pdf, other

    cs.CL

    CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

    Authors: Longwei Zou, Qingyang Wang, Han Zhao, Jiangang Kong, Yi Yang, Yangdong Deng

    Abstract: The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks. However, the effectiveness of large language models are reliant on an exponentially increasing number of parameters. The overwhelming computation complexity incurs a high inference latency that negatively affects user experience. Existing methods to improve inferen… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: ARR Under Review

  28. arXiv:2404.06077  [pdf, other

    cs.CR cs.AI cs.CY

    Is Your AI Truly Yours? Leveraging Blockchain for Copyrights, Provenance, and Lineage

    Authors: Yilin Sai, Qin Wang, Guangsheng Yu, H. M. N. Dilum Bandara, Shiping Chen

    Abstract: As Artificial Intelligence (AI) integrates into diverse areas, particularly in content generation, ensuring rightful ownership and ethical use becomes paramount. AI service providers are expected to prioritize responsibly sourcing training data and obtaining licenses from data owners. However, existing studies primarily center on safeguarding static copyrights, which simply treats metadata/dataset… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  29. arXiv:2404.04940  [pdf, other

    cs.LG

    Fuzzy K-Means Clustering without Cluster Centroids

    Authors: Han Lu, Fangfang Li, Quanxue Gao, Cheng Deng, Chris Ding, Qianqian Wang

    Abstract: Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. However, the performance of popular Fuzzy K-Means algorithms is sensitive to the selection of initial cluster centroids and is also affected by noise when updating mean cluster centroids. To address these challenges, this paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  30. arXiv:2404.04860  [pdf, other

    cs.CV

    ByteEdit: Boost, Comply and Accelerate Generative Image Editing

    Authors: Yuxi Ren, Jie Wu, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean Fu

    Abstract: Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we p… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  31. arXiv:2404.04599  [pdf, ps, other

    quant-ph cs.CC

    Local Test for Unitarily Invariant Properties of Bipartite Quantum States

    Authors: Kean Chen, Qisheng Wang, Zhicheng Zhang

    Abstract: We study the power of local test for bipartite quantum states. Our central result is that, for properties of bipartite pure states, unitary invariance on one part implies an optimal (over all global testers) local tester acting only on the other part. This suggests a canonical local tester for entanglement spectra (i.e., Schmidt coefficients), and reveals that purified samples offer no advantage i… ▽ More

    Submitted 29 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 51 pages. Compared to [v1], we (i) extended testers with parameterized completeness and soundness, (ii) added new lower bounds for testing the bond dimension of matrix product states (MPS), and (iii) improved the lower bounds for testing Schmidt rank

  32. arXiv:2404.04522  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models

    Authors: Zhiyuan Peng, Xuyang Wu, Qifan Wang, Sravanthi Rajanala, Yi Fang

    Abstract: Parameter Efficient Fine-Tuning (PEFT) methods have been extensively utilized in Large Language Models (LLMs) to improve the down-streaming tasks without the cost of fine-tuing the whole LLMs. Recent studies have shown how to effectively use PEFT for fine-tuning LLMs in ranking tasks with convincing performance; there are some limitations, including the learned prompt being fixed for different doc… ▽ More

    Submitted 11 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

  33. arXiv:2404.04319  [pdf, other

    cs.CV

    SpatialTracker: Tracking Any 2D Pixels in 3D Space

    Authors: Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou

    Abstract: Recovering dense and long-range pixel motion in videos is a challenging problem. Part of the difficulty arises from the 3D-to-2D projection process, leading to occlusions and discontinuities in the 2D motion domain. While 2D motion can be intricate, we posit that the underlying 3D motion can often be simple and low-dimensional. In this work, we propose to estimate point trajectories in 3D space to… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 (selected as highlight paper). Project page: https://henry123-boy.github.io/SpaTracker/

  34. arXiv:2404.04232  [pdf, other

    cs.CL

    Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

    Authors: Tianqi Zhong, Zhaoyi Li, Quan Wang, Linqi Song, Ying Wei, Defu Lian, Zhendong Mao

    Abstract: Compositional generalization, representing the model's ability to generate text with new attribute combinations obtained by recombining single attributes from the training data, is a crucial property for multi-aspect controllable text generation (MCTG) methods. Nonetheless, a comprehensive compositional generalization evaluation benchmark of MCTG is still lacking. We propose CompMCTG, a benchmark… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  35. arXiv:2404.03873  [pdf, other

    cs.CR

    PrivShape: Extracting Shapes in Time Series under User-Level Local Differential Privacy

    Authors: Yulian Mao, Qingqing Ye, Haibo Hu, Qi Wang, Kai Huang

    Abstract: Time series have numerous applications in finance, healthcare, IoT, and smart city. In many of these applications, time series typically contain personal data, so privacy infringement may occur if they are released directly to the public. Recently, local differential privacy (LDP) has emerged as the state-of-the-art approach to protecting data privacy. However, existing works on LDP-based collecti… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  36. arXiv:2404.03819  [pdf, other

    cs.CV

    Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

    Authors: Qinji Yu, Yirui Wang, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Le Lu, Na Shen, Qifeng Wang, Xiaowei Ding, Xianghua Ye, Dakai Jin

    Abstract: Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previou… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Technical report

  37. arXiv:2404.03611  [pdf, other

    cs.CV cs.AI

    InsectMamba: Insect Pest Classification with State Space Model

    Authors: Qianning Wang, Chenglin Wang, Zhixin Lai, Yucheng Zhou

    Abstract: The classification of insect pests is a critical task in agricultural technology, vital for ensuring food security and environmental sustainability. However, the complexity of pest identification, due to factors like high camouflage and species diversity, poses significant obstacles. Existing methods struggle with the fine-grained feature extraction needed to distinguish between closely related pe… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 13 pages, 5 figures

  38. arXiv:2404.03253  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    A dataset of primary nasopharyngeal carcinoma MRI with multi-modalities segmentation

    Authors: Yin Li, Qi Chen, Kai Wang, Meige Li, Liping Si, Yingwei Guo, Yu Xiong, Qixing Wang, Yang Qin, Ling Xu, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Multi-modality magnetic resonance imaging data with various sequences facilitate the early diagnosis, tumor segmentation, and disease staging in the management of nasopharyngeal carcinoma (NPC). The lack of publicly available, comprehensive datasets limits advancements in diagnosis, treatment planning, and the development of machine learning algorithms for NPC. Addressing this critical need, we in… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  39. arXiv:2404.02837  [pdf, other

    cs.CL

    Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models

    Authors: Wanyun Cui, Qianle Wang

    Abstract: This paper reveals the phenomenon of parameter heterogeneity in large language models (LLMs). We find that a small subset of ``cherry'' parameters exhibit a disproportionately large influence on model performance, while the vast majority of parameters have minimal impact. This heterogeneity is found to be prevalent across different model families, scales, and types. Motivated by this observation,… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  40. arXiv:2404.02733  [pdf, other

    cs.CV

    InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

    Authors: Haofan Wang, Matteo Spinelli, Qixun Wang, Xu Bai, Zekui Qin, Anthony Chen

    Abstract: Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization. However, despite this notable progress, current models continue to grapple with several complex challenges in producing style-consistent image generation. Firstly, the concept of style is inherently underdetermined, encompassing a multitude of elements such as color,… ▽ More

    Submitted 4 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Technical Report

  41. Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM

    Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Yuekai Huang, Jun Hu, Qing Wang

    Abstract: Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in.… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 CHI Conference on Human Factors in Computing Systems

  42. arXiv:2404.00883  [pdf, other

    cs.LG

    Interpretable Multi-View Clustering Based on Anchor Graph Tensor Factorization

    Authors: Jing Li, Quanxue Gao, Cheng Deng, Qianqian Wang, Ming Yang

    Abstract: The clustering method based on the anchor graph has gained significant attention due to its exceptional clustering performance and ability to process large-scale data. One common approach is to learn bipartite graphs with K-connected components, helping avoid the need for post-processing. However, this method has strict parameter requirements and may not always get K-connected components. To addre… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  43. arXiv:2404.00226  [pdf, other

    cs.CV cs.CL

    Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training

    Authors: Tongkun Su, Jun Li, Xi Zhang, Haibo Jin, Hao Chen, Qiong Wang, Faqin Lv, Baoliang Zhao, Yin Hu

    Abstract: Multimodal pre-training demonstrates its potential in the medical domain, which learns medical visual representations from paired medical reports. However, many pre-training tasks require extra annotations from clinicians, and most of them fail to explicitly guide the model to learn the desired features of different pathologies. To the best of our knowledge, we are the first to utilize Visual Ques… ▽ More

    Submitted 8 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

  44. arXiv:2403.20300  [pdf, other

    cs.MA cs.AI cs.RO

    Improving Learnt Local MAPF Policies with Heuristic Search

    Authors: Rishi Veerapaneni, Qian Wang, Kevin Ren, Arthur Jakobsson, Jiaoyang Li, Maxim Likhachev

    Abstract: Multi-agent path finding (MAPF) is the problem of finding collision-free paths for a team of agents to reach their goal locations. State-of-the-art classical MAPF solvers typically employ heuristic search to find solutions for hundreds of agents but are typically centralized and can struggle to scale when run with short timeouts. Machine learning (ML) approaches that learn policies for each agent… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted in ICAPS 2024

  45. arXiv:2403.20163  [pdf, other

    cs.NE q-bio.NC

    Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning

    Authors: Duzhen Zhang, Qingyu Wang, Tielin Zhang, Bo Xu

    Abstract: The success of Deep Reinforcement Learning (DRL) is largely attributed to utilizing Artificial Neural Networks (ANNs) as function approximators. Recent advances in neuroscience have unveiled that the human brain achieves efficient reward-based learning, at least by integrating spiking neurons with spatial-temporal dynamics and network topologies with biologically-plausible connectivity patterns. T… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Work in Progress

  46. arXiv:2403.19833  [pdf, other

    cs.NI cs.AI

    ChatTracer: Large Language Model Powered Real-time Bluetooth Device Tracking System

    Authors: Qijun Wang, Shichen Zhang, Kunzhe Song, Huacheng Zeng

    Abstract: Large language models (LLMs), exemplified by OpenAI ChatGPT and Google Bard, have transformed the way we interact with cyber technologies. In this paper, we study the possibility of connecting LLM with wireless sensor networks (WSN). A successful design will not only extend LLM's knowledge landscape to the physical world but also revolutionize human interaction with WSN. To the end, we present Cha… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  47. arXiv:2403.19531  [pdf, other

    cs.CR cs.DB cs.SI

    SecGraph: Towards SGX-based Efficient and Confidentiality-Preserving Graph Search

    Authors: Qiuhao Wang, Xu Yang, Saiyu Qi, Yong Qi

    Abstract: Graphs have more expressive power and are widely researched in various search demand scenarios, compared with traditional relational and XML models. Today, many graph search services have been deployed on a third-party server, which can alleviate users from the burdens of maintaining large-scale graphs and huge computation costs. Nevertheless, outsourcing graph search services to the third-party s… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted by DASFAA 2024

  48. arXiv:2403.19235  [pdf, other

    cs.CV

    DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation

    Authors: Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong Liu, Jingdong Wang

    Abstract: While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context. Existing personalization methods either require time-consuming optimization or learning additional encoders, ad… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  49. arXiv:2403.18228  [pdf, other

    cs.CV cs.LG cs.NE

    Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification

    Authors: Qingyu Wang, Duzhen Zhang, Tilelin Zhang, Bo Xu

    Abstract: Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial Transformer, whereby the Spiking Self-Attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not always necessary, especially in sparse spike-form calculation manners. In this paper, we innovative… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 18 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2308.02557

  50. arXiv:2403.18193  [pdf, other

    cs.CV

    Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking

    Authors: Qiming Wang, Yongqiang Bai, Hongxing Song

    Abstract: RGB-T tracking, a vital downstream task of object tracking, has made remarkable progress in recent years. Yet, it remains hindered by two major challenges: 1) the trade-off between performance and efficiency; 2) the scarcity of training data. To address the latter challenge, some recent methods employ prompts to fine-tune pre-trained RGB tracking models and leverage upstream knowledge in a paramet… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.