Skip to main content

Showing 1–50 of 1,135 results for author: He, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17987  [pdf, other

    cs.CR

    BlueSWAT: A Lightweight State-Aware Security Framework for Bluetooth Low Energy

    Authors: Xijia Che, Yi He, Xuewei Feng, Kun Sun, Ke Xu, Qi Li

    Abstract: Bluetooth Low Energy (BLE) is a short-range wireless communication technology for resource-constrained IoT devices. Unfortunately, BLE is vulnerable to session-based attacks, where previous packets construct exploitable conditions for subsequent packets to compromise connections. Defending against session-based attacks is challenging because each step in the attack sequence is legitimate when insp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2405.17468  [pdf, other

    cs.LG cs.AI

    Deep Activity Model: A Generative Approach for Human Mobility Pattern Synthesis

    Authors: Xishun Liao, Brian Yueshuai He, Qinhua Jiang, Chenchen Kuai, Jiaqi Ma

    Abstract: Human mobility significantly impacts various aspects of society, including transportation, urban planning, and public health. The increasing availability of diverse mobility data and advancements in deep learning have revolutionized mobility modeling. Existing deep learning models, however, mainly study spatio-temporal patterns using trajectories and often fall short in capturing the underlying se… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  3. arXiv:2405.17404  [pdf, other

    cs.LG cs.AI stat.ML

    Spectral Greedy Coresets for Graph Neural Networks

    Authors: Mucong Ding, Yinhan He, Jundong Li, Furong Huang

    Abstract: The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2405.17042  [pdf, other

    cs.LG cs.CR

    LabObf: A Label Protection Scheme for Vertical Federated Learning Through Label Obfuscation

    Authors: Ying He, Mingyang Niu, Jingyu Hua, Yunlong Mao, Xu Huang, Chen Li, Sheng Zhong

    Abstract: Split learning, as one of the most common architectures in vertical federated learning, has gained widespread use in industry due to its privacy-preserving characteristics. In this architecture, the party holding the labels seeks cooperation from other parties to improve model performance due to insufficient feature data. Each of these participants has a self-defined bottom model to learn hidden r… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2405.16234  [pdf, other

    cs.CV

    Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities

    Authors: Shiyu Xia, Junyu Xiong, Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Mengyu Zhou, Yeye He, Shi Han, Dongmei Zhang

    Abstract: This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatial perception, and visual format recognition. Additionally, we utilize the spreadsheet table detection task to assess the overall performance of VLMs b… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  6. arXiv:2405.15158  [pdf, other

    q-bio.BM cs.LG

    ProtFAD: Introducing function-aware domains as implicit modality towards protein function perception

    Authors: Mingqing Wang, Zhiwei Nie, Yonghong He, Zhixiang Ren

    Abstract: Protein function prediction is currently achieved by encoding its sequence or structure, where the sequence-to-function transcendence and high-quality structural data scarcity lead to obvious performance bottlenecks. Protein domains are "building blocks" of proteins that are functionally independent, and their combinations determine the diverse biological functions. However, most existing studies… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 16 pages, 6 figures, 5 tables

  7. arXiv:2405.14751  [pdf, other

    cs.LG

    AGILE: A Novel Framework of LLM Agents

    Authors: Peiyuan Feng, Yichen He, Guanhua Huang, Yuan Lin, Hanchong Zhang, Yuchen Zhang, Hang Li

    Abstract: We introduce a novel framework of LLM agents named AGILE (AGent that Interacts and Learns from Environments) designed to perform complex conversational tasks with users, leveraging LLMs, memory, tools, and interactions with experts. The agent's abilities include not only conversation but also reflection, utilization of tools, and consultation with experts. We formulate the construction of such an… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.14633  [pdf, other

    cs.CV cs.CG

    Flatten Anything: Unsupervised Neural Surface Parameterization

    Authors: Qijian Zhang, Junhui Hou, Wenping Wang, Ying He

    Abstract: Surface parameterization plays an essential role in numerous computer graphics and geometry processing applications. Traditional parameterization approaches are designed for high-quality meshes laboriously created by specialized 3D modelers, thus unable to meet the processing demand for the current explosion of ordinary 3D data. Moreover, their working mechanisms are typically restricted to certai… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  9. arXiv:2405.14516  [pdf, other

    cs.CV cs.AI

    Towards Realistic Long-tailed Semi-supervised Learning in an Open World

    Authors: Yuanpeng He, Lijian Li

    Abstract: Open-world long-tailed semi-supervised learning (OLSSL) has increasingly attracted attention. However, existing OLSSL algorithms generally assume that the distributions between known and novel categories are nearly identical. Against this backdrop, we construct a more \emph{Realistic Open-world Long-tailed Semi-supervised Learning} (\textbf{ROLSSL}) setting where there is no premise on the distrib… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  10. arXiv:2405.14366  [pdf, other

    cs.CL cs.AI cs.LG

    MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

    Authors: Akide Liu, Jing Liu, Zizheng Pan, Yefei He, Gholamreza Haffari, Bohan Zhuang

    Abstract: A critical approach for efficiently deploying computationally demanding large language models (LLMs) is Key-Value (KV) caching. The KV cache stores key-value states of previously generated tokens, significantly reducing the need for repetitive computations and thereby lowering latency in autoregressive generation. However, the size of the KV cache grows linearly with sequence length, posing challe… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Tech report

  11. arXiv:2405.14256  [pdf, other

    cs.LG cs.AI

    ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification

    Authors: Yefei He, Luoming Zhang, Weijia Wu, Jing Liu, Hong Zhou, Bohan Zhuang

    Abstract: KV cache stores key and value states from previous tokens to avoid re-computation, yet it demands substantial storage space, especially for long sequences. Adaptive KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance. However, previous methods of this approach exhibit significant performance degradation… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 15 pages

  12. arXiv:2405.14195  [pdf, other

    cs.CV cs.AI

    Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

    Authors: Zhenyu Wei, Yujie He, Zhanchuan Cai

    Abstract: RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  13. arXiv:2405.13967  [pdf, other

    cs.CL

    DeTox: Toxic Subspace Projection for Model Editing

    Authors: Rheeya Uppaal, Apratim Dey, Yiting He, Yiqiao Zhong, Junjie Hu

    Abstract: Recent alignment algorithms such as direct preference optimization (DPO) have been developed to improve the safety of large language models (LLMs) by training these models to match human behaviors exemplified by preference data. However, these methods are both computationally intensive and lacking in controllability and transparency, making them prone to jailbreaking and inhibiting their widesprea… ▽ More

    Submitted 25 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Preprint

  14. arXiv:2405.13873  [pdf, other

    cs.AI cs.CL

    FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering

    Authors: Yuan Sui, Yufei He, Nian Liu, Xiaoxin He, Kun Wang, Bryan Hooi

    Abstract: While large language models (LLMs) have achieved significant success in various applications, they often struggle with hallucinations, especially in scenarios that require deep and responsible reasoning. These issues could be partially mitigate by integrating external knowledge graphs (KG) in LLM reasoning. However, the method of their incorporation is still largely unexplored. In this paper, we p… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  15. arXiv:2405.13839  [pdf, other

    cs.GR

    Diffusing Winding Gradients (DWG): A Parallel and Scalable Method for 3D Reconstruction from Unoriented Point Clouds

    Authors: Weizhou Liu, Jiaze Li, Xuhui Chen, Fei Hou, Shiqing Xin, Xingce Wang, Zhongke Wu, Chen Qian, Ying He

    Abstract: This paper presents a method for reconstructing watertight 3D surfaces from unoriented point clouds. Starting with randomly initialized normals, the method iteratively refines each normal by diffusing the gradient of the generalized winding number (GWN) field. Upon convergence, the target surface is extracted using the standard Marching Cubes algorithm. Our method is conceptually simple, easy to i… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  16. arXiv:2405.12487  [pdf, other

    cs.CV eess.IV

    3DSS-Mamba: 3D-Spectral-Spatial Mamba for Hyperspectral Image Classification

    Authors: Yan He, Bing Tu, Bo Liu, Jun Li, Antonio Plaza

    Abstract: Hyperspectral image (HSI) classification constitutes the fundamental research in remote sensing fields. Convolutional Neural Networks (CNNs) and Transformers have demonstrated impressive capability in capturing spectral-spatial contextual dependencies. However, these architectures suffer from limited receptive fields and quadratic computational complexity, respectively. Fortunately, recent Mamba a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  17. arXiv:2405.11811  [pdf, other

    cs.LG cs.DC

    FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

    Authors: Liuzhi Zhou, Yu He, Kun Zhai, Xiang Liu, Sen Liu, Xingjun Ma, Guangnan Ye, Yu-Gang Jiang, Hongfeng Chai

    Abstract: Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients while preserving data privacy. However, the quest to balance acceleration and stability becomes a significant challenge in FL, especially on the client-side. In this paper, we introduce FedCAda, an innovative federated client adaptive algorithm designed to ta… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  18. arXiv:2405.11715  [pdf, other

    cs.AI cs.LG

    Semantic Trajectory Data Mining with LLM-Informed POI Classification

    Authors: Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma

    Abstract: Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly en… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  19. arXiv:2405.11034  [pdf, other

    cs.LG

    Safety in Graph Machine Learning: Threats and Safeguards

    Authors: Song Wang, Yushun Dong, Binchi Zhang, Zihan Chen, Xingbo Fu, Yinhan He, Cong Shen, Chuxu Zhang, Nitesh V. Chawla, Jundong Li

    Abstract: Graph Machine Learning (Graph ML) has witnessed substantial advancements in recent years. With their remarkable ability to process graph-structured data, Graph ML techniques have been extensively utilized across diverse applications, including critical domains like finance, healthcare, and transportation. Despite their societal benefits, recent research highlights significant safety concerns assoc… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 20 pages

  20. arXiv:2405.06510  [pdf, other

    cs.AI

    UniDM: A Unified Framework for Data Manipulation with Large Language Models

    Authors: Yichen Qian, Yongyi He, Rong Zhu, Jintao Huang, Zhijian Ma, Haibin Wang, Yaohua Wang, Xiuyu Sun, Defu Lian, Bolin Ding, Jingren Zhou

    Abstract: Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models. Recent methods apply Large Language Models (LLMs) to resolve multiple data manipulation tasks. They exhibit bright benefits in terms of performance but still requir… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: MLSys24

  21. arXiv:2405.06419  [pdf, other

    cs.LG cs.AI cs.NE

    Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting

    Authors: Tianxiang Zhan, Yuanpeng He, Zhen Li, Yong Deng

    Abstract: In real-world scenarios, time series forecasting often demands timeliness, making research on model backbones a perennially hot topic. To meet these performance demands, we propose a novel backbone from the perspective of information fusion. Introducing the Basic Probability Assignment (BPA) Module and the Time Evidence Fusion Network (TEFN), based on evidence theory, allows us to achieve superior… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  22. arXiv:2405.06389  [pdf, other

    cs.CV cs.AI

    Continual Novel Class Discovery via Feature Enhancement and Adaptation

    Authors: Yifan Yu, Shaokun Wang, Yuhang He, Junzhe Chen, Yihong Gong

    Abstract: Continual Novel Class Discovery (CNCD) aims to continually discover novel classes without labels while maintaining the recognition capability for previously learned classes. The main challenges faced by CNCD include the feature-discrepancy problem, the inter-session confusion problem, etc. In this paper, we propose a novel Feature Enhancement and Adaptation method for the CNCD to tackle the above… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  23. arXiv:2405.04825  [pdf, other

    cs.CR cs.AI cs.LG

    Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution

    Authors: Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, Kui Ren

    Abstract: Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge me… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  24. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 24 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  25. arXiv:2405.04095  [pdf, other

    cs.CR cs.AI

    Going Proactive and Explanatory Against Malware Concept Drift

    Authors: Yiling He, Junchi Lei, Zhan Qin, Kui Ren

    Abstract: Deep learning-based malware classifiers face significant challenges due to concept drift. The rapid evolution of malware, especially with new families, can depress classification accuracy to near-random levels. Previous research has primarily focused on detecting drift samples, relying on expert-led analysis and labeling for model retraining. However, these methods often lack a comprehensive under… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  26. Swipe2Pair: Secure and Fast In-Band Wireless Device Pairing

    Authors: Yaqi He, Kai Zeng, Long Jiao, Brian L. Mark, Khaled N. Khasawneh

    Abstract: Wireless device pairing is a critical security mechanism to bootstrap the secure communication between two devices without a pre-shared secret. It has been widely used in many Internet of Things (IoT) applications, such as smart-home and smart-health. Most existing device pairing mechanisms are based on out-of-band channels, e.g., extra sensors or hardware, to validate the proximity of pairing dev… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  27. arXiv:2405.01662  [pdf, other

    cs.CV

    Out-of-distribution detection based on subspace projection of high-dimensional features output by the last convolutional layer

    Authors: Qiuyu Zhu, Yiwei He

    Abstract: Out-of-distribution (OOD) detection, crucial for reliable pattern classification, discerns whether a sample originates outside the training distribution. This paper concentrates on the high-dimensional features output by the final convolutional layer, which contain rich image features. Our key idea is to project these high-dimensional features into two specific feature subspaces, leveraging the di… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures

  28. arXiv:2405.01328  [pdf

    cs.RO

    An Advanced Framework for Ultra-Realistic Simulation and Digital Twinning for Autonomous Vehicles

    Authors: Yuankai He, Hanlin Chen, Weisong Shi

    Abstract: Simulation is a fundamental tool in developing autonomous vehicles, enabling rigorous testing without the logistical and safety challenges associated with real-world trials. As autonomous vehicle technologies evolve and public safety demands increase, advanced, realistic simulation frameworks are critical. Current testing paradigms employ a mix of general-purpose and specialized simulators, such a… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 6 Pages. 5 Figures, 1 Table

    MSC Class: G.3

  29. arXiv:2405.00705  [pdf, other

    cs.CL cs.LG

    SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning

    Authors: Yexiao He, Ziyao Wang, Zheyu Shen, Guoheng Sun, Yucong Dai, Yongkai Wu, Hongyi Wang, Ang Li

    Abstract: The pre-trained Large Language Models (LLMs) can be adapted for many downstream tasks and tailored to align with human preferences through fine-tuning. Recent studies have discovered that LLMs can achieve desirable performance with only a small amount of high-quality data, suggesting that a large amount of the data in these extensive datasets is redundant or even harmful. Identifying high-quality… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

  30. arXiv:2405.00577  [pdf

    cs.LG eess.SP q-bio.NC

    Discovering robust biomarkers of neurological disorders from functional MRI using graph neural networks: A Review

    Authors: Yi Hao Chan, Deepank Girish, Sukrit Gupta, Jing Xia, Chockalingam Kasi, Yinan He, Conghao Wang, Jagath C. Rajapakse

    Abstract: Graph neural networks (GNN) have emerged as a popular tool for modelling functional magnetic resonance imaging (fMRI) datasets. Many recent studies have reported significant improvements in disorder classification performance via more sophisticated GNN designs and highlighted salient features that could be potential biomarkers of the disorder. In this review, we provide an overview of how GNN and… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  31. arXiv:2405.00526  [pdf, other

    cs.CR

    JNI Global References Are Still Vulnerable: Attacks and Defenses

    Authors: Yi He, Yuan Zhou, Yacong Gu, Purui Su, Qi Li, Yajin Zhou, Yong Jiang

    Abstract: System services and resources in Android are accessed through IPC based mechanisms. Previous research has demonstrated that they are vulnerable to the denial-of-service attack (DoS attack). For instance, the JNI global reference (JGR), which is widely used by system services, can be exhausted to cause the system reboot (hence the name JGRE attack). Even though the Android team tries to fix the pro… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  32. arXiv:2405.00197  [pdf

    cs.AI cs.DB cs.IR

    Grounding Realizable Entities

    Authors: Michael Rabenberg, Carter Benson, Federico Donato, Yongqun He, Anthony Huffman, Shane Babcock, John Beverley

    Abstract: Ontological representations of qualities, dispositions, and roles have been refined over the past decade, clarifying subtle distinctions in life science research. After articulating a widely-used characterization of these entities within the context of Basic Formal Ontology (BFO), we identify gaps in this treatment and motivate the need for supplementing the BFO characterization. By way of supplem… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 13

  33. arXiv:2405.00186  [pdf

    cs.AI cs.DB cs.IR

    Credentials in the Occupation Ontology

    Authors: John Beverley, Robin McGill, Sam Smith, Jie Zheng, Giacomo De Colle, Finn Wilson, Matthew Diller, William D. Duncan, William R. Hogan, Yongqun He

    Abstract: The term credential encompasses educational certificates, degrees, certifications, and government-issued licenses. An occupational credential is a verification of an individuals qualification or competence issued by a third party with relevant authority. Job seekers often leverage such credentials as evidence that desired qualifications are satisfied by their holders. Many U.S. education and workf… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 11

  34. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  35. arXiv:2404.18919  [pdf, other

    cs.CV

    TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

    Authors: Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: Recent advances in diffusion models can generate high-quality and stunning images from text. However, multi-turn image generation, which is of high demand in real-world scenarios, still faces challenges in maintaining semantic consistency between images and texts, as well as contextual consistency of the same subject across multiple interactive turns. To address this issue, we introduce TheaterGen… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  36. arXiv:2404.17662  [pdf, other

    cs.CL

    PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games

    Authors: Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He

    Abstract: Recent advancements in Large Language Models (LLMs) have enhanced the efficacy of agent communication and social interactions. Despite these advancements, building LLM-based agents for reasoning in dynamic environments involving competition and collaboration remains challenging due to the limitations of informed graph-based search methods. We propose PLAYER*, a novel framework based on an anytime… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  37. arXiv:2404.17310  [pdf, other

    cs.CV

    Image Copy-Move Forgery Detection via Deep PatchMatch and Pairwise Ranking Learning

    Authors: Yuanman Li, Yingjie He, Changsheng Chen, Li Dong, Bin Li, Jiantao Zhou, Xia Li

    Abstract: Recent advances in deep learning algorithms have shown impressive progress in image copy-move forgery detection (CMFD). However, these algorithms lack generalizability in practical scenarios where the copied regions are not present in the training images, or the cloned regions are part of the background. Additionally, these algorithms utilize convolution operations to distinguish source and target… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 16 pages, 14figures

  38. arXiv:2404.16067  [pdf, other

    cs.HC cs.AI

    Layout2Rendering: AI-aided Greenspace design

    Authors: Ran Chen, Zeke Lian, Yueheng He, Xiao Ling, Fuyu Yang, Xueqi Yao, Xingjian Yi, Jing Zhao

    Abstract: In traditional human living environment landscape design, the establishment of three-dimensional models is an essential step for designers to intuitively present the spatial relationships of design elements, as well as a foundation for conducting landscape analysis on the site. Rapidly and effectively generating beautiful and realistic landscape spaces is a significant challenge faced by designers… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 14 pages,8 figures

  39. arXiv:2404.15704  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

    Authors: Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao

    Abstract: Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial comp… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  40. arXiv:2404.13892  [pdf, other

    cs.SD cs.AI eess.AS

    Retrieval-Augmented Audio Deepfake Detection

    Authors: Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, Jing Xiao, Jianzong Wang

    Abstract: With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) detection methods rely solely on the fuzzy knowledge learned by a single model, resulting in performance bottlenecks and transparency issues. Inspired… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Conference on Multimedia Retrieval (ICMR 2024)

  41. arXiv:2404.13786  [pdf, other

    eess.SY cs.AI cs.DC cs.LG

    Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving

    Authors: Shuyao Shi, Neiwen Ling, Zhehao Jiang, Xuan Huang, Yuze He, Xiaoguang Zhao, Bufang Yang, Chen Bian, Jingfei Xia, Zhenyu Yan, Raymond Yeung, Guoliang Xing

    Abstract: Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components ca… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  42. arXiv:2404.13648  [pdf, other

    cs.CV cs.LG

    Data-independent Module-aware Pruning for Hierarchical Vision Transformers

    Authors: Yang He, Joey Tianyi Zhou

    Abstract: Hierarchical vision transformers (ViTs) have two advantages over conventional ViTs. First, hierarchical ViTs achieve linear computational complexity with respect to image size by local self-attention. Second, hierarchical ViTs create hierarchical feature maps by merging image patches in deeper layers for dense prediction. However, existing pruning methods ignore the unique properties of hierarchic… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by ICLR 2024

  43. arXiv:2404.13619  [pdf, other

    cs.MM

    Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering

    Authors: Ben Fei, Yixuan Li, Weidong Yang, Lipeng Ma, Ying He

    Abstract: State-of-the-art 3D models, which excel in recognition tasks, typically depend on large-scale datasets and well-defined category sets. Recent advances in multi-modal pre-training have demonstrated potential in learning 3D representations by aligning features from 3D shapes with their 2D RGB or depth counterparts. However, these existing frameworks often rely solely on either RGB or depth images, l… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  44. arXiv:2404.13576  [pdf, other

    cs.CV cs.LG

    I2CANSAY:Inter-Class Analogical Augmentation and Intra-Class Significance Analysis for Non-Exemplar Online Task-Free Continual Learning

    Authors: Songlin Dong, Yingjie Chen, Yuhang He, Yuhan Jin, Alex C. Kot, Yihong Gong

    Abstract: Online task-free continual learning (OTFCL) is a more challenging variant of continual learning which emphasizes the gradual shift of task boundaries and learns in an online mode. Existing methods rely on a memory buffer composed of old samples to prevent forgetting. However,the use of memory buffers not only raises privacy concerns but also hinders the efficient learning of new samples. To addres… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  45. arXiv:2404.12659  [pdf, ps, other

    cs.CL

    SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

    Authors: Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu

    Abstract: In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  46. arXiv:2404.12608  [pdf, other

    cs.DB cs.CL cs.PL

    Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

    Authors: Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri

    Abstract: Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface. Today, spreadsheets are used by billions of users to manipulate tables, most of whom are neither database experts nor professional programmers. Despite the success of spreadsheets, authoring complex formulas remains challe… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: full version of a paper to appear in SIGMOD 2024

  47. arXiv:2404.12273  [pdf, other

    cs.AI cs.CL cs.LG

    FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom

    Authors: Yuanqin He, Yan Kang, Lixin Fan, Qiang Yang

    Abstract: Federated Learning (FL) has emerged as a promising solution for collaborative training of large language models (LLMs). However, the integration of LLMs into FL introduces new challenges, particularly concerning the evaluation of LLMs. Traditional evaluation methods that rely on labeled test sets and similarity-based metrics cover only a subset of the acceptable answers, thereby failing to accurat… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: In Progress

  48. arXiv:2404.12041  [pdf, other

    cs.CL cs.AI

    Can We Catch the Elephant? The Evolvement of Hallucination Evaluation on Natural Language Generation: A Survey

    Authors: Siya Qi, Yulan He, Zheng Yuan

    Abstract: Hallucination in Natural Language Generation (NLG) is like the elephant in the room, obvious but often overlooked until recent achievements significantly improved the fluency and grammatical accuracy of generated text. For Large Language Models (LLMs), hallucinations can happen in various downstream tasks and casual conversations, which need accurate assessment to enhance reliability and safety. H… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 19 pages in total, with 9 pages as main body. Under review as a conference paper at CoLM 2024

  49. arXiv:2404.10498  [pdf, other

    cs.AI cs.CV cs.DC

    LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System

    Authors: Shijing Hu, Ruijun Deng, Xin Du, Zhihui Lu, Qiang Duan, Yi He, Shih-Chia Huang, Jie Wu

    Abstract: Recent large vision models (e.g., SAM) enjoy great potential to facilitate intelligent perception with high accuracy. Yet, the resource constraints in the IoT environment tend to limit such large vision models to be locally deployed, incurring considerable inference latency thereby making it difficult to support real-time applications, such as autonomous driving and robotics. Edge-cloud collaborat… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  50. arXiv:2404.10318  [pdf, other

    cs.CV

    SRGS: Super-Resolution 3D Gaussian Splatting

    Authors: Xiang Feng, Yongbo He, Yubo Wang, Yan Yang, Zhenzhong Kuang, Yu Jun, Jianping Fan, Jiajun ding

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has gained popularity as a novel explicit 3D representation. This approach relies on the representation power of Gaussian primitives to provide a high-quality rendering. However, primitives optimized at low resolution inevitably exhibit sparsity and texture deficiency, posing a challenge for achieving high-resolution novel view synthesis (HRNVS). To address t… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: submit ACM MM 2024