Skip to main content

Showing 1–50 of 3,609 results for author: Liu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05665  [pdf, other

    cs.LG q-bio.QM

    SubGDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning

    Authors: Jiying Zhang, Zijing Liu, Yu Wang, Yu Li

    Abstract: Molecular representation learning has shown great success in advancing AI-based drug discovery. The core of many recent works is based on the fact that the 3D geometric structure of molecules provides essential information about their physical and chemical characteristics. Recently, denoising diffusion probabilistic models have achieved impressive performance in 3D molecular representation learnin… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 31 pages

  2. arXiv:2405.05474  [pdf

    cs.HC

    (Dis)placed Contributions: Uncovering Hidden Hurdles to Collaborative Writing Involving Non-Native Speakers, Native Speakers, and AI-Powered Editing Tools

    Authors: Yimin Xiao, Yuewen Chen, Naomi Yamashita, Yuexi Chen, Zhicheng Liu, Ge Gao

    Abstract: Content creation today often takes place via collaborative writing. A longstanding interest of CSCW research lies in understanding and promoting the coordination between co-writers. However, little attention has been paid to individuals who write in their non-native language and to co-writer groups involving them. We present a mixed-method study that fills the above gap. Our participants included… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.05258  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

    Authors: Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

    Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the effi… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 6 figures, 8 tables; Code at https://github.com/ldkong1205/LaserMix

  4. arXiv:2405.05027  [pdf, other

    cs.CV cs.AI

    StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer

    Authors: Zijia Wang, Zhi-Song Liu

    Abstract: We present StyleMamba, an efficient image style transfer framework that translates text prompts into corresponding visual styles while preserving the content integrity of the original images. Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources. To speed up the process, we propose a conditional State Space Model for Efficient Text-driven… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Blind submission to ECAI 2024

  5. arXiv:2405.04840  [pdf, other

    cs.IR

    Federated Adaptation for Foundation Model-based Recommendations

    Authors: Chunxu Zhang, Guodong Long, Hongkuan Guo, Xiao Fang, Yang Song, Zhaojie Liu, Guorui Zhou, Zijian Zhang, Yang Liu, Bo Yang

    Abstract: With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted as a regular paper of IJCAI'24

  6. Teacher-Student Network for Real-World Face Super-Resolution with Progressive Embedding of Edge Information

    Authors: Zhilei Liu, Chenggong Zhang

    Abstract: Traditional face super-resolution (FSR) methods trained on synthetic datasets usually have poor generalization ability for real-world face images. Recent work has utilized complex degradation models or training networks to simulate the real degradation process, but this limits the performance of these methods due to the domain differences that still exist between the generated low-resolution image… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICIP 2023

  7. arXiv:2405.04484  [pdf, other

    cs.LG physics.comp-ph

    OptPDE: Discovering Novel Integrable Systems via AI-Human Collaboration

    Authors: Subhash Kantamneni, Ziming Liu, Max Tegmark

    Abstract: Integrable partial differential equation (PDE) systems are of great interest in natural science, but are exceedingly rare and difficult to discover. To solve this, we introduce OptPDE, a first-of-its-kind machine learning approach that Optimizes PDEs' coefficients to maximize their number of conserved quantities, $n_{\rm CQ}$, and thus discover new integrable systems. We discover four families of… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2405.04295  [pdf, other

    eess.IV cs.CV

    Semi-Supervised Disease Classification based on Limited Medical Image Data

    Authors: Yan Zhang, Chun Li, Zhaoxia Liu, Ming Li

    Abstract: In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medi… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2405.04219  [pdf, other

    cs.CL cs.AI cs.MA cs.SE

    Iterative Experience Refinement of Software-Developing Agents

    Authors: Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun

    Abstract: Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks it… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Work in progress

  10. arXiv:2405.03988  [pdf, other

    cs.IR cs.AI

    Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

    Authors: Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai

    Abstract: Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures

  11. arXiv:2405.03974  [pdf, other

    cs.CR cs.AI cs.LG

    TBNet: A Neural Architectural Defense Framework Facilitating DNN Model Protection in Trusted Execution Environments

    Authors: Ziyu Liu, Tong Zhou, Yukui Luo, Xiaolin Xu

    Abstract: Trusted Execution Environments (TEEs) have become a promising solution to secure DNN models on edge devices. However, the existing solutions either provide inadequate protection or introduce large performance overhead. Taking both security and performance into consideration, this paper presents TBNet, a TEE-based defense framework that protects DNN model from a neural architectural perspective. Sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: DAC2024

  12. arXiv:2405.03727  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    Large Language Models Synergize with Automated Machine Learning

    Authors: Jinglue Xu, Zhen Liu, Nagar Anthel Venkatesh Suryanarayanan, Hitoshi Iba

    Abstract: Recently, code generation driven by large language models (LLMs) has become increasingly popular. However, automatically generating code for machine learning (ML) tasks still poses significant challenges. This paper explores the limits of program synthesis for ML by combining LLMs and automated machine learning (autoML). Specifically, our goal is to fully automate the code generation process for t… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  13. arXiv:2405.03409  [pdf, other

    cs.LG

    LightTR: A Lightweight Framework for Federated Trajectory Recovery

    Authors: Ziqiao Liu, Hao Miao, Yan Zhao, Chenxi Liu, Kai Zheng, Huan Li

    Abstract: With the proliferation of GPS-equipped edge devices, huge trajectory data is generated and accumulated in various domains, motivating a variety of urban applications. Due to the limited acquisition capabilities of edge devices, a lot of trajectories are recorded at a low sampling rate, which may lead to the effectiveness drop of urban applications. We aim to recover a high-sampled trajectory based… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: The paper was accepted by ICDE 2024

  14. arXiv:2405.03272  [pdf, other

    cs.CV

    WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

    Authors: Yuanhan Zhang, Kaichen Zhang, Bo Li, Fanyi Pu, Christopher Arif Setiadharma, Jingkang Yang, Ziwei Liu

    Abstract: Multimodal information, together with our knowledge, help us to understand the complex and dynamic world. Large language models (LLM) and large multimodal models (LMM), however, still struggle to emulate this capability. In this paper, we present WorldQA, a video understanding dataset designed to push the boundaries of multimodal world models with three appealing properties: (1) Multimodal Inputs:… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  15. arXiv:2405.03138  [pdf, other

    cs.CL

    CRAFT: Extracting and Tuning Cultural Instructions from the Wild

    Authors: Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, Nancy F. Chen

    Abstract: Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models' cultural reasoning capabilities, especially concerning underrepresented regions. This paper introd… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 6 pages

  16. arXiv:2405.03003  [pdf, other

    cs.LG cs.AI cs.CL

    Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

    Authors: Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, Jia Li

    Abstract: Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $ΔW=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to fur… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  17. arXiv:2405.02008  [pdf, other

    cs.CV

    DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model

    Authors: Peijin Jia, Tuopu Wen, Ziang Luo, Mengmeng Yang, Kun Jiang, Zhiquan Lei, Xuewei Tang, Ziyuan Liu, Le Cui, Kehua Sheng, Bo Zhang, Diange Yang

    Abstract: Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  18. arXiv:2405.01538  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Space Alignments Towards Universal LiDAR Segmentation

    Authors: Youquan Liu, Lingdong Kong, Xiaoyang Wu, Runnan Chen, Xin Li, Liang Pan, Ziwei Liu, Yuexin Ma

    Abstract: A unified and versatile LiDAR segmentation model with strong robustness and generalizability is desirable for safe autonomous driving perception. This work presents M3Net, a one-of-a-kind framework for fulfilling multi-task, multi-dataset, multi-modality LiDAR segmentation in a universal manner using just a single set of parameters. To better exploit data volume and diversity, we first combine lar… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: CVPR 2024; 33 pages, 14 figures, 14 tables; Code at https://github.com/youquanl/M3Net

  19. arXiv:2405.01373  [pdf, other

    cs.CV

    ATOM: Attention Mixer for Efficient Dataset Distillation

    Authors: Samir Khaki, Ahmad Sajedi, Kai Wang, Lucy Z. Liu, Yuri A. Lawryshyn, Konstantinos N. Plataniotis

    Abstract: Recent works in dataset distillation seek to minimize training expenses by generating a condensed synthetic dataset that encapsulates the information present in a larger real dataset. These approaches ultimately aim to attain test accuracy levels akin to those achieved by models trained on the entirety of the original dataset. Previous studies in feature and distribution matching have achieved sig… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted for an oral presentation in CVPR-DD 2024

  20. arXiv:2405.01186  [pdf, other

    cs.LG cs.AI

    Potential Energy based Mixture Model for Noisy Label Learning

    Authors: Zijia Wang, Wenbin Yang, Zhisong Liu, Zhen Jia

    Abstract: Training deep neural networks (DNNs) from noisy labels is an important and challenging task. However, most existing approaches focus on the corrupted labels and ignore the importance of inherent data structure. To bridge the gap between noisy labels and data, inspired by the concept of potential energy in physics, we propose a novel Potential Energy based Mixture Model (PEMM) for noise-labels lear… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  21. arXiv:2405.01175  [pdf, other

    cs.CV cs.AI

    Uncertainty-aware self-training with expectation maximization basis transformation

    Authors: Zijia Wang, Wenbin Yang, Zhisong Liu, Zhen Jia

    Abstract: Self-training is a powerful approach to deep learning. The key process is to find a pseudo-label for modeling. However, previous self-training algorithms suffer from the over-confidence issue brought by the hard labels, even some confidence-related regularizers cannot comprehensively catch the uncertainty. Therefore, we propose a new self-training framework to combine uncertainty information of bo… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  22. arXiv:2405.00700  [pdf

    cs.NE cond-mat.str-el

    Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

    Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

    Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More

    Submitted 16 April, 2024; originally announced May 2024.

    Comments: 18 pages,4 figures

  23. arXiv:2405.00557  [pdf, other

    cs.CL cs.AI

    Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

    Authors: Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

    Abstract: As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge, posing potential risks during deployment. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  24. arXiv:2405.00515  [pdf, other

    cs.RO cs.CV

    GAD-Generative Learning for HD Map-Free Autonomous Driving

    Authors: Weijian Sun, Yanbo Jia, Qi Zeng, Zihao Liu, Jiang Liao, Yue Li, Xianfeng Li, Bolin Zhao

    Abstract: Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic progra… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  25. arXiv:2405.00361  [pdf, other

    cs.CL

    AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts

    Authors: Zefang Liu, Jiahua Luo

    Abstract: We introduce AdaMoLE, a novel method for fine-tuning large language models (LLMs) through an Adaptive Mixture of Low-Rank Adaptation (LoRA) Experts. Moving beyond conventional methods that employ a static top-k strategy for activating experts, AdaMoLE dynamically adjusts the activation threshold using a dedicated threshold network, adaptively responding to the varying complexities of different tas… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  26. arXiv:2404.19756  [pdf, other

    cs.LG cond-mat.dis-nn cs.AI stat.ML

    KAN: Kolmogorov-Arnold Networks

    Authors: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, Max Tegmark

    Abstract: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametriz… ▽ More

    Submitted 2 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: 48 pages, 20 figures. Codes are available at https://github.com/KindXiaoming/pykan

  27. arXiv:2404.19553  [pdf, other

    cs.CL

    Extending Llama-3's Context Ten-Fold Overnight

    Authors: Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou

    Abstract: We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  28. arXiv:2404.19536  [pdf, other

    cs.LG

    Physics-Informed Machine Learning On Polar Ice: A Survey

    Authors: Zesheng Liu, YoungHyun Koo, Maryam Rahnemoonfar

    Abstract: The mass loss of the polar ice sheets contributes considerably to ongoing sea-level rise and changing ocean circulation, leading to coastal flooding and risking the homes and livelihoods of tens of millions of people globally. To address the complex problem of ice behavior, physical models and data-driven models have been proposed in the literature. Although traditional physical models can guarant… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  29. arXiv:2404.19383  [pdf, other

    cs.CV

    Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition

    Authors: Zhendong Liu, Haifeng Xia, Tong Guo, Libo Sun, Ming Shao, Siyu Xia

    Abstract: Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purpos… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  30. Interest Clock: Time Perception in Real-Time Streaming Recommendation System

    Authors: Yongchun Zhu, Jingwu Chen, Ling Chen, Yitan Li, Feng Zhang, Zuotao Liu

    Abstract: User preferences follow a dynamic pattern over a day, e.g., at 8 am, a user might prefer to read news, while at 8 pm, they might prefer to watch movies. Time modeling aims to enable recommendation systems to perceive time changes to capture users' dynamic preferences over time, which is an important and challenging problem in recommendation systems. Especially, streaming recommendation systems in… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024

  31. arXiv:2404.19326  [pdf, other

    cs.CV

    LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation

    Authors: Lingyi Hong, Zhongying Liu, Wenchao Chen, Chenzhi Tan, Yuang Feng, Xinyu Zhou, Pinxue Guo, Jinglun Li, Zhaoyu Chen, Shuyong Gao, Wei Zhang, Wenqiang Zhang

    Abstract: Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets rest… ▽ More

    Submitted 30 April, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: LVOS V2

  32. arXiv:2404.18911  [pdf, other

    cs.CL cs.LG

    Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

    Authors: Fangcheng Liu, Yehui Tang, Zhenhua Liu, Yunsheng Ni, Kai Han, Yunhe Wang

    Abstract: Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models while maintaining a consistent sampling distribution. However, the conventional approach of training a separate draft model to achieve a satisfactory token acceptance rate can be costly. Drawing inspiration from early exiting, we propose a novel self-speculative decoding framework \emph{K… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  33. M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework

    Authors: Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai

    Abstract: Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive multi-… ▽ More

    Submitted 7 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  34. arXiv:2404.18419  [pdf

    cs.CV cs.AI

    Research on Intelligent Aided Diagnosis System of Medical Image Based on Computer Deep Learning

    Authors: Jiajie Yuan, Linxiao Wu, Yulu Gong, Zhou Yu, Ziang Liu, Shuyao He

    Abstract: This paper combines Struts and Hibernate two architectures together, using DAO (Data Access Object) to store and access data. Then a set of dual-mode humidity medical image library suitable for deep network is established, and a dual-mode medical image assisted diagnosis method based on the image is proposed. Through the test of various feature extraction methods, the optimal operating characteris… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  35. arXiv:2404.18243  [pdf, other

    cs.CL

    LEGENT: Open Platform for Embodied Agents

    Authors: Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun

    Abstract: Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments. Existing integrations often feature limited open sourcing, challenging collective progress in this field. We introduce LEGENT, an open, scalable platfo… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Demo Paper

  36. arXiv:2404.17991  [pdf, other

    cs.CL

    Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension

    Authors: Lin Ai, Zheng Hui, Zizhou Liu, Julia Hirschberg

    Abstract: Machine Reading Comprehension (MRC) poses a significant challenge in the field of Natural Language Processing (NLP). While mainstream MRC methods predominantly leverage extractive strategies using encoder-only models such as BERT, generative approaches face the issue of out-of-control generation -- a critical problem where answers generated are often incorrect, irrelevant, or unfaithful to the sou… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  37. arXiv:2404.17973  [pdf, other

    cs.IT eess.SP

    Over-the-Air Fusion of Sparse Spatial Features for Integrated Sensing and Edge AI over Broadband Channels

    Authors: Zhiyan Liu, Qiao Lan, Kaibin Huang

    Abstract: The 6G mobile networks are differentiated from 5G by two new usage scenarios - distributed sensing and edge AI. Their natural integration, termed integrated sensing and edge AI (ISEA), promised to create a platform for enabling environment perception to make intelligent decisions and take real-time actions. A basic operation in ISEA is for a fusion center to acquire and fuse features of spatial se… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE for possible publication

  38. arXiv:2404.17964  [pdf, other

    cs.SE

    Automating Zero-Shot Patch Porting for Hard Forks

    Authors: Shengyi Pan, You Wang, Zhongxin Liu, Xing Hu, Xin Xia, Shanping Li

    Abstract: Forking is a typical way of code reuse, which provides a simple way for developers to create a variant software (denoted as hard fork) by copying and modifying an existing codebase. Despite of the benefits, forking also leads to duplicate efforts in software maintenance. Developers need to port patches across the hard forks to address similar bugs or implement similar features. Due to the divergen… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted by ISSTA 2024

  39. Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving

    Authors: C. Gong, C. Lu, Z. Li, Z. Liu, J. Gong, X. Chen

    Abstract: Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the d… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: IEEE Transactions on Vehicular Technology 2024 Pages 1-14

  40. arXiv:2404.17122  [pdf, other

    cs.CL cs.AI

    2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion

    Authors: Dongsheng Wang, Xiaoqin Feng, Zeming Liu, Chuan Wang

    Abstract: Named entity recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying entities in sentences into pre-defined types. It plays a crucial role in various research fields, including entity linking, question answering, and online product recommendation. Recent studies have shown that incorporating multilingual and multimodal datasets can enhance t… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 20 pages

  41. arXiv:2404.16829  [pdf, other

    cs.CV cs.AI cs.CL

    Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

    Authors: Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin

    Abstract: Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions. However, existing 3D assets and generative models often lack authentic material properties. Manual assignment of materials using graphic software is a tedious and time-consuming task. In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Project Page: https://sunzey.github.io/Make-it-Real/

  42. arXiv:2404.16587  [pdf, other

    cs.CL cs.AI

    Understanding Privacy Risks of Embeddings Induced by Large Language Models

    Authors: Zhihao Zhu, Ninglu Shao, Defu Lian, Chenwang Wu, Zheng Liu, Yi Yang, Enhong Chen

    Abstract: Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation. However, such a solution risks compromising privacy, as recent studies experimentally showed that the original text can be partially rec… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  43. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  44. arXiv:2404.16038  [pdf, other

    cs.CV cs.AI cs.MM

    A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

    Authors: Pengyuan Zhou, Lin Wang, Zhi Liu, Yanbin Hao, Pan Hui, Sasu Tarkoma, Jussi Kangasharju

    Abstract: This paper offers an insightful examination of how currently top-trending AI technologies, i.e., generative artificial intelligence (Generative AI) and large language models (LLMs), are reshaping the field of video technology, including video generation, understanding, and streaming. It highlights the innovative use of these technologies in producing highly realistic videos, a significant leap in… ▽ More

    Submitted 30 January, 2024; originally announced April 2024.

    Comments: 16 pages, 10 figures, 4 tables

  45. MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes

    Authors: Xiaolong Ma, Feng Yan, Lei Yang, Ian Foster, Michael E. Papka, Zhengchun Liu, Rajkumar Kettimuthu

    Abstract: First-come first-serve scheduling can result in substantial (up to 10%) of transiently idle nodes on supercomputers. Recognizing that such unfilled nodes are well-suited for deep neural network (DNN) training, due to the flexible nature of DNN training tasks, Liu et al. proposed that the re-scaling DNN training tasks to fit gaps in schedules be formulated as a mixed-integer linear programming (MIL… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  46. arXiv:2404.15589  [pdf, other

    stat.AP cs.CY physics.data-an

    The impact of complexity in the built environment on vehicular routing behavior: Insights from an empirical study of taxi mobility in Beijing, China

    Authors: Chaogui Kang, Zheren Liu

    Abstract: The modeling of disaggregated vehicular mobility and its associations with the ambient urban built environment is essential for developing operative transport intervention and urban optimization plans. However, established vehicular route choice models failed to fully consider the bounded behavioral rationality and the complex characteristics of the urban built environment affecting drivers' route… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 45 pages, 11 figures, 5 tables

  47. arXiv:2404.14897  [pdf, other

    cs.CL cs.AI

    Beyond the Speculative Game: A Survey of Speculative Execution in Large Language Models

    Authors: Chen Zhang, Zhuorui Liu, Dawei Song

    Abstract: With the increasingly giant scales of (causal) large language models (LLMs), the inference efficiency comes as one of the core concerns along the improved performance. In contrast to the memory footprint, the latency bottleneck seems to be of greater importance as there can be billions of requests to a LLM (e.g., GPT-4) per day. The bottleneck is mainly due to the autoregressive innateness of LLMs… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, 1 table, rejected from IJCAI 2024, revision in progress

  48. arXiv:2404.14693  [pdf, other

    cs.CR cs.CV eess.IV

    Double Privacy Guard: Robust Traceable Adversarial Watermarking against Face Recognition

    Authors: Yunming Zhang, Dengpan Ye, Sipeng Shen, Caiyun Xie, Ziyi Liu, Jiacheng Deng, Long Tang

    Abstract: The wide deployment of Face Recognition (FR) systems poses risks of privacy leakage. One countermeasure to address this issue is adversarial attacks, which deceive malicious FR searches but simultaneously interfere the normal identity verification of trusted authorizers. In this paper, we propose the first Double Privacy Guard (DPG) scheme based on traceable adversarial watermarking. DPG employs a… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.14581  [pdf, other

    cs.CV cs.AI cs.CR

    The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking

    Authors: Yuying Li, Zeyan Liu, Junyi Zhao, Liangqin Ren, Fengjun Li, Jiebo Luo, Bo Luo

    Abstract: Generative AI models can produce high-quality images based on text prompts. The generated images often appear indistinguishable from images generated by conventional optical photography devices or created by human artists (i.e., real images). While the outstanding performance of such generative models is generally well received, security concerns arise. For instance, such image generators could be… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  50. arXiv:2404.14433  [pdf, other

    cs.LG cs.CE

    KATO: Knowledge Alignment and Transfer for Transistor Sizing of Different Design and Technology

    Authors: Wei W. Xing, Weijian Fan, Zhuohua Liu, Yuan Yao, Yuanqi Hu

    Abstract: Automatic transistor sizing in circuit design continues to be a formidable challenge. Despite that Bayesian optimization (BO) has achieved significant success, it is circuit-specific, limiting the accumulation and transfer of design knowledge for broader applications. This paper proposes (1) efficient automatic kernel construction, (2) the first transfer learning across different circuits and tech… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 6 pages, received by DAC2024