Skip to main content

Showing 1–50 of 3,205 results for author: Zhang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05010  [pdf, other

    cs.CV

    ${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields

    Authors: Ning Wang, Lefei Zhang, Angel X Chang

    Abstract: Neural fields (NeRF) have emerged as a promising approach for representing continuous 3D scenes. Nevertheless, the lack of semantic encoding in NeRFs poses a significant challenge for scene decomposition. To address this challenge, we present a single model, Multi-Modal Decomposition NeRF (${M^2D}$NeRF), that is capable of both text-based and visual patch-based edits. Specifically, we use multi-mo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  2. arXiv:2405.04404  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Vision Mamba: A Comprehensive Survey and Taxonomy

    Authors: Xiao Liu, Chenxu Zhang, Lei Zhang

    Abstract: State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems. This model has witnessed numerous applications in several fields, including control theory, signal processing, economics and machine learning. In the field of deep learning, state space models are used to process sequence data, such as time series analysis, natural language processing (NLP… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: https://github.com/lx6c78/Vision-Mamba-A-Comprehensive-Survey-and-Taxonomy

  3. arXiv:2405.04180  [pdf, other

    cs.LG cs.CV

    Sora Detector: A Unified Hallucination Detection for Large Text-to-Video Models

    Authors: Zhixuan Chu, Lei Zhang, Yichen Sun, Siqiao Xue, Zhibo Wang, Zhan Qin, Kui Ren

    Abstract: The rapid advancement in text-to-video (T2V) generative models has enabled the synthesis of high-fidelity video content guided by textual descriptions. Despite this significant progress, these models are often susceptible to hallucination, generating contents that contradict the input text, which poses a challenge to their reliability and practical deployment. To address this critical issue, we in… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.08302, arXiv:2403.05131 by other authors

  4. arXiv:2405.04041  [pdf, other

    cs.AI cs.CV

    Feature Map Convergence Evaluation for Functional Module

    Authors: Ludan Zhang, Chaoyi Chen, Lei He, Keqiang Li

    Abstract: Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding. However, perception models are predominantly optimized as a black box through end-to-end training, lacking independent evaluation of functional modules, which poses difficulties for interpretability and optimization. Pioneer… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  5. arXiv:2405.04026  [pdf, other

    stat.ML cs.LG

    Federated Control in Markov Decision Processes

    Authors: Hao Jin, Yang Peng, Liangyu Zhang, Zhihua Zhang

    Abstract: We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  6. arXiv:2405.03978  [pdf, other

    cs.CV

    VMambaCC: A Visual State Space Model for Crowd Counting

    Authors: Hao-Yuan Ma, Li Zhang, Shuai Shi

    Abstract: As a deep learning model, Visual Mamba (VMamba) has a low computational complexity and a global receptive field, which has been successful applied to image classification and detection. To extend its applications, we apply VMamba to crowd counting and propose a novel VMambaCC (VMamba Crowd Counting) model. Naturally, VMambaCC inherits the merits of VMamba, or global modeling for images and low com… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  7. arXiv:2405.03884  [pdf, other

    cs.CV

    BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection

    Authors: Saket S. Chaturvedi, Lan Zhang, Wenbin Zhang, Pan He, Xiaoyong Yuan

    Abstract: 3D object detection plays an important role in autonomous driving; however, its vulnerability to backdoor attacks has become evident. By injecting ''triggers'' to poison the training dataset, backdoor attacks manipulate the detector's prediction for inputs containing these triggers. Existing backdoor attacks against 3D object detection primarily poison 3D LiDAR signals, where large-sized 3D trigge… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted at IJCAI 2024 Conference

  8. arXiv:2405.03299  [pdf, other

    cs.CR cs.DC

    DarkFed: A Data-Free Backdoor Attack in Federated Learning

    Authors: Minghui Li, Wei Wan, Yuxuan Ning, Shengshan Hu, Lulu Xue, Leo Yu Zhang, Yichen Wang

    Abstract: Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by IJCAI 2024

  9. arXiv:2405.03236  [pdf, other

    cs.LG stat.ML

    Federated Reinforcement Learning with Constraint Heterogeneity

    Authors: Hao Jin, Liangyu Zhang, Zhihua Zhang

    Abstract: We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity. In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals. Such learning… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  10. arXiv:2405.03095  [pdf, other

    cs.LG math-ph

    Loss Jump During Loss Switch in Solving PDEs with Neural Networks

    Authors: Zhiwei Wang, Lulu Zhang, Zhongwang Zhang, Zhi-Qin John Xu

    Abstract: Using neural networks to solve partial differential equations (PDEs) is gaining popularity as an alternative approach in the scientific computing community. Neural networks can integrate different types of information into the loss function. These include observation data, governing equations, and variational forms, etc. These loss functions can be broadly categorized into two types: observation d… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  11. arXiv:2405.02844  [pdf, other

    cs.CV

    SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

    Authors: Ziyun Qian, Zeyu Xiao, Zhenyi Wu, Dingkang Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Dongliang Kou, Lihua Zhang

    Abstract: Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, wh… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  12. arXiv:2405.02807  [pdf

    cs.LG cs.AI cs.CV

    Kinematic analysis of structural mechanics based on convolutional neural network

    Authors: Leye Zhang, Xiangxiang Tian, Hongjun Zhang

    Abstract: Attempt to use convolutional neural network to achieve kinematic analysis of plane bar structure. Through 3dsMax animation software and OpenCV module, self-build image dataset of geometrically stable system and geometrically unstable system. we construct and train convolutional neural network model based on the TensorFlow and Keras deep learning platform framework. The model achieves 100% accuracy… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 9 pages, 13 figures

  13. arXiv:2405.02225  [pdf, other

    stat.ML cs.AI cs.CY cs.LG stat.ME

    Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks

    Authors: Lujing Zhang, Aaron Roth, Linjun Zhang

    Abstract: This paper introduces a framework for post-processing machine learning models so that their predictions satisfy multi-group fairness guarantees. Based on the celebrated notion of multicalibration, we introduce $(\mathbf{s},\mathcal{G}, α)-$GMC (Generalized Multi-Dimensional Multicalibration) for multi-dimensional mappings $\mathbf{s}$, constraint set $\mathcal{G}$, and a pre-specified threshold le… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 28 pages, 8 figures, accepted by ICML2024

  14. How to Gain Commit Rights in Modern Top Open Source Communities?

    Authors: Xin Tan, Yan Gong, Geyu Huang, Haohua Wu, Li Zhang

    Abstract: The success of open source software (OSS) projects relies on voluntary contributions from various community roles.Being a committer signifies gaining trust and higher privileges. Substantial studies have focused on the requirements of becoming a committer, but most of them are based on interviews or several hypotheses, lacking a comprehensive understanding of committers' qualifications.We explore… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 23 pages,5 figures,FSE 2024

    Journal ref: Proceedings of the ACM on Software Engineering (PACMSE) Issue FSE 2024

  15. arXiv:2405.01002  [pdf, other

    cs.CV cs.LG

    Spider: A Unified Framework for Context-dependent Concept Understanding

    Authors: Xiaoqi Zhao, Youwei Pang, Wei Ji, Baicheng Sheng, Jiaming Zuo, Lihe Zhang, Huchuan Lu

    Abstract: Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovatio… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  16. arXiv:2405.00698  [pdf, other

    cs.NE cs.RO

    CUDA-Accelerated Soft Robot Neural Evolution with Large Language Model Supervision

    Authors: Lechen Zhang

    Abstract: This paper addresses the challenge of co-designing morphology and control in soft robots via a novel neural network evolution approach. We propose an innovative method to implicitly dual-encode soft robots, thus facilitating the simultaneous design of morphology and control. Additionally, we introduce the large language model to serve as the control center during the evolutionary process. This adv… ▽ More

    Submitted 12 April, 2024; originally announced May 2024.

    Comments: 3 pages, 5 figures

  17. arXiv:2405.00522  [pdf, other

    econ.GN cs.CE cs.CL cs.CR q-fin.CP

    DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries Cryptocurrency Trend Forecasting

    Authors: Yihang Fu, Mingyu Zhou, Luyao Zhang

    Abstract: In the distributed systems landscape, Blockchain has catalyzed the rise of cryptocurrencies, merging enhanced security and decentralization with significant investment opportunities. Despite their potential, current research on cryptocurrency trend forecasting often falls short by simplistically merging sentiment data without fully considering the nuanced interplay between financial market dynamic… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  18. arXiv:2405.00227  [pdf, other

    cs.NI

    Optimized Non-Primary Channel Access Design in IEEE 802.11bn

    Authors: Dongyu Wei, Liu Cao, Lyutianyang Zhang, Xiangyu Gao, Hao Yin

    Abstract: The IEEE 802.11 standards, culminating in IEEE 802.11be (Wi-Fi 7), have significantly expanded bandwidth capacities from 20 MHz to 320 MHz, marking a crucial evolution in wireless access technology. Despite these advancements, the full potential of these capacities remains largely untapped due to inefficiencies in channel management, in particular, the underutilization of secondary (non-primary) c… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. 6 pages, 5 figures

  19. arXiv:2405.00135  [pdf, other

    cs.IT eess.SP

    Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach

    Authors: Shuai Lyu, Yao Sun, Linke Guo, Xiaoyong Yuan, Fang Fang, Lan Zhang, Xianbin Wang

    Abstract: Task-oriented semantic communications (TSC) enhance radio resource efficiency by transmitting task-relevant semantic information. However, current research often overlooks the inherent semantic distinctions among encoded features. Due to unavoidable channel variations from time and frequency-selective fading, semantically sensitive feature units could be more susceptible to erroneous inference if… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE Communications Letters

  20. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  21. arXiv:2404.19468  [pdf, ps, other

    cs.IT

    Compute-Forward Multiple Access for Gaussian Fast Fading Channels

    Authors: Lanwei Zhang, Jamie Evans, Jingge Zhu

    Abstract: Compute-forward multiple access (CFMA) is a transmission strategy which allows the receiver in a multiple access channel (MAC) to first decode linear combinations of the transmitted signals and then solve for individual messages. Compared to existing MAC strategies such as joint decoding or successive interference cancellation (SIC), CFMA was shown to achieve the MAC capacity region for fixed chan… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: ISIT'2024

  22. arXiv:2404.19334  [pdf, other

    cs.CV

    Multi-Scale Heterogeneity-Aware Hypergraph Representation for Histopathology Whole Slide Images

    Authors: Minghao Han, Xukun Zhang, Dingkang Yang, Tao Liu, Haopeng Kuang, Jinghui Feng, Lihua Zhang

    Abstract: Survival prediction is a complex ordinal regression task that aims to predict the survival coefficient ranking among a cohort of patients, typically achieved by analyzing patients' whole slide images. Existing deep learning approaches mainly adopt multiple instance learning or graph neural networks under weak supervision. Most of them are unable to uncover the diverse interactions between differen… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures, accepted by ICME2024

  23. arXiv:2404.19141  [pdf, other

    cs.LG

    Micro-Macro Spatial-Temporal Graph-based Encoder-Decoder for Map-Constrained Trajectory Recovery

    Authors: Tonglong Wei, Youfang Lin, Yan Lin, Shengnan Guo, Lan Zhang, Huaiyu Wan

    Abstract: Recovering intermediate missing GPS points in a sparse trajectory, while adhering to the constraints of the road network, could offer deep insights into users' moving behaviors in intelligent transportation systems. Although recent studies have demonstrated the advantages of achieving map-constrained trajectory recovery via an end-to-end manner, they still face two significant challenges. Firstly,… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted as a regular paper at IEEE TKDE

  24. arXiv:2404.18538  [pdf, ps, other

    cs.LG math.NA

    Symmetry group based domain decomposition to enhance physics-informed neural networks for solving partial differential equations

    Authors: Ye Liu, Jie-Ying Li, Li-Sheng Zhang, Lei-Lei Guo, Zhi-Yong Zhang

    Abstract: Domain decomposition provides an effective way to tackle the dilemma of physics-informed neural networks (PINN) which struggle to accurately and efficiently solve partial differential equations (PDEs) in the whole domain, but the lack of efficient tools for dealing with the interfaces between two adjacent sub-domains heavily hinders the training effects, even leads to the discontinuity of the lear… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  25. arXiv:2404.18515  [pdf, other

    cs.SE

    An Agile Formal Specification Language Design Based on K Framework

    Authors: Jianyu Zhang, Long Zhang, Yixuan Wu, Feng Yang

    Abstract: Formal Methods (FMs) are currently essential for verifying the safety and reliability of software systems. However, the specification writing in formal methods tends to be complex and challenging to learn, requiring familiarity with various intricate formal specification languages and verification technologies. In response to the increasing complexity of software frameworks, existing specification… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  26. arXiv:2404.18392  [pdf, other

    cs.DC

    Dflow, a Python framework for constructing cloud-native AI-for-Science workflows

    Authors: Xinzijian Liu, Yanbo Han, Zhuoyuan Li, Jiahao Fan, Chengqian Zhang, Jinzhe Zeng, Yifan Shan, Yannan Yuan, Wei-Hong Xu, Yun-Pei Liu, Yuzhi Zhang, Tongqi Wen, Darrin M. York, Zhicheng Zhong, Hang Zheng, Jun Cheng, Linfeng Zhang, Han Wang

    Abstract: In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  27. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 7 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  28. arXiv:2404.18041  [pdf, other

    quant-ph cs.LG math.OC

    Variational Optimization for Quantum Problems using Deep Generative Networks

    Authors: Lingxia Zhang, Xiaodie Lin, Peidong Wang, Kaiyan Yang, Xiao Zeng, Zhaohui Wei, Zizhu Wang

    Abstract: Optimization is one of the keystones of modern science and engineering. Its applications in quantum technology and machine learning helped nurture variational quantum algorithms and generative AI respectively. We propose a general approach to design variational optimization algorithms based on generative models: the Variational Generative Optimization Network (VGON). To demonstrate its broad appli… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 17 pages, 13 figures, comments welcome

  29. arXiv:2404.17433  [pdf, other

    cs.CV

    PromptCIR: Blind Compressed Image Restoration with Prompt Learning

    Authors: Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

    Abstract: Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Winner of NTIRE 2024 Blind Compressed Image Enhancement Challenge

  30. arXiv:2404.17227  [pdf, other

    econ.GN cs.CE cs.CR cs.CY q-fin.RM

    Trust Dynamics and Market Behavior in Cryptocurrency: A Comparative Study of Centralized and Decentralized Exchanges

    Authors: Xintong Wu, Wanling Deng, Yuotng Quan, Luyao Zhang

    Abstract: In the evolving landscape of digital finance, the transition from centralized to decentralized trust mechanisms, primarily driven by blockchain technology, plays a critical role in shaping the cryptocurrency ecosystem. This paradigm shift raises questions about the traditional reliance on centralized trust and introduces a novel, decentralized trust framework built upon distributed networks. Our r… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  31. arXiv:2404.17153  [pdf, other

    cs.SE

    A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

    Authors: Cheryl Lee, Chunqiu Steven Xia, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, Michael R. Lyu

    Abstract: Tremendous efforts have been devoted to automating software debugging, a time-consuming process involving fault localization and repair generation. Recently, Large Language Models (LLMs) have shown great potential in automated debugging. However, we identified three challenges posed to traditional and LLM-based debugging tools: 1) the upstream imperfection of fault localization affects the downstr… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  32. arXiv:2404.16825  [pdf, other

    cs.CV eess.IV

    ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

    Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

    Abstract: With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  33. arXiv:2404.16635  [pdf, other

    cs.CV

    TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

    Authors: Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang

    Abstract: Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in terms of parameters and computational requirements limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficien… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages, 11 figures

  34. arXiv:2404.16456  [pdf, other

    cs.CV

    Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

    Authors: Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, Lihua Zhang

    Abstract: Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. Most MSA efforts are based on the assumption of modality completeness. However, in real-world applications, some practical factors cause uncertain modality missingness, which drastically degrades the model's performance. To this end, we propose a Correlation-decoupled Knowledge Distillation (CorrKD) fra… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  35. arXiv:2404.16385  [pdf, other

    cs.CV

    Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

    Authors: Jiawei Chen, Dingkang Yang, Yue Jiang, Mingcheng Li, Jinjie Wei, Xiaolu Hou, Lihua Zhang

    Abstract: In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored. Given the unique challenges in the medical domain, such as limited data scope and significant domain-specific requirements, evaluating… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  36. arXiv:2404.16323  [pdf, other

    cs.CV

    DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction

    Authors: Jiamin Wu, Kenkun Liu, Han Gao, Xiaoke Jiang, Lei Zhang

    Abstract: In this paper, we study the problem of 3D reconstruction from a single-view RGB image and propose a novel approach called DIG3D for 3D object reconstruction and novel view synthesis. Our method utilizes an encoder-decoder framework which generates 3D Gaussians in decoder with the guidance of depth-aware image features from encoder. In particular, we introduce the use of deformable transformer, all… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  37. arXiv:2404.16287  [pdf, other

    stat.ML cs.CR cs.LG math.ST stat.ME

    Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference

    Authors: Zhe Zhang, Ryumei Nakada, Linjun Zhang

    Abstract: Differentially private federated learning is crucial for maintaining privacy in distributed environments. This paper investigates the challenges of high-dimensional estimation and inference under the constraints of differential privacy. First, we study scenarios involving an untrusted central server, demonstrating the inherent difficulties of accurate estimation in high-dimensional problems. Our f… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 56 pages, 3 figures

  38. arXiv:2404.16054  [pdf, other

    cs.HC cs.AI

    LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation

    Authors: Li Zhang, Shihe Wang, Xianqing Jia, Zhihan Zheng, Yunhe Yan, Longxi Gao, Yuanchun Li, Mengwei Xu

    Abstract: The emergent large language/multimodal models facilitate the evolution of mobile agents, especially in the task of mobile UI automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined ones, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  39. arXiv:2404.15956  [pdf, other

    cs.CV

    A Survey on Visual Mamba

    Authors: Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Zi Ye

    Abstract: State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is th… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  40. arXiv:2404.15615  [pdf, other

    cs.HC cs.LG

    MDDD: Manifold-based Domain Adaptation with Dynamic Distribution for Non-Deep Transfer Learning in Cross-subject and Cross-session EEG-based Emotion Recognition

    Authors: Ting Luo, Jing Zhang, Yingwei Qiu, Li Zhang, Yaohua Hu, Zhuliang Yu, Zhen Liang

    Abstract: Emotion decoding using Electroencephalography (EEG)-based affective brain-computer interfaces represents a significant area within the field of affective computing. In the present study, we propose a novel non-deep transfer learning method, termed as Manifold-based Domain adaptation with Dynamic Distribution (MDDD). The proposed MDDD includes four main modules: manifold feature transformation, dyn… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  41. arXiv:2404.15272  [pdf, other

    cs.CV cs.AI cs.CL

    CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

    Authors: Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang

    Abstract: Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal datas… ▽ More

    Submitted 28 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures, 3 tables

  42. arXiv:2404.15247  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

    Authors: Yifeng Ding, Jiawei Liu, Yuxiang Wei, Terry Yue Zhuo, Lingming Zhang

    Abstract: We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, XFT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  43. arXiv:2404.14826  [pdf, ps, other

    cs.NI cs.DC

    Channel Access Methods for RF-Powered IoT Networks: A Survey

    Authors: Hang Yu, Lei Zhang, Yiwei Li, Kwan-Wu Chin, Changlin Yang

    Abstract: Many Internet of Things (IoT) networks with Radio Frequency (RF) powered devices operate over a shared medium. They thus require a channel access protocol. Unlike conventional networks where devices have unlimited energy, in an RF-powered IoT network, devices must first harvest RF energy in order to transmit or/and receive data. To this end, this survey presents the {\em first} comprehensive revie… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  44. arXiv:2404.14710  [pdf, other

    cs.SE

    Challenges of Using Pre-trained Models: the Practitioners' Perspective

    Authors: Xin Tan, Taichuan Li, Ruohe Chen, Fang Liu, Li Zhang

    Abstract: The challenges associated with using pre-trained models (PTMs) have not been specifically investigated, which hampers their effective utilization. To address this knowledge gap, we collected and analyzed a dataset of 5,896 PTM-related questions on Stack Overflow. We first analyze the popularity and difficulty trends of PTM-related questions. We find that PTM-related questions are becoming more and… ▽ More

    Submitted 1 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: SANER 2024

  45. arXiv:2404.14671  [pdf, other

    cs.CV

    LaneCorrect: Self-supervised Lane Detection

    Authors: Ming Nie, Xinyue Cai, Hang Xu, Li Zhang

    Abstract: Lane detection has evolved highly functional autonomous driving system to understand driving scenes even under complex environments. In this paper, we work towards developing a generalized computer vision system able to detect lanes without using any annotation. We make the following contributions: (i) We illustrate how to perform unsupervised 3D lane segmentation by leveraging the distinctive int… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  46. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  47. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg , et al. (62 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 12 pages

  48. Revolutionizing student course selection: Exploring the application prospects and challenges of blockchain token voting technology

    Authors: Tiansu Hu, Yuzhao Song, Linjing Zhang, Xiaoya Zhou

    Abstract: This paper explores the utilization of blockchain token voting technology in student course selection systems. The current course selection systems face various issues, which can be mitigated through the implementation of blockchain technology. The advantages of blockchain technology, including consensus mechanisms and smart contracts, are discussed in detail. The token voting mechanism, encompass… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures, presented at the 5th International Conference on Computing and Data Science, 2023. Open access article distributed under CC BY license. DOI: 10.54254/2755-2721/18/20230971

    ACM Class: H.1.2; K.4.1

    Journal ref: Applied and Computational Engineering, Vol. 18, Pages 96-101, Published 23 October 2023

  49. arXiv:2404.12000  [pdf, other

    cs.SE

    How far are AI-powered programming assistants from meeting developers' needs?

    Authors: Xin Tan, Xiao Long, Xianjun Ni, Yinghao Zhu, Jing Jiang, Li Zhang

    Abstract: Recent In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits. While some studies have examined their effectiveness, there lacks in-depth investigation into the actual assistance process. To bridge this gap, we simulate real development scenarios encompassing three typical types of software development tasks and recruit 27 computer scienc… ▽ More

    Submitted 24 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  50. arXiv:2404.11811  [pdf

    physics.chem-ph cs.AI cs.LG

    Physics-informed active learning for accelerating quantum chemical simulations

    Authors: Yi-Fan Hou, Lina Zhang, Quanhao Zhang, Fuchun Ge, Pavlo O. Dral

    Abstract: Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable inves… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.