Skip to main content

Showing 1–50 of 4,423 results for author: Zhang, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05922  [pdf

    cs.HC

    Understanding and Mitigating Harmful Design in User-Generated Virtual Worlds

    Authors: Zinan Zhang, Xinning Gui, Yubo Kou

    Abstract: Virtual space offers innovative ways for individuals to engage with one another in a digital setting. Prominent virtual social platforms, such as Facebook Spaces, VR Chat, and AltspaceVR, facilitate social connections, allowing users to interact seamlessly. Additionally, certain video games, like Second Life and World of Warcraft, are set within these virtual spaces as well, providing immersive pl… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

  3. arXiv:2405.05760  [pdf, other

    cs.CV cs.CL

    Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media

    Authors: Zhizhen Zhang, Ning Wang, Haojie Li, Zhihui Wang

    Abstract: The purpose of semantic location prediction is to extract relevant semantic location information from multimodal social media posts, offering a more contextual understanding of daily activities compared to GPS coordinates. However, this task becomes challenging due to the presence of noise and irrelevant information in "text-image" pairs. Existing methods suffer from insufficient feature represent… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  4. arXiv:2405.05691  [pdf, other

    cs.CV cs.MM

    StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework

    Authors: Yiheng Huang, Hui Yang, Chuanchen Luo, Yuxi Wang, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng

    Abstract: Thanks to the powerful generative capacity of diffusion models, recent years have witnessed rapid progress in human motion generation. Existing diffusion-based methods employ disparate network architectures and training strategies. The effect of the design of each component is still unclear. In addition, the iterative denoising process consumes considerable computational overhead, which is prohibi… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2405.05636  [pdf, other

    cs.CV cs.AI

    SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space

    Authors: Zeren Zhang, Haibo Qin, Jiayu Huang, Yixin Li, Hui Lin, Yitao Duan, Jinwen Ma

    Abstract: Combining face swapping with lip synchronization technology offers a cost-effective solution for customized talking face generation. However, directly cascading existing models together tends to introduce significant interference between tasks and reduce video clarity because the interaction space is limited to the low-level semantic RGB space. To address this issue, we propose an innovative unifi… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  6. arXiv:2405.05613  [pdf, other

    cs.CV

    Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification

    Authors: Xiangbo Yin, Jiangming Shi, Yachao Zhang, Yang Lu, Zhizhong Zhang, Yuan Xie, Yanyun Qu

    Abstract: Unsupervised Visible-Infrared Person Re-identification (USVI-ReID) presents a formidable challenge, which aims to match pedestrian images across visible and infrared modalities without any annotations. Recently, clustered pseudo-label methods have become predominant in USVI-ReID, although the inherent noise in pseudo-labels presents a significant obstacle. Most existing works primarily focus on sh… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  7. arXiv:2405.05552  [pdf, other

    cs.CV

    Bidirectional Progressive Transformer for Interaction Intention Anticipation

    Authors: Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang

    Abstract: Interaction intention anticipation aims to jointly predict future hand trajectories and interaction hotspots. Existing research often treated trajectory forecasting and interaction hotspots prediction as separate tasks or solely considered the impact of trajectories on interaction hotspots, which led to the accumulation of prediction errors over time. However, a deeper inherent connection exists b… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  8. arXiv:2405.05538  [pdf, other

    cs.CV

    A Survey on Personalized Content Synthesis with Diffusion Models

    Authors: Xulu Zhang, Xiao-Yong Wei, Wengyu Zhang, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li

    Abstract: Recent advancements in generative models have significantly impacted content creation, leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user-provided examples, PCS aims to customize the subject of interest to specific user-defined prompts. Over the past two years, more than 150 methods have been proposed. However, existing surveys mainly focus on text-to-image… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  9. arXiv:2405.05514  [pdf, other

    cs.RO

    HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

    Authors: Zhirui Sun, Zhe Zhang, Jieting Zhao, Hanjing Ye, Jiankun Wang

    Abstract: The robotic autonomous luggage trolley collection system employs robots to gather and transport scattered luggage trolleys at airports. However, existing methods for detecting and locating these luggage trolleys often fail when they are not fully visible. To address this, we introduce the Hierarchical Progressive Perception System (HPPS), which enhances the detection and localization of luggage tr… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  10. arXiv:2405.05430  [pdf, other

    cs.LG

    Towards Invariant Time Series Forecasting in Smart Cities

    Authors: Ziyi Zhang, Shaogang Ren, Xiaoning Qian, Nick Duffield

    Abstract: In the transformative landscape of smart cities, the integration of the cutting-edge web technologies into time series forecasting presents a pivotal opportunity to enhance urban planning, sustainability, and economic growth. The advancement of deep neural networks has significantly improved forecasting performance. However, a notable challenge lies in the ability of these models to generalize wel… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ACM WWW Companion 2024

  11. arXiv:2405.05409  [pdf, other

    cs.LG

    Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

    Authors: Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu

    Abstract: Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks using anchor functions. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solu… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  12. arXiv:2405.04883  [pdf, other

    cs.CV cs.AI cs.LG

    Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion

    Authors: Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao

    Abstract: Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces. In this work, we propose Molecule-Space, an idea that treats multimodal representation spaces as "molecules", and augments pre-trained unified space… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024. The code and checkpoints are released at https://github.com/MoleculeSpace/MoleculeSpace

  13. arXiv:2405.04840  [pdf, other

    cs.IR

    Federated Adaptation for Foundation Model-based Recommendations

    Authors: Chunxu Zhang, Guodong Long, Hongkuan Guo, Xiao Fang, Yang Song, Zhaojie Liu, Guorui Zhou, Zijian Zhang, Yang Liu, Bo Yang

    Abstract: With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted as a regular paper of IJCAI'24

  14. arXiv:2405.04782  [pdf, other

    cs.CV

    Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection

    Authors: Zhaoxiang Zhang, Hanqiu Deng, Jinan Bao, Xingyu Li

    Abstract: Image Anomaly Detection has been a challenging task in Computer Vision field. The advent of Vision-Language models, particularly the rise of CLIP-based frameworks, has opened new avenues for zero-shot anomaly detection. Recent studies have explored the use of CLIP by aligning images with normal and prompt descriptions. However, the exclusive dependence on textual guidance often falls short, highli… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  15. arXiv:2405.04532  [pdf, other

    cs.CL cs.AI cs.LG cs.PF

    QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

    Authors: Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han

    Abstract: Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only accelerate low-batch, edge LLM inference, failing to deliver performance gains in large-batch, cloud-based LLM serving. We uncover a critical issue: existing… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: The first three authors contribute equally to this project and are listed in the alphabetical order. Yujun Lin leads the quantization algorithm, Haotian Tang and Shang Yang lead the GPU kernels and the serving system. Code is available at https://github.com/mit-han-lab/qserve

  16. arXiv:2405.04311  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Cross-IQA: Unsupervised Learning for Image Quality Assessment

    Authors: Zhen Zhang

    Abstract: Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  17. arXiv:2405.04121  [pdf, other

    cs.CV

    ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation

    Authors: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

    Abstract: Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Mult… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 9 pages, 6 figures, ICME 2024 oral

  18. arXiv:2405.04026  [pdf, other

    stat.ML cs.LG

    Federated Control in Markov Decision Processes

    Authors: Hao Jin, Yang Peng, Liangyu Zhang, Zhihua Zhang

    Abstract: We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  19. arXiv:2405.03960  [pdf, other

    cs.CL

    ESIHGNN: Event-State Interactions Infused Heterogeneous Graph Neural Network for Conversational Emotion Recognition

    Authors: Xupeng Zha, Huan Zhao, Zixing Zhang

    Abstract: Conversational Emotion Recognition (CER) aims to predict the emotion expressed by an utterance (referred to as an ``event'') during a conversation. Existing graph-based methods mainly focus on event interactions to comprehend the conversational context, while overlooking the direct influence of the speaker's emotional state on the events. In addition, real-time modeling of the conversation is cruc… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: published at ICASSP 2024

  20. arXiv:2405.03956  [pdf, other

    cs.SD eess.AS

    Adaptive Speech Emotion Representation Learning Based On Dynamic Graph

    Authors: Yingxue Gao, Huan Zhao, Zixing Zhang

    Abstract: Graph representation learning has become a hot research topic due to its powerful nonlinear fitting capability in extracting representative node embeddings. However, for sequential data such as speech signals, most traditional methods merely focus on the static graph created within a sequence, and largely overlook the intrinsic evolving patterns of these data. This may reduce the efficiency of gra… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: published at ICASSP 2024

  21. arXiv:2405.03953  [pdf, other

    cs.SD eess.AS

    Intelligent Cardiac Auscultation for Murmur Detection via Parallel-Attentive Models with Uncertainty Estimation

    Authors: Zixing Zhang, Tao Pang, Jing Han, Björn W. Schuller

    Abstract: Heart murmurs are a common manifestation of cardiovascular diseases and can provide crucial clues to early cardiac abnormalities. While most current research methods primarily focus on the accuracy of models, they often overlook other important aspects such as the interpretability of machine learning algorithms and the uncertainty of predictions. This paper introduces a heart murmur detection meth… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: published at ICASSP 2024

  22. arXiv:2405.03952  [pdf, other

    cs.SD cs.CL eess.AS

    HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech

    Authors: Zhongren Dong, Zixing Zhang, Weixiang Xu, Jing Han, Jianjun Ou, Björn W. Schuller

    Abstract: Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: publised at ICASSP 2024

  23. arXiv:2405.03562  [pdf, other

    cs.IR

    ID-centric Pre-training for Recommendation

    Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Fuzhen Zhuang, Xu Zhang, Leyu Lin, Zhanhui Kang, Yongjun Xu

    Abstract: Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered un… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  24. arXiv:2405.03520  [pdf, other

    cs.CV

    Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

    Authors: Zheng Zhu, Xiaofeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

    Abstract: General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical law… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This survey will be regularly updated at: https://github.com/GigaAI-research/General-World-Models-Survey

  25. arXiv:2405.03333  [pdf, other

    cs.CV

    Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance

    Authors: Xunchu Zhou, Xiaohong Liu, Yunlong Dong, Tengchuan Kou, Yixuan Gao, Zicheng Zhang, Chunyi Li, Haoning Wu, Guangtao Zhai

    Abstract: Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  26. arXiv:2405.03300  [pdf, other

    cs.IT eess.SP

    Active RIS-Aided Massive MIMO With Imperfect CSI and Phase Noise

    Authors: Zhangjie Peng, Jianchen Zhu, Cunhua Pan, Zaichen Zhang, Daniel Benevides da Costa, Maged Elkashlan, George K. Karagiannidis

    Abstract: Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system i… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  27. arXiv:2405.03236  [pdf, other

    cs.LG stat.ML

    Federated Reinforcement Learning with Constraint Heterogeneity

    Authors: Hao Jin, Liangyu Zhang, Zhihua Zhang

    Abstract: We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity. In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals. Such learning… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  28. arXiv:2405.03191  [pdf, ps, other

    cs.IT

    Exploiting Matrix Information Geometry for Integrated Decoding of Massive Uncoupled Unsourced Random Access

    Authors: Feiyan Tian, Xiaoming Chen, Chongwen Huang, Zhaoyang Zhang

    Abstract: In this paper, we explore an efficient uncoupled unsourced random access (UURA) scheme for 6G massive communication. UURA is a typical framework of unsourced random access that addresses the problems of codeword detection and message stitching, without the use of check bits. Firstly, we establish a framework for UURA, allowing for immediate decoding of sub-messages upon arrival. Thus, the processi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  29. arXiv:2405.03103  [pdf, other

    cs.LG cs.CV

    Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

    Authors: Jordan Dotzel, Yuzong Chen, Bahaa Kotb, Sushma Prasad, Gang Wu, Sheng Li, Mohamed S. Abdelfattah, Zhiru Zhang

    Abstract: Large language models (LLMs) have recently achieved state-of-the-art performance across various tasks, yet due to their large computational requirements, they struggle with strict latency and power demands. Deep neural network (DNN) quantization has traditionally addressed these limitations by converting models to low-precision integer formats. Yet recently alternative formats, such as Normal Floa… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  30. arXiv:2405.03095  [pdf, other

    cs.LG math-ph

    Loss Jump During Loss Switch in Solving PDEs with Neural Networks

    Authors: Zhiwei Wang, Lulu Zhang, Zhongwang Zhang, Zhi-Qin John Xu

    Abstract: Using neural networks to solve partial differential equations (PDEs) is gaining popularity as an alternative approach in the scientific computing community. Neural networks can integrate different types of information into the loss function. These include observation data, governing equations, and variational forms, etc. These loss functions can be broadly categorized into two types: observation d… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  31. arXiv:2405.02962  [pdf, other

    cs.CV

    VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

    Authors: Juncheng Hu, Ximing Xing, Zhengqi Zhang, Jing Zhang, Qian Yu

    Abstract: We propose a novel method, VectorPainter, for the task of stylized vector graphics synthesis. Given a text prompt and a reference style image, VectorPainter generates a vector graphic that aligns in content with the text prompt and remains faithful in style to the reference image. We recognize that the key to this task lies in fully leveraging the intrinsic properties of vector graphics. Innovativ… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  32. arXiv:2405.02923  [pdf, other

    cs.IT

    Constructing $(h,d)$ cooperative MSR codes with sub-packetization $(d-k+h)(d-k+1)^{\lceil n/2 \rceil}$

    Authors: Zihao Zhang, Guodong Li, Sihuang Hu

    Abstract: We address the multi-node failure repair challenges for MDS array codes. Presently, two primary models are employed for multi-node repairs: the centralized model where all failed nodes are restored in a singular data center, and the cooperative model where failed nodes acquire data from auxiliary nodes and collaborate amongst themselves for the repair process.This paper focuses on the cooperative… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  33. arXiv:2405.02604  [pdf, ps, other

    cs.IT eess.SP

    Interleave Frequency Division Multiplexing

    Authors: Yuhao Chi, Lei Liu, Yao Ge, Xuehui Chen, Ying Li, Zhaoyang Zhang

    Abstract: In this letter, we study interleave frequency division multiplexing (IFDM) for multicarrier modulation in static multipath and mobile time-varying channels, which outperforms orthogonal frequency division multiplexing (OFDM), orthogonal time frequency space (OTFS), and affine frequency division multiplexing (AFDM) by considering practical advanced detectors. The fundamental principle underlying ex… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Wireless Communications Letters

  34. arXiv:2405.02357  [pdf, other

    cs.LG

    Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks

    Authors: Zijian Zhang, Yujie Sun, Zepu Wang, Yuqi Nie, Xiaobo Ma, Peng Sun, Ruolin Li

    Abstract: Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban plan… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 9 pages

  35. arXiv:2405.02171  [pdf, other

    cs.CV

    Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

    Authors: Zhilu Zhang, Ruohao Wang, Hongzhi Zhang, Wangmeng Zuo

    Abstract: In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms. Firstly, considering the popularity of multiple… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accpted by IEEE TPAMI in 2024. Extended version of ECCV 2022 paper "Self-Supervised Learning for Real-World Super-Resolution from Dual Zoomed Observations" (arXiv:2203.01325)

  36. arXiv:2405.01872  [pdf, other

    cs.CV

    Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

    Authors: Yichun Tai, Kun Yang, Tao Peng, Zhenzhen Huang, Zhijiang Zhang

    Abstract: The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To thi… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  37. arXiv:2405.01564  [pdf, other

    cs.SE

    Prioritizing Software Requirements Using Large Language Models

    Authors: Malik Abdul Sami, Zeeshan Rasheed, Muhammad Waseem, Zheying Zhang, Tomas Herda, Pekka Abrahamsson

    Abstract: Large Language Models (LLMs) are revolutionizing Software Engineering (SE) by introducing innovative methods for tasks such as collecting requirements, designing software, generating code, and creating test cases, among others. This article focuses on requirements engineering, typically seen as the initial phase of software development that involves multiple system stakeholders. Despite its key ro… ▽ More

    Submitted 5 April, 2024; originally announced May 2024.

  38. arXiv:2405.01503  [pdf, other

    eess.IV cs.CV

    PAM-UNet: Shifting Attention on Region of Interest in Medical Images

    Authors: Abhijit Das, Debesh Jha, Vandan Gorade, Koushik Biswas, Hongyi Pan, Zheyuan Zhang, Daniela P. Ladner, Yury Velichko, Amir Borhani, Ulas Bagci

    Abstract: Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To addre… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted at 2024 IEEE EMBC

  39. arXiv:2405.01017  [pdf, ps, other

    math.CO cs.CC math.MG

    NP-completeness of Tiling Finite Simply Connected Regions with a Fixed Set of Wang Tiles

    Authors: Chao Yang, Zhujun Zhang

    Abstract: The computational complexity of tiling finite simply connected regions with a fixed set of tiles is studied in this paper. We show that the problem of tiling simply connected regions with a fixed set of $23$ Wang tiles is NP-complete. As a consequence, the problem of tiling simply connected regions with a fixed set of $111$ rectangles is NP-complete. Our results improve that of Igor Pak and Jed Ya… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  40. arXiv:2405.00902  [pdf, ps, other

    cs.LG cs.AI cs.MA

    MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure

    Authors: Zhicheng Zhang, Yancheng Liang, Yi Wu, Fei Fang

    Abstract: Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It lear… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted to AAMAS 2024. 15 pages

  41. arXiv:2405.00770  [pdf, other

    quant-ph cs.CC cs.LG

    Quantum-Classical Separations in Shallow-Circuit-Based Learning with and without Noises

    Authors: Zhihan Zhang, Weiyuan Gong, Weikang Li, Dong-Ling Deng

    Abstract: We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly wit… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 14 pages, 3 figures

  42. arXiv:2405.00736  [pdf, other

    eess.SP cs.LG

    Joint Signal Detection and Automatic Modulation Classification via Deep Learning

    Authors: Huijun Xing, Xuhui Zhang, Shuo Chang, Jinke Ren, Zixun Zhang, Jie Xu, Shuguang Cui

    Abstract: Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different ca… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

  43. arXiv:2405.00734  [pdf, other

    eess.SP cs.AI cs.LG

    EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

    Authors: Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang

    Abstract: Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Mani… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

  44. arXiv:2405.00479  [pdf, other

    cs.CV

    Enhanced Visual Question Answering: A Comparative Analysis and Textual Feature Extraction Via Convolutions

    Authors: Zhilin Zhang

    Abstract: Visual Question Answering (VQA) has emerged as a highly engaging field in recent years, attracting increasing research efforts aiming to enhance VQA accuracy through the deployment of advanced models such as Transformers. Despite this growing interest, there has been limited exploration into the comparative analysis and impact of textual modalities within VQA, particularly in terms of model comple… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  45. arXiv:2405.00391  [pdf, other

    cs.IT eess.SP

    Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays

    Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Ahmed Alhammadi, Hui Chen, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  46. arXiv:2405.00365  [pdf, other

    cs.IT eess.SP

    Robust Continuous-Time Beam Tracking with Liquid Neural Network

    Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Richeng Jin, Qianqian Yang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  47. arXiv:2404.19750  [pdf, other

    cs.IT eess.SP

    A Joint Communication and Computation Design for Distributed RISs Assisted Probabilistic Semantic Communication in IIoT

    Authors: Zhouxiang Zhao, Zhaohui Yang, Chongwen Huang, Li Wei, Qianqian Yang, Caijun Zhong, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, the problem of spectral-efficient communication and computation resource allocation for distributed reconfigurable intelligent surfaces (RISs) assisted probabilistic semantic communication (PSC) in industrial Internet-of-Things (IIoT) is investigated. In the considered model, multiple RISs are deployed to serve multiple users, while PSC adopts compute-then-transmit protocol to reduc… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  48. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  49. arXiv:2404.19384  [pdf, other

    cs.CV cs.AI

    Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

    Authors: Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai

    Abstract: Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previ… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  50. arXiv:2404.19335  [pdf, other

    cs.CL

    StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

    Authors: Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen

    Abstract: Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Submitted to ACL 2024