Skip to main content

Showing 1–50 of 2,662 results for author: Zhang, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05802  [pdf, other

    cs.DC cs.AI

    Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint

    Authors: Jun Li, Weiwei Zhang, Kang Wei, Guangji Chen, Long Shi, Wen Chen

    Abstract: As an emerging artificial intelligence technology, graph neural networks (GNNs) have exhibited promising performance across a wide range of graph-related applications. However, information exchanges among neighbor nodes in GNN pose new challenges in the resource-constrained scenario, especially in wireless systems. In practical wireless systems, the communication links among nodes are usually unre… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 5 pages,3 figures

  2. arXiv:2405.05695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

    Authors: Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma

    Abstract: We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure fo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2024

    Journal ref: International Conference on Learning Representations (ICLR), 2024

  3. arXiv:2405.05538  [pdf, other

    cs.CV

    A Survey on Personalized Content Synthesis with Diffusion Models

    Authors: Xulu Zhang, Xiao-Yong Wei, Wengyu Zhang, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li

    Abstract: Recent advancements in generative models have significantly impacted content creation, leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user-provided examples, PCS aims to customize the subject of interest to specific user-defined prompts. Over the past two years, more than 150 methods have been proposed. However, existing surveys mainly focus on text-to-image… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  4. arXiv:2405.05500  [pdf

    cs.RO eess.SY

    Research on the Tender Leaf Identification and Mechanically Perceptible Plucking Finger for High-quality Green Tea

    Authors: Wei Zhang, Yong Chen, Qianqian Wang, Jun Chen

    Abstract: BACKGROUND: Intelligent identification and precise plucking are the keys to intelligent tea harvesting robots, which are of increasing significance nowadays. Aiming at plucking tender leaves for high-quality green tea producing, in this paper, a tender leaf identification algorithm and a mechanically perceptible plucking finger have been proposed. RESULTS: Based on segmentation algorithm and color… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  5. arXiv:2405.05258  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

    Authors: Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

    Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the effi… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 6 figures, 8 tables; Code at https://github.com/ldkong1205/LaserMix

  6. arXiv:2405.05109  [pdf, other

    cs.CL cs.AI

    QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs

    Authors: Weijia Zhang, Vaishali Pal, Jia-Hong Huang, Evangelos Kanoulas, Maarten de Rijke

    Abstract: Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users' information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing que… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 16 pages, 3 figures

  7. arXiv:2405.04936  [pdf, other

    cs.DB

    SPSW: Database Watermarking Based on Fake Tuples and Sparse Priority Strategy

    Authors: Zhiwen Ren, Zehua Ma, Weiming Zhang, Nenghai Yu

    Abstract: Databases play a crucial role in storing and managing vast amounts of data in various organizations and industries. Yet the risk of database leakage poses a significant threat to data privacy and security. To trace the source of database leakage, researchers have proposed many database watermarking schemes. Among them, fake-tuples-based database watermarking shows great potential as it does not mo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  8. arXiv:2405.04165  [pdf, other

    cs.CL

    LingML: Linguistic-Informed Machine Learning for Enhanced Fake News Detection

    Authors: Jasraj Singh, Fang Liu, Hong Xu, Bee Chin Ng, Wei Zhang

    Abstract: Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 7 pages

  9. arXiv:2405.04121  [pdf, other

    cs.CV

    ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation

    Authors: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

    Abstract: Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Mult… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 9 pages, 6 figures, ICME 2024 oral

  10. arXiv:2405.04114  [pdf, other

    cs.LG cs.AI

    Acceleration Algorithms in GNNs: A Survey

    Authors: Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui

    Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the resear… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 9 pages,3 figures

  11. arXiv:2405.03950  [pdf, other

    cs.LG cs.AI

    Relating-Up: Advancing Graph Neural Networks through Inter-Graph Relationships

    Authors: Qi Zou, Na Yu, Daoliang Zhang, Wei Zhang, Rui Gao

    Abstract: Graph Neural Networks (GNNs) have excelled in learning from graph-structured data, especially in understanding the relationships within a single graph, i.e., intra-graph relationships. Despite their successes, GNNs are limited by neglecting the context of relationships across graphs, i.e., inter-graph relationships. Recognizing the potential to extend this capability, we introduce Relating-Up, a p… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 16 pages, 6 figures, 9 tables

  12. arXiv:2405.03884  [pdf, other

    cs.CV

    BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection

    Authors: Saket S. Chaturvedi, Lan Zhang, Wenbin Zhang, Pan He, Xiaoyong Yuan

    Abstract: 3D object detection plays an important role in autonomous driving; however, its vulnerability to backdoor attacks has become evident. By injecting ''triggers'' to poison the training dataset, backdoor attacks manipulate the detector's prediction for inputs containing these triggers. Existing backdoor attacks against 3D object detection primarily poison 3D LiDAR signals, where large-sized 3D trigge… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted at IJCAI 2024 Conference

  13. arXiv:2405.03614  [pdf, ps, other

    cs.IT

    Repairing with Zero Skip Cost

    Authors: Wenqin Zhang, Yeow Meng Chee, Son Hoang Dau, Tuvi Etzion, Han Mao Kiah, Yuan Luo

    Abstract: To measure repair latency at helper nodes, we introduce a new metric called skip cost that quantifies the number of contiguous sections accessed on a disk. We provide explicit constructions of zigzag codes and fractional repetition codes that incur zero skip cost

    Submitted 6 May, 2024; originally announced May 2024.

  14. arXiv:2405.03296  [pdf, ps, other

    cs.LG cs.AI

    Coefficient Decomposition for Spectral Graph Convolution

    Authors: Feng Huang, Wen Zhang

    Abstract: Spectral graph convolutional network (SGCN) is a kind of graph neural networks (GNN) based on graph signal filters, and has shown compelling expressivity for modeling graph-structured data. Most SGCNs adopt polynomial filters and learn the coefficients from the training data. Many of them focus on which polynomial basis leads to optimal expressive power and models' architecture is little discussed… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  15. arXiv:2405.03125  [pdf, other

    cs.IT

    MambaJSCC: Deep Joint Source-Channel Coding with Visual State Space Model

    Authors: Tong Wu, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Wenjun Zhang, Ping Zhang

    Abstract: Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: submitted to IEEE conference

  16. arXiv:2405.03008  [pdf, other

    eess.IV cs.CV cs.LG

    DVMSR: Distillated Vision Mamba for Efficient Super-Resolution

    Authors: Xiaoyan Lei, Wenlong ZHang, Weifeng Cao

    Abstract: Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational comple… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures

  17. arXiv:2405.02954  [pdf, other

    cs.CV cs.LG

    Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

    Authors: Wenyu Zhang, Li Shen, Chuan-Sheng Foo

    Abstract: Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the sourc… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Extension of ICCV paper arXiv:2212.07585, submitted to IJCV

  18. arXiv:2405.02358  [pdf, other

    cs.LG cs.AI

    A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model

    Authors: Jiexia Ye, Weiqi Zhang, Ke Yi, Yongzi Yu, Ziyue Li, Jia Li, Fugee Tsung

    Abstract: Time series data are ubiquitous across various domains, making time series analysis critically important. Traditional time series models are task-specific, featuring singular functionality and limited generalization capacity. Recently, large language foundation models have unveiled their remarkable capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explai… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 5 figures, 6 tables, 41 pages

  19. arXiv:2405.02355  [pdf, other

    cs.SE cs.AI

    CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation

    Authors: Kounianhua Du, Renting Rui, Huacan Chai, Lingyue Fu, Wei Xia, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang

    Abstract: Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  20. arXiv:2405.02301  [pdf, other

    cs.CV

    TFCounter:Polishing Gems for Training-Free Object Counting

    Authors: Pan Ting, Jianfeng Lin, Wenhao Yu, Wenlong Zhang, Xiaoying Chen, Jinlu Zhang, Binqiang Huang

    Abstract: Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis. Existing object counting methods face a tri-fold challenge: achieving superior performance, maintaining high generalizability, and minimizing annotation costs. We develop a novel training-free class-agnostic object counter, TFCounter, which is prompt-context-a… ▽ More

    Submitted 12 March, 2024; originally announced May 2024.

    Comments: 14pages,11 figuers

    MSC Class: 68

  21. arXiv:2405.01394  [pdf, other

    cs.AI

    Analysis of a Modular Autonomous Driving Architecture: The Top Submission to CARLA Leaderboard 2.0 Challenge

    Authors: Weize Zhang, Mohammed Elmahgiubi, Kasra Rezaee, Behzad Khamidehi, Hamidreza Mirkhani, Fazel Arasteh, Chunlin Li, Muhammad Ahsan Kaleem, Eduardo R. Corral-Soto, Dhruv Sharma, Tongtong Cao

    Abstract: In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assiste… ▽ More

    Submitted 21 March, 2024; originally announced May 2024.

  22. arXiv:2405.00616  [pdf, other

    cs.IT

    An Expectation-Maximization Relaxed Method for Privacy Funnel

    Authors: Lingyi Chen, Jiachuan Ye, Shitong Wu, Huihui Wu, Hao Wu, Wenyi Zhang

    Abstract: The privacy funnel (PF) gives a framework of privacy-preserving data release, where the goal is to release useful data while also limiting the exposure of associated sensitive information. This framework has garnered significant interest due to its broad applications in characterization of the privacy-utility tradeoff. Hence, there is a strong motivation to develop numerical methods with high prec… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  23. arXiv:2405.00545  [pdf, other

    cs.IT math.NA

    A Double Maximization Approach for Optimizing the LM Rate of Mismatched Decoding

    Authors: Lingyi Chen, Shitong Wu, Xinwei Li, Huihui Wu, Hao Wu, Wenyi Zhang

    Abstract: An approach is established for maximizing the Lower bound on the Mismatch capacity (hereafter abbreviated as LM rate), a key performance bound in mismatched decoding, by optimizing the channel input probability distribution. Under a fixed channel input probability distribution, the computation of the corresponding LM rate is a convex optimization problem. When optimizing the channel input probabil… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  24. arXiv:2405.00474  [pdf, other

    cs.IT

    On Convergence of Discrete Schemes for Computing the Rate-Distortion Function of Continuous Source

    Authors: Lingyi Chen, Shitong Wu, Wenyi Zhang, Huihui Wu, Hao Wu

    Abstract: Computing the rate-distortion function for continuous sources is commonly regarded as a standard continuous optimization problem. When numerically addressing this problem, a typical approach involves discretizing the source space and subsequently solving the associated discrete problem. However, existing literature has predominantly concentrated on the convergence analysis of solving discrete prob… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  25. arXiv:2405.00466  [pdf, other

    cs.CV cs.CR

    Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable

    Authors: Haozhe Liu, Wentian Zhang, Bing Li, Bernard Ghanem, Jürgen Schmidhuber

    Abstract: Foundational generative models should be traceable to protect their owners and facilitate safety regulation. To achieve this, traditional approaches embed identifiers based on supervisory trigger-response signals, which are commonly known as backdoor watermarks. They are prone to failure when the model is fine-tuned with nontrigger data. Our experiments show that this vulnerability is due to energ… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  26. arXiv:2405.00435  [pdf, other

    cs.HC

    CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model

    Authors: Wei Zhang, Wong Kam-Kwai, Biying Xu, Yiwen Ren, Yuhuai Li, Minfeng Zhu, Yingchaojie Feng, Wei Chen

    Abstract: The integration of new technology with cultural studies enhances our understanding of cultural heritage but often struggles to connect with diverse audiences. It is challenging to align personal interpretations with the intended meanings across different cultures. Our study investigates the important factors in appreciating art from a cross-cultural perspective. We explore the application of Large… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  27. arXiv:2404.19683  [pdf, other

    cs.GT

    Collaborative Control Method of Transit Signal Priority Based on Cooperative Game and Reinforcement Learning

    Authors: Hao Qin, Weishi Zhang

    Abstract: To address the low efficiency in priority signal control within intelligent transportation systems, this study introduces a novel eight-phase priority signal control method, CBQL-TSP, leveraging a hybrid decision-making framework that integrates cooperative game theory and reinforcement learning. This approach conceptualizes the allocation of bus signal priorities as a multi-objective decision-mak… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  28. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  29. arXiv:2404.19326  [pdf, other

    cs.CV

    LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation

    Authors: Lingyi Hong, Zhongying Liu, Wenchao Chen, Chenzhi Tan, Yuang Feng, Xinyu Zhou, Pinxue Guo, Jinglun Li, Zhaoyu Chen, Shuyong Gao, Wei Zhang, Wenqiang Zhang

    Abstract: Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets rest… ▽ More

    Submitted 30 April, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: LVOS V2

  30. arXiv:2404.19311  [pdf, other

    cs.CV cs.MM

    A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images

    Authors: Wang Zhang, Tingting Li, Yuntian Zhang, Gensheng Pei, Xiruo Jiang, Yazhou Yao

    Abstract: Matching visible and near-infrared (NIR) images remains a significant challenge in remote sensing image fusion. The nonlinear radiometric differences between heterogeneous remote sensing images make the image matching task even more difficult. Deep learning has gained substantial attention in computer vision tasks in recent years. However, many methods rely on supervised learning and necessitate l… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: accepted by Information Fusion

  31. arXiv:2404.18648  [pdf, other

    cs.CV

    Uncertainty-boosted Robust Video Activity Anticipation

    Authors: Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang

    Abstract: Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision and autonomous driving. Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored. This reduces the model generalization ability and deep understanding… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by T-PAMI

  32. arXiv:2404.18209  [pdf, other

    cs.LG cs.DB

    4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs

    Authors: Minjie Wang, Quan Gan, David Wipf, Zhenkun Cai, Ning Li, Jianheng Tang, Yanlin Zhang, Zizhao Zhang, Zunyao Mao, Yakun Song, Yanbo Wang, Jiahang Li, Han Zhang, Guang Yang, Xiao Qin, Chuan Lei, Muhan Zhang, Weinan Zhang, Christos Faloutsos, Zheng Zhang

    Abstract: Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and eva… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Under review

  33. arXiv:2404.18089  [pdf, other

    cs.MA

    ATR-Mapping: Asymmetric Topological Representation based Mapping Framework for Multi-Robot Environment Exploration

    Authors: Hao Zhang, Jiyu Cheng, Wei Zhang

    Abstract: In recent years, the widespread application of multi-robot systems in areas such as power inspection, autonomous vehicle fleets has made multi-robot technology a research hotspot in the field of robotics. This paper investigates multi-robot cooperative exploration in unknown environments, proposing a training framework and decision strategy based on multi-agent reinforcement learning. Specifically… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  34. arXiv:2404.17288  [pdf, other

    cs.IR

    ExcluIR: Exclusionary Neural Information Retrieval

    Authors: Wenhao Zhang, Mengqi Zhang, Shiguang Wu, Jiahuan Pei, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Pengjie Ren

    Abstract: Exclusion is an important and universal linguistic skill that humans use to express what they do not want. However, in information retrieval community, there is little research on exclusionary retrieval, where users express what they do not want in their queries. In this work, we investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  35. arXiv:2404.17215  [pdf, other

    cs.RO cs.CV

    SLAM for Indoor Mapping of Wide Area Construction Environments

    Authors: Vincent Ress, Wei Zhang, David Skuddis, Norbert Haala, Uwe Soergel

    Abstract: Simultaneous localization and mapping (SLAM), i.e., the reconstruction of the environment represented by a (3D) map and the concurrent pose estimation, has made astonishing progress. Meanwhile, large scale applications aiming at the data collection in complex environments like factory halls or construction sites are becoming feasible. However, in contrast to small scale scenarios with building int… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  36. arXiv:2404.16850  [pdf, other

    cs.CR

    Membership Information Leakage in Federated Contrastive Learning

    Authors: Kongyang Chen, Wenfeng Wang, Zixin Wang, Wangjun Zhang, Zhipeng Li, Yao Huang

    Abstract: Federated Contrastive Learning (FCL) represents a burgeoning approach for learning from decentralized unlabeled data while upholding data privacy. In FCL, participant clients collaborate in learning a global encoder using unlabeled data, which can serve as a versatile feature extractor for diverse downstream tasks. Nonetheless, FCL is susceptible to privacy risks, such as membership information le… ▽ More

    Submitted 6 March, 2024; originally announced April 2024.

  37. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  38. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  39. arXiv:2404.16422  [pdf, other

    cs.CV

    Robust Fine-tuning for Pre-trained 3D Point Cloud Models

    Authors: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

    Abstract: This paper presents a robust fine-tuning method designed for pre-trained 3D point cloud models, to enhance feature robustness in downstream fine-tuned models. We highlight the limitations of current fine-tuning methods and the challenges of learning robust models. The proposed method, named Weight-Space Ensembles for Fine-Tuning then Linear Probing (WiSE-FT-LP), integrates the original pre-trainin… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures

  40. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  41. arXiv:2404.16006  [pdf, other

    cs.CV

    MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

    Authors: Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 77 pages, 41 figures

  42. arXiv:2404.15954  [pdf, other

    cs.IR cs.LG

    Mixed Supervised Graph Contrastive Learning for Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu

    Abstract: Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss… ▽ More

    Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  43. arXiv:2404.15678  [pdf, other

    cs.IR cs.AI

    Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System

    Authors: Lei Zheng, Ning Li, Weinan Zhang, Yong Yu

    Abstract: Current recommendation systems are significantly affected by a serious issue of temporal data shift, which is the inconsistency between the distribution of historical data and that of online data. Most existing models focus on utilizing updated data, overlooking the transferable, temporal data shift-free information that can be learned from shifting data. We propose the Temporal Invariance of Asso… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  44. arXiv:2404.15592  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction

    Authors: Henry Peng Zou, Vinay Samuel, Yue Zhou, Weizhi Zhang, Liancheng Fang, Zihe Song, Philip S. Yu, Cornelia Caragea

    Abstract: Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction.… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  45. arXiv:2404.14832  [pdf, other

    cs.IT

    GLDPC-PC Codes for MIMO Systems with Iterative Detection and Decoding

    Authors: Binghui Shi, Yongpeng Wu, Yin Xu, Xiqi Gao, Xiaohu You, Wenjun Zhang

    Abstract: In this work, we propose the integration of GLDPC codes with short polar-like component codes, termed GLDPC codes with polar component codes (GLDPC-PC). This approach leverages the good distance properties of polar-like codes and mitigates their high decoding latency in long block lengths. A recently proposed soft-input soft-output decoder for polar-like codes enables effective iterative belief pr… ▽ More

    Submitted 9 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: submitted to globecom 2024

  46. arXiv:2404.14828  [pdf, other

    cs.IT

    GLDPC-PC Codes: Channel Coding Towards 6G Communications

    Authors: Li Shen, Yongpeng Wu, Yin Xu, Xiaohu You, Xiqi Gao, Wenjun Zhang

    Abstract: The sixth generation (6G) wireless communication system will improve the key technical indicators by one to two orders of magnitude, and come with some new features. As a crucial technique to enhance the reliability and efficiency of data transmission, the next generation channel coding is not only required to satisfy the stringent requirements of 6G, but also expected to be backward compatible to… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE Communications Magazine

  47. arXiv:2404.14712  [pdf, other

    physics.ao-ph cs.AI cs.DC eess.IV physics.geo-ph

    ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability

    Authors: Xiao Wang, Aristeidis Tsaris, Siyan Liu, Jong-Youl Choi, Ming Fan, Wei Zhang, Junqi Yin, Moetasim Ashfaq, Dan Lu, Prasanna Balaprakash

    Abstract: Earth system predictability is challenged by the complexity of environmental dynamics and the multitude of variables involved. Current AI foundation models, although advanced by leveraging large and heterogeneous data, are often constrained by their size and data integration, limiting their effectiveness in addressing the full range of Earth system prediction challenges. To overcome these limitati… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  48. arXiv:2404.14692  [pdf, other

    cs.SI cs.DB physics.soc-ph

    Deep Overlapping Community Search via Subspace Embedding

    Authors: Qing Sima, Jianke Yu, Xiaoyang Wang, Wenjie Zhang, Ying Zhang, Xuemin Lin

    Abstract: Community search (CS) aims to identify a set of nodes based on a specified query, leveraging structural cohesiveness and attribute homogeneity. This task enjoys various applications, ranging from fraud detection to recommender systems. In contrast to algorithm-based approaches, graph neural network (GNN) based methods define communities using ground truth labels, leveraging prior knowledge to expl… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.13558  [pdf, other

    cs.CV

    LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation

    Authors: Haoyu Zheng, Wenqiao Zhang, Yaoke Wang, Hao Zhou, Jiang Liu, Juncheng Li, Zheqi Lv, Siliang Tang, Yueting Zhuang

    Abstract: Revolutionary advancements in text-to-image models have unlocked new dimensions for sophisticated content creation, e.g., text-conditioned image editing, allowing us to edit the diverse images that convey highly complex visual concepts according to the textual guidance. Despite being promising, existing methods focus on texture- or non-rigid-based visual manipulation, which struggles to produce th… ▽ More

    Submitted 23 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 10 pages, 7 figures

  50. arXiv:2404.12766  [pdf, other

    cs.LG cs.CV

    Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation

    Authors: Wenxuan Zhang, Youssef Mohamed, Bernard Ghanem, Philip H. S. Torr, Adel Bibi, Mohamed Elhoseiny

    Abstract: We propose and study a realistic Continual Learning (CL) setting where learning algorithms are granted a restricted computational budget per time step while training. We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates. Previous proficient CL methods perform very poorly in this challenging setting. Overfitting to the sparse labeled data and ins… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.