Skip to main content

Showing 1–50 of 1,370 results for author: Xu, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05830  [pdf, ps, other

    cs.CV

    Mask-TS Net: Mask Temperature Scaling Uncertainty Calibration for Polyp Segmentation

    Authors: Yudian Zhang, Chenhao Xu, Kaiye Xu, Haijiang Zhu

    Abstract: Lots of popular calibration methods in medical images focus on classification, but there are few comparable studies on semantic segmentation. In polyp segmentation of medical images, we find most diseased area occupies only a small portion of the entire image, resulting in previous models being not well-calibrated for lesion regions but well-calibrated for background, despite their seemingly bette… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05542  [pdf, other

    cs.RO cs.MA

    Dynamic Deep Factor Graph for Multi-Agent Reinforcement Learning

    Authors: Yuchen Shi, Shihong Duan, Cheng Xu, Ran Wang, Fangwen Ye, Chau Yuen

    Abstract: This work introduces a novel value decomposition algorithm, termed \textit{Dynamic Deep Factor Graphs} (DDFG). Unlike traditional coordination graphs, DDFG leverages factor graphs to articulate the decomposition of value functions, offering enhanced flexibility and adaptability to complex value function structures. Central to DDFG is a graph structure generation policy that innovatively generates… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: submitted to Nature Communications

  3. arXiv:2405.04249  [pdf, other

    cs.LG cs.AI cs.DC

    Federated Learning for Cooperative Inference Systems: The Case of Early Exit Networks

    Authors: Caelin Kaplan, Tareq Si Salem, Angelo Rodio, Chuan Xu, Giovanni Neglia

    Abstract: As Internet of Things (IoT) technology advances, end devices like sensors and smartphones are progressively equipped with AI models tailored to their local memory and computational constraints. Local inference reduces communication costs and latency; however, these smaller models typically underperform compared to more sophisticated models deployed on edge servers or in the cloud. Cooperative Infe… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2405.04122  [pdf, other

    cs.LG cs.DC

    Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning

    Authors: Chunlin Tian, Zhan Shi, Xinpeng Qin, Li Li, Chengzhong Xu

    Abstract: Federated Learning (FL) enables multiple devices to collaboratively train a shared model while ensuring data privacy. The selection of participating devices in each training round critically affects both the model performance and training efficiency, especially given the vast heterogeneity in training capabilities and data distribution across devices. To address these challenges, we introduce a no… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  5. arXiv:2405.03355  [pdf, other

    cs.LG cs.CV

    On the Theory of Cross-Modality Distillation with Contrastive Learning

    Authors: Hangyu Lin, Chen Liu, Chengming Xu, Zhengqi Gao, Yanwei Fu, Yuan Yao

    Abstract: Cross-modality distillation arises as an important topic for data modalities containing limited knowledge such as depth maps and high-quality sketches. Such techniques are of great importance, especially for memory and privacy-restricted scenarios where labeled training data is generally unavailable. To solve the problem, existing label-free methods leverage a few pairwise unlabeled data to distil… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.03192  [pdf, other

    cs.LG cs.AI

    QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

    Authors: Chenhui Xu, Xinyao Wang, Fuxun Yu, Jinjun Xiong, Xiang Chen

    Abstract: Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framewor… ▽ More

    Submitted 8 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  7. arXiv:2405.02730  [pdf, other

    cs.CV

    U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

    Authors: Yuchuan Tian, Zhijun Tu, Hanting Chen, Jie Hu, Chao Xu, Yunhe Wang

    Abstract: Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  8. arXiv:2405.02145  [pdf, other

    cs.RO

    Characterized Diffusion and Spatial-Temporal Interaction Network for Trajectory Prediction in Autonomous Driving

    Authors: Haicheng Liao, Xuelin Li, Yongkang Li, Hanlin Kong, Chengyue Wang, Bonan Wang, Yanchen Guan, KaHou Tam, Zhenning Li, Chengzhong Xu

    Abstract: Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  9. arXiv:2405.01882  [pdf, other

    cs.RO cs.AI eess.SP

    Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot

    Authors: Zhanzhong Gu, Xiangjian He, Gengfa Fang, Chengpei Xu, Feng Xia, Wenjing Jia

    Abstract: Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  10. arXiv:2405.01333  [pdf, other

    cs.RO cs.CV

    NeRF in Robotics: A Survey

    Authors: Guangming Wang, Lei Pan, Songyou Peng, Shaohui Liu, Chenfeng Xu, Yanzi Miao, Wei Zhan, Masayoshi Tomizuka, Marc Pollefeys, Hesheng Wang

    Abstract: Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields. The recent emergence of neural implicit representations has introduced radical innovation to this field as implicit representations enable numerous capabilities. Among these, the Neural Radiance Field (NeRF) has sparked a trend because of the huge representational advantages, such as sim… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 21 pages, 19 figures

  11. arXiv:2405.01266  [pdf, other

    cs.RO cs.AI

    MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving

    Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Huanming Shen, Bonan Wang, Dongping Liao, Guofa Li, Chengzhong Xu

    Abstract: This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph conv… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  12. arXiv:2405.01029  [pdf, other

    cs.AI cs.LG

    MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

    Authors: Jianan Zhou, Zhiguang Cao, Yaoxin Wu, Wen Song, Yining Ma, Jie Zhang, Chi Xu

    Abstract: Learning to solve vehicle routing problems (VRPs) has garnered much attention. However, most neural solvers are only structured and trained independently on a specific problem, making them less generic and practical. In this paper, we aim to develop a unified neural solver that can cope with a range of VRP variants simultaneously. Specifically, we propose a multi-task vehicle routing solver with m… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024

  13. arXiv:2405.00362  [pdf, other

    cs.RO cs.CG cs.GR

    Implicit Swept Volume SDF: Enabling Continuous Collision-Free Trajectory Generation for Arbitrary Shapes

    Authors: Jingping Wang, Tingrui Zhang, Qixuan Zhang, Chuxiao Zeng, Jingyi Yu, Chao Xu, Lan Xu, Fei Gao

    Abstract: In the field of trajectory generation for objects, ensuring continuous collision-free motion remains a huge challenge, especially for non-convex geometries and complex environments. Previous methods either oversimplify object shapes, which results in a sacrifice of feasible space or rely on discrete sampling, which suffers from the "tunnel effect". To address these limitations, we propose a novel… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: accecpted by SIGGRAPH2024&TOG. Joint First Authors: Jingping Wang,Tingrui Zhang, Joint Corresponding authors: Fei Gao, Lan Xu

  14. arXiv:2405.00266  [pdf, other

    cs.NI

    Robot-As-A-Sensor: Forming a Sensing Network with Robots for Underground Mining Missions

    Authors: Xiaoyu Ai, Chengpei Xu, Binghao Li, Feng Xia

    Abstract: Nowadays, robots are deployed as mobile platforms equipped with sensing, communication and computing capabilities, especially in the mining industry, where they perform tasks in hazardous and repetitive environments. Despite their potential, individual robots face significant limitations when completing complex tasks that require the collaboration of multiple robots. This collaboration requires a… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Submitted to Special Issue on Neuro-Inspired Learning for Robotics for IEEE Transactions on Cognitive and Developmental Systems

  15. arXiv:2405.00026  [pdf

    cs.CE cs.AI

    Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach

    Authors: Mengran Zhu, Ye Zhang, Yulu Gong, Changxin Xu, Yafei Xiang

    Abstract: Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transact… ▽ More

    Submitted 26 February, 2024; originally announced May 2024.

  16. arXiv:2404.19245  [pdf, other

    cs.CL cs.AI

    HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning

    Authors: Chunlin Tian, Zhan Shi, Zhijiang Guo, Li Li, Chengzhong Xu

    Abstract: Adapting Large Language Models (LLMs) to new tasks through fine-tuning has been made more efficient by the introduction of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA. However, these methods often underperform compared to full fine-tuning, particularly in scenarios involving complex datasets. This issue becomes even more pronounced in complex domains, highlighting the need for… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures

  17. arXiv:2404.18947  [pdf, other

    cs.LG cs.AI

    Multimodal Fusion on Low-quality Data: A Comprehensive Survey

    Authors: Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang

    Abstract: Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges… ▽ More

    Submitted 5 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Feel free to comment on our manuscript: qingyangzhang@tju.edu.cn

  18. arXiv:2404.17826  [pdf, other

    cs.IR

    A Taxation Perspective for Fair Re-ranking

    Authors: Chen Xu, Xiaopeng Ye, Wenjie Wang, Liang Pang, Jun Xu, Tat-Seng Chua

    Abstract: Fair re-ranking aims to redistribute ranking slots among items more equitably to ensure responsibility and ethics. The exploration of redistribution problems has a long history in economics, offering valuable insights for conceptualizing fair re-ranking as a taxation process. Such a formulation provides us with a fresh perspective to re-examine fair re-ranking and inspire the development of new me… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted in SIGIR 2024

  19. arXiv:2404.17617  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning

    Authors: Tao Liu, Yuhang Zhang, Zhu Feng, Zhiqin Yang, Chen Xu, Dapeng Man, Wu Yang

    Abstract: Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted,we propose a Full Combination… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(19): 21359-21367

  20. arXiv:2404.17589  [pdf

    cs.IR cs.LG

    An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

    Authors: Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

    Abstract: As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However,… ▽ More

    Submitted 6 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  21. arXiv:2404.17520  [pdf, other

    cs.RO

    A Cognitive-Driven Trajectory Prediction Model for Autonomous Driving in Mixed Autonomy Environment

    Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Bonan Wang, Hanlin Kong, Yanchen Guan, Guofa Li, Zhiyong Cui, Chengzhong Xu

    Abstract: As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traff… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  22. arXiv:2404.17485  [pdf, other

    cs.NI

    A Survey on Industrial Internet of Things (IIoT) Testbeds for Connectivity Research

    Authors: Tianyu Zhang, Chuanyu Xue, Jiachen Wang, Zelin Yun, Natong Lin, Song Han

    Abstract: Industrial Internet of Things (IIoT) technologies have revolutionized industrial processes, enabling smart automation, real-time data analytics, and improved operational efficiency across diverse industry sectors. IIoT testbeds play a critical role in advancing IIoT research and development (R&D) to provide controlled environments for technology evaluation before their real-world deployment. In th… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  23. arXiv:2404.17238  [pdf, other

    cs.IR

    TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content

    Authors: Meng Yan, Haibin Huang, Ying Liu, Juan Zhao, Xiyue Gao, Cai Xu, Ziyu Guan, Wei Zhao

    Abstract: Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal incon… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  24. arXiv:2404.17151  [pdf, other

    cs.MM cs.CV

    MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

    Authors: Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He

    Abstract: Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a n… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by Transaction on Multimedia

  25. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  26. arXiv:2404.15961  [pdf, other

    eess.SP cs.AI

    Soil analysis with machine-learning-based processing of stepped-frequency GPR field measurements: Preliminary study

    Authors: Chunlei Xu, Michael Pregesbauer, Naga Sravani Chilukuri, Daniel Windhager, Mahsa Yousefi, Pedro Julian, Lothar Ratschbacher

    Abstract: Ground Penetrating Radar (GPR) has been widely studied as a tool for extracting soil parameters relevant to agriculture and horticulture. When combined with Machine-Learning-based (ML) methods, high-resolution Stepped Frequency Countinuous Wave Radar (SFCW) measurements hold the promise to give cost effective access to depth resolved soil parameters, including at root-level depth. In a first step… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  27. arXiv:2404.15254  [pdf, other

    cs.CV

    UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

    Authors: Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He

    Abstract: This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios. The UniMER dataset consists of a large-scale training set UniMER-1M offering an unprecedented scale and diversity with one million training instances and a meticulously designed test set UniMER-Test that reflects a diverse range of formula distributio… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures

  28. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg , et al. (62 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 12 pages

  29. arXiv:2404.13903  [pdf, other

    cs.CV

    Accelerating Image Generation with Sub-path Linear Approximation Model

    Authors: Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

    Abstract: Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining hi… ▽ More

    Submitted 22 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  30. arXiv:2404.13400  [pdf, other

    cs.CV

    HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding

    Authors: Linhui Xiao, Xiaoshan Yang, Fang Peng, Yaowei Wang, Changsheng Xu

    Abstract: Visual grounding, which aims to ground a visual region via natural language, is a task that heavily relies on cross-modal alignment. Existing works utilized uni-modal pre-trained models to transfer visual/linguistic knowledge separately while ignoring the multimodal corresponding information. Motivated by recent advancements in contrastive language-image pre-training and low-rank adaptation (LoRA)… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: The project page: https://github.com/linhuixiao/HiVG

  31. arXiv:2404.13349  [pdf, other

    cs.DC cs.LG

    Breaking the Memory Wall for Heterogeneous Federated Learning with Progressive Training

    Authors: Yebo Wu, Li Li, Chunlin Tian, Chengzhong Xu

    Abstract: This paper presents ProFL, a novel progressive FL framework to effectively break the memory wall. Specifically, ProFL divides the model into different blocks based on its original architecture. Instead of updating the full model in each training round, ProFL first trains the front blocks and safely freezes them after convergence. Training of the next block is then triggered. This process iterates… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  32. arXiv:2404.12353  [pdf, other

    cs.CV cs.AI

    V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

    Authors: Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

    Abstract: Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective fine-tuning of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooki… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  33. arXiv:2404.12104  [pdf, other

    cs.CV cs.CL cs.LG

    Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

    Authors: Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

    Abstract: The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designe… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 42 pages, 17 figures, 29 tables

  34. arXiv:2404.11968  [pdf, other

    cs.CL

    P-NAL: an Effective and Interpretable Entity Alignment Method

    Authors: Chuanhao Xu, Jingwei Cheng, Fu Zhang

    Abstract: Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings' constraint and learn to align the embeddings. The structural and side information are usually utilized via embedding propagation, aggregation or interaction. However, the details of the underlying logical inference st… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 2 figures

    ACM Class: I.2.4

  35. arXiv:2404.11944  [pdf, other

    cs.LG

    Trusted Multi-view Learning with Label Noise

    Authors: Cai Xu, Yilin Zhang, Ziyu Guan, Wei Zhao

    Abstract: Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty, which significantly restricts their applications in safety-critical applications. To address this issue, researchers propose trusted multi-view methods that learn the class distribution for each instance, enabling the estimation of classification probabilities and uncertainty. However,… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures, accepted at IJCAI 2024

    MSC Class: I.2.6

  36. arXiv:2404.11457  [pdf, other

    cs.IR cs.AI cs.CL

    Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models

    Authors: Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, Jun Xu

    Abstract: With the rapid advancement of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a compre… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  37. arXiv:2404.11291  [pdf, other

    cs.CV

    Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

    Authors: Buzhen Huang, Chen Li, Chongyang Xu, Liang Pan, Yangang Wang, Gim Hee Lee

    Abstract: Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration, but overlook the modeling of close interactions. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. The main challenge of this task comes from insufficient visual information caused by depth ambiguity and severe inter-person occ… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  38. arXiv:2404.11161  [pdf, other

    cs.CV cs.LG

    Pre-processing matters: A segment search method for WSI classification

    Authors: Jun Wang, Yufei Cui, Yu Mao, Nan Guan, Chun Jason Xue

    Abstract: Pre-processing for whole slide images can affect classification performance both in the training and inference stages. Our study analyzes the impact of pre-processing parameters on inference and training across single- and multiple-domain datasets. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose a novel Similarity-based Simulated Annealing approach f… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  39. arXiv:2404.10955  [pdf, other

    cs.DS

    The Traveling Tournament Problem: Improved Algorithms Based on Cycle Packing

    Authors: Jingyang Zhao, Mingyu Xiao, Chao Xu

    Abstract: The Traveling Tournament Problem (TTP) is a well-known benchmark problem in the field of tournament timetabling, which asks us to design a double round-robin schedule such that each pair of teams plays one game in each other's home venue, minimizing the total distance traveled by all $n$ teams ($n$ is even). TTP-$k$ is the problem with one more constraint that each team can have at most $k$-consec… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: A preliminary version of this article was presented at MFCS 2022; Sumitted in 2022

  40. arXiv:2404.10541  [pdf, other

    cs.RO

    MPCOM: Robotic Data Gathering with Radio Mapping and Model Predictive Communication

    Authors: Zhiyou Ji, Guoliang Li, Ruihua Han, Shuai Wang, Bing Bai, Wei Xu, Kejiang Ye, Chengzhong Xu

    Abstract: Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guide… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: submit to IROS

  41. arXiv:2404.10178  [pdf, other

    q-bio.BM cs.CV

    CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

    Authors: Chentianye Xu, Xueying Zhan, Min Xu

    Abstract: Cryo-electron microscopy (cryo-EM) emerges as a pivotal technology for determining the architecture of cells, viruses, and protein assemblies at near-atomic resolution. Traditional particle picking, a key step in cryo-EM, struggles with manual effort and automated methods' sensitivity to low signal-to-noise ratio (SNR) and varied particle orientations. Furthermore, existing neural network (NN)-bas… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  42. arXiv:2404.09496  [pdf, other

    cs.CV

    Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

    Authors: Genjia Liu, Yue Hu, Chenxin Xu, Weibo Mao, Junhao Ge, Zhengxiang Huang, Yifan Lu, Yinda Xu, Junkai Xia, Yafei Wang, Siheng Chen

    Abstract: Vehicle-to-everything-aided autonomous driving (V2X-AD) has a huge potential to provide a safer driving solution. Despite extensive researches in transportation and communication to support V2X-AD, the actual utilization of these infrastructures and communication resources in enhancing driving performances remains largely unexplored. This highlights the necessity of collaborative autonomous drivin… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  43. arXiv:2404.08965  [pdf, other

    cs.CV cs.MM

    Seeing Text in the Dark: Algorithm and Benchmark

    Authors: Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang

    Abstract: Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for l… ▽ More

    Submitted 23 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  44. arXiv:2404.08412  [pdf, other

    physics.flu-dyn cs.AI

    PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction

    Authors: Siming Shan, Pengkai Wang, Song Chen, Jiaxu Liu, Chao Xu, Shengze Cai

    Abstract: The use of machine learning in fluid dynamics is becoming more common to expedite the computation when solving forward and inverse problems of partial differential equations. Yet, a notable challenge with existing convolutional neural network (CNN)-based methods for data fidelity enhancement is their reliance on specific low-fidelity data patterns and distributions during the training phase. In ad… ▽ More

    Submitted 9 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 22 pages

  45. arXiv:2404.08364  [pdf, other

    cs.DC

    FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

    Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong

    Abstract: Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarras… ▽ More

    Submitted 26 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  46. arXiv:2404.06676  [pdf

    cs.LG eess.SP stat.AP

    Topological Feature Search Method for Multichannel EEG: Application in ADHD classification

    Authors: Tianming Cai, Guoying Zhao, Junbin Zang, Chen Zong, Zhidong Zhang, Chenyang Xue

    Abstract: In recent years, the preliminary diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) using electroencephalography (EEG) has garnered attention from researchers. EEG, known for its expediency and efficiency, plays a pivotal role in the diagnosis and treatment of ADHD. However, the non-stationarity of EEG signals and inter-subject variability pose challenges to the diagnostic and classifica… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  47. arXiv:2404.05979  [pdf, other

    cs.CV

    StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

    Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Yaowei Wang, Changsheng Xu

    Abstract: Story visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although these models have shown notable progress, there are still three flaws. 1) The unidirectional generation of auto-regressive manner restricts the usability i… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 17 pages

  48. arXiv:2404.05285  [pdf, other

    cs.CV

    Detecting Every Object from Events

    Authors: Haitian Zhang, Chang Xu, Xinya Wang, Bingde Liu, Guang Hua, Lei Yu, Wen Yang

    Abstract: Object detection is critical in autonomous driving, and it is more practical yet challenging to localize objects of unknown categories: an endeavour known as Class-Agnostic Object Detection (CAOD). Existing studies on CAOD predominantly rely on ordinary cameras, but these frame-based sensors usually have high latency and limited dynamic range, leading to safety risks in real-world scenarios. In th… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  49. arXiv:2404.05196  [pdf, other

    cs.CV

    HSViT: Horizontally Scalable Vision Transformer

    Authors: Chenhao Xu, Chang-Tsun Li, Chee Peng Lim, Douglas Creighton

    Abstract: While the Vision Transformer (ViT) architecture gains prominence in computer vision and attracts significant attention from multimedia communities, its deficiency in prior knowledge (inductive bias) regarding shift, scale, and rotational invariance necessitates pre-training on large-scale datasets. Furthermore, the growing layers and parameters in both ViT and convolutional neural networks (CNNs)… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  50. arXiv:2404.04875  [pdf, other

    cs.CV

    NeRF2Points: Large-Scale Point Cloud Generation From Street Views' Radiance Field Optimization

    Authors: Peng Tu, Xun Zhou, Mingming Wang, Xiaojun Yang, Bo Peng, Ping Chen, Xiu Su, Yawen Huang, Yefeng Zheng, Chang Xu

    Abstract: Neural Radiance Fields (NeRF) have emerged as a paradigm-shifting methodology for the photorealistic rendering of objects and environments, enabling the synthesis of novel viewpoints with remarkable fidelity. This is accomplished through the strategic utilization of object-centric camera poses characterized by significant inter-frame overlap. This paper explores a compelling, alternative utility o… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 18 pages