Skip to main content

Showing 1–50 of 4,475 results for author: Wang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05861  [pdf, other

    cs.RO cs.AI cs.LG

    ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers

    Authors: Liangliang Chen, Shiyu Jin, Haoyu Wang, Liangjun Zhang

    Abstract: Excavators are crucial for diverse tasks such as construction and mining, while autonomous excavator systems enhance safety and efficiency, address labor shortages, and improve human working conditions. Different from the existing modularized approaches, this paper introduces ExACT, an end-to-end autonomous excavator system that processes raw LiDAR, camera data, and joint positions to control exca… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: ICRA Workshop 2024: 3rd Workshop on Future of Construction: Lifelong Learning Robots in Changing Construction Sites

  2. arXiv:2405.05721  [pdf, other

    cs.NE

    A Newton Method for Hausdorff Approximations of the Pareto Front within Multi-objective Evolutionary Algorithms

    Authors: Hao Wang, Angel E. Rodriguez-Fernandez, Lourdes Uribe, André Deutz, Oziel Cortés-Piña, Oliver Schütze

    Abstract: A common goal in evolutionary multi-objective optimization is to find suitable finite-size approximations of the Pareto front of a given multi-objective optimization problem. While many multi-objective evolutionary algorithms have proven to be very efficient in finding good Pareto front approximations, they may need quite a few resources or may even fail to obtain optimal or nearly approximations.… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.05702  [pdf, other

    cs.RO

    NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap

    Authors: Mingrui Li, Jingwei Huang, Lei Sun, Aaron Xuxiang Tian, Tianchen Deng, Hongyu Wang

    Abstract: Gaussian Splatting has garnered widespread attention due to its exceptional performance. Consequently, SLAM systems based on Gaussian Splatting have emerged, leveraging its capabilities for rapid real-time rendering and high-fidelity mapping. However, current Gaussian Splatting SLAM systems usually struggle with large scene representation and lack effective loop closure adjustments and scene gener… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 9pages, 4 figures

  4. arXiv:2405.05648  [pdf, other

    cs.RO cs.CV

    ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera

    Authors: Jun Shi, Yong A, Yixiang Jin, Dingzhe Li, Haoyu Niu, Zhezhu Jin, He Wang

    Abstract: In this paper, we tackle the problem of grasping transparent and specular objects. This issue holds importance, yet it remains unsolved within the field of robotics due to failure of recover their accurate geometry by depth cameras. For the first time, we propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera. ASGrasp utilizes a two-layer learning-based stereo net… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: IEEE International Conference on Robotics and Automation (ICRA), 2024

  5. arXiv:2405.05252  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

    Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

    Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  6. arXiv:2405.04844  [pdf, ps, other

    cs.IR

    Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

    Authors: Kai Zheng, Haijun Zhao, Rui Huang, Beichuan Zhang, Na Mou, Yanan Niu, Yang Song, Hongning Wang, Kun Gai

    Abstract: The Probability Ranking Principle (PRP) has been considered as the foundational standard in the design of information retrieval (IR) systems. The principle requires an IR module's returned list of results to be ranked with respect to the underlying user interests, so as to maximize the results' utility. Nevertheless, we point out that it is inappropriate to indiscriminately apply PRP through eve… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by WWW 2024

  7. arXiv:2405.04760  [pdf, other

    cs.CR cs.AI

    Large Language Models for Cyber Security: A Systematic Literature Review

    Authors: HanXiang Xu, ShenAo Wang, NingKe Li, KaiLong Wang, YanJie Zhao, Kai Chen, Ting Yu, Yang Liu, HaoYu Wang

    Abstract: The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we con… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 46 pages,6 figures

  8. arXiv:2405.04370  [pdf, other

    cs.CV

    Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos

    Authors: Junyi Ma, Jingyi Xu, Xieyuanli Chen, Hesheng Wang

    Abstract: Understanding how humans would behave during hand-object interaction is vital for applications in service robot manipulation and extended reality. To achieve this, some recent works have been proposed to simultaneously predict hand trajectories and object affordances on human egocentric videos. They are regarded as the representation of future hand-object interactions, indicating potential human m… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2405.04332  [pdf, other

    cs.CR

    WALLETRADAR: Towards Automating the Detection of Vulnerabilities in Browser-based Cryptocurrency Wallets

    Authors: Pengcheng Xia, Yanhui Guo, Zhaowen Lin, Jun Wu, Pengbo Duan, Ningyu He, Kailong Wang, Tianming Liu, Yinliang Yue, Guoai Xu, Haoyu Wang

    Abstract: Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive se… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Just accepted by the Automated Software Engineering Journal

  10. arXiv:2405.04103  [pdf, other

    cs.CV

    COM3D: Leveraging Cross-View Correspondence and Cross-Modal Mining for 3D Retrieval

    Authors: Hao Wu, Ruochong LI, Hao Wang, Hui Xiong

    Abstract: In this paper, we investigate an open research task of cross-modal retrieval between 3D shapes and textual descriptions. Previous approaches mainly rely on point cloud encoders for feature extraction, which may ignore key inherent features of 3D shapes, including depth, spatial hierarchy, geometric continuity, etc. To address this issue, we propose COM3D, making the first attempt to exploit the cr… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024 oral

  11. arXiv:2405.04097  [pdf, other

    cs.CV cs.AI cs.CY cs.LG cs.MM

    Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes

    Authors: Ammarah Hashmi, Sahibzada Adil Shahzad, Chia-Wen Lin, Yu Tsao, Hsin-Min Wang

    Abstract: The emergence of contemporary deepfakes has attracted significant attention in machine learning research, as artificial intelligence (AI) generated synthetic media increases the incidence of misinterpretation and is difficult to distinguish from genuine content. Currently, machine learning techniques have been extensively studied for automatically detecting deepfakes. However, human perception has… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  12. arXiv:2405.03728  [pdf, other

    cs.NE cs.AI

    GLHF: General Learned Evolutionary Algorithm Via Hyper Functions

    Authors: Xiaobin Li, Kai Wu, Yujian Betterest Li, Xiaoyu Zhang, Handing Wang, Jing Liu

    Abstract: Pretrained Optimization Models (POMs) leverage knowledge gained from optimizing various tasks, providing efficient solutions for new optimization challenges through direct usage or fine-tuning. Despite the inefficiencies and limited generalization abilities observed in current POMs, our proposed model, the general pre-trained optimization model (GPOM), addresses these shortcomings. GPOM constructs… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  13. arXiv:2405.03217  [pdf, other

    cs.CR cs.AR

    PCG: Mitigating Conflict-based Cache Side-channel Attacks with Prefetching

    Authors: Fang Jiang, Fei Tong, Hongyu Wang, Xiaoyu Cheng, Zhe Zhou, Ming Ling, Yuxing Mao

    Abstract: To defend against conflict-based cache side-channel attacks, cache partitioning or remapping techniques were proposed to prevent set conflicts between different security domains or obfuscate the locations of such conflicts. But such techniques complicate cache design and may result in significant performance penalties. Therefore, there have been lightweight prefetching-based schemes proposed to in… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 12 pages, 9 figures, submitting to a journal

  14. arXiv:2405.03140  [pdf, other

    cs.LG

    TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

    Authors: Xiwen Chen, Peijie Qiu, Wenhui Zhu, Huayu Li, Hao Wang, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

    Abstract: Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reform… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  15. arXiv:2405.02933  [pdf, other

    cs.CL

    Relay Decoding: Concatenating Large Language Models for Machine Translation

    Authors: Chengpeng Fu, Xiaocheng Feng, Yichong Huang, Wenshuai Huo, Baohang Li, Hui Wang, Bin Qin, Ting Liu

    Abstract: Leveraging large language models for machine translation has demonstrated promising results. However, it does require the large language models to possess the capability of handling both the source and target languages in machine translation. When it is challenging to find large models that support the desired languages, resorting to continuous learning methods becomes a costly endeavor. To mitiga… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Work in progress

  16. arXiv:2405.02911  [pdf, other

    cs.CV

    Multimodal Sense-Informed Prediction of 3D Human Motions

    Authors: Zhenyu Lou, Qiongjie Cui, Haofan Wang, Xu Tang, Hong Zhou

    Abstract: Predicting future human pose is a fundamental application for machine intelligence, which drives robots to plan their behavior and paths ahead of time to seamlessly accomplish human-robot collaboration in real-world 3D scenarios. Despite encouraging results, existing approaches rarely consider the effects of the external scene on the motion sequence, leading to pronounced artifacts and physical im… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  17. arXiv:2405.02834  [pdf, other

    cs.CV

    Scene-Adaptive Person Search via Bilateral Modulations

    Authors: Yimin Jiang, Huibing Wang, Jinjia Peng, Xianping Fu, Yang Wang

    Abstract: Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a S… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  18. arXiv:2405.02832  [pdf, other

    cs.CV

    Fast One-Stage Unsupervised Domain Adaptive Person Search

    Authors: Tianxiang Cui, Huibing Wang, Jinjia Peng, Ruoxi Deng, Xianping Fu, Yang Wang

    Abstract: Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations, which is extremely challenging due to the unexpected variations of the unlabeled domains. However, most existing methods dedicate to developing multi-stage models to adapt domain variations while using clustering for iterative model training, which inevitably increases mod… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  19. arXiv:2405.02778  [pdf, other

    cs.IR

    Improve Temporal Awareness of LLMs for Sequential Recommendation

    Authors: Zhendong Chu, Zichao Wang, Ruiyi Zhang, Yangfeng Ji, Hongning Wang, Tong Sun

    Abstract: Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks. However, it is empirically found that LLMs fall short in recognizing and utilizing temporal information, rendering poor performance in tasks that require an understanding of sequential data, such as sequential recommendation. In this paper, we aim to improve temporal awar… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 10 pages

  20. arXiv:2405.02686  [pdf, other

    cs.CV cs.AI

    Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images

    Authors: Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai

    Abstract: Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 3 pages

  21. arXiv:2405.02639  [pdf, other

    cs.RO

    Wall-Climbing Performance of Gecko-inspired Robot with Soft Feet and Digits enhanced by Gravity Compensation

    Authors: Bingcheng Wang, Zhiyuan Weng, Haoyu Wang, Shuangjie Wang, Zhouyi Wang, Zhendong Dai, Ardian Jusufi

    Abstract: Gravitational forces can induce deviations in body posture from desired configurations in multi-legged arboreal robot locomotion with low leg stiffness, affecting the contact angle between the swing leg's end-effector and the climbing surface during the gait cycle. The relationship between desired and actual foot positions is investigated here in a leg-stiffness-enhanced model under external force… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  22. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  23. arXiv:2405.02004  [pdf, other

    cs.CV

    M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation

    Authors: Yingshuang Zou, Yikang Ding, Xi Qiu, Haoqian Wang, Haotian Zhang

    Abstract: This paper presents a novel self-supervised two-frame multi-camera metric depth estimation network, termed M${^2}$Depth, which is designed to predict reliable scale-aware surrounding depth in autonomous driving. Unlike the previous works that use multi-view images from a single time-step or multiple time-step images from a single camera, M${^2}$Depth takes temporally adjacent two-frame images from… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  24. arXiv:2405.01761  [pdf, other

    stat.ML cs.LG

    Multivariate Bayesian Last Layer for Regression: Uncertainty Quantification and Disentanglement

    Authors: Han Wang, Eiji Kawasaki, Guillaume Damblin, Geoffrey Daniel

    Abstract: We present new Bayesian Last Layer models in the setting of multivariate regression under heteroscedastic noise, and propose an optimization algorithm for parameter learning. Bayesian Last Layer combines Bayesian modelling of the predictive distribution with neural networks for parameterization of the prior, and has the attractive property of uncertainty quantification with a single forward pass.… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  25. arXiv:2405.01741  [pdf, other

    cs.CR cs.AI cs.AR cs.LG

    PVF (Parameter Vulnerability Factor): A Quantitative Metric Measuring AI Vulnerability and Resilience Against Parameter Corruptions

    Authors: Xun Jiao, Fred Lin, Harish D. Dixit, Joel Coburn, Abhinav Pandey, Han Wang, Jianyu Huang, Venkat Ramesh, Wang Xu, Daniel Moore, Sriram Sankar

    Abstract: Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them inevitably and increasingly susceptible to hardware faults (e.g., bit flips) that can potentially corrupt model parameters. Given this challenge, this paper aims to answer a critical… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  26. arXiv:2405.01356  [pdf, other

    cs.CV

    Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance

    Authors: Kelvin C. K. Chan, Yang Zhao, Xuhui Jia, Ming-Hsuan Yang, Huisheng Wang

    Abstract: In subject-driven text-to-image synthesis, the synthesis process tends to be heavily influenced by the reference images provided by users, often overlooking crucial attributes detailed in the text prompt. In this work, we propose Subject-Agnostic Guidance (SAG), a simple yet effective solution to remedy the problem. We show that through constructing a subject-agnostic condition and applying our pr… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  27. arXiv:2405.01333  [pdf, other

    cs.RO cs.CV

    NeRF in Robotics: A Survey

    Authors: Guangming Wang, Lei Pan, Songyou Peng, Shaohui Liu, Chenfeng Xu, Yanzi Miao, Wei Zhan, Masayoshi Tomizuka, Marc Pollefeys, Hesheng Wang

    Abstract: Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields. The recent emergence of neural implicit representations has introduced radical innovation to this field as implicit representations enable numerous capabilities. Among these, the Neural Radiance Field (NeRF) has sparked a trend because of the huge representational advantages, such as sim… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 21 pages, 19 figures

  28. arXiv:2405.01280  [pdf, other

    cs.CL

    Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation

    Authors: Hao Wang, Tetsuro Morimura, Ukyo Honda, Daisuke Kawahara

    Abstract: Non-autoregressive (NAR) language models are known for their low latency in neural machine translation (NMT). However, a performance gap exists between NAR and autoregressive models due to the large decoding space and difficulty in capturing dependency between target words accurately. Compounding this, preparing appropriate training data for NAR models is a non-trivial task, often exacerbating exp… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  29. arXiv:2405.00957  [pdf, other

    cs.LG cs.AI cs.SI

    IntraMix: Intra-Class Mixup Generation for Accurate Labels and Neighbors

    Authors: Shenghe Zheng, Hongzhi Wang, Xianglong Liu

    Abstract: Graph Neural Networks (GNNs) demonstrate excellent performance on graphs, with their core idea about aggregating neighborhood information and learning from labels. However, the prevailing challenges in most graph datasets are twofold of Insufficient High-Quality Labels and Lack of Neighborhoods, resulting in weak GNNs. Existing data augmentation methods designed to address these two issues often t… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 18 pages

  30. arXiv:2405.00715  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation

    Authors: Hanyin Wang, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Korsapati, Jimeng Sun

    Abstract: Large Language Models (LLMs) have shown promising capabilities in handling clinical text summarization tasks. In this study, we demonstrate that a small open-source LLM can be effectively trained to generate high-quality clinical notes from outpatient patient-doctor dialogues. We achieve this through a comprehensive domain- and task-specific adaptation process for the LLaMA-2 13 billion parameter… ▽ More

    Submitted 25 April, 2024; originally announced May 2024.

  31. arXiv:2405.00705  [pdf, other

    cs.CL cs.LG

    SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning

    Authors: Yexiao He, Ziyao Wang, Zheyu Shen, Guoheng Sun, Yucong Dai, Yongkai Wu, Hongyi Wang, Ang Li

    Abstract: The pre-trained Large Language Models (LLMs) can be adapted for many downstream tasks and tailored to align with human preferences through fine-tuning. Recent studies have discovered that LLMs can achieve desirable performance with only a small amount of high-quality data, suggesting that a large amount of the data in these extensive datasets is redundant or even harmful. Identifying high-quality… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

  32. arXiv:2405.00704  [pdf, ps, other

    cs.CL cs.AI

    A Survey on the Real Power of ChatGPT

    Authors: Ming Liu, Ran Liu, Hua Wang, Wray Buntine

    Abstract: ChatGPT has changed the AI community and an active research line is the performance evaluation of ChatGPT. A key challenge for the evaluation is that ChatGPT is still closed-source and traditional benchmark datasets may have been used by ChatGPT as the training data. In this paper, (i) we survey recent studies which uncover the real performance levels of ChatGPT in seven categories of NLP tasks, (… ▽ More

    Submitted 22 April, 2024; originally announced May 2024.

    Comments: 9 pages, 2 tables

  33. arXiv:2405.00648  [pdf, other

    cs.SE

    HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models

    Authors: Ningke Li, Yuekang Li, Yi Liu, Ling Shi, Kailong Wang, Haoyu Wang

    Abstract: Large language models (LLMs) have transformed the landscape of language processing, yet struggle with significant challenges in terms of security, privacy, and the generation of seemingly coherent but factually inaccurate outputs, commonly referred to as hallucinations. Among these challenges, one particularly pressing issue is Fact-Conflicting Hallucination (FCH), where LLMs generate content that… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  34. arXiv:2405.00410  [pdf, other

    cs.LG

    UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

    Authors: Yucheng Shi, Alexandros Agapitos, David Lynch, Giorgio Cruciata, Hao Wang, Yayu Yao, Aleksandar Milenovic

    Abstract: In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to appr… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  35. arXiv:2405.00252  [pdf, other

    quant-ph cs.AI cs.LG

    Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent

    Authors: Pingzhi Li, Junyu Liu, Hanrui Wang, Tianlong Chen

    Abstract: Optimization techniques in deep learning are predominantly led by first-order gradient methodologies, such as SGD. However, neural network training can greatly benefit from the rapid convergence characteristics of second-order optimization. Newton's GD stands out in this category, by rescaling the gradient using the inverse Hessian. Nevertheless, one of its major bottlenecks is matrix inversion, w… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Our code is provided at https://github.com/UNITES-Lab/q-newton

  36. arXiv:2405.00168  [pdf, other

    cs.CV

    Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method

    Authors: Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, He Wang, Pengcheng Shao, Chunyang Cheng, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler

    Abstract: RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing benchmarks predominantly consist of videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quali… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  37. arXiv:2404.19723  [pdf, other

    eess.AS cs.SD

    Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech

    Authors: Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu

    Abstract: Recent popular decoder-only text-to-speech models are known for their ability of generating natural-sounding speech. However, such models sometimes suffer from word skipping and repeating due to the lack of explicit monotonic alignment constraints. In this paper, we notice from the attention maps that some particular attention heads of the decoder-only model indicate the alignments between speech… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  38. arXiv:2404.19462  [pdf, other

    cs.LG

    Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation

    Authors: Cengis Hasan, Alexandros Agapitos, David Lynch, Alberto Castagna, Giorgio Cruciata, Hao Wang, Aleksandar Milenovic

    Abstract: We present a method that addresses the pain point of long lead-time required to deploy cell-level parameter optimisation policies to new wireless network sites. Given a sequence of action spaces represented by overlapping subsets of cell-level configuration parameters provided by domain experts, we formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulatio… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Published at ECML 2023

  39. arXiv:2404.19307  [pdf, other

    cs.SE cs.CR

    Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated Monkey

    Authors: Han Hu, Han Wang, Ruiqi Dong, Xiao Chen, Chunyang Chen

    Abstract: Mobile apps are ubiquitous in our daily lives for supporting different tasks such as reading and chatting. Despite the availability of many GUI testing tools, app testers still struggle with low testing code coverage due to tools frequently getting stuck in loops or overlooking activities with concealed entries. This results in a significant amount of testing time being spent on redundant and repe… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  40. arXiv:2404.18758  [pdf, other

    cs.CV cs.LG

    Transitive Vision-Language Prompt Learning for Domain Generalization

    Authors: Liyuan Wang, Yan Jin, Zhen Chen, Jinlin Wu, Mengke Li, Yang Lu, Hanzi Wang

    Abstract: The vision-language pre-training has enabled deep models to make a huge step forward in generalizing across unseen domains. The recent learning method based on the vision-language pre-training model is a great tool for domain generalization and can solve this problem to a large extent. However, there are still some issues that an advancement still suffers from trading-off between domain invariance… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  41. arXiv:2404.18539  [pdf, other

    cs.CV cs.AI

    Enhancing Boundary Segmentation for Topological Accuracy with Skeleton-based Methods

    Authors: Chuni Liu, Boyuan Ma, Xiaojuan Ban, Yujie Xie, Hao Wang, Weihua Xue, Jingchao Ma, Ke Xu

    Abstract: Topological consistency plays a crucial role in the task of boundary segmentation for reticular images, such as cell membrane segmentation in neuron electron microscopic images, grain boundary segmentation in material microscopic images and road segmentation in aerial images. In these fields, topological changes in segmentation results have a serious impact on the downstream tasks, which can even… ▽ More

    Submitted 7 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  42. arXiv:2404.18534  [pdf, other

    cs.CL cs.AI cs.CR cs.SE

    Evaluating and Mitigating Linguistic Discrimination in Large Language Models

    Authors: Guoliang Dong, Haoyu Wang, Jun Sun, Xinyu Wang

    Abstract: By training on text in various languages, large language models (LLMs) typically possess multilingual support and demonstrate remarkable capabilities in solving tasks described in different languages. However, LLMs can exhibit linguistic discrimination due to the uneven distribution of training data across languages. That is, LLMs are hard to keep the consistency of responses when faced with the s… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  43. arXiv:2404.18392  [pdf, other

    cs.DC

    Dflow, a Python framework for constructing cloud-native AI-for-Science workflows

    Authors: Xinzijian Liu, Yanbo Han, Zhuoyuan Li, Jiahao Fan, Chengqian Zhang, Jinzhe Zeng, Yifan Shan, Yannan Yuan, Wei-Hong Xu, Yun-Pei Liu, Yuzhi Zhang, Tongqi Wen, Darrin M. York, Zhicheng Zhong, Hang Zheng, Jun Cheng, Linfeng Zhang, Han Wang

    Abstract: In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  44. arXiv:2404.18319  [pdf, other

    cs.IR

    User Welfare Optimization in Recommender Systems with Competing Content Creators

    Authors: Fan Yao, Yiming Liao, Mingzhe Wu, Chuanhao Li, Yan Zhu, James Yang, Qifan Wang, Haifeng Xu, Hongning Wang

    Abstract: Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global use… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  45. arXiv:2404.18279  [pdf, other

    cs.CV

    Out-of-distribution Detection in Medical Image Analysis: A survey

    Authors: Zesheng Hong, Yubiao Yue, Yubin Chen, Huanjie Lin, Yuanmei Luo, Mini Han Wang, Weidong Wang, Jialong Xu, Xiaoqi Yang, Zhenzhang Li, Sihong Xie

    Abstract: Computer-aided diagnostics has benefited from the development of deep learning-based computer vision techniques in these years. Traditional supervised deep learning methods assume that the test sample is drawn from the identical distribution as the training data. However, it is possible to encounter out-of-distribution samples in real-world clinical scenarios, which may cause silent failure in dee… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 23 pages, 3 figures

  46. arXiv:2404.18246  [pdf, other

    cs.LG cs.CV

    AdaFSNet: Time Series Classification Based on Convolutional Network with a Adaptive and Effective Kernel Size Configuration

    Authors: Haoxiao Wang, Bo Peng, Jianhua Zhang, Xu Cheng

    Abstract: Time series classification is one of the most critical and challenging problems in data mining, existing widely in various fields and holding significant research importance. Despite extensive research and notable achievements with successful real-world applications, addressing the challenge of capturing the appropriate receptive field (RF) size from one-dimensional or multi-dimensional time serie… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCNN 2024

  47. Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We?

    Authors: Kaixuan Li, Yue Xue, Sen Chen, Han Liu, Kairan Sun, Ming Hu, Haijun Wang, Yang Liu, Yixiang Chen

    Abstract: In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies o… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: to appear at FSE 2024

  48. arXiv:2404.17916  [pdf, other

    cs.LG cs.AI

    FedCRL: Personalized Federated Learning with Contrastive Shared Representations for Label Heterogeneity in Non-IID Data

    Authors: Chenghao Huang, Xiaolu Chen, Yanru Zhang, Hao Wang

    Abstract: To deal with heterogeneity resulting from label distribution skew and data scarcity in distributed machine learning scenarios, this paper proposes a novel Personalized Federated Learning (PFL) algorithm, named Federated Contrastive Representation Learning (FedCRL). FedCRL introduces contrastive representation learning (CRL) on shared representations to facilitate knowledge acquisition of clients.… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  49. arXiv:2404.17774  [pdf, other

    cs.CV cs.GR

    High-quality Surface Reconstruction using Gaussian Surfels

    Authors: Pinxuan Dai, Jiamin Xu, Wenxiang Xie, Xinguo Liu, Huamin Wang, Weiwei Xu

    Abstract: We propose a novel point-based representation, Gaussian surfels, to combine the advantages of the flexible optimization procedure in 3D Gaussian points and the surface alignment property of surfels. This is achieved by directly setting the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D ellipsoid into a 2D ellipse. Such a design provides clear guidance to the optimizer.… ▽ More

    Submitted 29 April, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Results added and improved

  50. arXiv:2404.17486  [pdf, other

    cs.CV

    TextGaze: Gaze-Controllable Face Generation with Natural Language

    Authors: Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang

    Abstract: Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describ… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Under review