Skip to main content

Showing 1–50 of 127 results for author: Sheng, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17764  [pdf, other

    cs.CL cs.AI math.ST

    On the Sequence Evaluation based on Stochastic Processes

    Authors: Tianhao Zhang, Zhexiao Lin, Zhecheng Sheng, Chen Jiang, Dongyeop Kang

    Abstract: Modeling and analyzing long sequences of text is an essential task for Natural Language Processing. Success in capturing long text dynamics using neural language models will facilitate many downstream tasks such as coherence evaluation, text generation, machine translation and so on. This paper presents a novel approach to model sequences through a stochastic process. We introduce a likelihood-bas… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.14365  [pdf, other

    cs.CL cs.AI

    JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

    Authors: Kun Zhou, Beichen Zhang, Jiapeng Wang, Zhipeng Chen, Wayne Xin Zhao, Jing Sha, Zhichao Sheng, Shijin Wang, Ji-Rong Wen

    Abstract: Mathematical reasoning is an important capability of large language models~(LLMs) for real-world applications. To enhance this capability, existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs (\eg GPT-4) to synthesize massive math problems. Both types of work generally lead to large costs in training or synthesis. To reduce the cost, based on op… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 28 pages, SOTA math LLM using Well-trained Data Synthesis LLM

  3. arXiv:2405.07164  [pdf, other

    cs.CV

    Modeling Pedestrian Intrinsic Uncertainty for Multimodal Stochastic Trajectory Prediction via Energy Plan Denoising

    Authors: Yao Liu, Quan Z. Sheng, Lina Yao

    Abstract: Pedestrian trajectory prediction plays a pivotal role in the realms of autonomous driving and smart cities. Despite extensive prior research employing sequence and generative models, the unpredictable nature of pedestrians, influenced by their social interactions and individual preferences, presents challenges marked by uncertainty and multimodality. In response, we propose the Energy Plan Denoisi… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  4. arXiv:2405.07046  [pdf, other

    cs.CV

    Retrieval Enhanced Zero-Shot Video Captioning

    Authors: Yunchuan Ma, Laiyun Qing, Guorong Li, Yuankai Qi, Quan Z. Sheng, Qingming Huang

    Abstract: Despite the significant progress of fully-supervised video captioning, zero-shot methods remain much less explored. In this paper, we propose to take advantage of existing pre-trained large-scale vision and language models to directly generate captions with test time adaptation. Specifically, we bridge video and text using three key models: a general video understanding model XCLIP, a general imag… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  5. arXiv:2405.07041  [pdf, other

    cs.RO cs.CV

    Multi-agent Traffic Prediction via Denoised Endpoint Distribution

    Authors: Yao Liu, Ruoyu Wang, Yuanjiang Cao, Quan Z. Sheng, Lina Yao

    Abstract: The exploration of high-speed movement by robots or road traffic agents is crucial for autonomous driving and navigation. Trajectory prediction at high speeds requires considering historical features and interactions with surrounding entities, a complexity not as pronounced in lower-speed environments. Prior methods have assessed the spatio-temporal dynamics of agents but often neglected intrinsic… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  6. arXiv:2405.04114  [pdf, other

    cs.LG cs.AI

    Acceleration Algorithms in GNNs: A Survey

    Authors: Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui

    Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the resear… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 9 pages,3 figures

  7. arXiv:2405.01844  [pdf, other

    cs.NI cs.CR cs.DC

    A Survey on Privacy-Preserving Caching at Network Edge: Classification, Solutions, and Challenges

    Authors: Xianzhi Zhang, Yipeng Zhou, Di Wu, Shazia Riaz, Quan Z. Sheng, Miao Hu, Linchang Xiao

    Abstract: Caching content at the network edge is a popular and effective technique widely deployed to alleviate the burden of network backhaul, shorten service delay and improve service quality. However, there has been some controversy over privacy violations in caching content at the network edge. On the one hand, the multi-access open edge network provides an ideal surface for external attackers to obtain… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  8. arXiv:2404.08857  [pdf, other

    cs.SD cs.AI eess.AS

    Voice Attribute Editing with Text Prompt

    Authors: Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

    Abstract: Despite recent advancements in speech generation with text prompt providing control over speech style, voice attributes in synthesized speech remain elusive and challenging to control. This paper introduces a novel task: voice attribute editing with text prompt, with the goal of making relative modifications to voice attributes according to the actions described in the text prompt. To solve this t… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  9. arXiv:2404.00349  [pdf, other

    cs.CV

    SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

    Authors: Runmin Zhang, Zhu Yu, Zehua Sheng, Jiacheng Ying, Si-Yuan Cao, Shu-Jie Chen, Bailin Yang, Junwei Li, Hui-Liang Shen

    Abstract: Cross-spectral image guided denoising has shown its great potential in recovering clean images with rich details, such as using the near-infrared image to guide the denoising process of the visible one. To obtain such image pairs, a feasible and economical way is to employ a stereo system, which is widely used on mobile devices. Current works attempt to generate an aligned guidance image to handle… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  10. arXiv:2403.20150  [pdf, other

    cs.LG cs.AI cs.CY

    TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

    Authors: Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, Bin Yang

    Abstract: Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Not surprisingly, many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. To achieve this, we propose TFB, an auto… ▽ More

    Submitted 8 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by PVLDB 2024

  11. arXiv:2402.07076  [pdf, other

    cs.IR cs.AI

    Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training

    Authors: Haonan Chen, Zhicheng Dou, Xuetong Hao, Yunhao Tao, Shiren Song, Zhenli Sheng

    Abstract: Cloud solutions have gained significant popularity in the technology industry as they offer a combination of services and tools to tackle specific problems. However, despite their widespread use, the task of identifying appropriate company customers for a specific target solution to the sales team of a solution provider remains a complex business problem that existing matching systems have yet to… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  12. arXiv:2402.03881  [pdf, other

    cs.NI

    DEthna: Accurate Ethereum Network Topology Discovery with Marked Transactions

    Authors: Chonghe Zhao, Yipeng Zhou, Shengli Zhang, Taotao Wang, Quan Z. Sheng, Song Guo

    Abstract: In Ethereum, the ledger exchanges messages along an underlying Peer-to-Peer (P2P) network to reach consistency. Understanding the underlying network topology of Ethereum is crucial for network optimization, security and scalability. However, the accurate discovery of Ethereum network topology is non-trivial due to its deliberately designed security mechanism. Consequently, existing measuring schem… ▽ More

    Submitted 17 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in IEEE INFOCOM 2024

  13. arXiv:2402.01512  [pdf, other

    cs.CL

    Distractor Generation for Multiple-Choice Questions: A Survey of Methods, Datasets, and Evaluation

    Authors: Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang, Munazza Zaib, Ahoud Alhazmi

    Abstract: Distractors are important in learning evaluation. This paper surveys distractor generation tasks using English multiple-choice question datasets for textual and multimodal contexts. In particular, this paper presents a thorough literature review of the recent studies on distractor generation tasks, discusses multiple choice components and their characteristics, analyzes the related datasets, and s… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  14. arXiv:2401.14257  [pdf, other

    cs.CV cs.AI

    Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

    Authors: Minglin Chen, Weihao Yuan, Yukun Wang, Zhe Sheng, Yisheng He, Zilong Dong, Liefeng Bo, Yulan Guo

    Abstract: Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we prese… ▽ More

    Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 11 pages, 9 figures

  15. arXiv:2401.03160  [pdf, other

    cs.LG cs.AI cs.RO

    Human as AI Mentor: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving

    Authors: Zilin Huang, Zihao Sheng, Chengyuan Ma, Sikai Chen

    Abstract: Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient… ▽ More

    Submitted 18 February, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

    Comments: Accepted by Communications in Transportation Research

  16. arXiv:2312.16893  [pdf, other

    cs.CL cs.AI

    BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

    Authors: Zhecheng Sheng, Tianhao Zhang, Chen Jiang, Dongyeop Kang

    Abstract: Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their ca… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted to the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24)

  17. arXiv:2312.05435  [pdf, other

    cs.CL

    Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts

    Authors: Xiruo Ding, Zhecheng Sheng, Brian Hur, Feng Chen, Serguei V. S. Pakhomov, Trevor Cohen

    Abstract: Foundation models are a current focus of attention in both industry and academia. While they have shown their capabilities in a variety of tasks, in-depth research is required to determine their robustness to distribution shift when used as a basis for supervised machine learning. This is especially important in the context of clinical data, with particular limitations related to data accessibilit… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: Accepted in Workshop on Distribution Shifts, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  18. arXiv:2310.19173  [pdf, other

    cs.CR cs.SI

    Can we Quantify Trust? Towards a Trust-based Resilient SIoT Network

    Authors: Subhash Sagar, Adnan Mahmood, Quan Z. Sheng, Munazza Zaib, Farhan Sufyan

    Abstract: The emerging yet promising paradigm of the Social Internet of Things (SIoT) integrates the notion of the Internet of Things with human social networks. In SIoT, objects, i.e., things, have the capability to socialize with the other objects in the SIoT network and can establish their social network autonomously by modeling human behaviour. The notion of trust is imperative in realizing these charac… ▽ More

    Submitted 12 May, 2023; originally announced October 2023.

    Comments: 18 Pages

  19. arXiv:2310.16547  [pdf, other

    cs.DC

    AdaMEC: Towards a Context-Adaptive and Dynamically-Combinable DNN Deployment Framework for Mobile Edge Computing

    Authors: Bowen Pang, Sicong Liu, Hongli Wang, Bin Guo, Yuzhan Wang, Hao Wang, Zhenli Sheng, Zhongyi Wang, Zhiwen Yu

    Abstract: With the rapid development of deep learning, recent research on intelligent and interactive mobile applications (e.g., health monitoring, speech recognition) has attracted extensive attention. And these applications necessitate the mobile edge computing scheme, i.e., offloading partial computation from mobile devices to edge devices for inference acceleration and transmission load reduction. The c… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  20. arXiv:2310.08051  [pdf, other

    cs.LG

    LGL-BCI: A Lightweight Geometric Learning Framework for Motor Imagery-Based Brain-Computer Interfaces

    Authors: Jianchao Lu, Yuzhe Tian, Yang Zhang, Jiaqi Ge, Quan Z. Sheng, Xi Zheng

    Abstract: Brain-Computer Interfaces (BCIs) are a groundbreaking technology for interacting with external devices using brain signals. Despite advancements, electroencephalogram (EEG)-based Motor Imagery (MI) tasks face challenges like amplitude and phase variability, and complex spatial correlations, with a need for smaller model size and faster inference. This study introduces the LGL-BCI framework, employ… ▽ More

    Submitted 21 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

  21. arXiv:2310.02451  [pdf, other

    cs.CL

    Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes

    Authors: Xiruo Ding, Zhecheng Sheng, Meliha Yetişgen, Serguei Pakhomov, Trevor Cohen

    Abstract: Natural Language Processing (NLP) methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and in… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted in AMIA 2023 Annual Symposium

  22. arXiv:2309.15284  [pdf

    cs.LG

    A Physics Enhanced Residual Learning (PERL) Framework for Vehicle Trajectory Prediction

    Authors: Keke Long, Zihao Sheng, Haotian Shi, Xiaopeng Li, Sikai Chen, Sue Ahn

    Abstract: In vehicle trajectory prediction, physics models and data-driven models are two predominant methodologies. However, each approach presents its own set of challenges: physics models fall short in predictability, while data-driven models lack interpretability. Addressing these identified shortcomings, this paper proposes a novel framework, the Physics-Enhanced Residual Learning (PERL) model. PERL in… ▽ More

    Submitted 21 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

  23. arXiv:2309.09727  [pdf, other

    cs.DL cs.CL

    When Large Language Models Meet Citation: A Survey

    Authors: Yang Zhang, Yufei Wang, Kai Wang, Quan Z. Sheng, Lina Yao, Adnan Mahmood, Wei Emma Zhang, Rongying Zhao

    Abstract: Citations in scholarly work serve the essential purpose of acknowledging and crediting the original sources of knowledge that have been incorporated or referenced. Depending on their surrounding textual context, these citations are used for different motivations and purposes. Large Language Models (LLMs) could be helpful in capturing these fine-grained citation information via the corresponding te… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  24. arXiv:2309.09470  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

    Authors: Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling

    Abstract: This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker. To address this task, we propose a face-voice memory-based zero-shot FaceVC method. This method leverages a memo… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  25. arXiv:2308.16515  [pdf, other

    cs.CC

    A Discharging Method: Improved Kernels for Edge Triangle Packing and Covering

    Authors: Zimo Sheng, Mingyu Xiao

    Abstract: \textsc{Edge Triangle Packing} and \textsc{Edge Triangle Covering} are dual problems extensively studied in the field of parameterized complexity. Given a graph $G$ and an integer $k$, \textsc{Edge Triangle Packing} seeks to determine whether there exists a set of at least $k$ edge-disjoint triangles in $G$, while \textsc{Edge Triangle Covering} aims to find out whether there exists a set of a… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

  26. arXiv:2308.14328  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning for Generative AI: A Survey

    Authors: Yuanjiang Cao, Quan Z. Sheng, Julian McAuley, Lina Yao

    Abstract: Deep Generative AI has been a long-standing essential topic in the machine learning community, which can impact a number of application areas like text generation and computer vision. The major paradigm to train a generative model is maximum likelihood estimation, which pushes the learner to capture and approximate the target data distribution by decreasing the divergence between the model distrib… ▽ More

    Submitted 28 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

  27. arXiv:2308.13714  [pdf, other

    cs.LG cs.CR cs.CY

    Uncovering Promises and Challenges of Federated Learning to Detect Cardiovascular Diseases: A Scoping Literature Review

    Authors: Sricharan Donkada, Seyedamin Pouriyeh, Reza M. Parizi, Meng Han, Nasrin Dehbozorgi, Nazmus Sakib, Quan Z. Sheng

    Abstract: Cardiovascular diseases (CVD) are the leading cause of death globally, and early detection can significantly improve outcomes for patients. Machine learning (ML) models can help diagnose CVDs early, but their performance is limited by the data available for model training. Privacy concerns in healthcare make it harder to acquire data to train accurate ML models. Federated learning (FL) is an emerg… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  28. arXiv:2308.11874  [pdf, other

    cs.CV

    Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch

    Authors: Pan Du, Suyun Zhao, Zisen Sheng, Cuiping Li, Hong Chen

    Abstract: Semi-Supervised Learning (SSL) under class distribution mismatch aims to tackle a challenging problem wherein unlabeled data contain lots of unknown categories unseen in the labeled ones. In such mismatch scenarios, traditional SSL suffers severe performance damage due to the harmful invasion of the instances with unknown categories into the target classifier. In this study, by strict mathematical… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  29. arXiv:2308.02294  [pdf, other

    cs.CL cs.AI cs.IR

    Learning to Select the Relevant History Turns in Conversational Question Answering

    Authors: Munazza Zaib, Wei Emma Zhang, Quan Z. Sheng, Subhash Sagar, Adnan Mahmood, Yang Zhang

    Abstract: The increasing demand for the web-based digital assistants has given a rapid rise in the interest of the Information Retrieval (IR) community towards the field of conversational question answering (ConvQA). However, one of the critical aspects of ConvQA is the effective selection of conversational history turns to answer the question at hand. The dependency between relevant history selection and c… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  30. arXiv:2307.07544  [pdf, other

    cs.CL cs.AI

    A Dialogue System for Assessing Activities of Daily Living: Improving Consistency with Grounded Knowledge

    Authors: Zhecheng Sheng, Raymond Finzel, Michael Lucke, Sheena Dufresne, Maria Gini, Serguei Pakhomov

    Abstract: In healthcare, the ability to care for oneself is reflected in the "Activities of Daily Living (ADL)," which serve as a measure of functional ability (functioning). A lack of functioning may lead to poor living conditions requiring personal care and assistance. To accurately identify those in need of support, assistance programs continuously evaluate participants' functioning across various domain… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Accepted to ACL 2023 DialDoc Workshop

    Journal ref: In Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2023, page 68-79

  31. arXiv:2307.05627  [pdf, other

    cs.CL

    Separate-and-Aggregate: A Transformer-based Patch Refinement Model for Knowledge Graph Completion

    Authors: Chen Chen, Yufei Wang, Yang Zhang, Quan Z. Sheng, Kwok-Yan Lam

    Abstract: Knowledge graph completion (KGC) is the task of inferencing missing facts from any given knowledge graphs (KG). Previous KGC methods typically represent knowledge graph entities and relations as trainable continuous embeddings and fuse the embeddings of the entity $h$ (or $t$) and relation $r$ into hidden representations of query $(h, r, ?)$ (or $(?, r, t$)) to approximate the missing entities. To… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted by ADMA 2023, oral

  32. arXiv:2306.01771  [pdf, other

    cs.AI

    ProcessGPT: Transforming Business Process Management with Generative Artificial Intelligence

    Authors: Amin Beheshti, Jian Yang, Quan Z. Sheng, Boualem Benatallah, Fabio Casati, Schahram Dustdar, Hamid Reza Motahari Nezhad, Xuyun Zhang, Shan Xue

    Abstract: Generative Pre-trained Transformer (GPT) is a state-of-the-art machine learning model capable of generating human-like text through natural language processing (NLP). GPT is trained on massive amounts of text data and uses deep learning techniques to learn patterns and relationships within the data, enabling it to generate coherent and contextually appropriate text. This position paper proposes us… ▽ More

    Submitted 28 May, 2023; originally announced June 2023.

    Comments: Accepted in: 2023 IEEE International Conference on Web Services (ICWS); Corresponding author: Prof. Amin Beheshti (amin.beheshti@mq.edu.au)

  33. arXiv:2305.16580  [pdf, other

    cs.CV

    TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection

    Authors: Xue Zhang, Xiao-Han Zhang, Jiacheng Ying, Zehua Sheng, Heng Yu, Chunguang Li, Hui-Liang Shen

    Abstract: Pedestrian detection plays a critical role in computer vision as it contributes to ensuring traffic safety. Existing methods that rely solely on RGB images suffer from performance degradation under low-light conditions due to the lack of useful information. To address this issue, recent multispectral detection approaches have combined thermal images to provide complementary information and have ob… ▽ More

    Submitted 17 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  34. arXiv:2305.14359  [pdf, other

    cs.MM cs.AI cs.CV cs.SD eess.AS

    Zero-shot personalized lip-to-speech synthesis with face image based voice control

    Authors: Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling

    Abstract: Lip-to-Speech (Lip2Speech) synthesis, which predicts corresponding speech from talking face images, has witnessed significant progress with various models and training strategies in a series of independent studies. However, existing studies can not achieve voice control under zero-shot condition, because extra speaker embeddings need to be extracted from natural reference speech and are unavailabl… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: ICASSP 2023

  35. arXiv:2305.05221  [pdf, other

    cs.LG

    BARA: Efficient Incentive Mechanism with Online Reward Budget Allocation in Cross-Silo Federated Learning

    Authors: Yunchao Yang, Yipeng Zhou, Miao Hu, Di Wu, Quan Z. Sheng

    Abstract: Federated learning (FL) is a prospective distributed machine learning framework that can preserve data privacy. In particular, cross-silo FL can complete model training by making isolated data islands of different organizations collaborate with a parameter server (PS) via exchanging model parameters for multiple communication rounds. In cross-silo FL, an incentive mechanism is indispensable for mo… ▽ More

    Submitted 15 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted by IJCAI 2023, camera ready version with appendix

  36. arXiv:2304.07922  [pdf, other

    cs.IR cs.AI

    Causal Disentangled Variational Auto-Encoder for Preference Understanding in Recommendation

    Authors: Siyu Wang, Xiaocong Chen, Quan Z. Sheng, Yihong Zhang, Lina Yao

    Abstract: Recommendation models are typically trained on observational user interaction data, but the interactions between latent factors in users' decision-making processes lead to complex and entangled data. Disentangling these latent factors to uncover their underlying representation can improve the robustness, interpretability, and controllability of recommendation models. This paper introduces the Caus… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

  37. arXiv:2304.07125  [pdf, other

    cs.CL cs.IR

    Keeping the Questions Conversational: Using Structured Representations to Resolve Dependency in Conversational Question Answering

    Authors: Munazza Zaib, Quan Z. Sheng, Wei Emma Zhang, Adnan Mahmood

    Abstract: Having an intelligent dialogue agent that can engage in conversational question answering (ConvQA) is now no longer limited to Sci-Fi movies only and has, in fact, turned into a reality. These intelligent agents are required to understand and correctly interpret the sequential turns provided as the context of the given question. However, these sequential questions are sometimes left implicit and t… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  38. arXiv:2304.00676  [pdf, other

    cs.RO cs.CY

    CV2X-LOCA: Roadside Unit-Enabled Cooperative Localization Framework for Autonomous Vehicles

    Authors: Zilin Huang, Sikai Chen, Yuzhuang Pian, Zihao Sheng, Soyoung Ahn, David A. Noyce

    Abstract: An accurate and robust localization system is crucial for autonomous vehicles (AVs) to enable safe driving in urban scenes. While existing global navigation satellite system (GNSS)-based methods are effective at locating vehicles in open-sky regions, achieving high-accuracy positioning in urban canyons such as lower layers of multi-layer bridges, streets beside tall buildings, tunnels, etc., remai… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

  39. arXiv:2303.17027  [pdf, other

    cs.LG cs.CV cs.RO

    EPG-MGCN: Ego-Planning Guided Multi-Graph Convolutional Network for Heterogeneous Agent Trajectory Prediction

    Authors: Zihao Sheng, Zilin Huang, Sikai Chen

    Abstract: To drive safely in complex traffic environments, autonomous vehicles need to make an accurate prediction of the future trajectories of nearby heterogeneous traffic agents (i.e., vehicles, pedestrians, bicyclists, etc). Due to the interactive nature, human drivers are accustomed to infer what the future situations will become if they are going to execute different maneuvers. To fully exploit the im… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  40. arXiv:2303.14544  [pdf, other

    cs.NI

    Privacy-Enhancing Technologies in Federated Learning for the Internet of Healthcare Things: A Survey

    Authors: Fatemeh Mosaiyebzadeh, Seyedamin Pouriyeh, Reza M. Parizi, Quan Z. Sheng, Meng Han, Liang Zhao, Giovanna Sannino, Daniel Macêdo Batista

    Abstract: Advancements in wearable medical devices in IoT technology are shaping the modern healthcare system. With the emergence of the Internet of Healthcare Things (IoHT), we are witnessing how efficient healthcare services are provided to patients and how healthcare professionals are effectively used AI-based models to analyze the data collected from IoHT devices for the treatment of various diseases. T… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 15 pages, 4 figures, 5 tables

  41. arXiv:2303.08367  [pdf, other

    cs.CV

    Uncertainty-Aware Pedestrian Trajectory Prediction via Distributional Diffusion

    Authors: Yao Liu, Zesheng Ye, Rui Wang, Binghao Li, Quan Z. Sheng, Lina Yao

    Abstract: Tremendous efforts have been put forth on predicting pedestrian trajectory with generative models to accommodate uncertainty and multi-modality in human behaviors. An individual's inherent uncertainty, e.g., change of destination, can be masked by complex patterns resulting from the movements of interacting pedestrians. However, latent variable-based generative models often entangle such uncertain… ▽ More

    Submitted 11 May, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

  42. arXiv:2303.03598  [pdf, other

    cs.CV eess.IV

    Guided Image-to-Image Translation by Discriminator-Generator Communication

    Authors: Yuanjiang Cao, Lina Yao, Le Pan, Quan Z. Sheng, Xiaojun Chang

    Abstract: The goal of Image-to-image (I2I) translation is to transfer an image from a source domain to a target domain, which has recently drawn increasing attention. One major branch of this research is to formulate I2I translation based on Generative Adversarial Network (GAN). As a zero-sum game, GAN can be reformulated as a Partially-observed Markov Decision Process (POMDP) for generators, where generato… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  43. arXiv:2302.06114  [pdf, other

    cs.LG

    A Comprehensive Survey on Graph Summarization with Graph Neural Networks

    Authors: Nasrin Shabani, Jia Wu, Amin Beheshti, Quan Z. Sheng, Jin Foo, Venus Haghighi, Ambreen Hanif, Maryam Shahabikargar

    Abstract: As large-scale graphs become more widespread, more and more computational challenges with extracting, processing, and interpreting large graph data are being exposed. It is therefore natural to search for ways to summarize these expansive graphs while preserving their key characteristics. In the past, most graph summarization techniques sought to capture the most important part of a graph statisti… ▽ More

    Submitted 3 January, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: 21 pages, 4 figures, 9 tables, Journal of IEEE Transactions on Artificial Intelligence

  44. arXiv:2301.05860  [pdf, other

    cs.LG cs.AI

    State of the Art and Potentialities of Graph-level Learning

    Authors: Zhenyu Yang, Ge Zhang, Jia Wu, Jian Yang, Quan Z. Sheng, Shan Xue, Chuan Zhou, Charu Aggarwal, Hao Peng, Wenbin Hu, Edwin Hancock, Pietro Liò

    Abstract: Graphs have a superior ability to represent relational data, like chemical compounds, proteins, and social networks. Hence, graph-level learning, which takes a set of graphs as input, has been applied to many tasks including comparison, regression, classification, and more. Traditional approaches to learning a set of graphs heavily rely on hand-crafted features, such as substructures. But while th… ▽ More

    Submitted 25 May, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

  45. arXiv:2301.05845  [pdf, other

    cs.CV cs.GR cs.RO

    ${S}^{2}$Net: Accurate Panorama Depth Estimation on Spherical Surface

    Authors: Meng Li, Senbo Wang, Weihao Yuan, Weichao Shen, Zhe Sheng, Zilong Dong

    Abstract: Monocular depth estimation is an ambiguous problem, thus global structural cues play an important role in current data-driven single-view depth estimation methods. Panorama images capture the complete spatial information of their surroundings utilizing the equirectangular projection which introduces large distortion. This requires the depth estimation method to be able to handle the distortion and… ▽ More

    Submitted 14 January, 2023; originally announced January 2023.

    Comments: Accepted by IEEE Robotics and Automation Letters

  46. arXiv:2212.01964  [pdf, other

    cs.CL cs.AI

    Building Metadata Inference Using a Transducer Based Language Model

    Authors: David Waterworth, Subbu Sethuvenkatraman, Quan Z. Sheng

    Abstract: Solving the challenges of automatic machine translation of Building Automation System text metadata is a crucial first step in efficiently deploying smart building applications. The vocabulary used to describe building metadata appears small compared to general natural languages, but each term has multiple commonly used abbreviations. Conventional machine learning techniques are inefficient since… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

    Comments: Presented at First Australasia Symposium on Artificial Intelligence for the Environment (AI4Environment), 2022

  47. arXiv:2210.09766  [pdf, other

    cs.LG

    DAGAD: Data Augmentation for Graph Anomaly Detection

    Authors: Fanzhen Liu, Xiaoxiao Ma, Jia Wu, Jian Yang, Shan Xue, Amin Beheshti, Chuan Zhou, Hao Peng, Quan Z. Sheng, Charu C. Aggarwal

    Abstract: Graph anomaly detection in this paper aims to distinguish abnormal nodes that behave differently from the benign ones accounting for the majority of graph-structured instances. Receiving increasing attention from both academia and industry, yet existing research on this task still suffers from two critical issues when learning informative anomalous behavior from graph data. For one thing, anomalie… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Regular paper accepted by the 22nd IEEE International Conference on Data Mining (ICDM 2022)

  48. arXiv:2207.14472  [pdf

    eess.IV cs.CV cs.LG

    Beyond CNNs: Exploiting Further Inherent Symmetries in Medical Image Segmentation

    Authors: Shuchao Pang, Anan Du, Mehmet A. Orgun, Yan Wang, Quan Z. Sheng, Shoujin Wang, Xiaoshui Huang, Zhenmei Yu

    Abstract: Automatic tumor or lesion segmentation is a crucial step in medical image analysis for computer-aided diagnosis. Although the existing methods based on Convolutional Neural Networks (CNNs) have achieved the state-of-the-art performance, many challenges still remain in medical tumor segmentation. This is because, although the human visual system can detect symmetries in 2D images effectively, regul… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: this work was just accepted by IEEE Transactions on Cybernetics on 22 July 2022. arXiv admin note: substantial text overlap with arXiv:2005.03924

  49. arXiv:2207.08340  [pdf, other

    cs.DS

    Extracting Densest Sub-hypergraph with Convex Edge-weight Functions

    Authors: Yi Zhou, Shan Hu, Zimo Sheng

    Abstract: The densest subgraph problem (DSG) aiming at finding an induced subgraph such that the average edge-weights of the subgraph is maximized, is a well-studied problem. However, when the input graph is a hypergraph, the existing notion of DSG fails to capture the fact that a hyperedge partially belonging to an induced sub-hypergraph is also a part of the sub-hypergraph. To resolve the issue, we sugges… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

  50. arXiv:2207.03688  [pdf, other

    cs.LG cs.NE

    GCN-based Multi-task Representation Learning for Anomaly Detection in Attributed Networks

    Authors: Venus Haghighi, Behnaz Soltani, Adnan Mahmood, Quan Z. Sheng, Jian Yang

    Abstract: Anomaly detection in attributed networks has received a considerable attention in recent years due to its applications in a wide range of domains such as finance, network security, and medicine. Traditional approaches cannot be adopted on attributed networks' settings to solve the problem of anomaly detection. The main limitation of such approaches is that they inherently ignore the relational inf… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.