Skip to main content

Showing 1–50 of 5,508 results for author: Wang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05674  [pdf

    cs.CV physics.med-ph

    TransAnaNet: Transformer-based Anatomy Change Prediction Network for Head and Neck Cancer Patient Radiotherapy

    Authors: Meixu Chen, Kai Wang, Michael Dohopolski, Howard Morgan, Jing Wang

    Abstract: Early identification of head and neck cancer (HNC) patients who would experience significant anatomical change during radiotherapy (RT) is important to optimize patient clinical benefit and treatment resources. This study aims to assess the feasibility of using a vision-transformer (ViT) based neural network to predict RT-induced anatomic change in HNC patients. We retrospectively included 121 HNC… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.05514  [pdf, other

    cs.RO

    HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

    Authors: Zhirui Sun, Zhe Zhang, Jieting Zhao, Hanjing Ye, Jiankun Wang

    Abstract: The robotic autonomous luggage trolley collection system employs robots to gather and transport scattered luggage trolleys at airports. However, existing methods for detecting and locating these luggage trolleys often fail when they are not fully visible. To address this, we introduce the Hierarchical Progressive Perception System (HPPS), which enhances the detection and localization of luggage tr… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2405.05488  [pdf

    cs.CV physics.med-ph

    Advancing Head and Neck Cancer Survival Prediction via Multi-Label Learning and Deep Model Interpretation

    Authors: Meixu Chen, Kai Wang, Jing Wang

    Abstract: A comprehensive and reliable survival prediction model is of great importance to assist in the personalized management of Head and Neck Cancer (HNC) patients treated with curative Radiation Therapy (RT). In this work, we propose IMLSP, an Interpretable Multi-Label multi-modal deep Survival Prediction framework for predicting multiple HNC survival outcomes simultaneously and provide time-event spec… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 2 tables, 2 pages of supplementary material

  5. Studying Self-Care with Generative AI Tools: Lessons for Design

    Authors: Tara Capel, Bernd Ploderer, Filip Bircanin, Simon Hanmer, Jamie Yates, Jiaxuan Wang, Kai Ling Khor, Tuck Wah Leong, Greg Wadley, Michelle Newcomb

    Abstract: The rise of generative AI presents new opportunities for the understanding and practice of self-care through its capability to generate varied content, including self-care suggestions via text and images, and engage in dialogue with users over time. However, there are also concerns about accuracy and trustworthiness of self-care advice provided via AI. This paper reports our findings from workshop… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 28 pages, 4 figures, to be published in the proceedings of the ACM Designing Interactive Systems Conference (DIS '24)

  6. arXiv:2405.05284  [pdf, other

    cs.HC cs.GR

    A Study on Cognitive Effects of Canvas Size for Augmenting Drawing Skill

    Authors: Jize Wang, Kazuhisa Nakano, Daiyannan Chen, Zhengyu Huang, Tsukasa Fukusato, Kazunori Miyata, Haoran Xie

    Abstract: In recent years, the field of generative artificial intelligence, particularly in the domain of image generation, has exerted a profound influence on society. Despite the capability of AI to produce images of high quality, the augmentation of users' drawing abilities through the provision of drawing support systems emerges as a challenging issue. In this study, we propose that a cognitive factor,… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 5 pages, 6 figures, accepted in NICOGRAPH International 2024

  7. arXiv:2405.05254  [pdf, other

    cs.CL

    You Only Cache Once: Decoder-Decoder Architectures for Language Models

    Authors: Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

    Abstract: We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO onl… ▽ More

    Submitted 9 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  8. arXiv:2405.05136  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

    Authors: Zhaoxing Li, Jujie Yang, Jindi Wang, Lei Shi, Sebastian Stein

    Abstract: The field of Knowledge Tracing aims to understand how students learn and master knowledge over time by analyzing their historical behaviour data. To achieve this goal, many researchers have proposed Knowledge Tracing models that use data from Intelligent Tutoring Systems to predict students' subsequent actions. However, with the development of Intelligent Tutoring Systems, large-scale datasets con… ▽ More

    Submitted 24 April, 2024; originally announced May 2024.

  9. arXiv:2405.05131  [pdf, other

    cs.RO

    DenserRadar: A 4D millimeter-wave radar point cloud detector based on dense LiDAR point clouds

    Authors: Zeyu Han, Junkai Jiang, Xiaokang Ding, Qingwen Meng, Shaobing Xu, Lei He, Jianqiang Wang

    Abstract: The 4D millimeter-wave (mmWave) radar, with its robustness in extreme environments, extensive detection range, and capabilities for measuring velocity and elevation, has demonstrated significant potential for enhancing the perception abilities of autonomous driving systems in corner-case scenarios. Nevertheless, the inherent sparsity and noise of 4D mmWave radar point clouds restrict its further d… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  10. arXiv:2405.04907  [pdf, other

    cs.NI

    Empowering Wireless Networks with Artificial Intelligence Generated Graph

    Authors: Jiacheng Wang, Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Haibo Zhou, Dong In Kim

    Abstract: In wireless communications, transforming network into graphs and processing them using deep learning models, such as Graph Neural Networks (GNNs), is one of the mainstream network optimization approaches. While effective, the generative AI (GAI) shows stronger capabilities in graph analysis, processing, and generation, than conventional methods such as GNN, offering a broader exploration space for… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  11. arXiv:2405.04609  [pdf, other

    cs.RO

    Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation

    Authors: Jenny Wang, Octavian Donca, David Held

    Abstract: Relative placement tasks are an important category of tasks in which one object needs to be placed in a desired pose relative to another object. Previous work has shown success in learning relative placement tasks from just a small number of demonstrations when using relational reasoning networks with geometric inductive biases. However, such methods cannot flexibly represent multimodal tasks, lik… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted for ICRA 2024

  12. arXiv:2405.04115  [pdf, other

    cs.CR

    A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning

    Authors: Xiaoyang Xu, Mengda Yang, Wenzhe Yi, Ziang Li, Juan Wang, Hongxin Hu, Yong Zhuang, Yaxin Liu

    Abstract: Split Learning (SL) is a distributed learning framework renowned for its privacy-preserving features and minimal computational requirements. Previous research consistently highlights the potential privacy breaches in SL systems by server adversaries reconstructing training data. However, these studies often rely on strong assumptions or compromise system utility to enhance attack performance. This… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  13. arXiv:2405.03875  [pdf, other

    cs.LG stat.ML

    Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

    Authors: Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia

    Abstract: Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis te… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  14. arXiv:2405.03764  [pdf, other

    cs.CL cs.IR

    GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation

    Authors: Wenjie Zhou, Zhenxin Ding, Xiaodong Zhang, Haibo Shi, Junfeng Wang, Dawei Yin

    Abstract: Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. For practical deployment, it is critical to carry out knowledge distillation to preserve high performance under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student performance, how does one effe… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  15. arXiv:2405.03724  [pdf, other

    cs.LG cs.SI

    GraphSL: An Open-Source Library for Graph Source Localization Approaches and Benchmark Datasets

    Authors: Junxiang Wang, Liang Zhao

    Abstract: We present GraphSL, a novel library designed for investigating the graph source localization problem. Our library facilitates the exploration of various graph diffusion models for simulating information spread and enables the evaluation of cutting-edge source localization approaches on established benchmark datasets. The source code of GraphSL is made available at \url{https://github.com/xianggebe… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  16. arXiv:2405.03546  [pdf, other

    cs.CV cs.LG

    CCDM: Continuous Conditional Diffusion Models for Image Generation

    Authors: Xin Ding, Yongwei Wang, Kao Zhang, Z. Jane Wang

    Abstract: Continuous Conditional Generative Modeling (CCGM) aims to estimate the distribution of high-dimensional data, typically images, conditioned on scalar continuous variables known as regression labels. While Continuous conditional Generative Adversarial Networks (CcGANs) were initially designed for this task, their adversarial training mechanism remains vulnerable to extremely sparse or imbalanced da… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  17. arXiv:2405.03318  [pdf, other

    cs.CV cs.MM

    Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation

    Authors: Yingying Zhang, Chuangji Shi, Xin Guo, Jiangwei Lao, Jian Wang, Jiaotuan Wang, Jingdong Chen

    Abstract: The design of the query is crucial for the performance of DETR and its variants. Each query consists of two components: a content part and a positional one. Traditionally, the content query is initialized with a zero or learnable embedding, lacking essential content information and resulting in sub-optimal performance. In this paper, we introduce a novel plug-and-play module, Self-Adaptive Content… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures

  18. arXiv:2405.03288  [pdf, other

    cs.IT

    Fundamental Bounds on Unequal Error Protection Codes

    Authors: Liuquan Yao, Shuai Yuan, Yuan Li, Huazi Zhang, Jun Wang, Guiying Yan, Zhiming Ma

    Abstract: Unequal error protection (UEP) codes can facilitate the transmission of messages with different protection levels. In this paper, we study the achievability bounds on UEP by the generalization of Gilbert-Varshamov (GV) bound. For the first time, we show that under certain conditions, UEP enhances the code rate comparing with time-sharing (TS) strategies asymptotically.

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 8 pages, 4 figures

  19. arXiv:2405.03118  [pdf, other

    cs.SD eess.AS

    Determined Multichannel Blind Source Separation with Clustered Source Model

    Authors: Jianyu Wang, Shanzheng Guan

    Abstract: The independent low-rank matrix analysis (ILRMA) method stands out as a prominent technique for multichannel blind audio source separation. It leverages nonnegative matrix factorization (NMF) and nonnegative canonical polyadic decomposition (NCPD) to model source parameters. While it effectively captures the low-rank structure of sources, the NMF model overlooks inter-channel dependencies. On the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  20. arXiv:2405.03101  [pdf, ps, other

    cs.IT

    Double Self-Sustainable Reconfigurable Intelligent Surfaces Aided Wireless Communications

    Authors: Ji Wang, Suhong Luo, Yixuan Li, Wenwu Xie, Xingwang Li, Arumugam Nallanathan

    Abstract: A double self-sustainable reconfigurable intelligent surfaces (RISs) assisted multi-user multiple input multiple output (MIMO) system is investigated. Two RISs are equipped with energy harvesting circuit to achieve self-sustainable transmission. The aim is to minimize the transmission power at the base station (BS), while guaranteeing the quality of service (QoS) requirements of the users and meet… ▽ More

    Submitted 7 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

  21. arXiv:2405.03091  [pdf

    cs.CV cs.LG

    Research on Image Recognition Technology Based on Multimodal Deep Learning

    Authors: Jinyin Wang, Xingchen Li, Yixuan Jin, Yihao Zhong, Keke Zhang, Chang Zhou

    Abstract: This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks. According to the characteristics of different modal information, different deep neural networks are used to adapt to different modal video information. Through the integration of various deep neural networks, the algorithm successfully identifies behaviors across multiple modalities. I… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  22. arXiv:2405.02967  [pdf, other

    cs.HC

    Exploring Text-based Realistic Building Facades Editing Applicaiton

    Authors: Jing Wang, Xin Zhang

    Abstract: This paper explores the utilization of diffusion models and textual guidance for achieving localized editing of building facades, addressing the escalating demand for sophisticated editing methodologies in architectural design and urban planning. Leveraging the robust generative capabilities of diffusion models, this study presents a promising avenue for realistically synthesizing and modifying ar… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  23. arXiv:2405.02941  [pdf, other

    cs.CV

    Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling

    Authors: Jinmin Li, Tao Dai, Jingyun Zhang, Kang Liu, Jun Wang, Shaoming Wang, Shu-Tao Xia, rizen guo

    Abstract: Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling. However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details, which thus hinders their real applications. To address this issue, we pr… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  24. arXiv:2405.02814  [pdf, other

    cs.CL

    NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli

    Authors: Xu Wang, Cheng Li, Yi Chang, Jindong Wang, Yuan Wu

    Abstract: Large Language Models (LLMs) have become integral to a wide spectrum of applications, ranging from traditional computing tasks to advanced artificial intelligence (AI) applications. This widespread adoption has spurred extensive research into LLMs across various disciplines, including the social sciences. Notably, studies have revealed that LLMs possess emotional intelligence, which can be further… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by IJCAI 2024

  25. arXiv:2405.02384  [pdf, other

    cs.NE cs.AI cs.LG

    CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

    Authors: Kaiyuan Chen, Xingzhuo Guo, Yu Zhang, Jianmin Wang, Mingsheng Long

    Abstract: Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  26. arXiv:2405.01827  [pdf, other

    cs.CL

    SoftMCL: Soft Momentum Contrastive Learning for Fine-grained Sentiment-aware Pre-training

    Authors: Jin Wang, Liang-Chih Yu, Xuejie Zhang

    Abstract: The pre-training for language models captures general language understanding but fails to distinguish the affective impact of a particular context to a specific word. Recent works have sought to introduce contrastive learning (CL) for sentiment-aware pre-training in acquiring affective information. Nevertheless, these methods present two significant limitations. First, the compatibility of the GPU… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by LREC-COLING 2024

  27. arXiv:2405.01053  [pdf, other

    cs.LG cs.AI

    Explicitly Modeling Generality into Self-Supervised Learning

    Authors: Jingyao Wang, Wenwen Qiang, Changwen Zheng

    Abstract: The goal of generality in machine learning is to achieve excellent performance on various unseen tasks and domains. Recently, self-supervised learning (SSL) has been regarded as an effective method to achieve this goal. It can learn high-quality representations from unlabeled data and achieve promising empirical performance on multiple downstream tasks. Existing SSL methods mainly constrain genera… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 28 pages

  28. arXiv:2405.00983  [pdf, other

    cs.CV

    LLM-AD: Large Language Model based Audio Description System

    Authors: Peng Chu, Jiang Wang, Andre Abrantes

    Abstract: The development of Audio Description (AD) has been a pivotal step forward in making video content more accessible and inclusive. Traditionally, AD production has demanded a considerable amount of skilled labor, while existing automated approaches still necessitate extensive training to integrate multimodal inputs and tailor the output from a captioning style to an AD style. In this paper, we intro… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  29. arXiv:2405.00930  [pdf, other

    cs.SD eess.AS

    MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion

    Authors: Pengcheng Li, Jianzong Wang, Xulong Zhang, Yong Zhang, Jing Xiao, Ning Cheng

    Abstract: One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and suffer from sizable networks as some of them leverage numerous complex modules for disentanglement. In this paper, we propose a model named MAIN-VC to effectively… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  30. arXiv:2405.00603  [pdf, other

    cs.SD eess.AS

    Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

    Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issue… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  31. arXiv:2405.00362  [pdf, other

    cs.RO cs.CG cs.GR

    Implicit Swept Volume SDF: Enabling Continuous Collision-Free Trajectory Generation for Arbitrary Shapes

    Authors: Jingping Wang, Tingrui Zhang, Qixuan Zhang, Chuxiao Zeng, Jingyi Yu, Chao Xu, Lan Xu, Fei Gao

    Abstract: In the field of trajectory generation for objects, ensuring continuous collision-free motion remains a huge challenge, especially for non-convex geometries and complex environments. Previous methods either oversimplify object shapes, which results in a sacrifice of feasible space or rely on discrete sampling, which suffers from the "tunnel effect". To address these limitations, we propose a novel… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: accecpted by SIGGRAPH2024&TOG. Joint First Authors: Jingping Wang,Tingrui Zhang, Joint Corresponding authors: Fei Gao, Lan Xu

  32. arXiv:2405.00338  [pdf, other

    cs.IR

    Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model

    Authors: Yu Cui, Feng Liu, Pengbo Wang, Bohao Wang, Heng Tang, Yi Wan, Jun Wang, Jiawei Chen

    Abstract: Owing to their powerful semantic reasoning capabilities, Large Language Models (LLMs) have been effectively utilized as recommenders, achieving impressive performance. However, the high inference latency of LLMs significantly restricts their practical deployment. To address this issue, this work investigates knowledge distillation from cumbersome LLM-based recommendation models to lightweight conv… ▽ More

    Submitted 3 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 10 pages, 2 figures

  33. arXiv:2405.00213  [pdf, other

    cs.LG cs.HC eess.SP

    Block-As-Domain Adaptation for Workload Prediction from fNIRS Data

    Authors: Jiyang Wang, Ayse Altay, Senem Velipasalar

    Abstract: Functional near-infrared spectroscopy (fNIRS) is a non-intrusive way to measure cortical hemodynamic activity. Predicting cognitive workload from fNIRS data has taken on a diffuse set of methods. To be applicable in real-world settings, models are needed, which can perform well across different sessions as well as different subjects. However, most existing works assume that training and testing da… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  34. arXiv:2404.19759  [pdf, other

    cs.CV

    MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

    Authors: Wenxun Dai, Ling-Hao Chen, Jingbo Wang, Jinpeng Liu, Bo Dai, Yansong Tang

    Abstract: This work introduces MotionLCM, extending controllable motion generation to a real-time level. Existing methods for spatial control in text-conditioned motion generation suffer from significant runtime inefficiency. To address this issue, we first propose the motion latent consistency model (MotionLCM) for motion generation, building upon the latent diffusion model (MLD). By employing one-step (or… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: MotionLCM project version 1.0

  35. arXiv:2404.19722  [pdf, other

    cs.CV

    PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios

    Authors: Jingbo Wang, Zhengyi Luo, Ye Yuan, Yixuan Li, Bo Dai

    Abstract: We address the challenge of content diversity and controllability in pedestrian simulation for driving scenarios. Recent pedestrian animation frameworks have a significant limitation wherein they primarily focus on either following trajectory [46] or the content of the reference video [57], consequently overlooking the potential diversity of human motion within such scenarios. This limitation rest… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  36. RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting

    Authors: Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou

    Abstract: We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant col… ▽ More

    Submitted 8 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: To be published in ACM SIGGRAPH 2024

  37. arXiv:2404.19597  [pdf, other

    cs.CL cs.CR

    Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning

    Authors: Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Minervini, Pontus Stenetorp, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: The implications of backdoor attacks on English-centric large language models (LLMs) have been widely examined - such attacks can be achieved by embedding malicious behaviors during training and activated under specific conditions that trigger malicious outputs. However, the impact of backdoor attacks on multilingual models remains under-explored. Our research focuses on cross-lingual backdoor att… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: work in progress

  38. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  39. arXiv:2404.19438  [pdf, other

    cs.NE

    Neuro-Vision to Language: Image Reconstruction and Language enabled Interaction via Brain Recordings

    Authors: Guobin Shen, Dongcheng Zhao, Xiang He, Linghao Feng, Yiting Dong, Jihang Wang, Qian Zhang, Yi Zeng

    Abstract: Decoding non-invasive brain recordings is crucial for advancing our understanding of human cognition, yet faces challenges from individual differences and complex neural signal representations. Traditional methods require custom models and extensive trials, and lack interpretability in visual reconstruction tasks. Our framework integrating integrates 3D brain structures with visual semantics by Vi… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  40. arXiv:2404.19316  [pdf, other

    cs.CL

    QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering

    Authors: Sheng Ouyang, Jianzong Wang, Yong Zhang, Zhitao Li, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Extractive Question Answering (EQA) in Machine Reading Comprehension (MRC) often faces the challenge of dealing with semantically identical but format-variant inputs. Our work introduces a novel approach, called the ``Query Latent Semantic Calibrator (QLSC)'', designed as an auxiliary module for existing MRC models. We propose a unique scaling strategy to capture latent semantic center features of… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  41. arXiv:2404.19277  [pdf, other

    cs.CV

    Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model

    Authors: Wentao Lei, Li Liu, Jun Wang

    Abstract: Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings, enabling people with hearing impairments to communicate efficiently. CS video generation aims to produce specific lip and gesture movements of CS from audio or text inputs. The main challenge is that given limited CS data, we strive to simultaneously generate fine-grained hand and finger… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Journal ref: IJCAI 2024

  42. arXiv:2404.19214  [pdf, other

    cs.SD eess.AS

    EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

    Authors: Jianzong Wang, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  43. arXiv:2404.19212  [pdf, other

    cs.SD eess.AS

    EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning

    Authors: Ziqi Liang, Jianzong Wang, Xulong Zhang, Yong Zhang, Ning Cheng, Jing Xiao

    Abstract: Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot research topic. Existing works generally take into account disentangling speech components through human-crafted bottleneck features which can not achieve sufficient information disentangling, while pitch and rhythm may still be mixed together. There is a risk of informat… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  44. arXiv:2404.19187  [pdf, other

    cs.SD eess.AS

    CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition

    Authors: Jianzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Singing voice beautifying is a novel task that has application value in people's daily life, aiming to correct the pitch of the singing voice and improve the expressiveness without changing the original timbre and content. Existing methods rely on paired data or only concentrate on the correction of pitch. However, professional songs and amateur songs from the same person are hard to obtain, and s… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  45. arXiv:2404.19180  [pdf, other

    cs.AR

    MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-core Processor

    Authors: Bingcai Sui, Junzhong Shen, Caixia Sun, Junhui Wang, Zhong Zheng, Wei Guo

    Abstract: General-purpose processor vendors have integrated customized accelerator in their products due to the widespread use of General Matrix-Matrix Multiplication (GEMM) kernels. However, it remains a challenge to further improve the flexibilityand scalability of these GEMM-enhanced processors to cater to the emerging large-scale GEMM workloads. In this paper we propose MACO, a novel loosely-coupled mul… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  46. arXiv:2404.18946  [pdf, other

    physics.optics cs.IR eess.IV

    Align-Free Multi-Plane Phase Retrieval

    Authors: Jiabao Wang, Yang Wu, Jun Wang, Ni Chen

    Abstract: The multi-plane phase retrieval method provides a budget-friendly and effective way to perform phase imaging, yet it often encounters alignment challenges due to shifts along the optical axis in experiments. Traditional methods, such as employing beamsplitters instead of mechanical stage movements or adjusting focus using tunable light sources, add complexity to the setup required for multi-plane… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  47. arXiv:2404.18501  [pdf, other

    eess.AS cs.SD

    Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

    Authors: Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li

    Abstract: Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech from the audio mixture given auxiliary visual cues. Previous methods usually search for the target voice through speech-lip synchronization. However, this strategy mainly focuses on the existence of target speech, while ignoring the variations of the noise characteristics. That may result in extracting noi… ▽ More

    Submitted 8 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  48. arXiv:2404.17876  [pdf, other

    cs.CV

    DF-SLAM: Neural Feature Rendering Based on Dictionary Factors Representation for High-Fidelity Dense Visual SLAM System

    Authors: Weifeng Wei, Jie Wang

    Abstract: We introduce a high-fidelity neural implicit dense visual Simultaneous Localization and Mapping (SLAM) system, termed DF-SLAM. In our work, we employ dictionary factors for scene representation, encoding the geometry and appearance information of the scene as a combination of basis and coefficient factors. Compared to neural implicit SLAM methods that directly encode scene information as features,… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  49. arXiv:2404.17749  [pdf, other

    cs.AI cs.CL

    UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt -- A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis

    Authors: Parth Vashisht, Abhilasha Lodha, Mukta Maddipatla, Zonghai Yao, Avijit Mitra, Zhichao Yang, Junda Wang, Sunjae Kwon, Hong Yu

    Abstract: This paper presents our team's participation in the MEDIQA-ClinicalNLP2024 shared task B. We present a novel approach to diagnosing clinical dermatology cases by integrating large multimodal models, specifically leveraging the capabilities of GPT-4V under a retriever and a re-ranker framework. Our investigation reveals that GPT-4V, when used as a retrieval agent, can accurately retrieve the correc… ▽ More

    Submitted 8 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL-ClinicalNLP workshop 2024

  50. arXiv:2404.17642  [pdf, other

    cs.CL cs.AI

    Empowering Large Language Models for Textual Data Augmentation

    Authors: Yichuan Li, Kaize Ding, Jianling Wang, Kyumin Lee

    Abstract: With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on the augmentation instructions provided, and the effectiveness can fluctuate across different downstream tasks. While manually crafting and selecting instructio… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.