Skip to main content

Showing 1–50 of 1,260 results for author: Yu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

    Authors: Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma

    Abstract: We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure fo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2024

    Journal ref: International Conference on Learning Representations (ICLR), 2024

  2. arXiv:2405.04108  [pdf, other

    cs.CR cs.AI

    A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model

    Authors: Tianxiu Xie, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo

    Abstract: Recent booming development of Generative Artificial Intelligence (GenAI) has facilitated an emerging model commercialization for the purpose of reinforcement on model performance, such as licensing or trading Deep Neural Network (DNN) models. However, DNN model trading may trigger concerns of the unauthorized replications or misuses over the model, so that the benefit of the model ownership will b… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2405.03064  [pdf, other

    cs.LG cs.AI cs.CR

    RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

    Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing

    Abstract: Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for re… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  4. I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis

    Authors: Haofei Song, Xintian Mao, Jing Yu, Qingli Li, Yan Wang

    Abstract: Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution fr… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  5. arXiv:2405.02696  [pdf, other

    cs.CR cs.AI

    DiffuseTrace: A Transparent and Flexible Watermarking Scheme for Latent Diffusion Model

    Authors: Liangqi Lei, Keke Gai, Jing Yu, Liehuang Zhu

    Abstract: Latent Diffusion Models (LDMs) enable a wide range of applications but raise ethical concerns regarding illegal utilization.Adding watermarks to generative model outputs is a vital technique employed for copyright tracking and mitigating potential risks associated with AI-generated content. However, post-hoc watermarking techniques are susceptible to evasion. Existing watermarking methods for LDMs… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  6. arXiv:2405.02288  [pdf, other

    cs.CV cs.AI cs.RO

    Prospective Role of Foundation Models in Advancing Autonomous Vehicles

    Authors: Jianhua Wu, Bingzhao Gao, Jincheng Gao, Jianhao Yu, Hongqing Chu, Qiankun Yu, Xun Gong, Yi Chang, H. Eric Tseng, Hong Chen, Jie Chen

    Abstract: With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, CLIP, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhance scene understanding and reason… ▽ More

    Submitted 8 December, 2023; originally announced May 2024.

    Comments: 36 pages,5 figures

  7. arXiv:2405.00362  [pdf, other

    cs.RO cs.CG cs.GR

    Implicit Swept Volume SDF: Enabling Continuous Collision-Free Trajectory Generation for Arbitrary Shapes

    Authors: Jingping Wang, Tingrui Zhang, Qixuan Zhang, Chuxiao Zeng, Jingyi Yu, Chao Xu, Lan Xu, Fei Gao

    Abstract: In the field of trajectory generation for objects, ensuring continuous collision-free motion remains a huge challenge, especially for non-convex geometries and complex environments. Previous methods either oversimplify object shapes, which results in a sacrifice of feasible space or rely on discrete sampling, which suffers from the "tunnel effect". To address these limitations, we propose a novel… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: accecpted by SIGGRAPH2024&TOG. Joint First Authors: Jingping Wang,Tingrui Zhang, Joint Corresponding authors: Fei Gao, Lan Xu

  8. arXiv:2404.19364  [pdf, other

    cs.CL

    Navigating Brain Language Representations: A Comparative Analysis of Neural Language Models and Psychologically Plausible Models

    Authors: Yunhao Zhang, Shaonan Wang, Xinyi Dong, Jiajun Yu, Chengqing Zong

    Abstract: Neural language models, particularly large-scale ones, have been consistently proven to be most effective in predicting brain neural activity across a range of studies. However, previous research overlooked the comparison of these models with psychologically plausible ones. Moreover, evaluations were reliant on limited, single-modality, and English cognitive datasets. To address these questions, w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  9. arXiv:2404.19246  [pdf

    cs.CR cs.AR

    Logistic Map Pseudo Random Number Generator in FPGA

    Authors: Mateo Jalen Andrew Calderon, Lee Jun Lei Lucas, Syarifuddin Azhar Bin Rosli, Stephanie See Hui Ying, Jarell Lim En Yu, Maoyang Xiang, T. Hui Teo

    Abstract: This project develops a pseudo-random number generator (PRNG) using the logistic map, implemented in Verilog HDL on an FPGA and processes its output through a Central Limit Theorem (CLT) function to achieve a Gaussian distribution. The system integrates additional FPGA modules for real-time interaction and visualisation, including a clock generator, UART interface, XADC, and a 7-segment display dr… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures

  10. arXiv:2404.18961  [pdf, other

    cs.LG cs.AI cs.CV

    Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras

    Authors: Jun Yu, Yutong Dai, Xiaokang Liu, Jin Huang, Yishan Shen, Ke Zhang, Rong Zhou, Eashan Adhikarla, Wenxuan Ye, Yixin Liu, Zhaoming Kong, Kai Zhang, Yilong Yin, Vinod Namboodiri, Brian D. Davison, Jason H. Moore, Yong Chen

    Abstract: MTL is a learning paradigm that effectively leverages both task-specific and shared information to address multiple related tasks simultaneously. In contrast to STL, MTL offers a suite of benefits that enhance both the training process and the inference efficiency. MTL's key advantages encompass streamlined model architecture, performance enhancement, and cross-domain generalizability. Over the pa… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 60 figures, 116 pages, 500+ references

  11. M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework

    Authors: Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai

    Abstract: Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive multi-… ▽ More

    Submitted 7 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  12. arXiv:2404.18245  [pdf, other

    cs.CV

    FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method

    Authors: Yanbing Bai, Rui-Yang Ju, Siao Li, Zihao Yang, Jinze Yu

    Abstract: Illegal, unreported, and unregulated (IUU) fishing seriously affects various aspects of human life. However, current methods for detecting and monitoring IUU activities at sea have limitations. While Synthetic Aperture Radar (SAR) can complement existing vessel detection systems, extracting useful information from SAR images using traditional methods, especially for IUU fishing identification, pos… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  13. arXiv:2404.18235  [pdf, other

    cs.CV

    Flood Data Analysis on SpaceNet 8 Using Apache Sedona

    Authors: Yanbing Bai, Zihao Yang, Jinze Yu, Rui-Yang Ju, Bin Yang, Erick Mas, Shunichi Koshimura

    Abstract: With the escalating frequency of floods posing persistent threats to human life and property, satellite remote sensing has emerged as an indispensable tool for monitoring flood hazards. SpaceNet8 offers a unique opportunity to leverage cutting-edge artificial intelligence technologies to assess these hazards. A significant contribution of this research is its application of Apache Sedona, an advan… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  14. arXiv:2404.17890  [pdf, other

    eess.IV cs.AI cs.CV

    DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

    Authors: Chenhe Du, Xiyue Lin, Qing Wu, Xuanyu Tian, Ying Su, Zhe Luo, Hongjiang Wei, S. Kevin Zhou, Jingyi Yu, Yuyao Zhang

    Abstract: Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages, 10 figures

    ACM Class: I.2.10; I.4.5

  15. arXiv:2404.17844  [pdf, other

    cs.IR

    Towards Robust Recommendation: A Review and an Adversarial Robustness Evaluation Library

    Authors: Lei Cheng, Xiaowen Huang, Jitao Sang, Jian Yu

    Abstract: Recently, recommender system has achieved significant success. However, due to the openness of recommender systems, they remain vulnerable to malicious attacks. Additionally, natural noise in training data and issues such as data sparsity can also degrade the performance of recommender systems. Therefore, enhancing the robustness of recommender systems has become an increasingly important research… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  16. arXiv:2404.17381  [pdf, other

    cs.CV

    Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

    Authors: Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

    Abstract: We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalo… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  17. arXiv:2404.16824  [pdf, other

    cs.CV

    V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

    Authors: Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

    Abstract: AI-generated video has revolutionized short video production, filmmaking, and personalized media, making video local editing an essential tool. However, this progress also blurs the line between reality and fiction, posing challenges in multimedia forensics. To solve this urgent issue, V2A-Mark is proposed to address the limitations of current video tampering forensics, such as poor generalizabili… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  18. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  19. arXiv:2404.15380  [pdf, other

    cs.LG cs.AI

    ControlTraj: Controllable Trajectory Generation with Topology-Constrained Diffusion Model

    Authors: Yuanshao Zhu, James Jianqiao Yu, Xiangyu Zhao, Qidong Liu, Yongchao Ye, Wei Chen, Zijian Zhang, Xuetao Wei, Yuxuan Liang

    Abstract: Generating trajectory data is among promising solutions to addressing privacy concerns, collection costs, and proprietary restrictions usually associated with human mobility analyses. However, existing trajectory generation methods are still in their infancy due to the inherent diversity and unpredictability of human activities, grappling with issues such as fidelity, flexibility, and generalizabi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  20. arXiv:2404.14692  [pdf, other

    cs.SI cs.DB physics.soc-ph

    Deep Overlapping Community Search via Subspace Embedding

    Authors: Qing Sima, Jianke Yu, Xiaoyang Wang, Wenjie Zhang, Ying Zhang, Xuemin Lin

    Abstract: Community search (CS) aims to identify a set of nodes based on a specified query, leveraging structural cohesiveness and attribute homogeneity. This task enjoys various applications, ranging from fraud detection to recommender systems. In contrast to algorithm-based approaches, graph neural network (GNN) based methods define communities using ground truth labels, leveraging prior knowledge to expl… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  21. arXiv:2404.14527  [pdf, other

    cs.DC cs.LG

    Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

    Authors: Tyler Griggs, Xiaoxuan Liu, Jiaxiang Yu, Doyoung Kim, Wei-Lin Chiang, Alvin Cheung, Ion Stoica

    Abstract: Large language models (LLMs) are increasingly integrated into many online services. However, a major challenge in deploying LLMs is their high cost, due primarily to the use of expensive GPU instances. To address this problem, we find that the significant heterogeneity of GPU types presents an opportunity to increase GPU cost efficiency and reduce deployment costs. The broad and growing market of… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  22. arXiv:2404.13522  [pdf, other

    cs.AI cs.LG stat.ML

    Error Analysis of Shapley Value-Based Model Explanations: An Informative Perspective

    Authors: Ningsheng Zhao, Jia Yuan Yu, Krzysztof Dzieciolowski, Trang Bui

    Abstract: Shapley value attribution is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement Shapley value attributions have some drawbacks. Due to these drawbacks, the resulting Shapley value attributions may provide biased or unreliable explanations, which fail to… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  23. arXiv:2404.12317  [pdf

    cs.CE cs.AI cs.CY cs.HC cs.MA

    Large Language Models for Synthetic Participatory Planning of Synergistic Transportation Systems

    Authors: Jiangbo Yu

    Abstract: Unleashing the synergies of rapidly evolving mobility technologies in a multi-stakeholder landscape presents unique challenges and opportunities for addressing urban transportation problems. This paper introduces a novel synthetic participatory method, critically leveraging large language models (LLMs) to create digital avatars representing diverse stakeholders to plan shared automated electric mo… ▽ More

    Submitted 23 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  24. arXiv:2404.11958  [pdf, other

    cs.CV cs.RO

    Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation

    Authors: Song Wang, Jiawei Yu, Wentong Li, Wenyu Liu, Xiaolu Liu, Junbo Chen, Jianke Zhu

    Abstract: Semantic scene completion, also known as semantic occupancy prediction, can provide dense geometric and semantic information for autonomous vehicles, which attracts the increasing attention of both academia and industry. Unfortunately, existing methods usually formulate this task as a voxel-wise classification problem and treat each voxel equally in 3D space during training. As the hard voxels hav… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  25. arXiv:2404.11449  [pdf, other

    cs.CL cs.LG

    AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

    Authors: Meng Jiang, Yi Jing Yu, Qing Zhao, Jianqiang Li, Changwei Song, Hongzhi Qi, Wei Zhai, Dan Luo, Xiaoqin Wang, Guanghui Fu, Bing Xiang Yang

    Abstract: Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  26. Research on emotionally intelligent dialogue generation based on automatic dialogue system

    Authors: Jin Wang, JinFei Wang, Shuying Dai, Jiqiang Yu, Keqin Li

    Abstract: Automated dialogue systems are important applications of artificial intelligence, and traditional systems struggle to understand user emotions and provide empathetic feedback. This study integrates emotional intelligence technology into automated dialogue systems and creates a dialogue generation model with emotional intelligence through deep learning and natural language processing techniques. Th… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  27. arXiv:2404.10378  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

    Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

    Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.10476

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

  28. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  29. arXiv:2404.09822  [pdf

    cs.RO

    GuLu XuanYuan , a biomimetic Transformer that intergrates humanoid MIP, reptile UGV, and bird UAV

    Authors: Le Chen, Jie Yu, XingWu Chen

    Abstract: This article proposes a multi habitat bio-mimetic robot, named as GuLu XuanYuan.It combines all common types of mobile robots, namely humanoid MIP, unmanned ground vehicle, and unmanned aerial vehicle. These 3 modals imitate human, bird, and reptile, separately. As a transformer, GuLu XuanYuan can transform from one modal to another. Transforming function integrates the specialized abilities of th… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 5 pages, 4 figures

    ACM Class: I.2.9

  30. arXiv:2404.09748  [pdf, other

    cs.CV cs.GR

    LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives

    Authors: Jiadi Cui, Junming Cao, Yuhui Zhong, Liao Wang, Fuqiang Zhao, Penghao Wang, Yifan Chen, Zhipeng He, Lan Xu, Yujiao Shi, Yingliang Zhang, Jingyi Yu

    Abstract: Large garages are ubiquitous yet intricate scenes in our daily lives, posing challenges characterized by monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction fail in these environments due to poor correspondence construction. To address these challenges, this pap… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Project Page: https://jdtsui.github.io/letsgo/

  31. arXiv:2404.09472  [pdf

    cs.CV

    Q2A: Querying Implicit Fully Continuous Feature Pyramid to Align Features for Medical Image Segmentation

    Authors: Jiahao Yu, Li Chen

    Abstract: Recent medical image segmentation methods apply implicit neural representation (INR) to the decoder for achieving a continuous coordinate decoding to tackle the drawback of conventional discrete grid-based data representations. However, the INR-based decoder cannot well handle the feature misalignment problem brought about by the naive latent code acquisition strategy in INR. Although there exist… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures

  32. arXiv:2404.08563  [pdf, other

    cs.RO

    FusionPortableV2: A Unified Multi-Sensor Dataset for Generalized SLAM Across Diverse Platforms and Scalable Environments

    Authors: Hexiang Wei, Jianhao Jiao, Xiangcheng Hu, Jingwen Yu, Xupeng Xie, Jin Wu, Yilong Zhu, Yuxuan Liu, Lujia Wang, Ming Liu

    Abstract: Simultaneous Localization and Mapping (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM data… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 20 pages, 17 figures, 7 tables. Submitted for IJRR dataset paper

  33. arXiv:2404.06758  [pdf, other

    cs.RO

    Toward Holistic Planning and Control Optimization for Dual-Arm Rearrangement

    Authors: Kai Gao, Zihe Ye, Duo Zhang, Baichuan Huang, Jingjin Yu

    Abstract: Long-horizon task and motion planning (TAMP) is notoriously difficult to solve, let alone optimally, due to the tight coupling between the interleaved (discrete) task and (continuous) motion planning phases, where each phase on its own is frequently an NP-hard or even PSPACE-hard computational challenge. In this study, we tackle the even more challenging goal of jointly optimizing task and motion… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: First three authors made equal contributions to this study

  34. arXiv:2404.06742  [pdf, other

    cs.CL

    Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking

    Authors: Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang

    Abstract: Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations. Current factuality probes, trained using humanannotated labels, exhibit limited transferability to out-of-distribution content, while online selfconsistency checking imposes extensive computation burden due to the necessity of generating multiple outputs. This paper pro… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  35. arXiv:2404.06691  [pdf

    q-bio.BM cs.LG cs.NE

    Latent Chemical Space Searching for Plug-in Multi-objective Molecule Generation

    Authors: Ningfeng Liu, Jie Yu, Siyu Xiu, Xinfang Zhao, Siyu Lin, Bo Qiang, Ruqiu Zheng, Hongwei Jin, Liangren Zhang, Zhenming Liu

    Abstract: Molecular generation, an essential method for identifying new drug structures, has been supported by advancements in machine learning and computational technology. However, challenges remain in multi-objective generation, model adaptability, and practical application in drug discovery. In this study, we developed a versatile 'plug-in' molecular generation model that incorporates multiple objective… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  36. arXiv:2404.06621  [pdf, other

    cs.CL

    What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models

    Authors: Jeongrok Yu, Seong Ug Kim, Jacob Choi, Jinho D. Choi

    Abstract: Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (MLMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper p… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  37. arXiv:2404.06078  [pdf, other

    cs.IR

    End-to-end training of Multimodal Model and ranking Model

    Authors: Xiuqi Deng, Lu Xu, Xiyao Li, Jinkai Yu, Erpeng Xue, Zhongyuan Wang, Di Zhang, Zhaojie Liu, Guorui Zhou, Yang Song, Na Mou, Shen Jiang, Han Li

    Abstract: Traditional recommender systems heavily rely on ID features, which often encounter challenges related to cold-start and generalization. Modeling pre-extracted content features can mitigate these issues, but is still a suboptimal solution due to the discrepancies between training tasks and model parameters. End-to-end training presents a promising solution for these problems, yet most of the existi… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 9 pages, 8 figures

  38. arXiv:2404.06039  [pdf, other

    cs.HC

    Breathing New Life into Existing Visualizations: A Natural Language-Driven Manipulation Framework

    Authors: Can Liu, Jiacheng Yu, Yuhan Guo, Jiayi Zhuang, Yuchu Luo, Xiaoru Yuan

    Abstract: We propose an approach to manipulate existing interactive visualizations to answer users' natural language queries. We analyze the natural language tasks and propose a design space of a hierarchical task structure, which allows for a systematic decomposition of complex queries. We introduce a four-level visualization manipulation space to facilitate in-situ manipulations for visualizations, enabli… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 21 pages

  39. arXiv:2404.04947  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Gull: A Generative Multifunctional Audio Codec

    Authors: Yi Luo, Jianwei Yu, Hangting Chen, Rongzhi Gu, Chao Weng

    Abstract: We introduce Gull, a generative multifunctional audio codec. Gull is a general purpose neural audio compression and decompression model which can be applied to a wide range of tasks and applications such as real-time communication, audio super-resolution, and codec language models. The key components of Gull include (1) universal-sample-rate modeling via subband modeling schemes motivated by recen… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Demo page: https://yluo42.github.io/Gull/

  40. arXiv:2404.04890  [pdf, other

    cs.CV

    A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals

    Authors: Jiangnan Tang, Jingya Wang, Kaiyang Ji, Lan Xu, Jingyi Yu, Ye Shi

    Abstract: Estimating full-body human motion via sparse tracking signals from head-mounted displays and hand controllers in 3D scenes is crucial to applications in AR/VR. One of the biggest challenges to this task is the one-to-many mapping from sparse observations to dense full-body motions, which endowed inherent ambiguities. To help resolve this ambiguous problem, we introduce a new framework to combine r… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  41. arXiv:2404.04818  [pdf, other

    cs.AI cs.CL cs.CV

    DWE+: Dual-Way Matching Enhanced Framework for Multimodal Entity Linking

    Authors: Shezheng Song, Shasha Li, Shan Zhao, Xiaopeng Li, Chengyu Wang, Jie Yu, Jun Ma, Tianwei Yan, Bin Ji, Xiaoguang Mao

    Abstract: Multimodal entity linking (MEL) aims to utilize multimodal information (usually textual and visual information) to link ambiguous mentions to unambiguous entities in knowledge base. Current methods facing main issues: (1)treating the entire image as input may contain redundant information. (2)the insufficient utilization of entity-related information, such as attributes in images. (3)semantic inco… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: under review on TOIS. arXiv admin note: substantial text overlap with arXiv:2312.11816

  42. arXiv:2404.03869  [pdf, other

    cs.LG cs.AI cs.MA cs.RO eess.SY

    Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

    Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan

    Abstract: The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is reshaping our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, r… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  43. arXiv:2404.03577  [pdf, other

    cs.CL

    Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

    Authors: Yantao Liu, Zijun Yao, Xin Lv, Yuchen Fan, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li

    Abstract: Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external know… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024 as long paper

  44. arXiv:2404.03532  [pdf, other

    cs.CL

    Evaluating Generative Language Models in Information Extraction as Subjective Question Correction

    Authors: Yuchen Fan, Yantao Liu, Zijun Yao, Jifan Yu, Lei Hou, Juanzi Li

    Abstract: Modern Large Language Models (LLMs) have showcased remarkable prowess in various tasks necessitating sophisticated cognitive behaviors. Nevertheless, a paradoxical performance discrepancy is observed, where these models underperform in seemingly elementary tasks like relation extraction and event extraction due to two issues in conventional evaluation. (1) The imprecision of existing evaluation me… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024, short paper

  45. arXiv:2404.03491  [pdf, other

    cs.CL cs.AI

    A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

    Authors: Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li

    Abstract: Empowered by the large-scale pretrained language models, existing dialogue systems have demonstrated impressive performance conducting fluent and natural-sounding conversations. However, they are still plagued by the hallucination problem, causing unpredictable factual errors in the generated responses. Recently, knowledge-grounded dialogue generation models, that intentionally invoke external kno… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024

  46. arXiv:2404.02934  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    GreedLlama: Performance of Financial Value-Aligned Large Language Models in Moral Reasoning

    Authors: Jeffy Yu, Maximilian Huber, Kevin Tang

    Abstract: This paper investigates the ethical implications of aligning Large Language Models (LLMs) with financial optimization, through the case study of GreedLlama, a model fine-tuned to prioritize economically beneficial outcomes. By comparing GreedLlama's performance in moral reasoning tasks to a base Llama2 model, our results highlight a concerning trend: GreedLlama demonstrates a marked preference for… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 9 pages, 1 figure

  47. arXiv:2404.02638  [pdf, other

    cs.CV

    SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation

    Authors: Junyan Ye, Qiyan Luo, Jinhua Yu, Huaping Zhong, Zhimeng Zheng, Conghui He, Weijia Li

    Abstract: This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i.e., using satellite and street-view image pairs. The main challenge lies in overcoming the significant perspective differences between street views and satellite views. In this work, we introduce SG-BEV, a novel approach for satellite-guided BEV fusion for cross-view semantic segmentation. To over… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: accepted by CVPR 2024

  48. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  49. arXiv:2404.01679  [pdf, other

    cs.CL cs.SI physics.soc-ph

    Event Detection from Social Media for Epidemic Prediction

    Authors: Tanmay Parekh, Anh Mac, Jiarui Yu, Yuxuan Dong, Syed Shahriar, Bonnie Liu, Eric Yang, Kuan-Hao Huang, Wei Wang, Nanyun Peng, Kai-Wei Chang

    Abstract: Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by de… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024

  50. arXiv:2404.00921  [pdf, other

    cs.CV

    Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting

    Authors: Beomyoung Kim, Myeong Yeon Yi, Joonsang Yu, Young Joon Yoo, Sung Ju Hwang

    Abstract: This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge f… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Preprint, 15 pages, 13 figures