Skip to main content

Showing 1–50 of 3,575 results for author: Yang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05758  [pdf, other

    cs.HC cs.CL cs.CY

    Exploring the Potential of Human-LLM Synergy in Advancing Qualitative Analysis: A Case Study on Mental-Illness Stigma

    Authors: Han Meng, Yitian Yang, Yunan Li, Jungup Lee, Yi-Chieh Lee

    Abstract: Qualitative analysis is a challenging, yet crucial aspect of advancing research in the field of Human-Computer Interaction (HCI). Recent studies show that large language models (LLMs) can perform qualitative coding within existing schemes, but their potential for collaborative human-LLM discovery and new insight generation in qualitative analysis is still underexplored. To bridge this gap and adva… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 55 pages

  2. arXiv:2405.05513  [pdf

    cs.CL cs.DM

    Automatic question generation for propositional logical equivalences

    Authors: Yicheng Yang, Xinyu Wang, Haoming Yu, Zhiyuan Li

    Abstract: The increase in academic dishonesty cases among college students has raised concern, particularly due to the shift towards online learning caused by the pandemic. We aim to develop and implement a method capable of generating tailored questions for each student. The use of Automatic Question Generation (AQG) is a possible solution. Previous studies have investigated AQG frameworks in education, wh… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.05363  [pdf, other

    cs.CV cs.RO

    LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

    Authors: Tianrui Guan, Yurou Yang, Harry Cheng, Muyuan Lin, Richard Kim, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

    Abstract: In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stabilit… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to ICRA 2024

  4. arXiv:2405.05022  [pdf, other

    cs.CR cs.SI

    Adversarial Threats to Automatic Modulation Open Set Recognition in Wireless Networks

    Authors: Yandie Yang, Sicheng Zhang, Kuixian Li, Qiao Tian, Yun Lin

    Abstract: Automatic Modulation Open Set Recognition (AMOSR) is a crucial technological approach for cognitive radio communications, wireless spectrum management, and interference monitoring within wireless networks. Numerous studies have shown that AMR is highly susceptible to minimal perturbations carefully designed by malicious attackers, leading to misclassification of signals. However, the adversarial s… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  5. arXiv:2405.04955  [pdf, other

    cs.CL cs.AI

    Improving Long Text Understanding with Knowledge Distilled from Summarization Model

    Authors: Yan Liu, Yazheng Yang, Xiaokang Chen

    Abstract: Long text understanding is important yet challenging for natural language processing. A long article or document usually contains many redundant words that are not pertinent to its gist and sometimes can be regarded as noise. With recent advances of abstractive summarization, we propose our \emph{Gist Detector} to leverage the gist detection ability of a summarization model and integrate the extra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2110.04741

  6. arXiv:2405.04821  [pdf, other

    cs.RO eess.SY

    ATDM:An Anthropomorphic Aerial Tendon-driven Manipulator with Low-Inertia and High-Stiffness

    Authors: Quman Xu, Zhan Li, Hai Li, Xinghu Yu, Yipeng Yang

    Abstract: Aerial Manipulator Systems (AMS) have garnered significant interest for their utility in aerial operations. Nonetheless, challenges related to the manipulator's limited stiffness and the coupling disturbance with manipulator movement persist. This paper introduces the Aerial Tendon-Driven Manipulator (ATDM), an innovative AMS that integrates a hexrotor Unmanned Aerial Vehicle (UAV) with a 4-degree… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  7. arXiv:2405.04245  [pdf, other

    cs.LG cs.AI

    Exploring Correlations of Self-supervised Tasks for Graphs

    Authors: Taoran Fang, Wei Zhou, Yifei Sun, Kaiqiao Han, Lvbin Ma, Yang Yang

    Abstract: Graph self-supervised learning has sparked a research surge in training informative representations without accessing any labeled data. However, our understanding of graph self-supervised learning remains limited, and the inherent relationships between various self-supervised tasks are still unexplored. Our paper aims to provide a fresh understanding of graph self-supervised learning based on task… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2405.03519  [pdf, other

    cs.CV

    Low-light Object Detection

    Authors: Pengpeng Li, Haowei Gu, Yang Yang

    Abstract: In this competition we employed a model fusion approach to achieve object detection results close to those of real images. Our method is based on the CO-DETR model, which was trained on two sets of data: one containing images under dark conditions and another containing images enhanced with low-light conditions. We used various enhancement techniques on the test data to generate multiple sets of p… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  9. arXiv:2405.03352  [pdf, other

    cs.CV

    Salient Object Detection From Arbitrary Modalities

    Authors: Nianchang Huang, Yang Yang, Ruida Xi, Qiang Zhang, Jungong Han, Jin Huang

    Abstract: Toward desirable saliency prediction, the types and numbers of inputs for a salient object detection (SOD) algorithm may dynamically change in many real-life applications. However, existing SOD algorithms are mainly designed or trained for one particular type of inputs, failing to be generalized to other types of inputs. Consequentially, more types of SOD algorithms need to be prepared in advance… ▽ More

    Submitted 9 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 15 Pages, 7 Figures, 8 Tables

  10. arXiv:2405.03351  [pdf, other

    cs.CV

    Modality Prompts for Arbitrary Modality Salient Object Detection

    Authors: Nianchang Huang, Yang Yang, Qiang Zhang, Jungong Han, Jin Huang

    Abstract: This paper delves into the task of arbitrary modality salient object detection (AM SOD), aiming to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD, ie more diverse modality discrepancies caused by varying modality types that need to be… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 Figures, 3 Tables

  11. arXiv:2405.03167  [pdf, other

    cs.IR

    TF4CTR: Twin Focus Framework for CTR Prediction via Adaptive Sample Differentiation

    Authors: Honghao Li, Yiwen Zhang, Yi Zhang, Lei Sang, Yun Yang

    Abstract: Effective feature interaction modeling is critical for enhancing the accuracy of click-through rate (CTR) prediction in industrial recommender systems. Most of the current deep CTR models resort to building complex network architectures to better capture intricate feature interactions or user behaviors. However, we identify two limitations in these models: (1) the samples given to the model are un… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  12. X-SLAM: Scalable Dense SLAM for Task-aware Optimization using CSFD

    Authors: Zhexi Peng, Yin Yang, Tianjia Shao, Chenfanfu Jiang, Kun Zhou

    Abstract: We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters throug… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: To be published in ACM SIGGRAPH 2024

  13. arXiv:2405.02151  [pdf, other

    cs.SD cs.AI eess.AS

    GMP-ATL: Gender-augmented Multi-scale Pseudo-label Enhanced Adaptive Transfer Learning for Speech Emotion Recognition via HuBERT

    Authors: Yu Pan, Yuguang Yang, Heng Lu, Lei Ma, Jianjun Zhao

    Abstract: The continuous evolution of pre-trained speech models has greatly advanced Speech Emotion Recognition (SER). However, there is still potential for enhancement in the performance of these methods. In this paper, we present GMP-ATL (Gender-augmented Multi-scale Pseudo-label Adaptive Transfer Learning), a novel HuBERT-based adaptive transfer learning framework for SER. Specifically, GMP-ATL initially… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  14. arXiv:2405.01992  [pdf, other

    cs.CV

    SFFNet: A Wavelet-Based Spatial and Frequency Domain Fusion Network for Remote Sensing Segmentation

    Authors: Yunsong Yang, Genji Yuan, Jinjiang Li

    Abstract: In order to fully utilize spatial information for segmentation and address the challenge of handling areas with significant grayscale variations in remote sensing segmentation, we propose the SFFNet (Spatial and Frequency Domain Fusion Network) framework. This framework employs a two-stage network design: the first stage extracts features using spatial methods to obtain features with sufficient sp… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  15. arXiv:2405.01588  [pdf, other

    cs.CL cs.AI

    Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL

    Authors: Yongjin Yang, Sihyeon Kim, SangMook Kim, Gyubok Lee, Se-Young Yun, Edward Choi

    Abstract: Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identi… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

    Comments: DPFM Workshop, ICLR 2024

  16. arXiv:2405.01563  [pdf, other

    cs.LG cs.AI cs.CL

    Mitigating LLM Hallucinations via Conformal Abstention

    Authors: Yasin Abbasi Yadkori, Ilja Kuzborskij, David Stutz, András György, Adam Fisch, Arnaud Doucet, Iuliya Beloshapka, Wei-Hung Weng, Yao-Yuan Yang, Csaba Szepesvári, Ali Taylan Cemgil, Nenad Tomasev

    Abstract: We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answer. Building on earlier approaches that use self-consistency as a more reliable measure of model confidence, we propose using the LLM itself to self-e… ▽ More

    Submitted 4 April, 2024; originally announced May 2024.

  17. arXiv:2405.01202  [pdf, other

    cs.SE cs.CR

    DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection

    Authors: Yanjing Yang, Xin Zhou, Runfeng Mao, Jinwei Xu, Lanxin Yang, Yu Zhangm, Haifeng Shen, He Zhang

    Abstract: Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge due to the complex structure of source code, the bla… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 15 pages, 8 figures

  18. arXiv:2405.00675  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Self-Play Preference Optimization for Language Model Alignment

    Authors: Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu

    Abstract: Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language mo… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 25 pages, 4 figures, 5 tables

  19. arXiv:2405.00077  [pdf, other

    cs.LG eess.SP

    BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

    Authors: Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

    Abstract: Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samp… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  20. RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting

    Authors: Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou

    Abstract: We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant col… ▽ More

    Submitted 8 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: To be published in ACM SIGGRAPH 2024

  21. arXiv:2404.19527  [pdf, other

    cs.CV

    Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

    Authors: Yunbing Jia, Xiaoyu Kong, Fan Tang, Yixing Gao, Weiming Dong, Yi Yang

    Abstract: In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via i… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  22. arXiv:2404.19489  [pdf, other

    cs.CV cs.AR cs.ET cs.NE

    EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision

    Authors: Yufeng Yang, Adrian Kneip, Charlotte Frenkel

    Abstract: Edge vision systems combining sensing and embedded processing promise low-latency, decentralized, and energy-efficient solutions that forgo reliance on the cloud. As opposed to conventional frame-based vision sensors, event-based cameras deliver a microsecond-scale temporal resolution with sparse information encoding, thereby outlining new opportunities for edge vision systems. However, mainstream… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 12 pages, 14 figures

  23. arXiv:2404.19394  [pdf, other

    cs.CV

    CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation

    Authors: Weiquan Huang, Yifei Shen, Yifan Yang

    Abstract: State space models and Mamba-based models have been increasingly applied across various domains, achieving state-of-the-art performance. This technical report introduces the first attempt to train a transferable Mamba model utilizing contrastive language-image pretraining (CLIP). We have trained Mamba models of varying sizes and undertaken comprehensive evaluations of these models on 26 zero-shot… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  24. arXiv:2404.18933  [pdf, other

    cs.CV cs.LG

    Learning Low-Rank Feature for Thorax Disease Classification

    Authors: Rajeev Goel, Utkarsh Nath, Yancheng Wang, Alvin C. Silva, Teresa Wu, Yingzhen Yang

    Abstract: Deep neural networks, including Convolutional Neural Networks (CNNs) and Visual Transformers (ViT), have achieved stunning success in medical image domain. We study thorax disease classification in this paper. Effective extraction of features for the disease areas is crucial for disease classification on radiographic images. While various neural architectures and training techniques, such as self-… ▽ More

    Submitted 14 February, 2024; originally announced April 2024.

  25. arXiv:2404.18886  [pdf, other

    cs.LG cs.AI

    A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

    Authors: Yiyuan Yang, Ming Jin, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, Jiang Bian, Shirui Pan, Qingsong Wen

    Abstract: The study of time series data is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Ongoing work; 27 pages, 8 figures, 2 tables; Github Repo: https://github.com/yyysjz1997/Awesome-TimeSeries-SpatioTemporal-Diffusion-Model

  26. arXiv:2404.18439  [pdf, other

    cs.CV cs.RO

    $ν$-DBA: Neural Implicit Dense Bundle Adjustment Enables Image-Only Driving Scene Reconstruction

    Authors: Yunxuan Mao, Bingqi Shen, Yifei Yang, Kai Wang, Rong Xiong, Yiyi Liao, Yue Wang

    Abstract: The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of bundle adjustment (BA), essential for autonomous driving. This paper presents $ν$-DBA, a novel framework implementing geometric dense bundle adjustment (DBA) using 3D neural implicit surfaces for map parametrization, which optimizes both the map surface and trajectory poses using geometric error guided by den… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  27. arXiv:2404.18406  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna-Enhanced Wireless Powered Mobile Edge Computing Systems

    Authors: Pengcheng Chen, Yuxuan Yang, Bin Lyu, Zhen Yang, Abbas Jamalipour

    Abstract: In this paper, we propose a movable antenna (MA) enhanced scheme for wireless powered mobile edge computing (WP-MEC) system, where the hybrid access point (HAP) equipped with multiple MAs first emits wireless energy to charge wireless devices (WDs), and then receives the offloaded tasks from the WDs for edge computing. The MAs deployed at the HAP enhance the spatial degrees of freedom (DoFs) by fl… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 13 pages, 10 figures. Submitted for possible publication

  28. arXiv:2404.18304  [pdf, other

    cs.IR cs.AI

    Retrieval-Oriented Knowledge for Click-Through Rate Prediction

    Authors: Huanshuo Liu, Bo Chen, Menghui Zhu, Jianghao Lin, Jiarui Qin, Yang Yang, Hao Zhang, Ruiming Tang

    Abstract: Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  29. arXiv:2404.18075  [pdf, other

    cs.CY

    Comparing E-bike and Conventional Bicycle Use Patterns in a Public Bike Share System: A Case Study of Richmond, VA

    Authors: Yifan Yang, Elliott Sloate, Nashid Khadem, Celeste Chavis, Vanessa Frias Martinez

    Abstract: The results show that pedelecs are generally associated with longer trip distances, shorter trip times, higher speeds, and lower rates of uphill elevation change. The origin-destination analysis considering the business, mixed use, residential, and other uses shows extremely similar trends, with a large number of trips staying within either business or residential locations or mixed use. The roadw… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Journal ref: Journal of Cycling and Micromobility Research (2024)

  30. arXiv:2404.17766  [pdf, other

    cs.LG cs.AI cs.DC cs.NI

    Implementation of Big AI Models for Wireless Networks with Collaborative Edge Computing

    Authors: Liekang Zeng, Shengyuan Ye, Xu Chen, Yang Yang

    Abstract: Big Artificial Intelligence (AI) models have emerged as a crucial element in various intelligent applications at the edge, such as voice assistants in smart homes and autonomous robotics in smart factories. Training big AI models, e.g., for personalized fine-tuning and continual model refinement, poses significant challenges to edge devices due to the inherent conflict between limited computing re… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  31. arXiv:2404.17569  [pdf, other

    cs.CV

    MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

    Authors: Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

    Abstract: This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: SIGGRAPH 2024. Project page: https://zhanghe3z.github.io/MaPa/

  32. arXiv:2404.17275  [pdf, other

    cs.CV cs.LG

    Adversarial Reweighting with $α$-Power Maximization for Domain Adaptation

    Authors: Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu

    Abstract: The practical Domain Adaptation (DA) tasks, e.g., Partial DA (PDA), open-set DA, universal DA, and test-time adaptation, have gained increasing attention in the machine learning community. In this paper, we propose a novel approach, dubbed Adversarial Reweighting with $α$-Power Maximization (ARPM), for PDA where the source domain contains private classes absent in target domain. In ARPM, we propos… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: To appear in IJCV

  33. arXiv:2404.16895  [pdf, other

    cs.ET

    QuERLoc: Towards Next-Generation Localization with Quantum-Enhanced Ranging

    Authors: Entong He, Yuxiang Yang, Chenshu Wu

    Abstract: Remarkable advances have been achieved in localization techniques in past decades, rendering it one of the most important technologies indispensable to our daily lives. In this paper, we investigate a novel localization approach for future computing by presenting QuERLoc, the first study on localization using quantum-enhanced ranging. By fine-tuning the evolution of an entangled quantum probe, qua… ▽ More

    Submitted 4 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  34. arXiv:2404.16587  [pdf, other

    cs.CL cs.AI

    Understanding Privacy Risks of Embeddings Induced by Large Language Models

    Authors: Zhihao Zhu, Ninglu Shao, Defu Lian, Chenwang Wu, Zheng Liu, Yi Yang, Enhong Chen

    Abstract: Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation. However, such a solution risks compromising privacy, as recent studies experimentally showed that the original text can be partially rec… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  35. arXiv:2404.16581  [pdf, other

    cs.CV

    AudioScenic: Audio-Driven Video Scene Editing

    Authors: Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

    Abstract: Audio-driven visual scene editing endeavors to manipulate the visual background while leaving the foreground content unchanged, according to the given audio signals. Unlike current efforts focusing primarily on image editing, audio-driven video scene editing has not been extensively addressed. In this paper, we introduce AudioScenic, an audio-driven framework designed for video scene editing. Audi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  36. arXiv:2404.16579  [pdf, other

    cs.AI cs.RO

    Neural Interaction Energy for Multi-Agent Trajectory Prediction

    Authors: Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

    Abstract: Maintaining temporal stability is crucial in multi-agent trajectory prediction. Insufficient regularization to uphold this stability often results in fluctuations in kinematic states, leading to inconsistent predictions and the amplification of errors. In this study, we introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE). This framework assesses the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  37. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  38. arXiv:2404.16195  [pdf, other

    cs.CR cs.GT

    A Game-Theoretic Analysis of Auditing Differentially Private Algorithms with Epistemically Disparate Herd

    Authors: Ya-Ting Yang, Tao Zhang, Quanyan Zhu

    Abstract: Privacy-preserving AI algorithms are widely adopted in various domains, but the lack of transparency might pose accountability issues. While auditing algorithms can address this issue, machine-based audit approaches are often costly and time-consuming. Herd audit, on the other hand, offers an alternative solution by harnessing collective intelligence. Nevertheless, the presence of epistemic dispar… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  39. arXiv:2404.16006  [pdf, other

    cs.CV

    MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

    Authors: Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 77 pages, 41 figures

  40. arXiv:2404.15070  [pdf, other

    cs.SI cs.AI

    BotDGT: Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers

    Authors: Buyun He, Yingguang Yang, Qi Wu, Hao Liu, Renyu Yang, Hao Peng, Xiang Wang, Yong Liao, Pengyuan Zhou

    Abstract: Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the… ▽ More

    Submitted 24 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: IJCAI 2024

  41. arXiv:2404.14743  [pdf, other

    stat.ML cs.LG

    Gradient Guidance for Diffusion Models: An Optimization Perspective

    Authors: Yingqing Guo, Hui Yuan, Yukang Yang, Minshuo Chen, Mengdi Wang

    Abstract: Diffusion models have demonstrated empirical successes in various applications and can be adapted to task-specific needs via guidance. This paper introduces a form of gradient guidance for adapting or fine-tuning diffusion models towards user-specified optimization objectives. We study the theoretic aspects of a guided score-based sampling process, linking the gradient-guided diffusion model to fi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  42. arXiv:2404.14696  [pdf

    cs.CV

    Adaptive Prompt Learning with Negative Textual Semantics and Uncertainty Modeling for Universal Multi-Source Domain Adaptation

    Authors: Yuxiang Yang, Lu Wen, Yuanyuan Xu, Jiliu Zhou, Yan Wang

    Abstract: Universal Multi-source Domain Adaptation (UniMDA) transfers knowledge from multiple labeled source domains to an unlabeled target domain under domain shifts (different data distribution) and class shifts (unknown target classes). Existing solutions focus on excavating image features to detect unknown samples, ignoring abundant information contained in textual semantics. In this paper, we propose a… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by ICME2024

  43. arXiv:2404.14042  [pdf, other

    cs.CV

    CloudFort: Enhancing Robustness of 3D Point Cloud Classification Against Backdoor Attacks via Spatial Partitioning and Ensemble Prediction

    Authors: Wenhao Lan, Yijun Yang, Haihua Shen, Shan Li

    Abstract: The increasing adoption of 3D point cloud data in various applications, such as autonomous vehicles, robotics, and virtual reality, has brought about significant advancements in object recognition and scene understanding. However, this progress is accompanied by new security challenges, particularly in the form of backdoor attacks. These attacks involve inserting malicious information into the tra… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  44. arXiv:2404.13885  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals

    Authors: Qingyang Wu, Ying Xu, Tingsong Xiao, Yunze Xiao, Yitong Li, Tianyang Wang, Yichi Zhang, Shanghai Zhong, Yuwei Zhang, Wei Lu, Yifan Yang

    Abstract: Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  45. arXiv:2404.13788  [pdf, other

    cs.CV cs.AI

    AnyPattern: Towards In-context Image Copy Detection

    Authors: Wenhao Wang, Yifan Sun, Zhentao Tan, Yi Yang

    Abstract: This paper explores in-context learning for image copy detection (ICD), i.e., prompting an ICD model to identify replicated images with new tampering patterns without the need for additional training. The prompts (or the contexts) are from a small set of image-replica pairs that reflect the new patterns and are used at inference time. Such in-context ICD has good realistic value, because it requir… ▽ More

    Submitted 28 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: The project is publicly available at https://anypattern.github.io. arXiv admin note: text overlap with arXiv:2403.06098

  46. arXiv:2404.13702  [pdf, other

    astro-ph.CO astro-ph.GA cs.LG

    Learning Galaxy Intrinsic Alignment Correlations

    Authors: Sneh Pandya, Yuanyuan Yang, Nicholas Van Alfen, Jonathan Blazek, Robin Walters

    Abstract: The intrinsic alignments (IA) of galaxies, regarded as a contaminant in weak lensing analyses, represents the correlation of galaxy shapes due to gravitational tidal interactions and galaxy formation processes. As such, understanding IA is paramount for accurate cosmological inferences from weak lensing surveys; however, one limitation to our understanding and mitigation of IA is expensive simulat… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 15 pages, 6 figures, 1 table. Accepted at the Data-centric Machine Learning Research (DMLR) Workshop at ICLR 2024

  47. arXiv:2404.13669  [pdf, other

    math.OC cs.DC cs.LG cs.MA

    Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization

    Authors: Yaqun Yang, Jinlong Lei

    Abstract: We consider an $n$ agents distributed optimization problem with imperfect information characterized in a parametric sense, where the unknown parameter can be solved by a distinct distributed parameter learning problem. Though each agent only has access to its local parameter learning and computational problem, they mean to collaboratively minimize the average of their local cost functions. To addr… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 27 pages, 6 figures

  48. arXiv:2404.13544  [pdf, other

    cs.CR

    Faster Post-Quantum TLS 1.3 Based on ML-KEM: Implementation and Assessment

    Authors: Jieyu Zheng, Haoliang Zhu, Yifan Dong, Zhenyu Song, Zhenhao Zhang, Yafang Yang, Yunlei Zhao

    Abstract: TLS is extensively utilized for secure data transmission over networks. However, with the advent of quantum computers, the security of TLS based on traditional public-key cryptography is under threat. To counter quantum threats, it is imperative to integrate post-quantum algorithms into TLS. Most PQ-TLS research focuses on integration and evaluation, but few studies address the improvement of PQ-T… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: update the title

  49. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  50. arXiv:2404.13478  [pdf, other

    cs.RO cs.CV cs.LG

    Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

    Authors: Ben Eisner, Yi Yang, Todor Davchev, Mel Vecerik, Jonathan Scholz, David Held

    Abstract: Many robot manipulation tasks can be framed as geometric reasoning tasks, where an agent must be able to precisely manipulate an object into a position that satisfies the task from a set of initial conditions. Often, task success is defined based on the relationship between two objects - for instance, hanging a mug on a rack. In such cases, the solution should be equivariant to the initial positio… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Published at International Conference on Representation Learning (ICLR 2024)