Skip to main content

Showing 1–50 of 7,599 results for author: Zhang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05945  [pdf, other

    cs.CV

    Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

    Authors: Peng Gao, Le Zhuo, Ziyi Lin, Chris Liu, Junsong Chen, Ruoyi Du, Enze Xie, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

    Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  2. arXiv:2405.05830  [pdf, ps, other

    cs.CV

    Mask-TS Net: Mask Temperature Scaling Uncertainty Calibration for Polyp Segmentation

    Authors: Yudian Zhang, Chenhao Xu, Kaiye Xu, Haijiang Zhu

    Abstract: Lots of popular calibration methods in medical images focus on classification, but there are few comparable studies on semantic segmentation. In polyp segmentation of medical images, we find most diseased area occupies only a small portion of the entire image, resulting in previous models being not well-calibrated for lesion regions but well-calibrated for background, despite their seemingly bette… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.05784  [pdf, other

    cs.CR cs.LG

    Link Stealing Attacks Against Inductive Graph Neural Networks

    Authors: Yixin Wu, Xinlei He, Pascal Berrang, Mathias Humbert, Michael Backes, Neil Zhenqiang Gong, Yang Zhang

    Abstract: A graph neural network (GNN) is a type of neural network that is specifically designed to process graph-structured data. Typically, GNNs can be implemented in two settings, including the transductive setting and the inductive setting. In the transductive setting, the trained model can only predict the labels of nodes that were observed at the training time. In the inductive setting, the trained mo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: To appear in the 24th Privacy Enhancing Technologies Symposium (PETS 2024), July 15-20, 2024

  4. arXiv:2405.05613  [pdf, other

    cs.CV

    Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification

    Authors: Xiangbo Yin, Jiangming Shi, Yachao Zhang, Yang Lu, Zhizhong Zhang, Yuan Xie, Yanyun Qu

    Abstract: Unsupervised Visible-Infrared Person Re-identification (USVI-ReID) presents a formidable challenge, which aims to match pedestrian images across visible and infrared modalities without any annotations. Recently, clustered pseudo-label methods have become predominant in USVI-ReID, although the inherent noise in pseudo-labels presents a significant obstacle. Most existing works primarily focus on sh… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2405.05589  [pdf, other

    cs.RO

    Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

    Authors: Yifan Duan, Xinran Zhang, Guoliang You, Yilong Wu, Xingchen Li, Yao Li, Xiaomeng Chu, Jie Peng, Yu Zhang, Jianmin Ji, Yanyong Zhang

    Abstract: Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 19 pages, 19 figures

  6. arXiv:2405.05508  [pdf, other

    cs.IR cs.AI

    Redefining Information Retrieval of Structured Database via Large Language Models

    Authors: Mingzhu Wang, Yuzhe Zhang, Qihang Zhao, Juanyi Yang, Hong Zhang

    Abstract: Retrieval augmentation is critical when Language Models (LMs) exploit non-parametric knowledge related to the query through external knowledge bases before reasoning. The retrieved information is incorporated into LMs as context alongside the query, enhancing the reliability of responses towards factual questions. Prior researches in retrieval augmentation typically follow a retriever-generator pa… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  7. arXiv:2405.05409  [pdf, other

    cs.LG

    Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

    Authors: Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu

    Abstract: Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks using anchor functions. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solu… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  8. arXiv:2405.05244  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

    Authors: You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan

    Abstract: The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specializ… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Evaluation plan of the SVDD Challenge @ SLT 2024

  9. TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios

    Authors: Yichao Zhang, Marco Bertuletti, Samuel Riedel, Matheus Cavalcante, Alessandro Vanelli-Coralli, Luca Benini

    Abstract: Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but stag… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures and 3 tables

  10. arXiv:2405.04942  [pdf, other

    cs.IR cs.SI

    Dual-domain Collaborative Denoising for Social Recommendation

    Authors: Wenjie Chen, Yi Zhang, Honghao Li, Lei Sang, Yiwen Zhang

    Abstract: Social recommendation leverages social network to complement user-item interaction data for recommendation task, aiming to mitigate the data sparsity issue in recommender systems. However, existing social recommendation methods encounter the following challenge: both social network and interaction data contain substaintial noise, and the propagation of such noise through Graph Neural Networks (GNN… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 14 pages, 9 figures

  11. arXiv:2405.04800  [pdf, other

    cs.CV cs.LG

    DeepDamageNet: A two-step deep-learning model for multi-disaster building damage segmentation and classification using satellite imagery

    Authors: Irene Alisjahbana, Jiawei Li, Ben, Strong, Yue Zhang

    Abstract: Satellite imagery has played an increasingly important role in post-disaster building damage assessment. Unfortunately, current methods still rely on manual visual interpretation, which is often time-consuming and can cause very low accuracy. To address the limitations of manual interpretation, there has been a significant increase in efforts to automate the process. We present a solution that per… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  12. arXiv:2405.04753  [pdf, other

    cs.CR cs.AI

    AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models

    Authors: Yongheng Zhang, Tingwen Du, Yunshan Ma, Xiang Wang, Yi Xie, Guozheng Yang, Yuliang Lu, Ee-Chien Chang

    Abstract: Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of ex… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 20 pages, 5 figures

  13. arXiv:2405.04675  [pdf, other

    cs.CV cs.GR

    TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model

    Authors: Yongming Zhang, Tianyu Zhang, Haoran Xie

    Abstract: Deep learning-based sketch-to-clothing image generation provides the initial designs and inspiration in the fashion design processes. However, clothing generation from freehand drawing is challenging due to the sparse and ambiguous information from the drawn sketches. The current generation models may have difficulty generating detailed texture information. In this work, we propose TexControl, a s… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 5 pages, 8 figures, accepted in NICOGRAPH International 2024

  14. arXiv:2405.04295  [pdf, other

    eess.IV cs.CV

    Semi-Supervised Disease Classification based on Limited Medical Image Data

    Authors: Yan Zhang, Chun Li, Zhaoxia Liu, Ming Li

    Abstract: In recent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medi… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  15. arXiv:2405.04219  [pdf, other

    cs.CL cs.AI cs.MA cs.SE

    Iterative Experience Refinement of Software-Developing Agents

    Authors: Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun

    Abstract: Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks it… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Work in progress

  16. arXiv:2405.04144  [pdf, other

    cs.IT

    Lossy Compression with Data, Perception, and Classification Constraints

    Authors: Yuhan Wang, Youlong Wu, Shuai Ma, Ying-Jun Angela Zhang

    Abstract: Balancing diverse task objectives under limited rate is crucial for developing robust multi-task deep learning (DL) models and improving performance across various domains. In this paper, we consider the lossy compression problem with human-centric and task-oriented metrics, such as perceptual quality and classification accuracy. We investigate two ternary relationships, namely, the rate-distortio… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 10 pages, in part submitted to ITW 2024

  17. arXiv:2405.04068  [pdf, other

    cs.CR

    An Improved Reversible Data Hiding Algorithm Based on Reconstructed Mapping for PVO-k

    Authors: Yusen Zhang, Haoyun Xu, Jingwen Li

    Abstract: Reversible Data Hiding (RDH) is a practical and efficient technique for information encryption. Among its methods, the Pixel-Value Ordering (PVO) algorithm and its variants primarily modify prediction errors to embed information. However, both the classic PVO and its improved versions, such as IPVO and PVO-k, share a common limitation: their maximum data embedding capacity for a given grayscale im… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  18. arXiv:2405.03624  [pdf, ps, other

    cs.LG math.OC q-fin.ST stat.ML

    $ε$-Policy Gradient for Online Pricing

    Authors: Lukasz Szpruch, Tanut Treetanthiploet, Yufei Zhang

    Abstract: Combining model-based and model-free reinforcement learning approaches, this paper proposes and analyzes an $ε$-policy gradient algorithm for the online pricing learning task. The algorithm extends $ε$-greedy algorithm by replacing greedy exploitation with gradient descent step and facilitates learning via model inference. We optimize the regret of the proposed algorithm by quantifying the explora… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    MSC Class: 62J12; 68Q32; 65Y20

  19. arXiv:2405.03500  [pdf, other

    cs.MM cs.AI cs.CV cs.IT

    A Rate-Distortion-Classification Approach for Lossy Image Compression

    Authors: Yuefeng Zhang

    Abstract: In lossy image compression, the objective is to achieve minimal signal distortion while compressing images to a specified bit rate. The increasing demand for visual analysis applications, particularly in classification tasks, has emphasized the significance of considering semantic distortion in compressed images. To bridge the gap between image compression and visual analysis, we propose a Rate-Di… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 15 pages

    Journal ref: Digital Signal Processing Volume 141, September 2023, 104163

  20. arXiv:2405.03486  [pdf, other

    cs.CR cs.CV cs.SI

    UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

    Authors: Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang

    Abstract: Image safety classifiers play an important role in identifying and mitigating the spread of unsafe images online (e.g., images including violence, hateful rhetoric, etc.). At the same time, with the advent of text-to-image models and increasing concerns about the safety of AI models, developers are increasingly relying on image safety classifiers to safeguard their models. Yet, the performance of… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  21. arXiv:2405.03387  [pdf, ps, other

    cs.CL

    The high dimensional psychological profile and cultural bias of ChatGPT

    Authors: Hang Yuan, Zhongyue Che, Shao Li, Yue Zhang, Xiaomeng Hu, Siyang Luo

    Abstract: Given the rapid advancement of large-scale language models, artificial intelligence (AI) models, like ChatGPT, are playing an increasingly prominent role in human society. However, to ensure that artificial intelligence models benefit human society, we must first fully understand the similarities and differences between the human-like characteristics exhibited by artificial intelligence models and… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  22. arXiv:2405.03318  [pdf, other

    cs.CV cs.MM

    Enhancing DETRs Variants through Improved Content Query and Similar Query Aggregation

    Authors: Yingying Zhang, Chuangji Shi, Xin Guo, Jiangwei Lao, Jian Wang, Jiaotuan Wang, Jingdong Chen

    Abstract: The design of the query is crucial for the performance of DETR and its variants. Each query consists of two components: a content part and a positional one. Traditionally, the content query is initialized with a zero or learnable embedding, lacking essential content information and resulting in sub-optimal performance. In this paper, we introduce a novel plug-and-play module, Self-Adaptive Content… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures

  23. arXiv:2405.03299  [pdf, other

    cs.CR cs.DC

    DarkFed: A Data-Free Backdoor Attack in Federated Learning

    Authors: Minghui Li, Wei Wan, Yuxuan Ning, Shengshan Hu, Lulu Xue, Leo Yu Zhang, Yichen Wang

    Abstract: Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by IJCAI 2024

  24. arXiv:2405.03272  [pdf, other

    cs.CV

    WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

    Authors: Yuanhan Zhang, Kaichen Zhang, Bo Li, Fanyi Pu, Christopher Arif Setiadharma, Jingkang Yang, Ziwei Liu

    Abstract: Multimodal information, together with our knowledge, help us to understand the complex and dynamic world. Large language models (LLM) and large multimodal models (LMM), however, still struggle to emulate this capability. In this paper, we present WorldQA, a video understanding dataset designed to push the boundaries of multimodal world models with three appealing properties: (1) Multimodal Inputs:… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  25. arXiv:2405.03167  [pdf, other

    cs.IR

    TF4CTR: Twin Focus Framework for CTR Prediction via Adaptive Sample Differentiation

    Authors: Honghao Li, Yiwen Zhang, Yi Zhang, Lei Sang, Yun Yang

    Abstract: Effective feature interaction modeling is critical for enhancing the accuracy of click-through rate (CTR) prediction in industrial recommender systems. Most of the current deep CTR models resort to building complex network architectures to better capture intricate feature interactions or user behaviors. However, we identify two limitations in these models: (1) the samples given to the model are un… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  26. arXiv:2405.03066  [pdf

    cs.ET

    A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)

    Authors: Lingyao Li, Jiayan Zhou, Zhenxiang Gao, Wenyue Hua, Lizhou Fan, Huizi Yu, Loni Hagen, Yonfeng Zhang, Themistocles L. Assimes, Libby Hemphill, Siyuan Ma

    Abstract: Electronic Health Records (EHRs) play an important role in the healthcare system. However, their complexity and vast volume pose significant challenges to data interpretation and analysis. Recent advancements in Artificial Intelligence (AI), particularly the development of Large Language Models (LLMs), open up new opportunities for researchers in this domain. Although prior studies have demonstrat… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  27. arXiv:2405.03026  [pdf, other

    cs.RO

    Enhanced Detection Classification via Clustering SVM for Various Robot Collaboration Task

    Authors: Rui Liu, Xuanzhen Xu, Yuwei Shen, Armando Zhu, Chang Yu, Tianjian Chen, Ye Zhang

    Abstract: We introduce an advanced, swift pattern recognition strategy for various multiple robotics during curve negotiation. This method, leveraging a sophisticated k-means clustering-enhanced Support Vector Machine algorithm, distinctly categorizes robotics into flying or mobile robots. Initially, the paradigm considers robot locations and features as quintessential parameters indicative of divergent rob… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: This paper has been received by CISCE 2024 Conference

  28. arXiv:2405.02935  [pdf, other

    cs.CL

    Enabling Patient-side Disease Prediction via the Integration of Patient Narratives

    Authors: Zhixiang Su, Yinan Zhang, Jiazheng Jing, Jie Xiao, Zhiqi Shen

    Abstract: Disease prediction holds considerable significance in modern healthcare, because of its crucial role in facilitating early intervention and implementing effective prevention measures. However, most recent disease prediction approaches heavily rely on laboratory test outcomes (e.g., blood tests and medical imaging from X-rays). Gaining access to such data for precise disease prediction is often a c… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  29. arXiv:2405.02846  [pdf

    cs.AI

    Responsible AI: Portraits with Intelligent Bibliometrics

    Authors: Yi Zhang, Mengjia Wu, Guangquan Zhang, Jie Lu

    Abstract: Shifting the focus from principles to practical implementation, responsible artificial intelligence (AI) has garnered considerable attention across academia, industry, and society at large. Despite being in its nascent stages, this emerging field grapples with nebulous concepts and intricate knowledge frameworks. By analyzing three prevailing concepts - explainable AI, trustworthy AI, and ethical… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 14 pages, 9 figures

  30. arXiv:2405.02774  [pdf, other

    cs.LG cs.AI cs.CL

    Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

    Authors: Feiyang Kang, Hoang Anh Just, Yifan Sun, Himanshu Jahagirdar, Yuanzhi Zhang, Rongxing Du, Anit Kumar Sahu, Ruoxi Jia

    Abstract: This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels. While many data selection algorithms have been designed for small-scale applications, rendering them unsuitable for our context, some emerg… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  31. arXiv:2405.02598  [pdf, other

    cs.LG

    UDUC: An Uncertainty-driven Approach for Learning-based Robust Control

    Authors: Yuan Zhang, Jasper Hoffmann, Joschka Boedecker

    Abstract: Learning-based techniques have become popular in both model predictive control (MPC) and reinforcement learning (RL). Probabilistic ensemble (PE) models offer a promising approach for modelling system dynamics, showcasing the ability to capture uncertainty and scalability in high-dimensional control scenarios. However, PE models are susceptible to mode collapse, resulting in non-robust control whe… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  32. arXiv:2405.02595  [pdf, other

    cs.CV

    Vision-based 3D occupancy prediction in autonomous driving: a review and outlook

    Authors: Yanan Zhang, Jinqing Zhang, Zengran Wang, Junhao Xu, Di Huang

    Abstract: In recent years, autonomous driving has garnered escalating attention for its potential to relieve drivers' burdens and improve driving safety. Vision-based 3D occupancy prediction, which predicts the spatial occupancy status and semantics of 3D voxel grids around the autonomous vehicle from image inputs, is an emerging perception task suitable for cost-effective perception system of autonomous dr… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 20 pages, 20 figures

  33. arXiv:2405.02384  [pdf, other

    cs.NE cs.AI cs.LG

    CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

    Authors: Kaiyuan Chen, Xingzhuo Guo, Yu Zhang, Jianmin Wang, Mingsheng Long

    Abstract: Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  34. arXiv:2405.02364  [pdf, other

    cs.LG cs.DC

    A Survey on Contribution Evaluation in Vertical Federated Learning

    Authors: Yue Cui, Chung-ju Huang, Yuzhu Zhang, Leye Wang, Lixin Fan, Xiaofang Zhou, Qiang Yang

    Abstract: Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns associated with centralized data storage and processing. VFL facilitates collaboration among multiple entities with distinct feature sets on the same user population, enabling the joint training of predictive models without direct data sharing. A key aspect of VFL is the fair and ac… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  35. arXiv:2405.02080  [pdf, ps, other

    cs.IT

    Coding for Synthesis Defects

    Authors: Ziyang Lu, Han Mao Kiah, Yiwei Zhang, Robert N. Grass, Eitan Yaakobi

    Abstract: Motivated by DNA based data storage system, we investigate the errors that occur when synthesizing DNA strands in parallel, where each strand is appended one nucleotide at a time by the machine according to a template supersequence. If there is a cycle such that the machine fails, then the strands meant to be appended at this cycle will not be appended, and we refer to this as a synthesis defect.… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  36. arXiv:2405.01668  [pdf, other

    cs.CR cs.SE

    WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs

    Authors: Hongbo Chen, Yifan Zhang, Xing Han, Huanyao Rong, Yuheng Zhang, Tianhao Mao, Hang Zhang, XiaoFeng Wang, Luyi Xing, Xun Chen

    Abstract: Originating from semantic bugs, Entity-Inconsistency Bugs (EIBs) involve misuse of syntactically valid yet incorrect program entities, such as variable identifiers and function names, which often have security implications. Unlike straightforward syntactic vulnerabilities, EIBs are subtle and can remain undetected for years. Traditional detection methods, such as static analysis and dynamic testin… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  37. arXiv:2405.01593  [pdf, other

    cs.CL cs.AI cs.IR

    Large Language Model Agent for Fake News Detection

    Authors: Xinyi Li, Yongfeng Zhang, Edward C. Malthouse

    Abstract: In the current digital era, the rapid spread of misinformation on online platforms presents significant challenges to societal well-being, public trust, and democratic processes, influencing critical decision making and public opinion. To address these challenges, there is a growing need for automated fake news detection mechanisms. Pre-trained large language models (LLMs) have demonstrated except… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  38. arXiv:2405.01567  [pdf, other

    cs.SE cs.AI

    CodeFort: Robust Training for Code Generation Models

    Authors: Yuhao Zhang, Shiqi Wang, Haifeng Qian, Zijian Wang, Mingyue Shang, Linbo Liu, Sanjay Krishna Gouda, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras

    Abstract: Code generation models are not robust to small perturbations, which often lead to inconsistent and incorrect generations and significantly degrade the performance of these models. Improving the robustness of code generation models is crucial to better user experience when these models are deployed in real-world applications. However, existing efforts have not addressed this issue for code generati… ▽ More

    Submitted 11 April, 2024; originally announced May 2024.

  39. arXiv:2405.01451  [pdf, other

    cs.LG

    Test-time Assessment of a Model's Performance on Unseen Domains via Optimal Transport

    Authors: Akshay Mehra, Yunbei Zhang, Jihun Hamm

    Abstract: Gauging the performance of ML models on data from unseen domains at test-time is essential yet a challenging problem due to the lack of labels in this setting. Moreover, the performance of these models on in-distribution data is a poor indicator of their performance on data from unseen domains. Thus, it is essential to develop metrics that can provide insights into the model's performance at test… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  40. arXiv:2405.01350  [pdf, other

    cs.LG cs.SI

    Community-Invariant Graph Contrastive Learning

    Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura

    Abstract: Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current know… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: This paper is accepted by ICML-2024

  41. arXiv:2405.01326  [pdf, other

    cs.CV

    Multi-modal Learnable Queries for Image Aesthetics Assessment

    Authors: Zhiwei Xiong, Yunfan Zhang, Zhiqi Shen, Peiran Ren, Han Yu

    Abstract: Image aesthetics assessment (IAA) is attracting wide interest with the prevalence of social media. The problem is challenging due to its subjective and ambiguous nature. Instead of directly extracting aesthetic features solely from the image, user comments associated with an image could potentially provide complementary knowledge that is useful for IAA. With existing large-scale pre-trained models… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME2024

  42. arXiv:2405.01229  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CR math.OC

    Boosting Jailbreak Attack with Momentum

    Authors: Yihao Zhang, Zeming Wei

    Abstract: Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-documented \textit{jailbreak} attack. Recently, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search.… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models

  43. arXiv:2405.01063  [pdf, other

    cs.IR cs.CY cs.LG

    Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach

    Authors: Tianhao Shi, Yang Zhang, Jizhi Zhang, Fuli Feng, Xiangnan He

    Abstract: As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequat… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures

  44. arXiv:2405.00930  [pdf, other

    cs.SD eess.AS

    MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion

    Authors: Pengcheng Li, Jianzong Wang, Xulong Zhang, Yong Zhang, Jing Xiao, Ning Cheng

    Abstract: One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and suffer from sizable networks as some of them leverage numerous complex modules for disentanglement. In this paper, we propose a model named MAIN-VC to effectively… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  45. arXiv:2405.00696  [pdf, other

    cs.RO

    Life-long Learning and Testing for Automated Vehicles via Adaptive Scenario Sampling as A Continuous Optimization Process

    Authors: Jingwei Ge, Pengbo Wang, Cheng Chang, Yi Zhang, Danya Yao, Li Li

    Abstract: Sampling critical testing scenarios is an essential step in intelligence testing for Automated Vehicles (AVs). However, due to the lack of prior knowledge on the distribution of critical scenarios in sampling space, we can hardly efficiently find the critical scenarios or accurately evaluate the intelligence of AVs. To solve this problem, we formulate the testing as a continuous optimization proce… ▽ More

    Submitted 28 March, 2024; originally announced May 2024.

  46. arXiv:2405.00676  [pdf, other

    cs.CV

    Spectrally Pruned Gaussian Fields with Neural Compensation

    Authors: Runyi Yang, Zhenxin Zhu, Zhou Jiang, Baijun Ye, Xiaoxue Chen, Yifei Zhang, Yuantao Chen, Jian Zhao, Hao Zhao

    Abstract: Recently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between pri… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/RunyiYang/SUNDAE Project page: https://runyiyang.github.io/projects/SUNDAE/

  47. arXiv:2405.00557  [pdf, other

    cs.CL cs.AI

    Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

    Authors: Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

    Abstract: As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge, posing potential risks during deployment. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), or on the self-alignment capacities of LLMs… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  48. arXiv:2405.00456  [pdf, other

    cs.LG cs.AI

    Counterfactual Explanations for Deep Learning-Based Traffic Forecasting

    Authors: Rushan Wang, Yanan Xin, Yatao Zhang, Fernando Perez-Cruz, Martin Raubal

    Abstract: Deep learning models are widely used in traffic forecasting and have achieved state-of-the-art prediction accuracy. However, the black-box nature of those models makes the results difficult to interpret by users. This study aims to leverage an Explainable AI approach, counterfactual explanations, to enhance the explainability and usability of deep learning-based traffic forecasting models. Specifi… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 24 pages

  49. arXiv:2405.00430  [pdf

    physics.med-ph cs.CV

    Continuous sPatial-Temporal Deformable Image Registration (CPT-DIR) for motion modelling in radiotherapy: beyond classic voxel-based methods

    Authors: Xia Li, Muheng Li, Antony Lomax, Joachim Buhmann, Ye Zhang

    Abstract: Background and purpose: Deformable image registration (DIR) is a crucial tool in radiotherapy for extracting and modelling organ motion. However, when significant changes and sliding boundaries are present, it faces compromised accuracy and uncertainty, determining the subsequential contour propagation and dose accumulation procedures. Materials and methods: We propose an implicit neural represent… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  50. arXiv:2405.00340  [pdf, other

    cs.CV

    NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation

    Authors: Ziyi Chen, Xiaolong Wu, Yu Zhang

    Abstract: State-of-the-art neural implicit surface representations have achieved impressive results in indoor scene reconstruction by incorporating monocular geometric priors as additional supervision. However, we have observed that multi-view inconsistency between such priors poses a challenge for high-quality reconstructions. In response, we present NC-SDF, a neural signed distance field (SDF) 3D reconstr… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.