Skip to main content

Showing 1–50 of 2,550 results for author: Xu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  2. arXiv:2405.03901  [pdf, other

    cs.HC cs.AI

    OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs

    Authors: Jiahao Nick Li, Yan Xu, Tovi Grossman, Stephanie Santosa, Michelle Li

    Abstract: The progression to "Pervasive Augmented Reality" envisions easy access to multimodal information continuously. However, in many everyday scenarios, users are occupied physically, cognitively or socially. This may increase the friction to act upon the multimodal information that users encounter in the world. To reduce such friction, future interactive interfaces should intelligently provide quick a… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Paper accepted to the 2024 CHI Conference on Human Factors in Computing Systems (CHI 2024)

  3. arXiv:2405.03562  [pdf, other

    cs.IR

    ID-centric Pre-training for Recommendation

    Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Fuzhen Zhuang, Xu Zhang, Leyu Lin, Zhanhui Kang, Yongjun Xu

    Abstract: Classical sequential recommendation models generally adopt ID embeddings to store knowledge learned from user historical behaviors and represent items. However, these unique IDs are challenging to be transferred to new domains. With the thriving of pre-trained language model (PLM), some pioneer works adopt PLM for pre-trained recommendation, where modality information (e.g., text) is considered un… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2405.02973  [pdf, other

    cs.CR

    FairRelay: Fair and Cost-Efficient Peer-to-Peer Content Delivery through Payment Channel Networks

    Authors: Jingyu Liu, Yingjie Xue, Zifan Peng, Chao Lin, Xinyi Huang

    Abstract: Peer-to-Peer (P2P) content delivery, known for scalability and resilience, offers a decentralized alternative to traditional centralized Content Delivery Networks (CDNs). A significant challenge in P2P content delivery remains: the fair compensation of relayers for their bandwidth contributions. Existing solutions employ blockchains for payment settlements, however, they are not practical due to h… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 27 pages, 21 figures

  5. arXiv:2405.02580  [pdf, other

    cs.SE cs.AI

    PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation

    Authors: Ye Liu, Yue Xue, Daoyuan Wu, Yuqiang Sun, Yi Li, Miaolei Shi, Yang Liu

    Abstract: With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs, such as GPT-4, to transfer existing human-written properties (e.g., those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  6. arXiv:2405.01725  [pdf, other

    eess.IV cs.CV cs.LG

    Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey

    Authors: Guoping Xu, Xiaxia Wang, Xinglong Wu, Xuesong Leng, Yongchao Xu

    Abstract: Deep learning has made significant progress in computer vision, specifically in image classification, object detection, and semantic segmentation. The skip connection has played an essential role in the architecture of deep neural networks,enabling easier optimization through residual learning during the training stage and improving accuracy during testing. Many neural networks have inherited the… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  7. arXiv:2405.01044  [pdf, other

    cs.RO

    Differentiable Particles for General-Purpose Deformable Object Manipulation

    Authors: Siwei Chen, Yiqing Xu, Cunjun Yu, Linfeng Li, David Hsu

    Abstract: Deformable object manipulation is a long-standing challenge in robotics. While existing approaches often focus narrowly on a specific type of object, we seek a general-purpose algorithm, capable of manipulating many different types of objects: beans, rope, cloth, liquid, . . . . One key difficulty is a suitable representation, rich enough to capture object shape, dynamics for manipulation and yet… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  8. arXiv:2405.00417  [pdf, other

    cs.LG stat.ME stat.ML

    Conformal Risk Control for Ordinal Classification

    Authors: Yunpeng Xu, Wenge Guo, Zhi Wei

    Abstract: As a natural extension to the standard conformal prediction method, several conformal risk control methods have been recently developed and applied to various learning problems. In this work, we seek to control the conformal risk in expectation for ordinal classification tasks, which have broad applications to many real problems. For this purpose, we firstly formulated the ordinal classification t… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 17 pages, 8 figures, 2 table; 1 supplementary page

    Journal ref: In UAI 2023: The 39th Conference on Uncertainty in Artificial Intelligence

  9. arXiv:2405.00239  [pdf, other

    eess.IV cs.CV cs.LG

    IgCONDA-PET: Implicitly-Guided Counterfactual Diffusion for Detecting Anomalies in PET Images

    Authors: Shadab Ahamed, Yixi Xu, Arman Rahmim

    Abstract: Minimizing the need for pixel-level annotated data for training PET anomaly segmentation networks is crucial, particularly due to time and cost constraints related to expert annotations. Current un-/weakly-supervised anomaly detection methods rely on autoencoder or generative adversarial networks trained only on healthy data, although these are more challenging to train. In this work, we present a… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures, 1 table

  10. arXiv:2404.19563  [pdf, other

    cs.CL

    RepEval: Effective Text Evaluation with LLM Representation

    Authors: Shuqian Sheng, Yi Xu, Tianhang Zhang, Zanwei Shen, Luoyi Fu, Jiaxin Ding, Lei Zhou, Xinbing Wang, Chenghu Zhou

    Abstract: Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  11. arXiv:2404.18953  [pdf, other

    math.OC cs.NE

    A Knowledge-driven Memetic Algorithm for the Energy-efficient Distributed Homogeneous Flow Shop Scheduling Problem

    Authors: Yunbao Xu, Xuemei Jiang, Jun Li, Lining Xing, Yanjie Song

    Abstract: The reduction of carbon emissions in the manufacturing industry holds significant importance in achieving the national "double carbon" target. Ensuring energy efficiency is a crucial factor to be incorporated into future generation manufacturing systems. In this study, energy consumption is considered in the distributed homogeneous flow shop scheduling problem (DHFSSP). A knowledge-driven memetic… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 14 pages

  12. Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We?

    Authors: Kaixuan Li, Yue Xue, Sen Chen, Han Liu, Kairan Sun, Ming Hu, Haijun Wang, Yang Liu, Yixiang Chen

    Abstract: In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies o… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: to appear at FSE 2024

  13. arXiv:2404.17769  [pdf, other

    cs.IR stat.ME stat.ML

    Conformal Ranked Retrieval

    Authors: Yunpeng Xu, Wenge Guo, Zhi Wei

    Abstract: Given the wide adoption of ranked retrieval techniques in various information systems that significantly impact our daily lives, there is an increasing need to assess and address the uncertainty inherent in their predictions. This paper introduces a novel method using the conformal risk control framework to quantitatively measure and manage risks in the context of ranked retrieval problems. Our re… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 14 pages, 6 figures, 1 table; 7 supplementary pages, 12 supplementary figures, 2 supplementary tables

  14. arXiv:2404.17357  [pdf, other

    eess.IV cs.CV

    Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution using Conditional Diffusion Model

    Authors: Yushen Xu, Xiaosong Li, Yuchan Jie, Haishu Tan

    Abstract: In clinical practice, tri-modal medical image fusion, compared to the existing dual-modal technique, can provide a more comprehensive view of the lesions, aiding physicians in evaluating the disease's shape, location, and biological activity. However, due to the limitations of imaging equipment and considerations for patient safety, the quality of medical images is usually limited, leading to sub-… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  15. arXiv:2404.17340  [pdf, other

    cs.CV

    Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

    Authors: Chengliang Liu, Jie Wen, Yabo Liu, Chao Huang, Zhihao Wu, Xiaoling Luo, Yong Xu

    Abstract: Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networ… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted at NeurIPS 2023. Email: liucl1996@163.com

  16. arXiv:2404.16824  [pdf, other

    cs.CV

    V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

    Authors: Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

    Abstract: AI-generated video has revolutionized short video production, filmmaking, and personalized media, making video local editing an essential tool. However, this progress also blurs the line between reality and fiction, posing challenges in multimedia forensics. To solve this urgent issue, V2A-Mark is proposed to address the limitations of current video tampering forensics, such as poor generalizabili… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  17. arXiv:2404.16812  [pdf, other

    cs.DC

    ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

    Authors: Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

    Abstract: Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some i… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC'24)

  18. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  19. arXiv:2404.16635  [pdf, other

    cs.CV

    TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

    Authors: Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang

    Abstract: Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in terms of parameters and computational requirements limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficien… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages, 11 figures

  20. arXiv:2404.16349  [pdf, ps, other

    cs.DS cs.CC

    More Asymmetry Yields Faster Matrix Multiplication

    Authors: Josh Alman, Ran Duan, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, Renfei Zhou

    Abstract: We present a new improvement on the laser method for designing fast matrix multiplication algorithms. The new method further develops the recent advances by [Duan, Wu, Zhou FOCS 2023] and [Vassilevska Williams, Xu, Xu, Zhou SODA 2024]. Surprisingly the new improvement is achieved by incorporating more asymmetry in the analysis, circumventing a fundamental tool of prior work that requires two of th… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 44 pages. arXiv admin note: text overlap with arXiv:2307.07970

  21. arXiv:2404.16131  [pdf, other

    cs.DS cs.LG cs.SI

    Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

    Authors: Vicente Balmaseda, Ying Xu, Yixin Cao, Nate Veldt

    Abstract: Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be der… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  22. arXiv:2404.15819  [pdf, other

    cs.AR

    APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

    Authors: Lin Ding, Song Bian, Penggao He, Yan Xu, Gang Qu, Jiliang Zhang

    Abstract: Fully Homomorphic Encryption (FHE) allows one to outsource computation over encrypted data to untrusted servers without worrying about data breaching. Since FHE is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  23. arXiv:2404.15714  [pdf, other

    cs.CV cs.AI

    Ada-DF: An Adaptive Label Distribution Fusion Network For Facial Expression Recognition

    Authors: Shu Liu, Yan Xu, Tongming Wan, Xiaoyan Kui

    Abstract: Facial expression recognition (FER) plays a significant role in our daily life. However, annotation ambiguity in the datasets could greatly hinder the performance. In this paper, we address FER task via label distribution learning paradigm, and develop a dual-branch Adaptive Distribution Fusion (Ada-DF) framework. One auxiliary branch is constructed to obtain the label distributions of samples. Th… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  24. arXiv:2404.15260  [pdf, other

    quant-ph cs.AR

    Distributed Architecture for FPGA-based Superconducting Qubit Control

    Authors: Neelay Fruitwala, Gang Huang, Yilun Xu, Abhi Rajagopala, Akel Hashim, Ravi K. Naik, Kasra Nowrouzi, David I. Santiago, Irfan Siddiqi

    Abstract: Quantum circuits utilizing real time feedback techniques (such as active reset and mid-circuit measurement) are a powerful tool for NISQ-era quantum computing. Such techniques are crucial for implementing error correction protocols, and can reduce the resource requirements of certain quantum algorithms. Realizing these capabilities requires flexible, low-latency classical control. We have develope… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 10 pages, 13 figures

  25. arXiv:2404.14885  [pdf, other

    cs.CV

    Domain adaptive pose estimation via multi-level alignment

    Authors: Yugan Chen, Lin Zhao, Yalong Xu, Honglei Zu, Xiaoqi An, Guangyu Li

    Abstract: Domain adaptive pose estimation aims to enable deep models trained on source domain (synthesized) datasets produce similar results on the target domain (real-world) datasets. The existing methods have made significant progress by conducting image-level or feature-level alignment. However, only aligning at a single level is not sufficient to fully bridge the domain gap and achieve excellent domain… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: accepted to icme2024

  26. arXiv:2404.14832  [pdf, other

    cs.IT

    GLDPC-PC Codes for MIMO Systems with Iterative Detection and Decoding

    Authors: Binghui Shi, Yongpeng Wu, Yin Xu, Xiqi Gao, Xiaohu You, Wenjun Zhang

    Abstract: In this work, we propose the integration of GLDPC codes with short polar-like component codes, termed GLDPC codes with polar component codes (GLDPC-PC). This approach leverages the good distance properties of polar-like codes and mitigates their high decoding latency in long block lengths. A recently proposed soft-input soft-output decoder for polar-like codes enables effective iterative belief pr… ▽ More

    Submitted 9 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: submitted to globecom 2024

  27. arXiv:2404.14828  [pdf, other

    cs.IT

    GLDPC-PC Codes: Channel Coding Towards 6G Communications

    Authors: Li Shen, Yongpeng Wu, Yin Xu, Xiaohu You, Xiqi Gao, Wenjun Zhang

    Abstract: The sixth generation (6G) wireless communication system will improve the key technical indicators by one to two orders of magnitude, and come with some new features. As a crucial technique to enhance the reliability and efficiency of data transmission, the next generation channel coding is not only required to satisfy the stringent requirements of 6G, but also expected to be backward compatible to… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE Communications Magazine

  28. arXiv:2404.14755  [pdf, other

    cs.MM cs.AI cs.CV cs.HC

    SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models

    Authors: Bo Lin, Yingjing Xu, Xuanwen Bao, Zhou Zhao, Zuyong Zhang, Zhouyang Wang, Jie Zhang, Shuiguang Deng, Jianwei Yin

    Abstract: With the continuous advancement of vision language models (VLMs) technology, remarkable research achievements have emerged in the dermatology field, the fourth most prevalent human disease category. However, despite these advancements, VLM still faces "hallucination" in dermatological diagnosis, and due to the inherent complexity of dermatological conditions, existing tools offer relatively limite… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  29. arXiv:2404.14741  [pdf, other

    cs.CL cs.AI

    Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering

    Authors: Yao Xu, Shizhu He, Jiabei Chen, Zihao Wang, Yangqiu Song, Hanghang Tong, Kang Liu, Jun Zhao

    Abstract: To address the issue of insufficient knowledge and the tendency to generate hallucination in Large Language Models (LLMs), numerous studies have endeavored to integrate LLMs with Knowledge Graphs (KGs). However, all these methods are evaluated on conventional Knowledge Graph Question Answering (KGQA) with complete KGs, where the factual triples involved in each question are entirely covered by the… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  30. arXiv:2404.14696  [pdf

    cs.CV

    Adaptive Prompt Learning with Negative Textual Semantics and Uncertainty Modeling for Universal Multi-Source Domain Adaptation

    Authors: Yuxiang Yang, Lu Wen, Yuanyuan Xu, Jiliu Zhou, Yan Wang

    Abstract: Universal Multi-source Domain Adaptation (UniMDA) transfers knowledge from multiple labeled source domains to an unlabeled target domain under domain shifts (different data distribution) and class shifts (unknown target classes). Existing solutions focus on excavating image features to detect unknown samples, ignoring abundant information contained in textual semantics. In this paper, we propose a… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by ICME2024

  31. arXiv:2404.14443  [pdf

    cs.CL cs.AI

    Evaluation of Machine Translation Based on Semantic Dependencies and Keywords

    Authors: Kewei Yuan, Qiurong Zhao, Yang Xu, Xiao Zhang, Huansheng Ning

    Abstract: In view of the fact that most of the existing machine translation evaluation algorithms only consider the lexical and syntactic information, but ignore the deep semantic information contained in the sentence, this paper proposes a computational method for evaluating the semantic correctness of machine translations based on reference translations and incorporating semantic dependencies and sentence… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  32. PointDifformer: Robust Point Cloud Registration With Neural Diffusion and Transformer

    Authors: Rui She, Qiyu Kang, Sijie Wang, Wee Peng Tay, Kai Zhao, Yang Song, Tianyu Geng, Yi Xu, Diego Navarro Navarro, Andreas Hartmannsgruber

    Abstract: Point cloud registration is a fundamental technique in 3-D computer vision with applications in graphics, autonomous driving, and robotics. However, registration tasks under challenging conditions, under which noise or perturbations are prevalent, can be difficult. We propose a robust point cloud registration approach that leverages graph neural partial differential equations (PDEs) and heat kerne… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Transactions on Geoscience and Remote Sensing

  33. arXiv:2404.13953  [pdf, other

    cs.CV

    360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos

    Authors: Yinzhe Xu, Huajian Huang, Yingshu Chen, Sai-Kit Yeung

    Abstract: Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360° images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidire… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  34. arXiv:2404.13885  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals

    Authors: Qingyang Wu, Ying Xu, Tingsong Xiao, Yunze Xiao, Yitong Li, Tianyang Wang, Yichi Zhang, Shanghai Zhong, Yuwei Zhang, Wei Lu, Yifan Yang

    Abstract: Large Language Models (LLMs) have emerged as potent tools for advancing the United Nations' Sustainable Development Goals (SDGs). However, the attitudinal disparities between LLMs and humans towards these goals can pose significant challenges. This study conducts a comprehensive review and analysis of the existing literature on the attitudes of LLMs towards the 17 SDGs, emphasizing the comparison… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  35. arXiv:2404.12388  [pdf, other

    cs.CV

    VideoGigaGAN: Towards Detail-rich Video Super-Resolution

    Authors: Yiran Xu, Taesung Park, Richard Zhang, Yang Zhou, Eli Shechtman, Feng Liu, Jia-Bin Huang, Difan Liu

    Abstract: Video super-resolution (VSR) approaches have shown impressive temporal consistency in upsampled videos. However, these approaches tend to generate blurrier results than their image counterparts as they are limited in their generative capability. This raises a fundamental question: can we extend the success of a generative image upsampler to the VSR task while preserving the temporal consistency? W… ▽ More

    Submitted 1 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: project page: https://videogigagan.github.io/

  36. arXiv:2404.12186  [pdf, other

    cs.LG cs.CR

    Privacy-Preserving UCB Decision Process Verification via zk-SNARKs

    Authors: Xikun Jiang, He Lyu, Chenhao Ying, Yibin Xu, Boris Düdder, Yuan Luo

    Abstract: With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge. This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidenc… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  37. arXiv:2404.11475  [pdf, other

    cs.CV cs.AI

    AdaIR: Exploiting Underlying Similarities of Image Restoration Tasks with Adapters

    Authors: Hao-Wei Chen, Yu-Syuan Xu, Kelvin C. K. Chan, Hsien-Kai Kuo, Chun-Yi Lee, Ming-Hsuan Yang

    Abstract: Existing image restoration approaches typically employ extensive networks specifically trained for designated degradations. Despite being effective, such methods inevitably entail considerable storage costs and computational overheads due to the reliance on task-specific networks. In this work, we go beyond this well-established framework and exploit the inherent commonalities among image restorat… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  38. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  39. arXiv:2404.10490  [pdf, other

    cs.CV

    Enhancing Sign Language Teaching: A Mixed Reality Approach for Immersive Learning and Multi-Dimensional Feedback

    Authors: Hongli Wen, Yang Xu, Lin Li, Xudong Ru, Xingce Wang, Zhongke Wu

    Abstract: Traditional sign language teaching methods face challenges such as limited feedback and diverse learning scenarios. Although 2D resources lack real-time feedback, classroom teaching is constrained by a scarcity of teacher. Methods based on VR and AR have relatively primitive interaction feedback mechanisms. This study proposes an innovative teaching model that uses real-time monocular vision and m… ▽ More

    Submitted 6 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures

  40. arXiv:2404.10441  [pdf, other

    cs.CV

    1st Place Solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction

    Authors: Hang Du, Yaping Xue, Weidong Dai, Xuejun Yan, Jingjing Wang

    Abstract: In this report, we present the 1st place solution for ICCV 2023 OmniObject3D Challenge: Sparse-View Reconstruction. The challenge aims to evaluate approaches for novel view synthesis and surface reconstruction using only a few posed images of each object. We utilize Pixel-NeRF as the basic model, and apply depth supervision as well as coarse-to-fine positional encoding. The experiments demonstrate… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  41. arXiv:2404.10383  [pdf, other

    cs.CV

    Learning to Score Sign Language with Two-stage Method

    Authors: Hongli Wen, Yang Xu

    Abstract: Human action recognition and performance assessment have been hot research topics in recent years. Recognition problems have mature solutions in the field of sign language, but past research in performance analysis has focused on competitive sports and medical training, overlooking the scoring assessment ,which is an important part of sign language teaching digitalization. In this paper, we analyz… ▽ More

    Submitted 16 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 9 pages, 7 figures

  42. arXiv:2404.10260  [pdf, other

    q-bio.BM cs.AI

    HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

    Authors: Xiaomin Fang, Jie Gao, Jing Hu, Lihang Liu, Yang Xue, Xiaonan Zhang, Kunrui Zhu

    Abstract: While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex predictio… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  43. arXiv:2404.10253  [pdf, other

    cs.DC

    Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

    Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

    Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 18 pages, 13 figures

  44. arXiv:2404.10229  [pdf, other

    cs.CL

    Generative Text Steganography with Large Language Model

    Authors: Jiaxuan Wu, Zhengxian Wu, Yiming Xue, Juan Wen, Wanli Peng

    Abstract: Recent advances in large language models (LLMs) have blurred the boundary of high-quality text generation between humans and machines, which is favorable for generative text steganography. While, current advanced steganographic mapping is not suitable for LLMs since most users are restricted to accessing only the black-box API or user interface of the LLMs, thereby lacking access to the training v… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  45. arXiv:2404.09512  [pdf, other

    cs.CV

    Magic Clothing: Controllable Garment-Driven Image Synthesis

    Authors: Weifeng Chen, Tao Gu, Yuhao Xu, Chengcai Chen

    Abstract: We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for an unexplored garment-driven image synthesis task. Aiming at generating customized characters wearing the target garments with diverse text prompts, the image controllability is the most critical issue, i.e., to preserve the garment details and maintain faithfulness to the text prompts. To this end, we introdu… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  46. arXiv:2404.09496  [pdf, other

    cs.CV

    Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

    Authors: Genjia Liu, Yue Hu, Chenxin Xu, Weibo Mao, Junhao Ge, Zhengxiang Huang, Yifan Lu, Yinda Xu, Junkai Xia, Yafei Wang, Siheng Chen

    Abstract: Vehicle-to-everything-aided autonomous driving (V2X-AD) has a huge potential to provide a safer driving solution. Despite extensive researches in transportation and communication to support V2X-AD, the actual utilization of these infrastructures and communication resources in enhancing driving performances remains largely unexplored. This highlights the necessity of collaborative autonomous drivin… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  47. arXiv:2404.09492  [pdf, other

    cs.CL

    Bridging the Gap between Different Vocabularies for LLM Ensemble

    Authors: Yangyifan Xu, Jinliang Lu, Jiajun Zhang

    Abstract: Ensembling different large language models (LLMs) to unleash their complementary potential and harness their individual strengths is highly valuable. Nevertheless, vocabulary discrepancies among various LLMs have constrained previous studies to either selecting or blending completely generated outputs. This limitation hinders the dynamic correction and enhancement of outputs during the generation… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to the main conference of NAACL 2024

  48. arXiv:2404.09468  [pdf, other

    cs.AI

    MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion

    Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Huajun Chen, Wen Zhang

    Abstract: Multi-modal knowledge graphs (MMKG) store structured world knowledge containing rich multi-modal descriptive information. To overcome their inherent incompleteness, multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given MMKGs, leveraging both structural information from the triples and multi-modal information of the entities. Existing MMKGC methods usually… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Working in progress; Repo is available at https://github.com/zjukg/MyGO

  49. arXiv:2404.09243  [pdf, other

    cs.LG cs.NE

    LSROM: Learning Self-Refined Organizing Map for Fast Imbalanced Streaming Data Clustering

    Authors: Yongqi Xu, Yujian Lee, Rong Zou, Yiqun Zhang, Yiu-Ming Cheung

    Abstract: Streaming data clustering is a popular research topic in the fields of data mining and machine learning. Compared to static data, streaming data, which is usually analyzed in data chunks, is more susceptible to encountering the dynamic cluster imbalanced issue. That is, the imbalanced degree of clusters varies in different streaming data chunks, leading to corruption in either the accuracy or the… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 13 pages, 7 figures

  50. arXiv:2404.08347  [pdf, other

    cs.CV cs.LG

    Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

    Authors: Yang Yang, Hongpeng Pan, Qing-Yuan Jiang, Yi Xu, Jinghui Tang

    Abstract: Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approa… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 17 pages;6 figures