Skip to main content

Showing 1–50 of 786 results for author: Zhao, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04932  [pdf, other

    cs.NI

    FIGRET: Fine-Grained Robustness-Enhanced Traffic Engineering

    Authors: Ximeng Liu, Shizhen Zhao, Yong Cui

    Abstract: Traffic Engineering (TE) is critical for improving network performance and reliability. A key challenge in TE is the management of sudden traffic bursts. Existing TE schemes often struggle to accurately determine the extent of focus required for these surges, thereby facing difficulties in achieving a balance between performance under normal and peak traffic conditions. To address this issue, we i… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  2. arXiv:2405.04812  [pdf, other

    cs.RO cs.CV

    General Place Recognition Survey: Towards Real-World Autonomy

    Authors: Peng Yin, Jianhao Jiao, Shiqi Zhao, Lingyun Xu, Guoquan Huang, Howie Choset, Sebastian Scherer, Jianda Han

    Abstract: In the realm of robotics, the quest for achieving real-world autonomy, capable of executing large-scale and long-term operations, has positioned place recognition (PR) as a cornerstone technology. Despite the PR community's remarkable strides over the past two decades, garnering attention from fields like computer vision and robotics, the development of PR methods that sufficiently support real-wo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 20 pages, 12 figures, under review

  3. arXiv:2405.04101  [pdf, other

    cs.LG cs.AI

    Continual Learning in the Presence of Repetition

    Authors: Hamed Hemati, Lorenzo Pellegrini, Xiaotian Duan, Zixuan Zhao, Fangfang Xia, Marc Masana, Benedikt Tscheschner, Eduardo Veas, Yuxiang Zheng, Shiji Zhao, Shao-Yuan Li, Sheng-Jun Huang, Vincenzo Lomonaco, Gido M. van de Ven

    Abstract: Continual learning (CL) provides a framework for training models in ever-evolving environments. Although re-occurrence of previously seen objects or tasks is common in real-world problems, the concept of repetition in the data stream is not often considered in standard benchmarks for CL. Unlike with the rehearsal mechanism in buffer-based strategies, where sample repetition is controlled by the st… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Preprint; Challenge Report of the 4th Workshop on Continual Learning in Computer Vision at CVPR

  4. arXiv:2405.04007  [pdf, other

    cs.CV

    SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

    Authors: Yuying Ge, Sijie Zhao, Chen Li, Yixiao Ge, Ying Shan

    Abstract: In this technical report, we introduce SEED-Data-Edit: a unique hybrid dataset for instruction-guided image editing, which aims to facilitate image manipulation using open-form language. SEED-Data-Edit is composed of three distinct types of data: (1) High-quality editing data produced by an automated pipeline, ensuring a substantial volume of diverse image editing pairs. (2) Real-world scenario da… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Technical Report; Dataset released in https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit

  5. arXiv:2405.03879  [pdf, other

    stat.ML cs.LG q-bio.GN stat.AP

    Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

    Authors: Sarah Zhao, Aditya Ravuri, Vidhi Lalchand, Neil D. Lawrence

    Abstract: Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.03565  [pdf, other

    cs.CV

    Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing

    Authors: Han Liu, Siyang Zhao, Xiaotong Zhang, Feng Zhang, Wei Wang, Fenglong Ma, Hongyang Chen, Hong Yu, Xianchao Zhang

    Abstract: Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all. While prevailing methods have shown promising performance via transferring knowledge from seen classes to unseen classes, they are still limited by (1) Inherent dissimilarities among classes make the transformation of features learned from seen classes t… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to AAAI 2024

  7. arXiv:2405.00749  [pdf, other

    cs.CV cs.LG

    More is Better: Deep Domain Adaptation with Multiple Sources

    Authors: Sicheng Zhao, Hui Chen, Hu Huang, Pengfei Xu, Guiguang Ding

    Abstract: In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to d… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024. arXiv admin note: text overlap with arXiv:2002.12169

  8. arXiv:2405.00700  [pdf

    cs.NE cond-mat.str-el

    Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

    Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

    Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More

    Submitted 16 April, 2024; originally announced May 2024.

    Comments: 18 pages,4 figures

  9. arXiv:2405.00622  [pdf, other

    cs.CL cs.AI cs.LG

    Causal Evaluation of Language Models

    Authors: Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

    Abstract: Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive ben… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 315 pages, 230 figures, 21 tables. Project website: https://opencausalab.github.io/CaLM

  10. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  11. arXiv:2404.19449  [pdf, other

    cs.IT

    AoI-aware Sensing Scheduling and Trajectory Optimization for Multi-UAV-assisted Wireless Backscatter Networks

    Authors: Yusi Long, Songhan Zhao, Shimin Gong, Bo Gu, Dusit Niyato, Xuemin, Shen

    Abstract: This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissio… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by IEEE TVT

  12. arXiv:2404.17546  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

    Authors: Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse

    Abstract: Numerous capability and safety techniques of Large Language Models (LLMs), including RLHF, automated red-teaming, prompt engineering, and infilling, can be cast as sampling from an unnormalized target distribution defined by a given reward or potential function over the full sequence. In this work, we leverage the rich toolkit of Sequential Monte Carlo (SMC) for these probabilistic inference probl… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  13. arXiv:2404.17433  [pdf, other

    cs.CV

    PromptCIR: Blind Compressed Image Restoration with Prompt Learning

    Authors: Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

    Abstract: Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Winner of NTIRE 2024 Blind Compressed Image Enhancement Challenge

  14. arXiv:2404.16825  [pdf, other

    cs.CV eess.IV

    ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

    Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

    Abstract: With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  15. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  16. arXiv:2404.16493  [pdf, other

    cs.CV

    Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

    Authors: Hai Wu, Shijia Zhao, Xun Huang, Chenglu Wen, Xin Li, Cheng Wang

    Abstract: The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detecto… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  17. arXiv:2404.14396  [pdf, other

    cs.CV

    SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

    Authors: Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan

    Abstract: The rapid evolution of multimodal foundation model has demonstrated significant progresses in vision-language understanding and generation, e.g., our previous work SEED-LLaMA. However, there remains a gap between its capability and the real-world applicability, primarily due to the model's limited capacity to effectively respond to various user instructions and interact with diverse visual data. I… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Project released at: https://github.com/AILab-CVC/SEED-X

  18. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  19. arXiv:2404.13207  [pdf, other

    cs.IR cs.LG

    STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

    Authors: Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec

    Abstract: Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 25 pages, 7 figures

  20. arXiv:2404.11605  [pdf, other

    cs.CV cs.AI cs.RO

    VG4D: Vision-Language Model Goes 4D Video Recognition

    Authors: Zhichao Deng, Xiangtai Li, Xia Li, Yunhai Tong, Shen Zhao, Mengyuan Liu

    Abstract: Understanding the real world through point cloud video is a crucial aspect of robotics and autonomous driving systems. However, prevailing methods for 4D point cloud recognition have limitations due to sensor resolution, which leads to a lack of detailed information. Recent advances have shown that Vision-Language Models (VLM) pre-trained on web-scale text-image datasets can learn fine-grained vis… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: ICRA 2024

  21. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  22. arXiv:2404.10321  [pdf, other

    cs.IR

    Cluster-based Graph Collaborative Filtering

    Authors: Fan Liu, Shuai Zhao, Zhiyong Cheng, Liqiang Nie, Mohan Kankanhalli

    Abstract: Graph Convolution Networks (GCNs) have significantly succeeded in learning user and item representations for recommendation systems. The core of their efficacy is the ability to explicitly exploit the collaborative signals from both the first- and high-order neighboring nodes. However, most existing GCN-based methods overlook the multiple interests of users while performing high-order graph convol… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 22 pages, 8 figures

    ACM Class: H.3.3

  23. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  24. arXiv:2404.09529  [pdf, other

    cs.LG cs.AI cs.CL

    Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

    Authors: Siyan Zhao, Daniel Israel, Guy Van den Broeck, Aditya Grover

    Abstract: During inference for transformer-based large language models (LLM), prefilling is the computation of the key-value (KV) cache for input tokens in the prompt prior to autoregressive generation. For longer input prompt lengths, prefilling will incur a significant overhead on decoding time. In this work, we highlight the following pitfall of prefilling: for batches containing high-varying prompt leng… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 18 pages, code in https://github.com/siyan-zhao/prepacking

  25. arXiv:2404.07972  [pdf, other

    cs.AI cs.CL

    OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

    Authors: Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu

    Abstract: Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 51 pages, 21 figures

  26. arXiv:2404.06777  [pdf, other

    cs.NI

    Responsible Federated Learning in Smart Transportation: Outlooks and Challenges

    Authors: Xiaowen Huang, Tao Huang, Shushi Gu, Shuguang Zhao, Guanglin Zhang

    Abstract: Integrating artificial intelligence (AI) and federated learning (FL) in smart transportation has raised critical issues regarding their responsible use. Ensuring responsible AI is paramount for the stability and sustainability of intelligent transportation systems. Despite its importance, research on the responsible application of AI and FL in this domain remains nascent, with a paucity of in-dept… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  27. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge in the field. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speak… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  28. arXiv:2404.05145  [pdf, other

    cs.CV

    UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather

    Authors: Haimei Zhao, Jing Zhang, Zhuo Chen, Shanshan Zhao, Dacheng Tao

    Abstract: LiDAR semantic segmentation (LSS) is a critical task in autonomous driving and has achieved promising progress. However, prior LSS methods are conventionally investigated and evaluated on datasets within the same domain in clear weather. The robustness of LSS models in unseen scenes and all weather conditions is crucial for ensuring safety and reliability in real applications. To this end, we prop… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  29. arXiv:2404.04818  [pdf, other

    cs.AI cs.CL cs.CV

    DWE+: Dual-Way Matching Enhanced Framework for Multimodal Entity Linking

    Authors: Shezheng Song, Shasha Li, Shan Zhao, Xiaopeng Li, Chengyu Wang, Jie Yu, Jun Ma, Tianwei Yan, Bin Ji, Xiaoguang Mao

    Abstract: Multimodal entity linking (MEL) aims to utilize multimodal information (usually textual and visual information) to link ambiguous mentions to unambiguous entities in knowledge base. Current methods facing main issues: (1)treating the entire image as input may contain redundant information. (2)the insufficient utilization of entity-related information, such as attributes in images. (3)semantic inco… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: under review on TOIS. arXiv admin note: substantial text overlap with arXiv:2312.11816

  30. arXiv:2404.03204  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

    Authors: Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao

    Abstract: We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. Th… ▽ More

    Submitted 6 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  31. arXiv:2404.02806  [pdf, other

    cs.SE cs.AI cs.HC

    The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

    Authors: Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag

    Abstract: Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability of LLMs to generate complete code that passes unit tests. As LLMs are increasingly used as programmer assistants, we study whether gains on existing benchmarks translate to gains in programmer productivity when coding with LLMs, including… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  32. arXiv:2404.02668  [pdf, other

    cs.CV

    RS-Mamba for Large Remote Sensing Image Dense Prediction

    Authors: Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

    Abstract: Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional… ▽ More

    Submitted 10 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages,8 figures

  33. arXiv:2404.00386  [pdf, other

    cs.CL

    Jetsons at FinNLP 2024: Towards Understanding the ESG Impact of a News Article using Transformer-based Models

    Authors: Parag Pravin Dakle, Alolika Gon, Sihan Zha, Liang Wang, SaiKrishna Rallabandi, Preethi Raghavan

    Abstract: In this paper, we describe the different approaches explored by the Jetsons team for the Multi-Lingual ESG Impact Duration Inference (ML-ESG-3) shared task. The shared task focuses on predicting the duration and type of the ESG impact of a news article. The shared task dataset consists of 2,059 news titles and articles in English, French, Korean, and Japanese languages. For the impact duration cla… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  34. arXiv:2403.19318  [pdf, other

    cs.CL

    TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

    Authors: Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-Li, Jifan Yu, Shu Zhao, Juanzi Li, Jie Tang

    Abstract: We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to un… ▽ More

    Submitted 1 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: https://tablellm.github.io/

  35. arXiv:2403.18300  [pdf, other

    cs.CR cs.DC

    HotStuff-2 vs. HotStuff: The Difference and Advantage

    Authors: Siyuan Zhao, Yanqi Wu, Zheng Wang

    Abstract: Byzantine consensus protocols are essential in blockchain technology. The widely recognized HotStuff protocol uses cryptographic measures for efficient view changes and reduced communication complexity. Recently, the main authors of HotStuff introduced an advanced iteration named HotStuff-2. This paper aims to compare the principles and analyze the effectiveness of both protocols, hoping to depict… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  36. arXiv:2403.17373  [pdf, other

    cs.CV cs.AI cs.LG

    AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

    Authors: Mingfu Liang, Jong-Chyi Su, Samuel Schulter, Sparsh Garg, Shiyu Zhao, Ying Wu, Manmohan Chandraker

    Abstract: Autonomous vehicle (AV) systems rely on robust perception models as a cornerstone of safety assurance. However, objects encountered on the road exhibit a long-tailed distribution, with rare or unseen categories posing challenges to a deployed perception model. This necessitates an expensive process of continuously curating and annotating data with significant human effort. We propose to leverage r… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR-2024

  37. arXiv:2403.17006  [pdf, other

    cs.CV

    Invertible Diffusion Models for Compressed Sensing

    Authors: Bin Chen, Zhenyu Zhang, Weiqi Li, Chen Zhao, Jiwen Yu, Shijie Zhao, Jie Chen, Jian Zhang

    Abstract: While deep neural networks (NN) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch constrains their effectiveness and hampers rapid deployment. Although recent methods utilize pre-trained diffusion models for image reconstruction, they struggle with slow inference and restricted adaptability to CS. To tackl… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  38. Domain Adaptive Detection of MAVs: A Benchmark and Noise Suppression Network

    Authors: Yin Zhang, Jinhong Deng, Peidong Liu, Wen Li, Shiyu Zhao

    Abstract: Visual detection of Micro Air Vehicles (MAVs) has attracted increasing attention in recent years due to its important application in various tasks. The existing methods for MAV detection assume that the training set and testing set have the same distribution. As a result, when deployed in new domains, the detectors would have a significant performance degradation due to domain discrepancy. In this… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 17 pages, 11 figures. Accepted by IEEE Transactions on Automation Science and Engineering

    Journal ref: IEEE Transactions on Automation Science and Engineering, 2024

  39. arXiv:2403.16159  [pdf, other

    cs.HC

    Designing Child-Centric AI Learning Environments: Insights from LLM-Enhanced Creative Project-Based Learning

    Authors: Siyu Zha, Yuehan Qiao, Qingyu Hu, Zhongsheng Li, Jiangtao Gong, Yingqing Xu

    Abstract: Project-based learning (PBL) is an instructional method that is very helpful in nurturing students' creativity, but it requires significant time and energy from both students and teachers. Large language models (LLMs) have been proven to assist in creative tasks, yet much controversy exists regarding their role in fostering creativity. This paper explores the potential of LLMs in PBL settings, wit… ▽ More

    Submitted 5 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  40. arXiv:2403.15740  [pdf, other

    cs.CL cs.CR cs.IR cs.LG

    Ghost Sentence: A Tool for Everyday Users to Copyright Data from Large Language Models

    Authors: Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang

    Abstract: Web user data plays a central role in the ecosystem of pre-trained large language models (LLMs) and their fine-tuned variants. Billions of data are crawled from the web and fed to LLMs. How can \textit{\textbf{everyday web users}} confirm if LLMs misuse their data without permission? In this work, we suggest that users repeatedly insert personal passphrases into their documents, enabling LLMs to m… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Preprint, work in progress

  41. arXiv:2403.13233  [pdf, other

    cs.CL

    Technical Report: Competition Solution For BetterMixture

    Authors: Shuaijiang Zhao, Xiaoquan Fang

    Abstract: In the era of flourishing large-scale models, the challenge of selecting and optimizing datasets from the vast and complex sea of data, to enhance the performance of large language models within the constraints of limited computational resources, has become paramount. This paper details our solution for the BetterMixture challenge, which focuses on the fine-tuning data mixing for large language mo… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 6 pages

  42. arXiv:2403.12473  [pdf, other

    cs.CV

    PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery

    Authors: Wendi Yang, Zihang Jiang, Shang Zhao, S. Kevin Zhou

    Abstract: With the recent advancements in single-image-based human mesh recovery, there is a growing interest in enhancing its performance in certain extreme scenarios, such as occlusion, while maintaining overall model accuracy. Although obtaining accurately annotated 3D human poses under occlusion is challenging, there is still a wealth of rich and precise 2D pose annotations that can be leveraged. Howeve… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  43. arXiv:2403.12327  [pdf, other

    cs.CV cs.LG

    GT-Rain Single Image Deraining Challenge Report

    Authors: Howard Zhang, Yunhao Ba, Ethan Yang, Rishi Upadhyay, Alex Wong, Achuta Kadambi, Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan, Chaochao Zheng, Luping Wang, Bin Liu, Sunder Ali Khowaja, Jiseok Yoon, Ik-Hyun Lee, Zhao Zhang, Yanyan Wei, Jiahuan Ren, Suiyi Zhao, Huan Zheng

    Abstract: This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  44. arXiv:2403.12035  [pdf, other

    cs.CV

    CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

    Authors: Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu Chen, Bin Liang, Kam-Fai Wong, Lei Zhang

    Abstract: Recent advancements in video generation have been remarkable, yet many existing methods struggle with issues of consistency and poor text-video alignment. Moreover, the field lacks effective techniques for text-guided video inpainting, a stark contrast to the well-explored domain of text-guided image inpainting. To this end, this paper proposes a novel text-guided video inpainting model that achie… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  45. arXiv:2403.11607  [pdf, other

    cs.RO

    AGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone Environments

    Authors: Junming Wang, Zekai Sun, Xiuxian Guan, Tianxiang Shen, Zongyuan Zhang, Tianyang Duan, Dong Huang, Shixiong Zhao, Heming Cui

    Abstract: The exceptional mobility and long endurance of air-ground robots are raising interest in their usage to navigate complex environments (e.g., forests and large buildings). However, such environments often contain occluded and unknown regions, and without accurate prediction of unobserved obstacles, the movement of the air-ground robot often suffers a suboptimal trajectory under existing mapping-bas… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to ICRA 2024

  46. arXiv:2403.11373  [pdf, other

    cs.CV

    Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration

    Authors: Shu Zhao, Xiaohan Zou, Tan Yu, Huijuan Xu

    Abstract: Pre-trained large multi-modal models (LMMs) exploit fine-tuning to adapt diverse user applications. Nevertheless, fine-tuning may face challenges due to deactivated sensors (e.g., cameras turned off for privacy or technical issues), yielding modality-incomplete data and leading to inconsistency in training data and the data for inference. Additionally, continuous training leads to catastrophic for… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  47. arXiv:2403.11113  [pdf, other

    cs.CV

    Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis

    Authors: Yiyang Chen, Lunhao Duan, Shanshan Zhao, Changxing Ding, Dacheng Tao

    Abstract: Rotation invariance is an important requirement for point shape analysis. To achieve this, current state-of-the-art methods attempt to construct the local rotation-invariant representation through learning or defining the local reference frame (LRF). Although efficient, these LRF-based methods suffer from perturbation of local geometric relations, resulting in suboptimal local rotation invariance.… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  48. arXiv:2403.10805  [pdf, other

    cs.SD cs.AI cs.CV cs.GR cs.HC eess.AS

    Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference

    Authors: Fan Zhang, Zhaohan Wang, Xin Lyu, Siyuan Zhao, Mengjian Li, Weidong Geng, Naye Ji, Hui Du, Fuxing Gao, Hao Wu, Shunman Li

    Abstract: Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional approaches, reliant on various explicit feature inputs and complex multimodal processing, constrain the… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 12 pages,

  49. arXiv:2403.09551  [pdf, other

    cs.CV

    WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity

    Authors: Qiyuan Wang, Yanzhe Liu, Shang Zhao, Rong Liu, S. Kevin Zhou

    Abstract: Weakly supervised surgical instrument segmentation with only instrument presence labels has been rarely explored in surgical domain. To mitigate the highly under-constrained challenges, we extend a two-stage weakly supervised segmentation paradigm with temporal attributes from two perspectives. From a temporal equivariance perspective, we propose a prototype-based temporal equivariance regulation… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  50. arXiv:2403.08414  [pdf, other

    cs.LG cs.AI

    Causal Graph Neural Networks for Wildfire Danger Prediction

    Authors: Shan Zhao, Ioannis Prapas, Ilektra Karasante, Zhitong Xiong, Ioannis Papoutsis, Gustau Camps-Valls, Xiao Xiang Zhu

    Abstract: Wildfire forecasting is notoriously hard due to the complex interplay of different factors such as weather conditions, vegetation types and human activities. Deep learning models show promise in dealing with this complexity by learning directly from data. However, to inform critical decision making, we argue that we need models that are right for the right reasons; that is, the implicit rules lear… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024 Machine Learning for Remote Sensing (ML4RS) Workshop