Skip to main content

Showing 1–50 of 1,032 results for author: Zhou, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  2. arXiv:2405.03905  [pdf, other

    cs.AR cs.CV cs.SD eess.AS

    A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

    Authors: Qinyu Chen, Kwantae Kim, Chang Gao, Sheng Zhou, Taekwang Jang, Tobi Delbruck, Shih-Chii Liu

    Abstract: This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network (ΔRNN) cla… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  3. arXiv:2405.02068  [pdf, other

    cs.CV

    Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

    Authors: Canhui Tang, Sanping Zhou, Yizhe Li, Yonghao Dong, Le Wang

    Abstract: With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model c… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: The paper is under review

  4. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  5. arXiv:2404.19146  [pdf, other

    cs.AI cs.IR

    Automated Construction of Theme-specific Knowledge Graphs

    Authors: Linyi Ding, Sizhe Zhou, Jinfeng Xiao, Jiawei Han

    Abstract: Despite widespread applications of knowledge graphs (KGs) in various tasks such as question answering and intelligent conversational systems, existing KGs face two major challenges: information granularity and deficiency in timeliness. These hinder considerably the retrieval and analysis of in-context, fine-grained, and up-to-date knowledge from KGs, particularly in highly specialized themes (e.g.… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2404.17890  [pdf, other

    eess.IV cs.AI cs.CV

    DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

    Authors: Chenhe Du, Xiyue Lin, Qing Wu, Xuanyu Tian, Ying Su, Zhe Luo, Hongjiang Wei, S. Kevin Zhou, Jingyi Yu, Yuyao Zhang

    Abstract: Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging recon… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages, 10 figures

    ACM Class: I.2.10; I.4.5

  7. arXiv:2404.16619  [pdf, other

    cs.SD eess.AS

    The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge

    Authors: Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu

    Abstract: This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted in Grand Challenge of ICASSP 2024

  8. arXiv:2404.16416  [pdf, other

    cs.CV

    Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition

    Authors: Yu Wang, Sanping Zhou, Kun Xia, Le Wang

    Abstract: Semi-supervised action recognition aims to improve spatio-temporal reasoning ability with a few labeled data in conjunction with a large amount of unlabeled data. Albeit recent advancements, existing powerful methods are still prone to making ambiguous predictions under scarce labeled data, embodied as the limitation of distinguishing different actions with similar spatio-temporal information. In… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures, 6 tables, 56 conferences

    MSC Class: 68U10; 68T45 ACM Class: I.2.10

  9. arXiv:2404.16366  [pdf, other

    cs.LG cs.AI

    Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection

    Authors: Yuanchen Bei, Sheng Zhou, Jinke Shi, Yao Ma, Haishuai Wang, Jiajun Bu

    Abstract: Unsupervised graph anomaly detection aims at identifying rare patterns that deviate from the majority in a graph without the aid of labels, which is important for a variety of real-world applications. Recent advances have utilized Graph Neural Networks (GNNs) to learn effective node representations by aggregating information from neighborhoods. This is motivated by the hypothesis that nodes in the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures

  10. arXiv:2404.16233  [pdf, other

    cs.LG cs.AI

    AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

    Authors: Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

    Abstract: AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted at AutoML 2024 Conference

  11. arXiv:2404.14546  [pdf, other

    cs.RO

    Closing the Perception-Action Loop for Semantically Safe Navigation in Semi-Static Environments

    Authors: Jingxing Qian, Siqi Zhou, Nicholas Jianrui Ren, Veronica Chatrath, Angela P. Schoellig

    Abstract: Autonomous robots navigating in changing environments demand adaptive navigation strategies for safe long-term operation. While many modern control paradigms offer theoretical guarantees, they often assume known extrinsic safety constraints, overlooking challenges when deployed in real-world environments where objects can appear, disappear, and shift over time. In this paper, we present a closed-l… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Manuscript accepted to ICRA 2024

  12. arXiv:2404.12377  [pdf, other

    cs.RO

    RoboDreamer: Learning Compositional World Models for Robot Imagination

    Authors: Siyuan Zhou, Yilun Du, Jiaben Chen, Yandong Li, Dit-Yan Yeung, Chuang Gan

    Abstract: Text-to-video models have demonstrated substantial potential in robotic decision-making, enabling the imagination of realistic plans of future actions as well as accurate environment simulation. However, one major issue in such models is generalization -- models are limited to synthesizing videos subject to language instructions similar to those seen at training time. This is heavily limiting in d… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  13. arXiv:2404.12008  [pdf, other

    cs.IR cs.AI

    How Do Recommendation Models Amplify Popularity Bias? An Analysis from the Spectral Perspective

    Authors: Siyi Lin, Chongming Gao, Jiawei Chen, Sheng Zhou, Binbin Hu, Can Wang

    Abstract: Recommendation Systems (RS) are often plagued by popularity bias. Specifically,when recommendation models are trained on long-tailed datasets, they not only inherit this bias but often exacerbate it. This effect undermines both the precision and fairness of RS and catalyzes the so-called Matthew Effect. Despite the widely recognition of this issue, the fundamental causes remain largely elusive. In… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 23 pages, 7 figures

  14. SIGformer: Sign-aware Graph Transformer for Recommendation

    Authors: Sirui Chen, Jiawei Chen, Sheng Zhou, Bohao Wang, Shen Han, Chanfei Su, Yuqing Yuan, Can Wang

    Abstract: In recommender systems, most graph-based methods focus on positive user feedback, while overlooking the valuable negative feedback. Integrating both positive and negative feedback to form a signed graph can lead to a more comprehensive understanding of user preferences. However, the existing efforts to incorporate both types of feedback are sparse and face two main limitations: 1) They process pos… ▽ More

    Submitted 6 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR2024

  15. arXiv:2404.10675  [pdf, other

    cs.RO

    SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation

    Authors: Chang Chen, Yuecheng Liu, Yuzheng Zhuang, Sitong Mao, Kaiwen Xue, Shunbo Zhou

    Abstract: Although visual navigation has been extensively studied using deep reinforcement learning, online learning for real-world robots remains a challenging task. Recent work directly learned from offline dataset to achieve broader generalization in the real-world tasks, which, however, faces the out-of-distribution (OOD) issue and potential robot localization failures in a given map for unseen observat… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 7 pages, 5 figures, 2024 IEEE International Conference on Robotics and Automation

  16. arXiv:2404.10499  [pdf, other

    cs.CV cs.AI

    Robust Noisy Label Learning via Two-Stream Sample Distillation

    Authors: Sihan Bai, Sanping Zhou, Zheng Qin, Le Wang, Nanning Zheng

    Abstract: Noisy label learning aims to learn robust networks under the supervision of noisy labels, which plays a critical role in deep learning. Existing work either conducts sample selection or label correction to deal with noisy labels during the model training process. In this paper, we design a simple yet effective sample selection framework, termed Two-Stream Sample Distillation (TSSD), for noisy labe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  17. arXiv:2404.10378  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

    Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

    Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.10476

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

  18. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  19. arXiv:2404.10202  [pdf, other

    cs.LG cs.AI

    Towards a Novel Perspective on Adversarial Examples Driven by Frequency

    Authors: Zhun Zhang, Yi Zeng, Qihe Liu, Shijie Zhou

    Abstract: Enhancing our understanding of adversarial examples is crucial for the secure application of machine learning models in real-world scenarios. A prevalent method for analyzing adversarial examples is through a frequency-based approach. However, existing research indicates that attacks designed to exploit low-frequency or high-frequency information can enhance attack performance, leading to an uncle… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  20. arXiv:2404.10201  [pdf, other

    cs.DS cs.CR cs.IT cs.LG

    Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

    Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar, Samson Zhou

    Abstract: We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in\mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $\tilde{\mathcal{O}}\left(\min(n\varepsilon^2,d)\right)$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each use… ▽ More

    Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Fixed author ordering

  21. arXiv:2404.09170  [pdf, other

    cs.CL

    Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models

    Authors: Xiaoshu Chen, Sihang Zhou, Ke Liang, Xinwang Liu

    Abstract: Chain of thought finetuning aims to endow small student models with reasoning capacity to improve their performance towards a specific task by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answer to the question. However, the existing methods 1) generate rationale before the answer, making their answer correctness sensitive to the hal… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  22. arXiv:2404.08978  [pdf, other

    cs.LG cs.AI

    Incremental Residual Concept Bottleneck Models

    Authors: Chenming Shang, Shiji Zhou, Hengyuan Zhang, Xinzhe Ni, Yujiu Yang, Yuwang Wang

    Abstract: Concept Bottleneck Models (CBMs) map the black-box visual representations extracted by deep neural networks onto a set of interpretable concepts and use the concepts to make predictions, enhancing the transparency of the decision-making process. Multimodal pre-trained models can match visual representations with textual concept embeddings, allowing for obtaining the interpretable concept bottlenec… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  23. arXiv:2404.07972  [pdf, other

    cs.AI cs.CL

    OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

    Authors: Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu

    Abstract: Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 51 pages, 21 figures

  24. arXiv:2404.07707  [pdf, ps, other

    cs.GT

    Tree Splitting Based Rounding Scheme for Weighted Proportional Allocations with Subsidy

    Authors: Xiaowei Wu, Shengwei Zhou

    Abstract: We consider the problem of allocating $m$ indivisible items to a set of $n$ heterogeneous agents, aiming at computing a proportional allocation by introducing subsidy (money). It has been shown by Wu et al. (WINE 2023) that when agents are unweighted a total subsidy of $n/4$ suffices (assuming that each item has value/cost at most $1$ to every agent) to ensure proportionality. When agents have gen… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 30 pages, 11 figures

  25. arXiv:2404.07495  [pdf, other

    cs.CV

    PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds

    Authors: Weisheng Xu, Sifan Zhou, Zhihang Yuan

    Abstract: LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. It aims to obtain accurate 3D BBox from the search area based on similarity or motion. However, existing 3D SOT methods usually follow the point-based pipeline, where the sampling operation inevitably leads to redundant or lost information, resulting in unexpected performance. To address these is… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  26. arXiv:2404.06903  [pdf, other

    cs.CV cs.AI

    DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

    Authors: Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi

    Abstract: The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  27. arXiv:2404.05960  [pdf, other

    cs.CV

    EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker

    Authors: Baojie Fan, Wuyang Zhou, Kai Wang, Shijun Zhou, Fengyu Xu, Jiandong Tian

    Abstract: Most of 3D single object trackers (SOT) in point clouds follow the two-stream multi-stage 3D Siamese or motion tracking paradigms, which process the template and search area point clouds with two parallel branches, built on supervised point cloud backbones. In this work, beyond typical 3D Siamese or motion tracking, we propose a neat and compact one-stream transformer 3D SOT paradigm from the nove… ▽ More

    Submitted 12 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  28. arXiv:2404.05781  [pdf, other

    q-bio.NC cs.LG

    Group-specific discriminant analysis reveals statistically validated sex differences in lateralization of brain functional network

    Authors: Shuo Zhou, Junhao Luo, Yaya Jiang, Haolin Wang, Haiping Lu, Gaolang Gong

    Abstract: Lateralization is a fundamental feature of the human brain, where sex differences have been observed. Conventional studies in neuroscience on sex-specific lateralization are typically conducted on univariate statistical comparisons between male and female groups. However, these analyses often lack effective validation of group specificity. Here, we formulate modeling sex differences in lateralizat… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  29. arXiv:2404.05320  [pdf, other

    cs.CR

    Reflected Search Poisoning for Illicit Promotion

    Authors: Sangyi Wu, Jialong Xue, Shaoxuan Zhou, Xianghang Mi

    Abstract: As an emerging black hat search engine optimization (SEO) technique, reflected search poisoning (RSP) allows a miscreant to free-ride the reputation of high-ranking websites, poisoning search engines with illicit promotion texts (IPTs) in an efficient and stealthy manner, while avoiding the burden of continuous website compromise as required by traditional promotion infections. However, little is… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  30. arXiv:2404.04718  [pdf, other

    cs.CV cs.AI

    Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment

    Authors: Prasun C Tripathi, Sina Tabakhi, Mohammod N I Suvon, Lawrence Schöb, Samer Alabed, Andrew J Swift, Shuo Zhou, Haiping Lu

    Abstract: Pulmonary Arterial Wedge Pressure (PAWP) is an essential cardiovascular hemodynamics marker to detect heart failure. In clinical practice, Right Heart Catheterization is considered a gold standard for assessing cardiac hemodynamics while non-invasive methods are often needed to screen high-risk patients from a large population. In this paper, we propose a multimodal learning pipeline to predict PA… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  31. arXiv:2404.04244  [pdf, other

    cs.CV

    Fast Diffeomorphic Image Registration using Patch based Fully Convolutional Networks

    Authors: Jiong Wu, Shuang Zhou, Li Lin, Xin Wang, Wenxue Tan

    Abstract: Diffeomorphic image registration is a fundamental step in medical image analysis, owing to its capability to ensure the invertibility of transformations and preservation of topology. Currently, unsupervised learning-based registration techniques primarily extract features at the image level, potentially limiting their efficacy. This paper proposes a novel unsupervised learning-based fully convolut… ▽ More

    Submitted 3 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  32. arXiv:2404.02570  [pdf, other

    cs.CL

    MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

    Authors: Shijia Zhou, Huangyan Shan, Barbara Plank, Robert Litschko

    Abstract: This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  33. arXiv:2404.00358  [pdf, other

    cs.CV

    Spread Your Wings: A Radial Strip Transformer for Image Deblurring

    Authors: Duosheng Chen, Shihao Zhou, Jinshan Pan, Jinglei Shi, Lishen Qu, Jufeng Yang

    Abstract: Exploring motion information is important for the motion deblurring task. Recent the window-based transformer approaches have achieved decent performance in image deblurring. Note that the motion causing blurry results is usually composed of translation and rotation movements and the window-shift operation in the Cartesian coordinate system by the window-based transformer approaches only directly… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  34. arXiv:2404.00313  [pdf, other

    cs.CV

    Harmonizing Light and Darkness: A Symphony of Prior-guided Data Synthesis and Adaptive Focus for Nighttime Flare Removal

    Authors: Lishen Qu, Shihao Zhou, Jinshan Pan, Jinglei Shi, Duosheng Chen, Jufeng Yang

    Abstract: Intense light sources often produce flares in captured images at night, which deteriorates the visual quality and negatively affects downstream applications. In order to train an effective flare removal network, a reliable dataset is essential. The mainstream flare removal datasets are semi-synthetic to reduce human labour, but these datasets do not cover typical scenarios involving multiple scatt… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  35. arXiv:2404.00288  [pdf, other

    cs.CV

    Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration

    Authors: Shihao Zhou, Jinshan Pan, Jinglei Shi, Duosheng Chen, Lishen Qu, Jufeng Yang

    Abstract: How to explore useful features from images as prompts to guide the deep image restoration models is an effective way to solve image restoration. In contrast to mining spatial relations within images as prompt, which leads to characteristics of different frequencies being neglected and further remaining subtle or undetectable artifacts in the restored image, we develop a Frequency Prompting image r… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 18 pages, 10 figrues

  36. arXiv:2404.00279  [pdf, other

    cs.CV

    Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration

    Authors: Shihao Zhou, Duosheng Chen, Jinshan Pan, Jufeng Yang

    Abstract: Transformer-based approaches have achieved superior performance in image restoration, since they can model long-term dependencies well. However, the limitation in capturing local information restricts their capacity to remove degradations. While existing approaches attempt to mitigate this issue by incorporating convolutional operations, the core component in Transformer, i.e., self-attention, whi… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures

  37. arXiv:2403.18342  [pdf, other

    cs.CV

    Learning Inclusion Matching for Animation Paint Bucket Colorization

    Authors: Yuekun Dai, Shangchen Zhou, Qinyue Li, Chongyi Li, Chen Change Loy

    Abstract: Colorizing line art is a pivotal task in the production of hand-drawn cel animation. This typically involves digital painters using a paint bucket tool to manually color each segment enclosed by lines, based on RGB values predetermined by a color designer. This frame-by-frame process is both arduous and time-intensive. Current automated methods mainly focus on segment matching. This technique migr… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: accepted to CVPR 2024. Project Page: https://ykdai.github.io/projects/InclusionMatching

  38. arXiv:2403.18160  [pdf, other

    cs.HC

    Eternagram: Probing Player Attitudes in Alternate Climate Scenarios Through a ChatGPT-Driven Text Adventure

    Authors: Suifang Zhou, Latisha Besariani Hendra, Qinshi Zhang, Jussi Holopainen, RAY LC

    Abstract: Conventional methods of assessing attitudes towards climate change are limited in capturing authentic opinions, primarily stemming from a lack of context-specific assessment strategies and an overreliance on simplistic surveys. Game-based Assessments (GBA) have demonstrated the ability to overcome these issues by immersing participants in engaging gameplay within carefully crafted, scenario-based… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 22 pages, 6 figures, Accepted by CHI Conference on Human Factors in Computing Systems 2024

    ACM Class: H.5.2

  39. arXiv:2403.17760  [pdf, other

    cs.CL

    Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons

    Authors: Shijia Zhou, Leonie Weissweiler, Taiqi He, Hinrich Schütze, David R. Mortensen, Lori Levin

    Abstract: In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias. We then create further challenging sub-tasks in an effort… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  40. arXiv:2403.16699  [pdf, other

    cs.IT eess.SP

    Resonant Beam Communications: A New Design Paradigm and Challenges

    Authors: Yuanming Tian, Dongxu Li, Chuan Huang, Qingwen Liu, Shengli Zhou

    Abstract: Resonant beam communications (RBCom), which adopt oscillating photons between two separate retroreflectors for information transmission, exhibit potential advantages over other types of wireless optical communications (WOC). However, echo interference generated by the modulated beam reflected from the receiver affects the transmission of the desired information. To tackle this challenge, a synchro… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  41. arXiv:2403.16694  [pdf, other

    cs.IT eess.SP

    Design and Performance of Resonant Beam Communications -- Part II: Mobile Scenario

    Authors: Dongxu Li, Yuanming Tian, Chuan Huang, Qingwen Liu, Shengli Zhou

    Abstract: This two-part paper focuses on the system design and performance analysis for a point-to-point resonant beam communication (RBCom) system under both the quasi-static and mobile scenarios. Part I of this paper proposes a synchronization-based information transmission scheme and derives the capacity upper and lower bounds for the quasi-static channel case. In Part II, we address the mobile scenario,… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  42. arXiv:2403.16676  [pdf, other

    cs.IT eess.SP

    Design and Performance of Resonant Beam Communications -- Part I: Quasi-Static Scenario

    Authors: Dongxu Li, Yuanming Tian, Chuan Huang, Qingwen Liu, Shengli Zhou

    Abstract: This two-part paper studies a point-to-point resonant beam communication (RBCom) system, where two separately deployed retroreflectors are adopted to generate the resonant beam between the transmitter and the receiver, and analyzes the transmission rate of the considered system under both the quasi-static and mobile scenarios. Part I of this paper focuses on the quasi-static scenario where the loc… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  43. arXiv:2403.16450  [pdf, other

    cs.CV

    Camera-aware Label Refinement for Unsupervised Person Re-identification

    Authors: Pengna Li, Kangyi Wu, Wenli Huang, Sanping Zhou, Jinjun Wang

    Abstract: Unsupervised person re-identification aims to retrieve images of a specified person without identity labels. Many recent unsupervised Re-ID approaches adopt clustering-based methods to measure cross-camera feature similarity to roughly divide images into clusters. They ignore the feature distribution discrepancy induced by camera domain gap, resulting in the unavoidable performance degradation. Ca… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: submitted to IEEE TMM

  44. arXiv:2403.16428  [pdf, other

    cs.CV

    Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

    Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao

    Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  45. arXiv:2403.15156  [pdf, other

    cs.RO cs.CV eess.SY

    Infrastructure-Assisted Collaborative Perception in Automated Valet Parking: A Safety Perspective

    Authors: Yukuan Jia, Jiawen Zhang, Shimeng Lu, Baokang Fan, Ruiqing Mao, Sheng Zhou, Zhisheng Niu

    Abstract: Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructur… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 7 figures, 4 tables, accepted by IEEE VTC2024-Spring

  46. arXiv:2403.13658  [pdf, other

    cs.LG cs.CV

    Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability Detection

    Authors: Mohammod N. I. Suvon, Prasun C. Tripathi, Wenrui Fan, Shuo Zhou, Xianyuan Liu, Samer Alabed, Venet Osmani, Andrew J. Swift, Chen Chen, Haiping Lu

    Abstract: Recent advancements in non-invasive detection of cardiac hemodynamic instability (CHDI) primarily focus on applying machine learning techniques to a single data modality, e.g. cardiac magnetic resonance imaging (MRI). Despite their potential, these approaches often fall short especially when the size of labeled patient data is limited, a common challenge in the medical domain. Furthermore, only a… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  47. arXiv:2403.12473  [pdf, other

    cs.CV

    PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery

    Authors: Wendi Yang, Zihang Jiang, Shang Zhao, S. Kevin Zhou

    Abstract: With the recent advancements in single-image-based human mesh recovery, there is a growing interest in enhancing its performance in certain extreme scenarios, such as occlusion, while maintaining overall model accuracy. Although obtaining accurately annotated 3D human poses under occlusion is challenging, there is still a wealth of rich and precise 2D pose annotations that can be leveraged. Howeve… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  48. arXiv:2403.12457  [pdf, other

    cs.CV

    Privacy-Preserving Face Recognition Using Trainable Feature Subtraction

    Authors: Yuxi Mi, Zhizhou Zhong, Yuge Huang, Jiazhen Ji, Jianqing Xu, Jun Wang, Shaoming Wang, Shouhong Ding, Shuigeng Zhou

    Abstract: The widespread adoption of face recognition has led to increasing privacy concerns, as unauthorized access to face images can expose sensitive personal information. This paper explores face image protection against viewing and recovery attacks. Inspired by image compression, we propose creating a visually uninformative face image through feature subtraction between an original face and its model-p… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  49. arXiv:2403.12019  [pdf, other

    cs.CV

    LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

    Authors: Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy

    Abstract: The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harn… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: project webpage: https://nirvanalan.github.io/projects/ln3diff/

  50. arXiv:2403.11189  [pdf, other

    cs.CV

    Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes

    Authors: Kun Xia, Le Wang, Sanping Zhou, Gang Hua, Wei Tang

    Abstract: The crux of semi-supervised temporal action localization (SS-TAL) lies in excavating valuable information from abundant unlabeled videos. However, current approaches predominantly focus on building models that are robust to the error-prone target class (i.e, the predicted class with the highest confidence) while ignoring informative semantics within non-target classes. This paper approaches SS-TAL… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.