Search | arXiv e-print repository

The Third Monocular Depth Estimation Challenge

Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 submissions outperforming the baseline on the test set: 10 among them submitted a report describing their approach, highlighting a diffused use of foundational models such as Depth Anything at the core of their method. The challenge winners drastically improved 3D F-Score performance, from 17.51% to 23.72%. △ Less

Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: To appear in CVPRW2024

arXiv:2306.11977 [pdf]

Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Authors: Zimeng Li, Sa Xiao, Cheng Wang, Haidong Li, Xiuchao Zhao, Caohui Duan, Qian Zhou, Qiuchen Rao, Yuan Fang, Junshuai Xie, Lei Shi, Fumin Guo, Chaohui Ye, Xin Zhou

Abstract: Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) direc… ▽ More Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) directly apply square convolution to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. In this work, we propose an encoding enhanced (EN2) complex CNN for highly undersampled pulmonary MRI reconstruction. EN2 employs convolution along either the frequency or phase-encoding direction, resembling the mechanisms of k-space sampling, to maximize the utilization of the encoding correlation and integrity within a row or column of k-space. We also employ complex convolution to learn rich representations from the complex k-space data. In addition, we develop a feature-strengthened modularized unit to further boost the reconstruction performance. Experiments demonstrate that our approach can accurately reconstruct hyperpolarized 129Xe and 1H lung MRI from 6-fold undersampled k-space data and provide lung function measurements with minimal biases compared with fully-sampled image. These results demonstrate the effectiveness of the proposed algorithmic components and indicate that the proposed approach could be used for accelerated pulmonary MRI in research and clinical lung disease patient care. △ Less

Submitted 13 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2304.02089 [pdf, other]

doi 10.1145/3511808.3557351

Hierarchically Fusing Long and Short-Term User Interests for Click-Through Rate Prediction in Product Search

Authors: Qijie Shen, Hong Wen, Jing Zhang, Qi Rao

Abstract: Estimating Click-Through Rate (CTR) is a vital yet challenging task in personalized product search. However, existing CTR methods still struggle in the product search settings due to the following three challenges including how to more effectively extract users' short-term interests with respect to multiple aspects, how to extract and fuse users' long-term interest with short-term interests, how t… ▽ More Estimating Click-Through Rate (CTR) is a vital yet challenging task in personalized product search. However, existing CTR methods still struggle in the product search settings due to the following three challenges including how to more effectively extract users' short-term interests with respect to multiple aspects, how to extract and fuse users' long-term interest with short-term interests, how to address the entangling characteristic of long and short-term interests. To resolve these challenges, in this paper, we propose a new approach named Hierarchical Interests Fusing Network (HIFN), which consists of four basic modules namely Short-term Interests Extractor (SIE), Long-term Interests Extractor (LIE), Interests Fusion Module (IFM) and Interests Disentanglement Module (IDM). Specifically, SIE is proposed to extract user's short-term interests by integrating three fundamental interests encoders within it namely query-dependent, target-dependent and causal-dependent interest encoder, respectively, followed by delivering the resultant representation to the module LIE, where it can effectively capture user long-term interests by devising an attention mechanism with respect to the short-term interests from SIE module. In IFM, the achieved long and short-term interests are further fused in an adaptive manner, followed by concatenating it with original raw context features for the final prediction result. Last but not least, considering the entangling characteristic of long and short-term interests, IDM further devises a self-supervised framework to disentangle long and short-term interests. Extensive offline and online evaluations on a real-world e-commerce platform demonstrate the superiority of HIFN over state-of-the-art methods. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: accpeted by CIKM'22 as a Full Paper

arXiv:2105.10697 [pdf, other]

ADNet: Attention-guided Deformable Convolutional Network for High Dynamic Range Imaging

Authors: Zhen Liu, Wenjie Lin, Xinpeng Li, Qing Rao, Ting Jiang, Mingyan Han, Haoqiang Fan, Jian Sun, Shuaicheng Liu

Abstract: In this paper, we present an attention-guided deformable convolutional network for hand-held multi-frame high dynamic range (HDR) imaging, namely ADNet. This problem comprises two intractable challenges of how to handle saturation and noise properly and how to tackle misalignments caused by object motion or camera jittering. To address the former, we adopt a spatial attention module to adaptively… ▽ More In this paper, we present an attention-guided deformable convolutional network for hand-held multi-frame high dynamic range (HDR) imaging, namely ADNet. This problem comprises two intractable challenges of how to handle saturation and noise properly and how to tackle misalignments caused by object motion or camera jittering. To address the former, we adopt a spatial attention module to adaptively select the most appropriate regions of various exposure low dynamic range (LDR) images for fusion. For the latter one, we propose to align the gamma-corrected images in the feature-level with a Pyramid, Cascading and Deformable (PCD) alignment module. The proposed ADNet shows state-of-the-art performance compared with previous methods, achieving a PSNR-$l$ of 39.4471 and a PSNR-$μ$ of 37.6359 in NTIRE 2021 Multi-Frame HDR Challenge. △ Less

Submitted 22 May, 2021; originally announced May 2021.

Comments: Accepted by CVPRW 2021

arXiv:1910.14453 [pdf, other]

LiDAR-Flow: Dense Scene Flow Estimation from Sparse LiDAR and Stereo Images

Authors: Ramy Battrawy, René Schuster, Oliver Wasenmüller, Qing Rao, Didier Stricker

Abstract: We propose a new approach called LiDAR-Flow to robustly estimate a dense scene flow by fusing a sparse LiDAR with stereo images. We take the advantage of the high accuracy of LiDAR to resolve the lack of information in some regions of stereo images due to textureless objects, shadows, ill-conditioned light environment and many more. Additionally, this fusion can overcome the difficulty of matching… ▽ More We propose a new approach called LiDAR-Flow to robustly estimate a dense scene flow by fusing a sparse LiDAR with stereo images. We take the advantage of the high accuracy of LiDAR to resolve the lack of information in some regions of stereo images due to textureless objects, shadows, ill-conditioned light environment and many more. Additionally, this fusion can overcome the difficulty of matching unstructured 3D points between LiDAR-only scans. Our LiDAR-Flow approach consists of three main steps; each of them exploits LiDAR measurements. First, we build strong seeds from LiDAR to enhance the robustness of matches between stereo images. The imagery part seeks the motion matches and increases the density of scene flow estimation. Then, a consistency check employs LiDAR seeds to remove the possible mismatches. Finally, LiDAR measurements constraint the edge-preserving interpolation method to fill the remaining gaps. In our evaluation we investigate the individual processing steps of our LiDAR-Flow approach and demonstrate the superior performance compared to image-only approach. △ Less

Submitted 13 December, 2019; v1 submitted 31 October, 2019; originally announced October 2019.

Comments: Accepted in IROS19

arXiv:1707.07795 [pdf, ps, other]

Anti-Forensics of Camera Identification and the Triangle Test by Improved Fingerprint-Copy Attack

Authors: Haodong Li, Weiqi Luo, Quanquan Rao, Jiwu Huang

Abstract: The fingerprint-copy attack aims to confuse camera identification based on sensor pattern noise. However, the triangle test shows that the forged images undergone fingerprint-copy attack would share a non-PRNU (Photo-response nonuniformity) component with every stolen image, and thus can detect fingerprint-copy attack. In this paper, we propose an improved fingerprint-copy attack scheme. Our main… ▽ More The fingerprint-copy attack aims to confuse camera identification based on sensor pattern noise. However, the triangle test shows that the forged images undergone fingerprint-copy attack would share a non-PRNU (Photo-response nonuniformity) component with every stolen image, and thus can detect fingerprint-copy attack. In this paper, we propose an improved fingerprint-copy attack scheme. Our main idea is to superimpose the estimated fingerprint into the target image dispersedly, via employing a block-wise method and using the stolen images randomly and partly. We also develop a practical method to determine the strength of the superimposed fingerprint based on objective image quality. In such a way, the impact of non-PRNU component on the triangle test is reduced, and our improved fingerprint-copy attack is difficultly detected. The experiments evaluated on 2,900 images from 4 cameras show that our scheme can effectively fool camera identification, and significantly degrade the performance of the triangle test simultaneously. △ Less

Submitted 24 July, 2017; originally announced July 2017.

Showing 1–6 of 6 results for author: Rao, Q