Skip to main content

Showing 1–50 of 53 results for author: Poggi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  2. arXiv:2402.13255  [pdf, other

    cs.CV cs.RO

    How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey

    Authors: Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi

    Abstract: Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Spla… ▽ More

    Submitted 11 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  3. arXiv:2401.14401  [pdf, other

    cs.CV

    Range-Agnostic Multi-View Depth Estimation With Keyframe Selection

    Authors: Andrea Conti, Matteo Poggi, Valerio Cambareri, Stefano Mattoccia

    Abstract: Methods for 3D reconstruction from posed frames require prior knowledge about the scene metric range, usually to recover matching cues along the epipolar lines and narrow the search range. However, such prior might not be directly available or estimated inaccurately in real scenarios -- e.g., outdoor 3D reconstruction from video sequences -- therefore heavily hampering performance. In this paper,… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 3DV 2024 Project Page https://andreaconti.github.io/projects/range_agnostic_multi_view_depth GitHub Page https://github.com/andreaconti/ramdepth.git

  4. arXiv:2312.09254  [pdf, other

    cs.CV

    Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization

    Authors: Luca Bartolomei, Matteo Poggi, Andrea Conti, Fabio Tosi, Stefano Mattoccia

    Abstract: This paper proposes a new framework for depth completion robust against domain-shifting issues. It exploits the generalization capability of modern stereo networks to face depth completion, by processing fictitious stereo pairs obtained through a virtual pattern projection paradigm. Any stereo network or traditional stereo matcher can be seamlessly plugged into our framework, allowing for the depl… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 3DV 2024. Code: https://github.com/bartn8/vppdc - Project page: https://vppdc.github.io/

  5. arXiv:2309.16019  [pdf, other

    cs.CV

    GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for Indoor Scenes

    Authors: Chaoqiang Zhao, Matteo Poggi, Fabio Tosi, Lei Zhou, Qiyu Sun, Yang Tang, Stefano Mattoccia

    Abstract: This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture. We ease the learning process by obtaining coarse camera poses from monocular sequences through multi-view geometry to deal with the former. However, we found that limited by the scale ambiguity across different scenes in the training dataset, a n… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Code: https://github.com/zxcqlf/GasMono

  6. arXiv:2309.12315  [pdf, other

    cs.CV

    Active Stereo Without Pattern Projector

    Authors: Luca Bartolomei, Matteo Poggi, Fabio Tosi, Andrea Conti, Stefano Mattoccia

    Abstract: This paper proposes a novel framework integrating the principles of active stereo in standard passive camera systems without a physical pattern projector. We virtually project a pattern over the left and right images according to the sparse measurements obtained from a depth sensor. Any such devices can be seamlessly plugged into our framework, allowing for the deployment of a virtual active stere… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Code: https://github.com/bartn8/vppstereo - Project page: https://vppstereo.github.io

  7. arXiv:2309.02436  [pdf, other

    cs.CV cs.RO

    GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction

    Authors: Youmin Zhang, Fabio Tosi, Stefano Mattoccia, Matteo Poggi

    Abstract: Neural implicit representations have recently demonstrated compelling results on dense Simultaneous Localization And Mapping (SLAM) but suffer from the accumulation of errors in camera tracking and distortion in the reconstruction. Purposely, we present GO-SLAM, a deep-learning-based dense visual SLAM framework globally optimizing poses and 3D reconstruction in real-time. Robust pose estimation is… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Code: https://github.com/youmi-zym/GO-SLAM - Project Page: https://youmi-zym.github.io/projects/GO-SLAM/

  8. arXiv:2308.14108  [pdf, other

    cs.CV cs.AI cs.LG

    Depth self-supervision for single image novel view synthesis

    Authors: Giovanni Minelli, Matteo Poggi, Samuele Salti

    Abstract: In this paper, we tackle the problem of generating a novel image from an arbitrary viewpoint given a single frame as input. While existing methods operating in this setup aim at predicting the target view depth map to guide the synthesis, without explicit supervision over such a task, we jointly optimize our framework for both novel view synthesis and depth estimation to unleash the synergy betwee… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  9. arXiv:2307.15063  [pdf, other

    cs.CV

    To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation

    Authors: Marc Botet Colomer, Pier Luigi Dovesi, Theodoros Panagiotakopoulos, Joao Frederico Carvalho, Linus Härenstam-Nielsen, Hossein Azizpour, Hedvig Kjellström, Daniel Cremers, Matteo Poggi

    Abstract: The goal of Online Domain Adaptation for semantic segmentation is to handle unforeseeable domain changes that occur during deployment, like sudden weather events. However, the high computational costs associated with brute-force adaptation make this paradigm unfeasible for real-world applications. In this paper we propose HAMLET, a Hardware-Aware Modular Least Expensive Training framework for real… ▽ More

    Submitted 7 August, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: ICCV 2023. The first two authors contributed equally. Project page: https://marcbotet.github.io/hamlet-web/

  10. arXiv:2307.15052  [pdf, other

    cs.CV

    Learning Depth Estimation for Transparent and Mirror Surfaces

    Authors: Alex Costanzino, Pierluigi Zama Ramirez, Matteo Poggi, Fabio Tosi, Stefano Mattoccia, Luigi Di Stefano

    Abstract: Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks, without requiring any ground-truth annotation. We unveil how to obtain reliable pseudo labels by in-painting ToM objects in images and processing them wi… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted at ICCV 2023. Project Page: https://cvlab-unibo.github.io/Depth4ToM

  11. arXiv:2304.07051  [pdf, other

    cs.CV cs.AI

    The Second Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, C. Stella Qian, Michaela Trescakova, Chris Russell, Simon Hadfield, Erich W. Graf, Wendy J. Adams, Andrew J. Schofield, James Elder, Richard Bowden, Ali Anwar, Hao Chen, Xiaozhi Chen, Kai Cheng, Yuchao Dai, Huynh Thai Hoa, Sadat Hossain, Jianmian Huang, Mohan Jing, Bo Li, Chao Li, Baojun Li, Zhiwen Liu, Stefano Mattoccia, Siegfried Mercelis , et al. (18 additional authors not shown)

    Abstract: This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes… ▽ More

    Submitted 26 April, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: Published at CVPRW2023

  12. arXiv:2303.17603  [pdf, other

    cs.CV cs.RO

    NeRF-Supervised Deep Stereo

    Authors: Fabio Tosi, Alessio Tonioni, Daniele De Gregorio, Matteo Poggi

    Abstract: We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth. By leveraging state-of-the-art neural rendering solutions, we generate stereo training data from image sequences collected with a single handheld camera. On top of them, a NeRF-supervised training procedure is carried out, from which we exploit rendered stereo triplets to compensate for occl… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Project page: https://nerfstereo.github.io/ Code: https://github.com/fabiotosi92/NeRF-Supervised-Deep-Stereo

  13. arXiv:2303.09307  [pdf, other

    cs.CV

    Depth Super-Resolution from Explicit and Implicit High-Frequency Features

    Authors: Xin Qiao, Chenyang Ge, Youmin Zhang, Yanhui Zhou, Fabio Tosi, Matteo Poggi, Stefano Mattoccia

    Abstract: We propose a novel multi-stage depth super-resolution network, which progressively reconstructs high-resolution depth maps from explicit and implicit high-frequency features. The former are extracted by an efficient transformer processing both local and global contexts, while the latter are obtained by projecting color images into the frequency domain. Both are combined together with depth feature… ▽ More

    Submitted 30 May, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

  14. arXiv:2302.02450  [pdf, other

    cs.LG

    Regularization and Optimization in Model-Based Clustering

    Authors: Raphael Araujo Sampaio, Joaquim Dias Garcia, Marcus Poggi, Thibaut Vidal

    Abstract: Due to their conceptual simplicity, k-means algorithm variants have been extensively used for unsupervised cluster analysis. However, one main shortcoming of these algorithms is that they essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution. In comparison, general Gaussian Mixture Models (GMMs) can fit richer structures but require estima… ▽ More

    Submitted 5 February, 2024; v1 submitted 5 February, 2023; originally announced February 2023.

  15. arXiv:2301.08245  [pdf, other

    cs.CV

    Booster: a Benchmark for Depth from Images of Specular and Transparent Surfaces

    Authors: Pierluigi Zama Ramirez, Alex Costanzino, Fabio Tosi, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

    Abstract: Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featu… ▽ More

    Submitted 30 January, 2024; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: Extension of the paper "Open Challenges in Deep Stereo: the Booster Dataset" presented at CVPR 2022. Accepted at TPAMI

  16. arXiv:2212.10806  [pdf, other

    cs.CV

    MaskingDepth: Masked Consistency Regularization for Semi-supervised Monocular Depth Estimation

    Authors: Jongbeom Baek, Gyeongnyeon Kim, Seonghoon Park, Honggyu An, Matteo Poggi, Seungryong Kim

    Abstract: We propose MaskingDepth, a novel semi-supervised learning framework for monocular depth estimation to mitigate the reliance on large ground-truth depth quantities. MaskingDepth is designed to enforce consistency between the strongly-augmented unlabeled data and the pseudo-labels derived from weakly-augmented unlabeled data, which enables learning depth without supervision. In this framework, a nov… ▽ More

    Submitted 23 March, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: Project page: https://ku-cvlab.github.io/MaskingDepth/

  17. arXiv:2212.00790  [pdf, other

    cs.CV

    Sparsity Agnostic Depth Completion

    Authors: Andrea Conti, Matteo Poggi, Stefano Mattoccia

    Abstract: We present a novel depth completion approach agnostic to the sparsity of depth points, that is very likely to vary in many practical applications. State-of-the-art approaches yield accurate results only when processing a specific density and distribution of input points, i.e. the one observed during training, narrowing their deployment in real use cases. On the contrary, our solution is robust to… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: This paper has been accepted for publication at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 2023

  18. arXiv:2211.13762  [pdf, other

    cs.CV

    ScanNeRF: a Scalable Benchmark for Neural Radiance Fields

    Authors: Luca De Luigi, Damiano Bolognini, Federico Domeniconi, Daniele De Gregorio, Matteo Poggi, Luigi Di Stefano

    Abstract: In this paper, we propose the first-ever real benchmark thought for evaluating Neural Radiance Fields (NeRFs) and, in general, Neural Rendering (NR) frameworks. We design and implement an effective pipeline for scanning real objects in quantity and effortlessly. Our scan station is built with less than 500$ hardware budget and can collect roughly 4000 images of a scanned object in just 5 minutes.… ▽ More

    Submitted 20 December, 2022; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: WACV 2023. The first three authors contributed equally. Project page: https://eyecan-ai.github.io/scannerf/

  19. arXiv:2211.13755  [pdf, other

    cs.CV

    TemporalStereo: Efficient Spatial-Temporal Stereo Matching Network

    Authors: Youmin Zhang, Matteo Poggi, Stefano Mattoccia

    Abstract: We present TemporalStereo, a coarse-to-fine stereo matching network that is highly efficient, and able to effectively exploit the past geometry and context information to boost matching accuracy. Our network leverages sparse cost volume and proves to be effective when a single stereo pair is given. However, its peculiar ability to use spatio-temporal information across stereo sequences allows Temp… ▽ More

    Submitted 3 August, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: Accepted by IROS 2023, Project page: https://youmi-zym.github.io/projects/TemporalStereo/

  20. arXiv:2211.12174  [pdf, other

    cs.CV

    The Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, C. Stella Qian, Chris Russell, Simon Hadfield, Erich Graf, Wendy Adams, Andrew J. Schofield, James Elder, Richard Bowden, Heng Cong, Stefano Mattoccia, Matteo Poggi, Zeeshan Khan Suri, Yang Tang, Fabio Tosi, Hao Wang, Youmin Zhang, Yusheng Zhang, Chaoqiang Zhao

    Abstract: This paper summarizes the results of the first Monocular Depth Estimation Challenge (MDEC) organized at WACV2023. This challenge evaluated the progress of self-supervised monocular depth estimation on the challenging SYNS-Patches dataset. The challenge was organized on CodaLab and received submissions from 4 valid teams. Participants were provided a devkit containing updated reference implementati… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: WACV-Workshops 2023

  21. arXiv:2210.11467  [pdf, other

    cs.CV

    Multi-View Guided Multi-View Stereo

    Authors: Matteo Poggi, Andrea Conti, Stefano Mattoccia

    Abstract: This paper introduces a novel deep framework for dense 3D reconstruction from multiple image frames, leveraging a sparse set of depth measurements gathered jointly with image acquisition. Given a deep multi-view stereo network, our framework uses sparse depth hints to guide the neural network by modulating the plane-sweep cost volume built during the forward step, enabling us to infer constantly m… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: IROS 2022. First two authors contributed equally. Project page: https://github.com/andreaconti/multi-view-guided-multi-view-stereo

  22. arXiv:2210.03118  [pdf, other

    cs.CV

    Unsupervised confidence for LiDAR depth maps and applications

    Authors: Andrea Conti, Matteo Poggi, Filippo Aleotti, Stefano Mattoccia

    Abstract: Depth perception is pivotal in many fields, such as robotics and autonomous driving, to name a few. Consequently, depth sensors such as LiDARs rapidly spread in many applications. The 3D point clouds generated by these sensors must often be coupled with an RGB camera to understand the framed scene semantically. Usually, the former is projected over the camera image plane, leading to a sparse depth… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: IROS 2022. Code available at https://github.com/andreaconti/lidar-confidence

  23. arXiv:2209.00648  [pdf, other

    cs.CV

    Cross-Spectral Neural Radiance Fields

    Authors: Matteo Poggi, Pierluigi Zama Ramirez, Fabio Tosi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

    Abstract: We propose X-NeRF, a novel method to learn a Cross-Spectral scene representation given images captured from cameras with different light spectrum sensitivity, based on the Neural Radiance Fields formulation. X-NeRF optimizes camera poses across spectra during training and exploits Normalized Cross-Device Coordinates (NXDC) to render images of different modalities from arbitrary viewpoints, which a… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: 3DV 2022. Project page: https://cvlab-unibo.github.io/xnerf-web/

  24. MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer

    Authors: Chaoqiang Zhao, Youmin Zhang, Matteo Poggi, Fabio Tosi, Xianda Guo, Zheng Zhu, Guan Huang, Yang Tang, Stefano Mattoccia

    Abstract: Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth labels for training. Convolutional neural networks (CNNs) have recently achieved great success in this task. However, their limited receptive field constrains existing network architectures to reason only locally, dampening the effectiveness of the self-supervised paradigm. In the light… ▽ More

    Submitted 6 August, 2022; originally announced August 2022.

    Comments: Accepted by 3DV 2022

  25. arXiv:2207.10667  [pdf, other

    cs.CV

    Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions

    Authors: Theodoros Panagiotakopoulos, Pier Luigi Dovesi, Linus Härenstam-Nielsen, Matteo Poggi

    Abstract: Unsupervised Domain Adaptation (UDA) aims at reducing the domain gap between training and testing data and is, in most cases, carried out in offline manner. However, domain changes may occur continuously and unpredictably during deployment (e.g. sudden weather changes). In such conditions, deep neural networks witness dramatic drops in accuracy and offline adaptation may not be enough to contrast… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. Project page: https://theo2021.github.io/onda-web/

  26. arXiv:2206.07047  [pdf, other

    cs.CV

    RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation

    Authors: Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

    Abstract: We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels in the form of disparity… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: CVPR 2022, New Orleans. Project page: https://cvlab-unibo.github.io/rgb-ms-web/

  27. arXiv:2206.04671  [pdf, other

    cs.CV

    Open Challenges in Deep Stereo: the Booster Dataset

    Authors: Pierluigi Zama Ramirez, Fabio Tosi, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

    Abstract: We present a novel high-resolution and challenging stereo dataset framing indoor scenes annotated with dense and accurate ground-truth disparities. Peculiar to our dataset is the presence of several specular and transparent surfaces, i.e. the main causes of failures for state-of-the-art stereo networks. Our acquisition pipeline leverages a novel deep space-time stereo framework which allows for ea… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: CVPR 2022, New Orleans. Project page: https://cvlab-unibo.github.io/booster-web/

  28. arXiv:2206.02714  [pdf, other

    cs.CV cs.AI cs.LG

    FuSS: Fusing Superpixels for Improved Segmentation Consistency

    Authors: Ian Nunes, Matheus B. Pereira, Hugo Oliveira, Jefersson A. Dos Santos, Marcus Poggi

    Abstract: In this work, we propose two different approaches to improve the semantic consistency of Open Set Semantic Segmentation. First, we propose a method called OpenGMM that extends the OpenPCS framework using a Gaussian Mixture of Models to model the distribution of pixels for each class in a multimodal manner. The second approach is a post-processing which uses superpixels to enforce highly homogeneou… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: submitted to IEEEACCESS. 19 pages

  29. arXiv:2204.01693  [pdf, other

    cs.CV

    Monitoring social distancing with single image depth estimation

    Authors: Alessio Mingozzi, Andrea Conti, Filippo Aleotti, Matteo Poggi, Stefano Mattoccia

    Abstract: The recent pandemic emergency raised many challenges regarding the countermeasures aimed at containing the virus spread, and constraining the minimum distance between people resulted in one of the most effective strategies. Thus, the implementation of autonomous systems capable of monitoring the so-called social distance gained much interest. In this paper, we aim to address this task leveraging a… ▽ More

    Submitted 29 April, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted for pubblication on IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI)

  30. arXiv:2203.01368  [pdf, other

    cs.CV cs.AI cs.LG

    Conditional Reconstruction for Open-set Semantic Segmentation

    Authors: Ian Nunes, Matheus B. Pereira, Hugo Oliveira, Jefersson A. dos Santos, Marcus Poggi

    Abstract: Open set segmentation is a relatively new and unexploredtask, with just a handful of methods proposed to model suchtasks.We propose a novel method called CoReSeg thattackles the issue using class conditional reconstruction ofthe input images according to their pixelwise mask. Ourmethod conditions each input pixel to all known classes,expecting higher errors for pixels of unknown classes. Itwas obs… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  31. arXiv:2110.15367  [pdf, other

    cs.CV

    Neural Disparity Refinement for Arbitrary Resolution Stereo

    Authors: Filippo Aleotti, Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Luigi Di Stefano

    Abstract: We introduce a novel architecture for neural disparity refinement aimed at facilitating deployment of 3D computer vision on cheap and widespread consumer devices, such as mobile phones. Our approach relies on a continuous formulation that enables to estimate a refined disparity map at any arbitrary output resolution. Thereby, it can handle effectively the unbalanced camera setup typical of nowaday… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: 3DV 2021 Oral paper. Project page: https://cvlab-unibo.github.io/neural-disparity-refinement-web

  32. arXiv:2109.15321  [pdf, other

    cs.CV

    Sensor-Guided Optical Flow

    Authors: Matteo Poggi, Filippo Aleotti, Stefano Mattoccia

    Abstract: This paper proposes a framework to guide an optical flow network with external cues to achieve superior accuracy either on known or unseen domains. Given the availability of sparse yet accurate optical flow hints from an external source, these are injected to modulate the correlation scores computed by a state-of-the-art optical flow network and guide it towards more accurate predictions. Although… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

    Comments: ICCV 2021

  33. arXiv:2104.03965  [pdf, other

    cs.CV

    Learning optical flow from still images

    Authors: Filippo Aleotti, Matteo Poggi, Stefano Mattoccia

    Abstract: This paper deals with the scarcity of data for training optical flow networks, highlighting the limitations of existing sources such as labeled synthetic datasets or unlabeled real videos. Specifically, we introduce a framework to generate accurate ground-truth optical flow annotations quickly and in large amounts from any readily available single real picture. Given an image, we use an off-the-sh… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: CVPR 2021. Project page with supplementary and code: https://mattpoggi.github.io/projects/cvpr2021aleotti/

  34. arXiv:2101.00431  [pdf, other

    cs.CV

    On the confidence of stereo matching in a deep-learning era: a quantitative evaluation

    Authors: Matteo Poggi, Seungryong Kim, Fabio Tosi, Sunok Kim, Filippo Aleotti, Dongbo Min, Kwanghoon Sohn, Stefano Mattoccia

    Abstract: Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images. Alongside with the development of more accurate algorithms, the research community focused on finding good strategies to estimate the reliability, i.e. the confidence, of estimated disparity maps. This information proves to b… ▽ More

    Submitted 30 March, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: TPAMI final version

  35. arXiv:2010.07347  [pdf, other

    cs.CV

    Matching-space Stereo Networks for Cross-domain Generalization

    Authors: Changjiang Cai, Matteo Poggi, Stefano Mattoccia, Philippos Mordohai

    Abstract: End-to-end deep networks represent the state of the art for stereo matching. While excelling on images framing environments similar to the training set, major drops in accuracy occur in unseen domains (e.g., when moving from synthetic to real scenes). In this paper we introduce a novel family of architectures, namely Matching-Space Networks (MS-Nets), with improved generalization properties. By re… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 14 pages, 8 figures, International Conference on 3D Vision (3DV'2020), Github code at https://github.com/ccj5351/MS-Nets

  36. arXiv:2008.07130  [pdf, other

    cs.CV

    Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation

    Authors: Filippo Aleotti, Fabio Tosi, Li Zhang, Matteo Poggi, Stefano Mattoccia

    Abstract: In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches. This fact occurs for depth estimation based on either monocular or stereo, with the latter often providing a valid source of self-supervision for the former. In contrast, to soften typical stereo artefacts, we propose a novel self-supervised paradigm reversing the link between th… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  37. arXiv:2008.06447  [pdf, other

    cs.CV

    Self-adapting confidence estimation for stereo

    Authors: Matteo Poggi, Filippo Aleotti, Fabio Tosi, Giulio Zaccaroni, Stefano Mattoccia

    Abstract: Estimating the confidence of disparity maps inferred by a stereo algorithm has become a very relevant task in the years, due to the increasing number of applications leveraging such cue. Although self-supervised learning has recently spread across many computer vision tasks, it has been barely considered in the field of confidence estimation. In this paper, we propose a flexible and lightweight so… ▽ More

    Submitted 24 November, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: ECCV 2020 (errata corrige: eq.6, k domain)

  38. arXiv:2007.05233  [pdf, other

    cs.CV cs.LG eess.IV

    Continual Adaptation for Deep Stereo

    Authors: Matteo Poggi, Alessio Tonioni, Fabio Tosi, Stefano Mattoccia, Luigi Di Stefano

    Abstract: Depth estimation from stereo images is carried out with unmatched results by convolutional neural networks trained end-to-end to regress dense disparities. Like for most tasks, this is possible if large amounts of labelled samples are available for training, possibly covering the whole data distribution encountered at deployment time. Being such an assumption systematically unmet in real applicati… ▽ More

    Submitted 3 May, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

    Comments: Extended version of CVPR 2019 paper "Real-time self-adaptive deep stereo" - Accepted to TPAMI

  39. arXiv:2006.05724  [pdf, other

    cs.CV cs.GR

    Real-time single image depth perception in the wild with handheld devices

    Authors: Filippo Aleotti, Giulio Zaccaroni, Luca Bartolomei, Matteo Poggi, Fabio Tosi, Stefano Mattoccia

    Abstract: Depth perception is paramount to tackle real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image represents the most versatile solution, since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit its practical deployment: i) the low reliability when deployed in-the-wild and ii) th… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Comments: 11 pages, 9 figures

  40. arXiv:2005.06209  [pdf, other

    cs.CV

    On the uncertainty of self-supervised monocular depth estimation

    Authors: Matteo Poggi, Filippo Aleotti, Fabio Tosi, Stefano Mattoccia

    Abstract: Self-supervised paradigms for monocular depth estimation are very appealing since they do not require ground truth annotations at all. Despite the astonishing results yielded by such methodologies, learning to reason about the uncertainty of the estimated depth maps is of paramount importance for practical applications, yet uncharted in the literature. Purposely, we explore for the first time how… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

    Comments: CVPR 2020. Code will be available https://github.com/mattpoggi/mono-uncertainty

  41. arXiv:2004.08566  [pdf, other

    cs.CV

    On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey

    Authors: Matteo Poggi, Fabio Tosi, Konstantinos Batsos, Philippos Mordohai, Stefano Mattoccia

    Abstract: Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo m… ▽ More

    Submitted 31 March, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

    Comments: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial: "Learning-based depth estimation from stereo and monocular images: successes, limitations and future challenges" (https://sites.google.com/view/cvpr-2019-depth-from-image/home)

  42. arXiv:2003.14030  [pdf, other

    cs.CV cs.LG

    Distilled Semantics for Comprehensive Scene Understanding from Videos

    Authors: Fabio Tosi, Filippo Aleotti, Pierluigi Zama Ramirez, Matteo Poggi, Samuele Salti, Luigi Di Stefano, Stefano Mattoccia

    Abstract: Whole understanding of the surroundings is paramount to autonomous systems. Recent works have shown that deep neural networks can learn geometry (depth) and motion (optical flow) from a monocular video without any explicit supervision from ground truth annotations, particularly hard to source for these two tasks. In this paper, we take an additional step toward holistic scene understanding with mo… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: CVPR 2020. Code will be available at https://github.com/CVLAB-Unibo/omeganet

  43. arXiv:1911.10090  [pdf, other

    cs.CV cs.RO

    Learning End-To-End Scene Flow by Distilling Single Tasks Knowledge

    Authors: Filippo Aleotti, Matteo Poggi, Fabio Tosi, Stefano Mattoccia

    Abstract: Scene flow is a challenging task aimed at jointly estimating the 3D structure and motion of the sensed environment. Although deep learning solutions achieve outstanding performance in terms of accuracy, these approaches divide the whole problem into standalone tasks (stereo and optical flow) addressing them with independent networks. Such a strategy dramatically increases the complexity of the tra… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

    Comments: Accepted to AAAI 2020. Project page: https://vision.disi.unibo.it/~faleotti/dwarf.html

  44. arXiv:1910.00541  [pdf, other

    cs.CV cs.RO

    Real-Time Semantic Stereo Matching

    Authors: Pier Luigi Dovesi, Matteo Poggi, Lorenzo Andraghetti, Miquel Martí, Hedvig Kjellström, Alessandro Pieropan, Stefano Mattoccia

    Abstract: Scene understanding is paramount in robotics, self-navigation, augmented reality, and many other fields. To fully accomplish this task, an autonomous agent has to infer the 3D structure of the sensed scene (to know where it looks at) and its content (to know what it sees). To tackle the two tasks, deep neural networks trained to infer semantic segmentation and depth from stereo images are often th… ▽ More

    Submitted 24 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: 8 pages, 3 figures. Accepted to ICRA 2020

  45. arXiv:1909.03943  [pdf, other

    cs.CV

    Unsupervised Domain Adaptation for Depth Prediction from Images

    Authors: Alessio Tonioni, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano

    Abstract: State-of-the-art approaches to infer dense depth measurements from images rely on CNNs trained end-to-end on a vast amount of data. However, these approaches suffer a drastic drop in accuracy when dealing with environments much different in appearance and/or context from those observed at training time. This domain shift issue is usually addressed by fine-tuning on smaller sets of images from the… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: 14 pages, 7 pages. Accepted to TPAMI

  46. arXiv:1908.03127  [pdf, other

    cs.CV

    Enhancing self-supervised monocular depth estimation with traditional visual odometry

    Authors: Lorenzo Andraghetti, Panteleimon Myriokefalitakis, Pier Luigi Dovesi, Belen Luque, Matteo Poggi, Alessandro Pieropan, Stefano Mattoccia

    Abstract: Estimating depth from a single image represents an attractive alternative to more traditional approaches leveraging multiple cameras. In this field, deep learning yielded outstanding results at the cost of needing large amounts of data labeled with precise depth measurements for training. An issue softened by self-supervised approaches leveraging monocular sequences or stereo pairs in place of exp… ▽ More

    Submitted 12 August, 2019; v1 submitted 8 August, 2019; originally announced August 2019.

    Comments: Accepted to 3DV 2019

  47. arXiv:1905.10107  [pdf, other

    cs.CV cs.LG

    Guided Stereo Matching

    Authors: Matteo Poggi, Davide Pallotti, Fabio Tosi, Stefano Mattoccia

    Abstract: Stereo is a prominent technique to infer dense depth maps from images, and deep learning further pushed forward the state-of-the-art, making end-to-end architectures unrivaled when enough data is available for training. However, deep networks suffer from significant drops in accuracy when dealing with new environments. Therefore, in this paper, we introduce Guided Stereo Matching, a novel paradigm… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

    Comments: CVPR 2019

  48. arXiv:1904.04144  [pdf, other

    cs.CV

    Learning monocular depth estimation infusing traditional stereo knowledge

    Authors: Fabio Tosi, Filippo Aleotti, Matteo Poggi, Stefano Mattoccia

    Abstract: Depth estimation from a single image represents a fascinating, yet challenging problem with countless applications. Recent works proved that this task could be learned without direct supervision from ground truth labels leveraging image synthesis on sequences or stereo pairs. Focusing on this second case, in this paper we leverage stereo matching in order to improve monocular depth estimation. To… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: accepted at CVPR 2019. Code available at https://github.com/fabiotosi92/monoResMatch-Tensorflow

  49. arXiv:1812.08277  [pdf, other

    eess.SP cs.DS math.OC

    Two-Dimensional Phase Unwrapping via Balanced Spanning Forests

    Authors: Ian Herszterg, Marcus Poggi, Thibaut Vidal

    Abstract: Phase unwrapping is the process of recovering a continuous phase signal from an original signal wrapped in the ($-π$,$π$] interval. It is a critical step of coherent signal processing, with applications such as synthetic aperture radar, acoustic imaging, magnetic resonance, X-ray crystallography, and seismic processing. In the field of computational optics, this problem is classically treated as a… ▽ More

    Submitted 30 May, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

  50. arXiv:1810.05424  [pdf, other

    cs.CV

    Real-time self-adaptive deep stereo

    Authors: Alessio Tonioni, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano

    Abstract: Deep convolutional neural networks trained end-to-end are the state-of-the-art methods to regress dense disparity maps from stereo pairs. These models, however, suffer from a notable decrease in accuracy when exposed to scenarios significantly different from the training set, e.g., real vs synthetic images, etc.). We argue that it is extremely unlikely to gather enough samples to achieve effective… ▽ More

    Submitted 5 April, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

    Comments: Accepted at CVPR2019 as oral presentation. Code Available https://github.com/CVLAB-Unibo/Real-time-self-adaptive-deep-stereo