-
The Third Monocular Depth Estimation Challenge
Authors:
Jaime Spencer,
Fabio Tosi,
Matteo Poggi,
Ripudaman Singh Arora,
Chris Russell,
Simon Hadfield,
Richard Bowden,
GuangYuan Zhou,
ZhengXin Li,
Qiang Rao,
YiPing Bao,
Xiao Liu,
Dohyeong Kim,
Jinseong Kim,
Myunghyun Kim,
Mykola Lavreniuk,
Rui Li,
Qing Mao,
Jiang Wu,
Yu Zhu,
Jinqiu Sun,
Yanning Zhang,
Suraj Patni,
Aradhye Agarwal,
Chetan Arora
, et al. (16 additional authors not shown)
Abstract:
This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su…
▽ More
This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 submissions outperforming the baseline on the test set: 10 among them submitted a report describing their approach, highlighting a diffused use of foundational models such as Depth Anything at the core of their method. The challenge winners drastically improved 3D F-Score performance, from 17.51% to 23.72%.
△ Less
Submitted 27 April, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Interactive Learning of Physical Object Properties Through Robot Manipulation and Database of Object Measurements
Authors:
Andrej Kruzliak,
Jiri Hartvich,
Shubhan P. Patni,
Lukas Rustler,
Jan Kristof Behrens,
Fares J. Abu-Dakka,
Krystian Mikolajczyk,
Ville Kyrki,
Matej Hoffmann
Abstract:
This work presents a framework for automatically extracting physical object properties, such as material composition, mass, volume, and stiffness, through robot manipulation and a database of object measurements. The framework involves exploratory action selection to maximize learning about objects on a table. A Bayesian network models conditional dependencies between object properties, incorporat…
▽ More
This work presents a framework for automatically extracting physical object properties, such as material composition, mass, volume, and stiffness, through robot manipulation and a database of object measurements. The framework involves exploratory action selection to maximize learning about objects on a table. A Bayesian network models conditional dependencies between object properties, incorporating prior probability distributions and uncertainty associated with measurement actions. The algorithm selects optimal exploratory actions based on expected information gain and updates object properties through Bayesian inference. Experimental evaluation demonstrates effective action selection compared to a baseline and correct termination of the experiments if there is nothing more to be learned. The algorithm proved to behave intelligently when presented with trick objects with material properties in conflict with their appearance. The robot pipeline integrates with a logging module and an online database of objects, containing over 24,000 measurements of 63 objects with different grippers. All code and data are publicly available, facilitating automatic digitization of objects and their physical properties through exploratory manipulations.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Authors:
Suraj Patni,
Aradhye Agarwal,
Chetan Arora
Abstract:
In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot…
▽ More
In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a diffusion backbone which is conditioned on ViT embeddings. Our proposed design establishes a new state-of-the-art (SOTA) for SIDE on NYUv2 dataset, achieving Abs Rel error of 0.059 (14% improvement) compared to 0.069 by the current SOTA (VPD). And on KITTI dataset, achieving Sq Rel error of 0.139 (2% improvement) compared to 0.142 by the current SOTA (GEDepth). For zero-shot transfer with a model trained on NYUv2, we report mean relative improvement of (20%, 23%, 81%, 25%) over NeWCRFs on (Sun-RGBD, iBims1, DIODE, HyperSim) datasets, compared to (16%, 18%, 45%, 9%) by ZoeDepth. The project page is available at https://ecodepth-iitd.github.io
△ Less
Submitted 17 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Online Elasticity Estimation and Material Sorting Using Standard Robot Grippers
Authors:
Shubhan P. Patni,
Pavel Stoudek,
Hynek Chlup,
Matej Hoffmann
Abstract:
Standard robot grippers are not designed for material recognition. We experimentally evaluated the accuracy with which material properties can be estimated through object compression by two standard parallel jaw grippers and a force/torque sensor mounted at the robot wrist, with a professional biaxial compression device used as reference. Gripper effort versus position curves were obtained and tra…
▽ More
Standard robot grippers are not designed for material recognition. We experimentally evaluated the accuracy with which material properties can be estimated through object compression by two standard parallel jaw grippers and a force/torque sensor mounted at the robot wrist, with a professional biaxial compression device used as reference. Gripper effort versus position curves were obtained and transformed into stress/strain curves. The modulus of elasticity was estimated at different strain points and the effect of multiple compression cycles (precycling), compression speed, and the gripper surface area on estimation was studied. Viscoelasticity was estimated using the energy absorbed in a compression/decompression cycle, the Kelvin-Voigt, and Hunt-Crossley models. We found that: (1) slower compression speeds improved elasticity estimation, while precycling or surface area did not; (2) the robot grippers, even after calibration, were found to have a limited capability of delivering accurate estimates of absolute values of Young's modulus and viscoelasticity; (3) relative ordering of material characteristics was largely consistent across different grippers; (4) despite the nonlinear characteristics of deformable objects, fitting linear stress/strain approximations led to more stable results than local estimates of Young's modulus; (5) the Hunt-Crossley model worked best to estimate viscoelasticity, from a single object compression. A two-dimensional space formed by elasticity and viscoelasticity estimates obtained from a single grasp is advantageous for the discrimination of the object material properties. We demonstrated the applicability of our findings in a mock single stream recycling scenario, where plastic, paper, and metal objects were correctly separated from a single grasp, even when compressed at different locations on the object. The data and code are publicly available.
△ Less
Submitted 8 April, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation
Authors:
Sudarshan S Harithas,
Gurkirat Singh,
Aneesh Chavan,
Sarthak Sharma,
Suraj Patni,
Chetan Arora,
K. Madhava Krishna
Abstract:
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads…
▽ More
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads to highly inaccurate LDC. In this original approach, we propose independent roll and pitch canonicalization of the point clouds using a common dominant ground plane. Discretization of the canonicalized point cloud along the axis perpendicular to the ground plane leads to an image similar to Digital Elevation Maps (DEMs), which exposes strong spatial priors in the scene. Our experiments show that LDC based on learnt embeddings of such DEMs is not only data efficient but also significantly more robust, and generalizable than the current SOTA. We report significant performance gain in terms of Average Precision for loop detection and absolute translation/rotation error for relative pose estimation (or loop closure) on Kitti, GPR and Oxford Robot Car over multiple SOTA LDC methods. Our encoder technique allows to compress the original point cloud by over 830 times. To further test the robustness of our technique we create and opensource a custom dataset called Lidar-UrbanFly Dataset (LUF) which consists of point clouds obtained from a LiDAR mounted on a quadrotor.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Single-grasp deformable object discrimination: the effect of gripper morphology, sensing modalities, and action parameters
Authors:
Michal Pliska,
Shubhan Patni,
Michal Mares,
Pavel Stoudek,
Zdenek Straka,
Karla Stepanova,
Matej Hoffmann
Abstract:
In haptic object discrimination, the effect of gripper embodiment, action parameters, and sensory channels has not been systematically studied. We used two anthropomorphic hands and two 2-finger grippers to grasp two sets of deformable objects. On the object classification task, we found: (i) among classifiers, SVM on sensory features and LSTM on raw time series performed best across all grippers;…
▽ More
In haptic object discrimination, the effect of gripper embodiment, action parameters, and sensory channels has not been systematically studied. We used two anthropomorphic hands and two 2-finger grippers to grasp two sets of deformable objects. On the object classification task, we found: (i) among classifiers, SVM on sensory features and LSTM on raw time series performed best across all grippers; (ii) faster compression speeds degraded performance; (iii) generalization to different grasping configurations was limited; transfer to different compression speeds worked well for the Barrett Hand only. Visualization of the feature spaces using PCA showed that the gripper morphology and the action parameters were the main source of variance, rendering generalization across embodiment or grasp configurations very hard. On the highly challenging dataset consisting of polyurethane foams alone, only the Barrett Hand achieved excellent performance. Tactile sensors can thus provide a key advantage even if recognition is based on stiffness rather than shape. The dataset with 24000 measurements is publicly available.
△ Less
Submitted 2 February, 2024; v1 submitted 13 April, 2022;
originally announced April 2022.
-
Controlling the Information Flow in Spreadsheets
Authors:
Vipin Samar,
Sangeeta Patni
Abstract:
There is no denying that spreadsheets have become critical for all operational processes including financial reporting, budgeting, forecasting, and analysis. Microsoft Excel has essentially become a scratch pad and a data browser that can quickly be put to use for information gathering and decision-making. However, there is little control in how data comes into Excel, and how it gets updated. Th…
▽ More
There is no denying that spreadsheets have become critical for all operational processes including financial reporting, budgeting, forecasting, and analysis. Microsoft Excel has essentially become a scratch pad and a data browser that can quickly be put to use for information gathering and decision-making. However, there is little control in how data comes into Excel, and how it gets updated. The information supply chain feeding into Excel remains ad hoc and without any centralized IT control. This paper discusses some of the pitfalls of the data collection and maintenance process in Excel. It then suggests service-oriented architecture (SOA) based information gathering and control techniques to ameliorate the pitfalls of this scratch pad while improving the integrity of data, boosting the productivity of the business users, and building controls to satisfy the requirements of Section 404 of the Sarbanes-Oxley Act.
△ Less
Submitted 17 March, 2008;
originally announced March 2008.