ZoeDepth：结合相对和度量深度实现Zero-shot迁移

作者: Valar_Morghulis | 来源:发表于2023-03-01 10:45 被阅读0次

神经风格迁移
超级行动课︱第1课：行动效能提升5倍的秘密
One-shot 学习
度量学习和深度度量
行动效能提升5倍的秘密，为什么只说了4个？
深度学习: 风格迁移 neural-style软件
瞎想一气 | 你能感受到时间的流动吗
你的 EMSA protocol 请收好
札记0426
Docker MongoDB 部署

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Feb 2023

Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller

[KAUST, Intel]

https://arxiv.org/abs/2302.12288

https://github.com/isl-org/ZoeDepth

本文解决了从单个图像进行深度估计的问题。现有工作要么侧重于不考虑度量尺度的泛化性能，即相对深度估计，要么侧重于特定数据集上的最新结果，即度量深度估计。我们提出了第一种方法，该方法结合了两个方面，在保持度量尺度的同时，得到了具有优异泛化性能的模型。我们的旗舰模型ZoeD-M12-NK使用相对深度在12个数据集上进行预训练，并使用度量深度在两个数据集进行微调。我们使用了一个轻量级的头部，每个域都有一个称为度量bins模块的新型bins调整设计。在推断过程中，使用潜在分类器将每个输入图像自动路由到适当的头部。我们的框架允许多种配置，这取决于用于相对深度预训练和度量微调的数据集。在没有预训练的情况下，我们已经可以显著改善NYU Depth v2室内数据集的最新技术（SOTA）。对12个数据集进行预训练，并对NYU Depth v2室内数据集进行微调，我们可以进一步提高SOTA的相对绝对误差（REL），总计提高21%。最后，ZoeD-M12-NK是第一个可以在多个数据集（NYU Depth v2和KITTI）上联合训练而不会显著降低性能的模型，并对来自室内和室外领域的八个未知数据集实现了前所未有的zero-shot泛化性能。

This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while maintaining metric scale. Our flagship model, ZoeD-M12-NK, is pre-trained on 12 datasets using relative depth and fine-tuned on two datasets using metric depth. We use a lightweight head with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. Our framework admits multiple configurations depending on the datasets used for relative depth pre-training and metric fine-tuning. Without pre-training, we can already significantly improve the state of the art (SOTA) on the NYU Depth v2 indoor dataset. Pre-training on twelve datasets and fine-tuning on the NYU Depth v2 indoor dataset, we can further improve SOTA for a total of 21% in terms of relative absolute error (REL). Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains.