rmvd
contains implementations of depth estimation models. The following provides an overview of the available models
and describes the usage of these models.
rmvd
contains two types of model implementations:
Models that are (re-)implemented natively within the rmvd
framework.
These models can be used for training and inference/evaluation.
Model wrappers around existing implementations. These models can only be used for inference/evaluation.
Wrapped models are indicated by names that end with _wrapped
.
The setup of these models is usually a bit more involved, as it is required to download the original implementation
(e.g. cloning the respective repository from GitHub). Usually the setup requires the follwing steps:
- clone the original repository to a local directory
- specify the path to the local directory in the
paths.toml
file - install required model-specific dependencies
The paths.toml
file contains paths to local directories of wrapped models. It can be located at:
rmvd/models/wrappers/paths.toml
(prioretized)~/rmvd_model_paths.toml
(useful when working on thermvd
framework and usinggit
)
To use wrapped models, it is required to create such a paths.toml
file at one of the above locations.
The file should contain the paths to the local directories of the wrapped models. The format is as follows:
[monodepth2]
root = '/tmp/monodepth2'
...
A template path.tomls
file is provided at rmvd/models/wrappers/_paths.toml
. Further details
can be found in the respective model descriptions below.
--
The following provides an overview of all available native models.
To train these models, use the commands from the train_all.sh
script. The weights that are used when these
models are used with pretrained=True
, are the weights that are obtained after training with the train_all.sh
script.
To evaluate these models with the provided pretrained weights, use the commands from the eval_all.sh
script.
This is the Robust MVD Baseline Model presented in the publication "A Benchmark and a Baseline for Robust Depth Estimation" by Schröppel et al.
This is the Robust MVD Baseline Model presented in the publication "A Benchmark and a Baseline for Robust Depth Estimation" by Schröppel et al., but trained for 5M iterations instead of the 600k iterations in the paper. The longer training slightly improves results.
The following provides an overview of all available wrapped models including their respective setup instructions.
To evaluate these models, use the commands from the eval_all.sh
script.
This is the "Monodepth2 MS-trained" model presented in the publication
"Digging into Self-Supervised Monocular Depth Estimation" by Godard et al.
The model is wrapped around the original implementation from https://github.com/nianticlabs/monodepth2, where it is
indicated as mono+stereo_640x192
.
From the directory of this README
file, execute the script scripts/setup_monodepth2.sh
and specify the local
directory to clone the original repository:
./scripts/setup_monodepth2.sh /path/to/monodepth2
Then specify the local directory /path/to/monodepth2
in the paths.toml
file (relative to the directory of
this README
).
It is not necessary to install additional dependencies.
The model is applied at a fixed input size of width=640
and height=192
. It therefore does not make sense to load
data at a specific downsampled resolution. Thus, don't use the input_size
parameters of Dataset
classes and of the
eval.py
and inference.py
scripts, when using this model.
In the original publication, the model is reported to have an Abs Rel of 0.080
on the KITTI Eigen split with the
improved ground truth depths (see Table 7 in the paper). In the codebase of the original implementation, this result
can be reproduced via:
python evaluate_depth.py --eval_mono --load_weights_folder path/to/monostereo_640x192/weights --eval_split eigen_benchmark
(Even tough, it seems like an error that the model is evaluated with the --eval_mono
flag.)
Within rmvd
, this result can be reproduced as follows:
python eval.py --output /tmp/eval_output --model monodepth2_mono_stereo_wrapped --dataset kitti.eigen_dense_depth_test.mvd --eval_type mvd --max_source_views 0 --clipping 1e-3 80 --alignment median
This command gives an Abs Rel of 8.11%
(i.e. 0.0811
).
This is the "Monodepth2 (1024x320) MS-trained" model presented in the publication
"Digging into Self-Supervised Monocular Depth Estimation" by Godard et al.
The model is wrapped around the original implementation from https://github.com/nianticlabs/monodepth2, where it is
indicated as mono+stereo_1024x320
.
Same as for the monodepth2_mono_stereo_wrapped
model.
The model is applied at a fixed input size of width=1024
and height=320
. It therefore does not make sense to load
data at a specific downsampled resolution. Thus, don't use the input_size
parameters of Dataset
classes and of the
eval.py
and inference.py
scripts, when using this model.
This is the "Monodepth2-Post MS-trained" model presented in the publication
"On the Uncertainty of Self-supervised Monocular Depth Estimation" by Poggi et al.
The model is wrapped around the implementation from https://github.com/nianticlabs/monodepth2, where it is
indicated as mono+stereo_640x192
. It uses the "Post" uncertainty estimation (via image flipping) as described by Poggi et al.
Same as for the monodepth2_mono_stereo_wrapped
model.
The model is applied at a fixed input size of width=640
and height=192
. It therefore does not make sense to load
data at a specific downsampled resolution. Thus, don't use the input_size
parameters of Dataset
classes and of the
eval.py
and inference.py
scripts, when using this model.
In the original publication from Poggi et al., the model is reported to have an Abs Rel of 0.082
and AUSE of 0.036
on the KITTI Eigen split with the improved ground truth depths (see Table 3 in the paper).
Within rmvd
, this result can be reproduced as follows:
python eval.py --output /tmp/eval_output --model monodepth2_postuncertainty_mono_stereo_wrapped --dataset kitti.eigen_dense_depth_test.mvd --eval_type mvd --max_source_views 0 --clipping 1e-3 80 --eval_uncertainty
This command gives an Abs Rel of 8.31%
(i.e. 0.0831
) and an unnormalized AUSE of 0.038
.
This is an unofficial implementation of the MVSNet model presented in the publication "MVSNet: Depth Inference for Unstructured Multi-view Stereo" by Yao et al. The model is wrapped around the unofficial implementation from https://github.com/kwea123/MVSNet_pl.
From the directory of this README
file, execute the script scripts/setup_mvsnet_pl.sh
and specify the local
directory to clone the original repository:
./scripts/setup_mvsnet_pl.sh /path/to/mvsnet_pl
Then specify the local directory /path/to/mvsnet_pl
in the paths.toml
file (relative to the directory of
this README
).
It is required to install additional dependencies. You might want to set up a new virtual environment for this:
pip install git+https://github.com/mapillary/[email protected]
pip install kornia
This is the "MiDaS" model presented in the publication
"Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer" by Ranftl et al.
The model is wrapped around the original implementation from https://github.com/isl-org/MiDaS, where it is
indicated as Big models: MiDaS v2.1
.
From the directory of this README
file, execute the script scripts/setup_midas.sh
and specify the local
directory to clone the original repository:
./scripts/setup_midas.sh /path/to/midas
Then specify the local directory /path/to/midas
in the paths.toml
file (relative to the directory of
this README
).
It is not necessary to install additional dependencies.
This is the Vis-MVSNet model presented in the publication "Visibility-aware Multi-view Stereo Network" by Zhang et al. The model is wrapped around the original implementation from https://github.com/jzhangbs/Vis-MVSNet.git.
From the directory of this README
file, execute the script scripts/setup_vis_mvsnet.sh
and specify the local
directory to clone the original repository:
./scripts/setup_vis_mvsnet.sh /path/to/vis_mvsnet
Then specify the local directory /path/to/vis_mvsnet
in the paths.toml
file (relative to the directory of
this README
).
It is not necessary to install additional dependencies.
This is the CVP-MVSNet model presented in the publication "Cost Volume Pyramid Based Depth Inference for Multi-View Stereo" by Yang et al. The model is wrapped around the original implementation from https://github.com/JiayuYANG/CVP-MVSNet.
From the directory of this README
file, execute the script scripts/setup_cvp_mvsnet.sh
and specify the local
directory to clone the original repository:
./scripts/setup_cvp_mvsnet.sh /path/to/cvp_mvsnet
Then specify the local directory /path/to/cvp_mvsnet
in the paths.toml
file (relative to the directory of
this README
).
It is not necessary to install additional dependencies.
With the original implementation, the number of calculated depth hypotheses is sometimes too small. We apply a small
patch (see cvp_mvsnet.patch
file) to fix this.
Further, the implementation does not support running the model with a single source view. It is therefore not possible
to evaluate the model with the quasi-optimal
view selection, but only with the nearest
view selection strategy.
This is the PatchmatchNet model presented in the publication "PatchmatchNet: Learned Multi-View Patchmatch Stereo" by Wang et al. The model is wrapped around the original implementation from https://github.com/FangjinhuaWang/PatchmatchNet.
From the directory of this README
file, execute the script scripts/setup_patchmatchnet.sh
and specify the local
directory to clone the original repository:
./scripts/setup_patchmatchnet.sh /path/to/patchmatchnet
Then specify the local directory /path/to/patchmatchnet
in the paths.toml
file (relative to the directory
of this README
).
It is not necessary to install additional dependencies.
This is the GMDepth depth estimation model that was trained on Demon datasets (RGBD-SLAM, SUN3D, Scenes11) as
presented in the publication "Unifying Flow, Stereo and Depth Estimation" by Xu et al. (Section 4.5.3 "Depth Estimation",
paragraph "RGBD-SLAM, SUN3D, and Scenes11", evaluation in Table 17).
The model is wrapped around the original implementation from https://github.com/autonomousvision/unimatch, where it is
indicated as GMDepth-scale1-regrefine1-resumeflowthings-demon
.
(see https://github.com/autonomousvision/unimatch/blob/master/MODEL_ZOO.md).
From the directory of this README
file, execute the script scripts/setup_gmdepth.sh
and specify the local
directory to clone the original repository:
./scripts/setup_gmdepth.sh /path/to/gmdepth
Then specify the local directory /path/to/gmdepth
in the paths.toml
file (relative to the directory of
this README
).
It is not necessary to install additional dependencies.
This is the GMDepth depth estimation model that was trained on the ScanNet dataset (using BA-Net splits) as
presented in the publication "Unifying Flow, Stereo and Depth Estimation" by Xu et al. (Section 4.5.3 "Depth Estimation",
paragraph "ScanNet", evaluation in Table 16).
The model is wrapped around the original implementation from https://github.com/autonomousvision/unimatch, where it is
indicated as GMDepth-scale1-regrefine1-resumeflowthings-scannet
.
(see https://github.com/autonomousvision/unimatch/blob/master/MODEL_ZOO.md).
Same as for the gmdepth_scale1_regrefine1_resumeflowthings_demon_wrapped
model.
The original implementation of gmdepth
(see https://github.com/autonomousvision/unimatch/blob/master/MODEL_ZOO.md)
provides several other models that are not mentioned in the publication.
Those models are available as follows:
gmdepth_scale1_resumeflowthings_demon_wrapped
is a wrapper around the officialGMDepth-scale1-resumeflowthings-demon
modelgmdepth_scale1_demon_wrapped
is a wrapper around the officialGMDepth-scale1-demon
modelgmdepth_scale1_resumeflowthings_scannet_wrapped
is a wrapper around the officialGMDepth-scale1-resumeflowthings-scannet
modelgmdepth_scale1_scannet_wrapped
is a wrapper around the officialGMDepth-scale1-scannet
model
Same as for the gmdepth_scale1_regrefine1_resumeflowthings_demon_wrapped
model.
All models can be used with the same interface. The following describes the usage of the models.
To initialize a model, use the create_model
function:
from rmvd import create_model
model_name = "robust_mvd" # available models: see above (e.g. "monodepth2_mono_stereo_1024x320_wrapped", etc.)
model = create_model(model_name, pretrained=True, weights=None, train=False, num_gpus=1) # optional: model-specific parameters
If pretrained
is set to True, the default pretrained weights for the model will be used.
Alternatively, custom weights can be loaded by providing the path to the weights with the weights
parameter.
The weights
parameter overrides the pretrained
parameter.
If train
is set to True, the model is created in training mode.
If num_gpus
is >0
, the model will be executed on the GPU.
The interface to do inference with the model is:
pred, aux = model.run(images=images, keyview_idx=keyview_idx, poses=poses, intrinsics=intrinsics,
depth_range=depth_range) # alternatively: run(**sample)
The inputs can be:
- numpy arrays with a prepended batch dimension (e.g. images are
N3HW
and of typenp.ndarray
) - numpy arrays without a batch dimension (e.g. images are
3HW
and of typenp.ndarray
)
The formats of specific inputs are described in the data readme.
The pred
output is a dictionary which contains:
depth
: predicted depth map for the reference viewdepth_uncertainty
: predicted uncertainty for the predicted depth map (optional)
The output type and shapes correspond to the input types and shapes, i.e.:
- numpy arrays with a prepended batch dimension (e.g.
depth
has shapeN1HW
and typenp.ndarray
) - numpy arrays without a batch dimension (e.g.
depth
has shape1HW
and typenp.ndarray
)
The aux
output is a dictionary which contains additional, model-specific outputs. These are only used for training
or debugging and not further described here.
Most models cannot handle input images at arbitrary resolutions. Models therefore internally upsize the images to the next resolution that can be handled.
The model output is often at a lower resolution as the input data.
Internally, all models have the following functions:
- a
input_adapter
function that converts input data into the models-specific format - a
forward
function that runs a forward pass with the model (in non-pytorch models, this is the__call__
function) - a
output_adapter
function that converts predictions from model-specific format to thermvd
format
The run
function mentioned above to do inference, uses those three functions as follows:
def run(images, keyview_idx, poses=None, intrinsics=None, depth_range=None):
no_batch_dim = (images[0].ndim == 3)
if no_batch_dim:
images, keyview_idx, poses, intrinsics, depth_range = \
add_batch_dim(images, keyview_idx, poses, intrinsics, depth_range)
sample = model.input_adapter(images=images, keyview_idx=keyview_idx, poses=poses,
intrinsics=intrinsics, depth_range=depth_range)
model_output = model(**sample)
pred, aux = model.output_adapter(model_output)
if no_batch_dim:
pred, aux = remove_batch_dim(pred, aux)
return pred, aux
In the following, we further describe the input_adapter
, forward
/__call__
and output_adapter
functions.
The input_adapter
function has the following interface:
def input_adapter(self, images, keyview_idx, poses=None, intrinsics=None, depth_range=None):
# construct sample dict that contains all inputs in the model-specific format: sample = {..}
return sample
The inputs to the input_adapter
function are all numpy
array with a batch dimension
(e.g. images are N3HW
and of type np.ndarray
). The function then converts all inputs to the format that
is required by the model and returns this converted data as a dictionary where the keys are the parameter names
of the model's forward
/ __call__
function. This allows to call model(**sample)
where sample is the dictionary
that is returned from the input_adapter
function.
The conversion may for example include converting the inputs to torch.Tensor
, moving them to the GPU if required,
normalizing the images, etc.
The forward
function of each model expects data in the model-specific format and returns model-specific outputs.
Hence, in case all input data is already in the format required by the model, you can also do model(**sample)
.
This is used in the rmvd
training code. Note that the forward
function also expects input data to have a resolution
that is supported by the model.
The output_adapter
function has the following interface:
def output_adapter(self, model_output):
# construct pred and aux dicts from model_output
# pred needs to have an item with key "depth" and value of type np.ndarray and shape N1HW
return pred, aux
The output adapter converts model-specific outputs to the pred
and aux
dictionaries. The output types and shapes
need to be numpy arrays with a batch dimension (i.e. depth
has shape N1HW
and type np.ndarray
).
If you want to use your own model within the framework, e.g. for evaluation, your model needs to have the
input_adapter
, forward
/__call__
and output_adapter
functions as described above.
Note: you don't have to add a run
function your model. This function will be added automatically by calling
rmvd.prepare_custom_model(model)
.
You can then use your custom model within the rmvd
framework, for example to run inference, e.g.:
import rmvd
model = CustomModel()
model = rmvd.prepare_custom_model(model)
dataset = rmvd.create_dataset("eth3d", "mvd", input_size=(384, 576))
sample = dataset[0]
pred, aux = model.run(**sample)
or to run evaluation, e.g.:
import rmvd
model = CustomModel()
model = rmvd.prepare_custom_model(model)
eval = rmvd.create_evaluation(evaluation_type="mvd", out_dir="/tmp/eval_output", inputs=["intrinsics", "poses"])
dataset = rmvd.create_dataset("kitti", "mvd", input_size=(384, 1280))
results = eval(dataset=dataset, model=model)