-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update geo-inference parameters #579
Changes from all commits
a14fe9e
74b8059
28db82c
6afd245
6efd4f1
7e9d082
94f2dea
0fc1a0e
75063d3
1e429fa
e8d411c
ccba099
23a8589
e8c9e0c
dddeafb
708c78d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
tests | ||
.github | ||
.git | ||
.pytest_cache | ||
.vscode | ||
__pycache__ | ||
*.md | ||
docs/* |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,46 +1,38 @@ | ||
FROM nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04 | ||
FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04 | ||
|
||
ARG CONDA_PYTHON_VERSION=3 | ||
ARG CONDA_DIR=/opt/conda | ||
ARG USERNAME=gdl_user | ||
ARG USERID=1000 | ||
ARG GIT_TAG=develop | ||
|
||
ENV PATH=$CONDA_DIR/bin:$PATH | ||
# RNCAN certificate; uncomment (with right .cer name) if you are building behind a FW | ||
#COPY NRCan-RootCA.cer /usr/local/share/ca-certificates/cert.crt | ||
#RUN chmod 644 /usr/local/share/ca-certificates/cert.crt && update-ca-certificates | ||
# COPY NRCan-RootCA.cer /usr/local/share/ca-certificates/cert.crt | ||
# RUN chmod 644 /usr/local/share/ca-certificates/cert.crt && update-ca-certificates | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y --no-install-recommends git wget unzip bzip2 build-essential sudo \ | ||
&& apt-key del 7fa2af80 \ | ||
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb \ | ||
&& sudo dpkg -i cuda-keyring_1.0-1_all.deb \ | ||
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004-keyring.gpg \ | ||
&& sudo mv cuda-ubuntu2004-keyring.gpg /usr/share/keyrings/cuda-archive-keyring.gpg \ | ||
&& rm -f cuda-keyring_1.0-1_all.deb && rm -f /etc/apt/sources.list.d/cuda.list | ||
|
||
# Install Mamba directly | ||
ENV PATH $CONDA_DIR/bin:$PATH | ||
RUN wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh -O /tmp/mamba.sh && \ | ||
/bin/bash /tmp/mamba.sh -b -p $CONDA_DIR && \ | ||
rm -rf /tmp/* && \ | ||
apt-get clean && \ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
ENV LD_LIBRARY_PATH $CONDA_DIR/lib:$LD_LIBRARY_PATH | ||
|
||
# Create the user | ||
RUN useradd --create-home -s /bin/bash --no-user-group -u $USERID $USERNAME && \ | ||
chown $USERNAME $CONDA_DIR -R && \ | ||
adduser $USERNAME sudo && \ | ||
echo "$USERNAME ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers | ||
|
||
&& wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh -O /tmp/mamba.sh \ | ||
&& /bin/bash /tmp/mamba.sh -b -p $CONDA_DIR \ | ||
&& rm -rf /tmp/* \ | ||
&& apt-get clean \ | ||
&& rm -rf /var/lib/apt/lists/* \ | ||
&& useradd --create-home -s /bin/bash --no-user-group -u $USERID $USERNAME \ | ||
&& chown $USERNAME $CONDA_DIR -R \ | ||
&& adduser $USERNAME sudo \ | ||
&& echo "$USERNAME ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers | ||
|
||
ENV LD_LIBRARY_PATH=$CONDA_DIR/lib:$LD_LIBRARY_PATH | ||
USER $USERNAME | ||
WORKDIR /home/$USERNAME/ | ||
|
||
RUN cd /home/$USERNAME && git clone --depth 1 "https://github.com/NRCan/geo-deep-learning.git" --branch $GIT_TAG | ||
RUN conda config --set ssl_verify no | ||
RUN mamba env create -f /home/$USERNAME/geo-deep-learning/environment.yml | ||
|
||
ENV PATH $CONDA_DIR/envs/geo_deep_env/bin:$PATH | ||
RUN echo "source activate geo_deep_env" > ~/.bashrc | ||
WORKDIR /usr/app | ||
|
||
COPY environment.yml /usr/app | ||
RUN cd /home/$USERNAME && \ | ||
conda config --set ssl_verify no && \ | ||
mamba env create -f /usr/app/environment.yml && \ | ||
mamba clean --all \ | ||
&& pip uninstall -y pip | ||
|
||
COPY . /usr/app/geo-deep-learning | ||
ENV PATH=$CONDA_DIR/envs/geo_ml_env/bin:$PATH | ||
RUN echo "source activate geo_ml_env" > ~/.bashrc |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,27 @@ | ||
name: geo_deep_env | ||
name: geo_ml_env | ||
channels: | ||
- pytorch | ||
- nvidia | ||
- conda-forge | ||
dependencies: | ||
- python==3.11.5 | ||
- coverage>=6.3.1 | ||
- geopandas>=0.14.4 | ||
- hydra-core>=1.2.0 | ||
- pip | ||
- gdal | ||
- pystac>=0.3.0 | ||
- pynvml>=11.0 | ||
- pystac>=1.10.1 | ||
- pytest>=7.1 | ||
- python>=3.11 | ||
- pytorch>=2.3 | ||
- pytorch-cuda>=12.1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need pytorch and pytorch-cuda at the same time? pytorch-cuda can work on CPU if needed. Won't having two different versions cause conflicts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is the recommended way to install pytorch in a conda env.: https://pytorch.org/ |
||
- rich>=11.1 | ||
- ruamel_yaml>=0.15 | ||
- scikit-image>=0.18 | ||
- torchgeo>=0.5.2 | ||
- torchvision>=0.13 | ||
- pip: | ||
- geo-inference>=2.0.7 | ||
- git+https://github.com/NRCan/geo-inference.git | ||
- hydra-colorlog>=1.1.0 | ||
- hydra-optuna-sweeper>=1.1.0 | ||
- ttach>=0.0.3 | ||
- mlflow>=1.2 # causes env solving to hang if not with pip |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,16 @@ | ||
import csv | ||
from math import sqrt | ||
import rasterio | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it should look like this:
but I think that PEP8 refactoring can be done later in a separate PR. |
||
from tqdm import tqdm | ||
from shutil import move | ||
from pathlib import Path | ||
from numbers import Number | ||
from tempfile import mkstemp | ||
from omegaconf import DictConfig | ||
from typing import Dict, Sequence, Union | ||
from dataset.stacitem import SingleBandItemEO | ||
|
||
|
||
from utils.aoiutils import aois_from_csv | ||
from dataset.stacitem import SingleBandItemEO | ||
from utils.logger import get_logger, set_tracker | ||
from geo_inference.geo_inference import GeoInference | ||
from utils.utils import get_device_ids, get_key_def, set_device | ||
|
@@ -24,24 +25,6 @@ def stac_input_to_temp_csv(input_stac_item: Union[str, Path]) -> Path: | |
csv.writer(fh).writerow([str(input_stac_item), None, "inference", Path(input_stac_item).stem]) | ||
return Path(stac_temp_csv) | ||
|
||
def calc_inference_chunk_size(gpu_devices_dict: dict, max_pix_per_mb_gpu: int = 200, default: int = 512) -> int: | ||
""" | ||
Calculate maximum chunk_size that could fit on GPU during inference based on thumb rule with hardcoded | ||
"pixels per MB of GPU RAM" as threshold. Threshold based on inference with a large model (Deeplabv3_resnet101) | ||
:param gpu_devices_dict: dictionary containing info on GPU devices as returned by lst_device_ids (utils.py) | ||
:param max_pix_per_mb_gpu: Maximum number of pixels that can fit on each MB of GPU (better to underestimate) | ||
:return: returns a downgraded evaluation batch size if the original batch size is considered too high | ||
""" | ||
if not gpu_devices_dict: | ||
return default | ||
# get max ram for smallest gpu | ||
smallest_gpu_ram = min(gpu_info['max_ram'] for _, gpu_info in gpu_devices_dict.items()) | ||
# rule of thumb to determine max chunk size based on approximate max pixels a gpu can handle during inference | ||
max_chunk_size = sqrt(max_pix_per_mb_gpu * smallest_gpu_ram) | ||
max_chunk_size_rd = int(max_chunk_size - (max_chunk_size % 256)) # round to the closest multiple of 256 | ||
logging.info(f'Data will be split into chunks of {max_chunk_size_rd} if chunk_size is not specified.') | ||
return max_chunk_size_rd | ||
|
||
|
||
def main(params:Union[DictConfig, Dict]): | ||
|
||
|
@@ -51,9 +34,10 @@ def main(params:Union[DictConfig, Dict]): | |
params['inference'], | ||
to_path=True, | ||
validate_path_exists=True, | ||
wildcard='*.pt') | ||
mask_to_vector = get_key_def('mask_to_vector', params['inference'], default=False, expected_type=bool) | ||
wildcard='*pt') | ||
|
||
prep_data_only = get_key_def('prep_data_only', params['inference'], default=False, expected_type=bool) | ||
|
||
# Set the device | ||
num_devices = get_key_def('gpu', params['inference'], default=0, expected_type=(int, bool)) | ||
if num_devices > 1: | ||
|
@@ -64,25 +48,27 @@ def main(params:Union[DictConfig, Dict]): | |
raise ValueError(f'\nMax used ram parameter should be a percentage. Got {max_used_ram}.') | ||
max_used_perc = get_key_def('max_used_perc', params['inference'], default=25, expected_type=int) | ||
gpu_devices_dict = get_device_ids(num_devices, max_used_ram_perc=max_used_ram, max_used_perc=max_used_perc) | ||
max_pix_per_mb_gpu = get_key_def('max_pix_per_mb_gpu', params['inference'], default=25, expected_type=int) | ||
auto_chunk_size = calc_inference_chunk_size(gpu_devices_dict=gpu_devices_dict, | ||
max_pix_per_mb_gpu=max_pix_per_mb_gpu, default=512) | ||
|
||
|
||
chunk_size = get_key_def('chunk_size', params['inference'], default=auto_chunk_size, expected_type=int) | ||
batch_size = get_key_def('batch_size', params['inference'], default=8, expected_type=int) | ||
patch_size = get_key_def('patch_size', params['inference'], default=1024, expected_type=int) | ||
workers = get_key_def('workers', params['inference'], default=0, expected_type=int) | ||
prediction_threshold = get_key_def('prediction_threshold', params['inference'], default=0.3, expected_type=float) | ||
device = set_device(gpu_devices_dict=gpu_devices_dict) | ||
|
||
|
||
# Dataset params | ||
bands_requested = get_key_def('bands', params['dataset'], default=[1, 2, 3], expected_type=Sequence) | ||
classes_dict = get_key_def('classes_dict', params['dataset'], expected_type=DictConfig) | ||
download_data = get_key_def('download_data', params['inference'], default=False, expected_type=bool) | ||
data_dir = get_key_def('raw_data_dir', params['dataset'], default="data", to_path=True, validate_path_exists=True) | ||
clahe_clip_limit = get_key_def('clahe_clip_limit', params['tiling'], expected_type=Number, default=0) | ||
raw_data_csv = get_key_def('raw_data_csv', params['inference'], expected_type=str, to_path=True, | ||
validate_path_exists=True) | ||
input_stac_item = get_key_def('input_stac_item', params['inference'], expected_type=str, to_path=True, | ||
validate_path_exists=True) | ||
num_classes = get_key_def('num_classes', params['inference'], expected_type=int, default=5) | ||
vectorize = get_key_def('ras2vec', params['inference'], expected_type=bool, default=False) | ||
transform_flip = get_key_def('flip', params['inference'], expected_type=bool, default=False) | ||
transform_rotate = get_key_def('rotate', params['inference'], expected_type=bool, default=False) | ||
transforms = True if transform_flip or transform_rotate else False | ||
|
||
if raw_data_csv and input_stac_item: | ||
raise ValueError(f"Input imagery should be either a csv of stac item. Got inputs from both \"raw_data_csv\" " | ||
|
@@ -109,22 +95,41 @@ def main(params:Union[DictConfig, Dict]): | |
data_dir=data_dir, | ||
equalize_clahe_clip_limit=clahe_clip_limit, | ||
) | ||
|
||
if prep_data_only: | ||
logging.info(f"[prep_data_only mode] Data preparation for inference is complete. Exiting...") | ||
exit() | ||
|
||
# Create the inference object | ||
device_str = "gpu" if device.type == 'cuda' else "cpu" | ||
gpu_index = device.index if device.type == 'cuda' else 0 | ||
|
||
geo_inference = GeoInference(model=str(model_path), | ||
work_dir=str(working_folder), | ||
batch_size=batch_size, | ||
mask_to_vec=mask_to_vector, | ||
mask_to_vec=vectorize, | ||
device=device_str, | ||
gpu_id=gpu_index, | ||
num_classes=num_classes, | ||
prediction_threshold=prediction_threshold, | ||
transformers=transforms, | ||
transformer_flip=transform_flip, | ||
transformer_rotate=transform_rotate, | ||
) | ||
|
||
# LOOP THROUGH LIST OF INPUT IMAGES | ||
for aoi in tqdm(list_aois, desc='Inferring from images', position=0, leave=True): | ||
logging.info(f'\nReading image: {aoi.aoi_id}') | ||
raster = aoi.raster | ||
geo_inference(raster, tiff_name=aoi.aoi_id, patch_size=chunk_size) | ||
|
||
input_path = str(aoi.raster.name) | ||
mask_name = geo_inference(input_path, patch_size=patch_size, workers=workers) | ||
mask_path = working_folder / mask_name | ||
|
||
# update metadata info and rename mask tif. | ||
if classes_dict is not None: | ||
meta_data_dict = {"checkpoint": str(model_path), | ||
"classes_dict": classes_dict} | ||
with rasterio.open(mask_path, 'r+') as raster: | ||
raster.update_tags(**meta_data_dict) | ||
output_path = get_key_def('output_path', params['inference'], expected_type=str, to_path=True, | ||
default=mask_path) | ||
move(mask_path, output_path) | ||
logging.info(f"finished inferring image: {aoi.aoi_id} ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont we need these cuda keyring stuff anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Tested with a local docker image, the CI test image and on HPC. All works fine without these.