Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

LennyN95 · 2024-11-22T12:08:19Z

This PR resolves #2556.

Implementation of preprocessing based on a single process
Introduce new environment variables to set -npp and -nps

We have done some internal testing and get identical results between the latest nnunetv2==2.5.1 and with our proposed patch.

New environment variables: - nnUNet_npp - nnUNet_nps Default values remain unchanged, cli parameter -npp and -nps overwrite environment variables if set.

LennyN95 · 2024-11-22T12:08:26Z

@FabianIsensee I have noticed some (significant) differences between nnunetv2==2.0 and nnunetv2==2.5.1. This is beyond the scope of this topic / PR, but since some of our contributors have used this version, we'd like to see if and to what extent we can offer them a solution as well. In general, I am curious if you have any idea what changes might have led to these differences (we are talking about a Dice score of 0.9451 between these versions for a lung nodule segmentation task).

FabianIsensee · 2025-01-15T09:31:33Z

Hey, thanks for the PR and sorry for being slow. I have too much on my plate.
One thing I am not particularly fond of is that fact that your no queue functions preprocess all the data, store it in a list and then yield from the existing list. This causes a lot of unnecessary RAM consumption. Why not yield the items as they are ready? Tat way we don't have to keep them in memory.

Can you please provide more information on the inconsistency in performance? What is the difference between the runs? This doesn't become quite clear from your message

LennyN95 · 2025-01-15T10:02:01Z

Hey @FabianIsensee thanks for the reply and no worries!

One thing I am not particularly fond of is that fact that your no queue functions preprocess all the data, store it in a list and then yield from the existing list. This causes a lot of unnecessary RAM consumption. Why not yield the items as they are ready? Tat way we don't have to keep them in memory.

Good point! @surajpaib Looks like we can combine it to output each item right after preprocessing.

Can you please provide more information on the inconsistency in performance? What is the difference between the runs? This doesn't become quite clear from your message

I ran some tests (with this submission model). All MHub models come with test and reference data, so I manually updated the version from nnunetv2==2.0 to nnunetv2==2.5.1 and compared the generated segmentation with the reference. Normally, we would expect a Dice score of ~1 (with slight variations due to rounding errors caused by different graphics card architectures). However, in this case I got a Dice score of 0.9451, so the generated masks differ when using the latest version. I was wondering if you could link this to a specific change.

FabianIsensee · 2025-01-15T15:16:26Z

Is there a way I can reproduce this locally to investigate? Like can you share both checkpoints + the reference data that gives Dice=1 in one case and 0.94 in the other? That way I can track down where things diverge

LennyN95 · 2025-01-23T18:03:18Z

Thank you @FabianIsensee for looking into this.

You can use the BAMF NNUnet Lung and Nodules V2 (MHubAI/models#92) model for testing.

The weights are available for download here.
The sample input data and reference output can be downloaded here.

You can also build and run the model via MHub in a self-contained environment by following these steps below:

$LOCAL_NNUNET_PATCH_DIR=/absolute/path/to/loacal/nnunet/patch

# build the model container
docker build \
    -t mhubai-nnunet-test/bamf_nnunet_ct_lungnodules:latest \
    --build-arg MHUB_MODELS_REPO=https://github.com/bamf-health/mhub-models.git::bamf_nnunet_ct_lung_v2 \
    https://github.com/bamf-health/mhub-models.git#bamf_nnunet_ct_lung_v2:models/bamf_nnunet_ct_lungnodules/dockerfiles

# run the model container
docker run --rm -it --entrypoint bash --gpus all -v $LOCAL_NNUNET_PATCH_DIR:/nnunet-src mhubai-test/bamf_nnunet_ct_lungnodules:latest

# install nnunet in the container
uv pip install -e /nnunet-src

# update NNUnetRunnerV2 Module
sed -i 's/bash_command += \["-c", self.nnunet_config\]/bash_command += \["-c", self.nnunet_config, "-npp", "0", "-nps", "0"\]/' /app/models/bamf_nnunet_ct_lungnodules/utils/NNUnetRunnerV2.py

# run mhub test
mhub.test srmteyvx

Let me know if you need anything else or if I can assist you in any way!

FabianIsensee · 2025-02-03T10:35:34Z

Hey, thanks for sharing. I will try to find time this week to look into this. Since I will be running this locally (no docker): All I need to do is run the prediction on the provided reference sample with both versions and compare?

LennyN95 · 2025-02-03T13:15:48Z

Hi @FabianIsensee, thank you!

All I need to do is run the prediction on the provided reference sample with both versions and compare?

Correct!

I'm curios what you will find. Let me know if there is anything I can help with!

FabianIsensee · 2025-02-04T09:05:00Z

Hey so I looked into this. Yes the segmentations generated by the two versions diifer. Here are the predictions I generated:
testimage_v20.nii.gz
testimagev252.nii.gz
The difference is to be expected because the inference pipeline was rebuilt in the meantime and comes with a few improvements. These are mostly quality of life, but some also affect the predictions.
When we made those changes we extensively evaluated that they would not result in a measurable performance difference. Specifically we reran the validations of our models and confirmed that the dice scores were comparable to the ones generated with the old setup. So the new results are different, but equivalently good. Have you tried running the validations of the 5 fold cross-validation with v2.0 and v2.5.2 and compared the dice scores? If you observe a substantial difference in this setup that would be very interesting and require more investigations on my end.

surajpaib and others added 6 commits November 21, 2024 08:15

Non-mp predict support [WIP]

69fd9a7

add torchification

a595f64

update preprocess_fromfiles_noqueue

6cb4e7a

dev-only: remove enforced num_processes = 1

b570e8a

add env to overwrite npp and nps

3628055

New environment variables: - nnUNet_npp - nnUNet_nps Default values remain unchanged, cli parameter -npp and -nps overwrite environment variables if set.

Fix preprocessor initialization order

6bcec21

FabianIsensee self-assigned this Nov 22, 2024

LennyN95 mentioned this pull request Nov 29, 2024

[PW41] Add MRSegmentator Model MHubAI/models#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

LennyN95 commented Nov 22, 2024 •

edited

Loading

LennyN95 commented Nov 22, 2024

FabianIsensee commented Jan 15, 2025

LennyN95 commented Jan 15, 2025

FabianIsensee commented Jan 15, 2025

LennyN95 commented Jan 23, 2025

FabianIsensee commented Feb 3, 2025

LennyN95 commented Feb 3, 2025

FabianIsensee commented Feb 4, 2025

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

Are you sure you want to change the base?

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

Conversation

LennyN95 commented Nov 22, 2024 • edited Loading

LennyN95 commented Nov 22, 2024

FabianIsensee commented Jan 15, 2025

LennyN95 commented Jan 15, 2025

FabianIsensee commented Jan 15, 2025

LennyN95 commented Jan 23, 2025

FabianIsensee commented Feb 3, 2025

LennyN95 commented Feb 3, 2025

FabianIsensee commented Feb 4, 2025

LennyN95 commented Nov 22, 2024 •

edited

Loading