Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlib: use cuda and blas properly #279927

Merged
merged 1 commit into from
Jan 23, 2024
Merged

Conversation

RobbieBuxton
Copy link
Contributor

@RobbieBuxton RobbieBuxton commented Jan 9, 2024

changed dlib to actually use cuda properly and switched back to blas and added lapack to make it easier to override, removed fftw because dlib hasn't supported it since 2015.

Fixes this Issue / closes #279924

Not building with CUDA

When you set cudaSupport = True, although it sets the CMAKE flag correctly the resulting package isn't actually built correctly as cmake can't find cuda so it unsets the flag. This is due to the incorrect usage of pkgs.stdenv.mkDerivation instead of pkgs.cudaPackages.backendStdenv.mkDerivation when building the cuda related packages. The correct usage can be seen in the OpenCV package. It also tries to build with CUDA by default which is wrong.

dlib takes fftw as a build input

Dlib hasn't supported fftw since 2015 however there are leftover references to it in cmake which might lead people to think it does support fftw. fftw should be removed.

Taking in openblas rather than blas and adding lapack

As mentioned in this pr it makes more sense to have blas as an input into dlib rather that openblas so that people can override it with other BLAS implementations e.g mkl more easily if they want to. The package was also missing lapack which should be added.

Description of changes

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 run on x86_64-linux 1

15 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python310Packages.dlib
  • python310Packages.dlib.dist
  • python310Packages.face-recognition
  • python310Packages.face-recognition.dist
  • python310Packages.openturns
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns

@hacker1024
Copy link
Member

hacker1024 commented Jan 15, 2024

cudatoolkit should not be used anymore, as we now have split derivations that are more granular. The given CUDA capabilities are also not being taken into account here, and CUDA is still broken for Python as well.

I have fixed these things in #273665, but I haven't had the time to act on the feedback.

@RobbieBuxton
Copy link
Contributor Author

cudatoolkit should not be used anymore, as we now have split derivations that are more granular. The given CUDA capabilities are also not being taken into account here, and CUDA is still broken for Python as well.

I have fixed these things in #273665, but I haven't had the time to act on the feedback.

Ah thanks for the feedback, don't know how I completely missed that you had already basically already put in this PR! I'll have a look into putting in the fixes CUDA granularity for dlib/default.nix tonight. I've just been testing this on c++ dlib library so can't comment on the python stuff much unfortunately.

@RobbieBuxton
Copy link
Contributor Author

cudatoolkit should not be used anymore, as we now have split derivations that are more granular. The given CUDA capabilities are also not being taken into account here, and CUDA is still broken for Python as well.

I have fixed these things in #273665, but I haven't had the time to act on the feedback.

So, I've been looking into this a bit more and trying to split cudatoolkit into a minimum set of required libs but have been running into some slight issues, I might just be being dumb though. One of the dependencies we need in dlib is libcudaart which is used by for example dlib::cuda::get_num_devices() (which is helpful debugging if this pr works). However, I cannot seem to find this packaged individually anywhere or provided by anything other than cudatoolkit. Would appreciate being pointed in the right direction if you do know anything @hacker1024! Also thought I might ping you in as well @SomeoneSerge if that is alright because you reviewed @hacker1024's similar pr. Cheers!

@SomeoneSerge SomeoneSerge changed the title changed dlib to actually use cuda properly dlib: use cuda and blas properly Jan 16, 2024
@RobbieBuxton
Copy link
Contributor Author

RobbieBuxton commented Jan 16, 2024

Was banging my head against a wall trying to get it work without cudatoolkit, then realized your pr suggestions @SomeoneSerge in #273665 did it so I've just implemented those here. Running the nixpkgs-review again locally and will post results when done but worked on my test repo for the meanwhile.

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 --extra-nixpkgs-config '{ cudaSupport = true; }' run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
6 packages failed to build:
  • openturns
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist
8 packages built:
  • dlib
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python312Packages.dlib
  • python312Packages.dlib.dist

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@RobbieBuxton
Copy link
Contributor Author

RobbieBuxton commented Jan 16, 2024

This is the error i'm getting for openturns I assume the other packages are similar, I'll have a proper look tomorrow.

openturns>   The link interface of target "dlib::dlib" contains:
openturns>     CUDA::cublas
openturns>   but the target was not found.  Possible reasons include:
openturns>     * There is a typo in the target name.
openturns>     * A find_package call is missing for an IMPORTED target.
openturns>     * An ALIAS target is missing.
openturns> Call Stack (most recent call first):
openturns>   /nix/store/llmknxnpvhr6h6cwzla5f8mlsfwh3xpx-dlib-19.24.2/lib/cmake/dlib/dlibConfig.cmake:21 (include)
openturns>   CMakeLists.txt:331 (find_package)```

@SomeoneSerge
Copy link
Contributor

cudaPackages.libcublas

@RobbieBuxton
Copy link
Contributor Author

cudaPackages.libcublas

Yeah the lib is there on line 67, I think I need to be more explicit about where it is to cmake.

Comment on lines 79 to 81
patches = [ ] ++ lib.optional
cudaSupport
useNewCMakePatch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mv patches up to after src nit2: optionalsoveroptional`

@RobbieBuxton
Copy link
Contributor Author

RobbieBuxton commented Jan 17, 2024

still broken for python311Packages.face-recognition but should be working for openturns now, the patch seems broken at least with nix, there was some very strange behavior where for example it wouldn't be able find cublas_v2.h even though it was looking in the correct place

Unable to find cublas_v2.h in either "/nix/store/i2bxh58nwq37qik00dpji6g6plf91khg-libcusolver-11.4.1.48/include;/nix/store/x6m8r8624npgx63r4fkgzd1xfzxnxsnx-libcurand-10.3.0.86/include;/nix/store/23y28zvyr5dp9zfvqa2zi5nayrrrins9-libcublas-11.11.3.6/include;/nix/store/6xbwsr4z38idr6c0kv62rjy8fdjxjz4y-cuda_cudart-11.8.89/include;/nix/store/rzmh6kycxiimqpbbzwwwrd1gvwiq7qm7-cuda_nvcc-11.8.89-dev/include;/nix/store/m2hf9m9y14aaj4hmx8s5731pf3p2h1k4-cudnn-8.9.7.29/include;/nix/store/inx3laxlhl3ji57g6fbd7l5bx1kzh3v6-cuda_nvcc-11.8.89/include" or "/nix/math_libs/include"

and path definitely had the header file it wanted

[robbieb@nixos:~/Random/dlib-nixpkgs-test]$ ls /nix/store/23y28zvyr5dp9zfvqa2zi5nayrrrins9-libcublas-11.11.3.6/include
cublas_api.h  cublas.h  cublasLt.h  cublas_v2.h  cublasXt.h  nvblas.h

face-recogniser is failing with this error

python3.11-face-recognition> face_recognition/api.py:26: in <module>
python3.11-face-recognition>     cnn_face_detector = dlib.cnn_face_detection_model_v1(cnn_face_detection_model)
python3.11-face-recognition> E   RuntimeError: Error while calling cudaGetDevice(&the_device_id) in file /build/source/dlib/cuda/gpu_data.cpp:204. code: 35, reason: CUDA driver version is insufficient for CUDA runtime version

I wonder if it's somehow inheriting a different cuda dependency from somewhere else because when I run the same c++ cuda call in my testing flake it's fine?

I'll have a proper investigation tomorrow.
Edit: broke it with a bad commit will fix tomorrow
Edit: This issue is weird builds fine locally in my flake but not when I run the review tests, will have a proper look on the weekend

@RobbieBuxton
Copy link
Contributor Author

After doing a bit more research today I've concluded that

Working fine

dlib
openturns
php81Extensions.pdlib
php82Extensions.pdlib
php83Extensions.pdlib
python311Packages.dlib
python311Packages.dlib.dist
python311Packages.openturns
python312Packages.dlib
python312Packages.dlib.dist

Broken by fix when run with {config.cudaSupport = true;}

python311Packages.face-recognition
python311Packages.face-recognition.dist

Broken before this pr with and without {config.cudaSupport = true;}

python312Packages.openturns
python312Packages.face-recognition
python312Packages.face-recognition.dist

The error python311Packages.face-recognition gets is
Screenshot_20240120_231559
which sounds like it's either missing or has the wrong cuda dependency while testing. I had a look into trying to fix it but I'm not very familiar with how the python packaging works and didn't get very far.

Seeing that python311Packages.face-recognition was arguably already broken because when run with CUDA it wasn't actually using it, would it be insane to suggest we merge this PR anyway and flag it as broken because it seems like quite a few people are waiting on this fix?

Would appreciate your input @SomeoneSerge, thanks!

@SomeoneSerge
Copy link
Contributor

cuda driver version insufficient for cuda runtime ...

This just means you need to deploy a newer nvidia driver or build with older cudaPackages, such that cudaRuntimeGetVersion() <= cudaDriverGetVersion() (the former is cudaPackages.cuda_cudart.version)

@SomeoneSerge
Copy link
Contributor

Result of nixpkgs-review pr 279927 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "8.6" ]; cudaEnableForwardCompat = false; }' run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages failed to build:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "8.6" ]; cudaEnableForwardCompat = false; }' run on x86_64-linux 1
1 package marked as broken and skipped:
14 packages failed to build:

I noticed this before also, for some reason when I run nixpkgs-review everything would fail to build whereas when I just used the new fix as an overlay in my test flake it was fine (I suspect I have a misunderstood something about how nixpkgs works). I'll have more of a look into this after I fix the python311Packages.face-recognition cuda issue.

@SomeoneSerge
Copy link
Contributor

Result of nixpkgs-review pr 279927 run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@RobbieBuxton
Copy link
Contributor Author

RobbieBuxton commented Jan 21, 2024

@SomeoneSerge am I right in reading into this issue that accessing the GPU in tests is not currently supported in nixpkgs ? Because python311Packages.face-recognition is trying to run GPU tests. Also seems like people are disabling the check here too?

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; }' run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@RobbieBuxton
Copy link
Contributor Author

@SomeoneSerge am I right in reading into this issue that accessing the GPU in tests is not currently supported in nixpkgs ? Because python311Packages.face-recognition is trying to run GPU tests. Also seems like people are disabling the check here too?

Provided disabling tests for python311Packages.face-recognition is acceptable, I think this is ready for review/merge.

Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I right in reading into this #225912 that accessing the GPU in tests is not currently supported in nixpkgs

Yes, since we want to be able to build derivations on machines that have no GPUs as well, we disable the GPU-related checks during the build. There's an idea to declare them in e.g. dedicated passthru derivations marked with requiredSystemFeatures so that a selection of builders could be set up to expose GPUs in the sandbox

Provided disabling tests for python311Packages.face-recognition is acceptable, I think this is ready for review/merge.

Yes. It's preferable to keep as many tests as possible on, and to do pythonImportsCheck. Some packages might fail even that if they eager-load the driver...

symlinkJoin

Is expensive and discouraged, but I think you could only get rid of it once we had patched dlib to support FindCUDAToolkit.cmake


I'll probably have one more look a bit later, but overall looks good!

@RobbieBuxton
Copy link
Contributor Author

am I right in reading into this #225912 that accessing the GPU in tests is not currently supported in nixpkgs

Yes, since we want to be able to build derivations on machines that have no GPUs as well, we disable the GPU-related checks during the build. There's an idea to declare them in e.g. dedicated passthru derivations marked with requiredSystemFeatures so that a selection of builders could be set up to expose GPUs in the sandbox

Provided disabling tests for python311Packages.face-recognition is acceptable, I think this is ready for review/merge.

Yes. It's preferable to keep as many tests as possible on, and to do pythonImportsCheck. Some packages might fail even that if they eager-load the driver...

symlinkJoin

Is expensive and discouraged, but I think you could only get rid of it once we had patched dlib to support FindCUDAToolkit.cmake

I'll probably have one more look a bit later, but overall looks good!

symlinkJoin isn't strictly necessary (I think), I just saw other packages examples doing it during my research and thought it was the expected pattern, happy to remove it if it's expensive.

You also mentioned here that it might be worth splitting cuda packages up into .lib, .dev and .static and I just wanted to double check that is still the case?

@SomeoneSerge
Copy link
Contributor

You also mentioned #273665 (comment) that it might be worth splitting cuda packages up into .lib, .dev and .static and I just wanted to double check that is still the case?

Yes, cf the linked issue

I just saw other packages examples doing it during my research and thought it was the expected pattern, happy to remove it if it's expensive.

No, it a necessity created by upstream build scripts (e.g. expecting all cuda libraries in the same merged directory)

@RobbieBuxton
Copy link
Contributor Author

You also mentioned #273665 (comment) that it might be worth splitting cuda packages up into .lib, .dev and .static and I just wanted to double check that is still the case?

Yes, cf the linked issue

I just saw other packages examples doing it during my research and thought it was the expected pattern, happy to remove it if it's expensive.

No, it a necessity created by upstream build scripts (e.g. expecting all cuda libraries in the same merged directory)

Sounds good, I'll add those changes tonight.

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; }' run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@@ -36,6 +36,9 @@ buildPythonPackage rec {
pytestCheckHook
];

# Disable tests
doCheck = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit! Can you make this doCheck = !cudaSupport please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought this was in dlib. Well, dlib.cudaSupport would've made sense still, but maybe out of scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can still add it, I probably should put a reference to the issue as well, might help someone else who comes across it.

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@RobbieBuxton RobbieBuxton force-pushed the dlib-cuda-fix branch 2 times, most recently from f21e71d to d2fe369 Compare January 23, 2024 00:39
…lt blas and added lapack to make it easier to override, removed fftw because dlib hasn't supported it since 2015

Signed-off-by: Robbie Buxton <[email protected]>
@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; }' run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@RobbieBuxton
Copy link
Contributor Author

Result of nixpkgs-review pr 279927 run on x86_64-linux 1

1 package marked as broken and skipped:
  • python312Packages.openturns
14 packages built:
  • dlib
  • openturns
  • php81Extensions.pdlib
  • php82Extensions.pdlib
  • php83Extensions.pdlib
  • python311Packages.dlib
  • python311Packages.dlib.dist
  • python311Packages.face-recognition
  • python311Packages.face-recognition.dist
  • python311Packages.openturns
  • python312Packages.dlib
  • python312Packages.dlib.dist
  • python312Packages.face-recognition
  • python312Packages.face-recognition.dist

@@ -36,6 +38,9 @@ buildPythonPackage rec {
pytestCheckHook
];

# Disables tests when running with cuda due to https://github.com/NixOS/nixpkgs/issues/225912
doCheck = !config.cudaSupport;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future: I think it's more flexible to expose a passthru in dlib and test that intead of the global config, because one might override dlib = prev.dlib.override { cudaSupport = true; } specifically. But this is good enough as it is IMO

Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks a lot @RobbieBuxton, nice job! I'm honestly surprised it just worked w/o symlinkJoin. And thanks @hacker1024 for all the preceding effort!

@SomeoneSerge SomeoneSerge merged commit c8457ba into NixOS:master Jan 23, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Dlib Build Issue: Doesn't actually build with cuda + others
5 participants