Importing Exllamav2 taking so much time #342
-
Why am I frequently getting stuck while importing exllamav2?
If I try to interrupt it then this error shows up, seems like there's a time.sleep holding it up(?):
The import works just fine after I cancelled the first import and then import it again, Is there anything I can do to prevent this? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
The first time you import the library it's going to build the C++ extension which can take a little while depending on your CPU. After building it once it should be cached in ~/.cache/torch_extensions and then subsequent imports should be very quick. If something's wrong with the build system (jinja) the caching might not work correctly, I guess? One solution is to install ExLlamaV2 with the extension built. Either get one of the prebuilt wheels from the releases or run this from the repo directory:
|
Beta Was this translation helpful? Give feedback.
2+ minutes sounds excessive, but I guess it's possible for a slow CPU?
I don't know if copying the cache works, but if you can't use any of the prebuilt wheels you can also build your own wheel with something like:
pip wheel --no-deps -w dist .
This should create a .whl file in the
dist
directory, containing both the exllamav2 and exllamav2_ext packages. Then install it as part of the docker image build withpip install whatever.whl