Skip to content

A Next.js chat app to use Llama 2 locally using node-llama-cpp

Notifications You must be signed in to change notification settings

Harry-Ross/llama-chat-nextjs

Repository files navigation

Llama Chat Next.js

Llama Chat running locally

Hey all, this project was a little experiment in wanting run Llama locally on my own machine using the Next.js App Router, node-llama-cpp, SQLite and more! See the full write-up of the project and technical considerations/decisions here: https://harry.is-a.dev/projects/llama-chat/.

Getting Started

  1. First, download a quantised GGUF Llama 2 Chat model from TheBloke on HuggingFace: https://huggingface.co/TheBloke. I used https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF, but choose whatever best suits your needs and requirements.

  2. Then, create a folder in the root directory called "llama", and add the downloaded model to it.

  3. Now, create an .env file in the root directory, and add the path of the model to LLAMA_MODEL_PATH, for example:

LLAMA_MODEL_PATH=llama/llama-2-7b-chat.Q2_K.gguf
  1. Now run pnpm install.

  2. (Optional) If you want to enable CUDA to speed up Llama 2, run the following command:

pnpm dlx node-llama-cpp download --cuda

For this to work, you must have the CUDA toolkit installed, cmake-js dependencies and CMake version 3.26 or higher.

For this to work for me on Windows, I had to modify the compileLlamaCpp.js file in the node-llama-cpp package. I had to change:

if (cuda && process.env.CUDA_PATH != null && await fs.pathExists(process.env.CUDA_PATH))
            cmakeCustomOptions.push("CMAKE_GENERATOR_TOOLSET=" + process.env.CUDA_PATH);

to this:

if (cuda && process.env.CUDA_PATH != null && await fs.pathExists(process.env.CUDA_PATH))
            cmakeCustomOptions.push("CMAKE_VS_PLATFORM_TOOLSET_CUDA=" + process.env.CUDA_PATH);

I also had to switch the NPM VS Build tools to 2022, as 2017 wasn't working for me:

npm config set msvs_version 2022 --global

To see more information on how to install Cuda and to check other requirements, see the official docs for node-llama-cpp.

  1. Now, you can run the development server:
pnpm dev

Open http://localhost:3000 with your browser to see the result.

The API route to interact with Llama 2 is at /api/chat.

About

A Next.js chat app to use Llama 2 locally using node-llama-cpp

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published