The goal of this project is to be an all-in-one solution for running Ai that is easy to install. It is a native app that runs a server which handles all basic building blocks of Ai: inference, memory, model file manager, agent builder, app installer, GUI.
The Obrew Engine is a Python server built with FastAPI. We provide a Web UI called Obrew Studio to access this server. You can also access it programmatically via the API.
Launch the desktop app locally, then navigate your browser to any web app that supports this project's api and start using ai locally with your own private data for free:
- โ Run locally
- โ Provide easy to use desktop installers
- โ Save chat history
- โ CPU & GPU support
- โ Windows OS installer
- โ MacOS/Linux installer
- โ Docker config for easy server deployment
- โ Support deployment to hosted server with Admin login support
- โ Production ready: This project is currently under active development, there may be bugs
- โ Inference: Run open-source AI models for free
- โ Embeddings: Create vector embeddings from a text or document files
- โ Search: Using a vector database and Llama Index to make semantic or similarity queries
- โ Build custom bots
- โ Agents: Bots with tools
- โ Workloads: Agent jobs
- โ Support multi-modal & vision models
- โ Source citations in retrieved responses
- โ Chat conversations
- โ Infinite context & Long-term memory across conversations (personal memory)
- โ Voice to text (User query) and text-to-speech (Ai responses)
This is a local first project. The ultimate goal is to support all providers via one API.
- โ Open-Source
- โ Google Gemini
- โ OpenAI
- โ Anthropic
- โ Mistral AI
- โ Groq
Install dependencies for python listed in requirements.txt file:
Be sure to run this command with admin privileges. This command is optional and is also run on each yarn build
.
pip install -r requirements.txt
# or
yarn python-deps
If you get a "Permission Denied" error, try running the executable with Admin privileges.
Right-click over src/backends/main.py
and choose "run python file in terminal" to start server:
Or
# from working dir
python src/backends/main.py
Or, using yarn (recommended)
yarn server:dev
# or
yarn server:local-prod
# or
yarn server:hosted-prod
The Obrew api server will be running on https://localhost:8008
*Note if the server fails to start be sure to run yarn makecert
command to create certificate files necessary for https (these go into _deps/public
folder).
These steps outline the process of supporting GPU's. If all you need is CPU, then you can skip this.
When you do the normal pip install llama-cpp-python
, it installs with only CPU support by default.
If you want GPU support for various platforms you must build llama.cpp from source and then pip --force-reinstall.
Follow these steps to build llama-cpp-python for your hardware and platform.
- Install Visual Studio (Community 2019 is fine) with components:
- C++ CMake tools for Windows
- C++ core features
- Windows 10/11 SDK
- Visual Studio Build Tools
- Install the CUDA Toolkit:
- Download CUDA Toolkit from https://developer.nvidia.com/cuda-toolkit
- Install only components for CUDA
- If the installation fails, you will need to uncheck everything and only install
visual_studio_integration
. Next proceed to install packages one at a time or in batches until everything is installed. - Add CUDA_PATH (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2) to your environment variables
- llama-cpp-python build steps:
If on Windows, run the following using "Command Prompt" tool. If you are developing in a python virtual or Anaconda env, be sure you have the env activated first and then run from Windows cmd prompt.
set FORCE_CMAKE=1 && set CMAKE_ARGS=-DLLAMA_CUBLAS=on && pip install llama-cpp-python --force-reinstall --ignore-installed --upgrade --no-cache-dir --verbose
- If CUDA is detected but you get
No CUDA toolset found
error, copy all files from:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\extras\visual_studio_integration\MSBuildExtensions
into
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Microsoft\VC\v160\BuildCustomizations
(Adjust the path/version as necessary)
- Once everything is installed, be sure to set
n_gpu_layers
to an integer higher than 0 to offload inference layers to gpu. You will need to play with this number depending on VRAM and context size of model.
See here https://github.com/ggerganov/llama.cpp#build
for steps to compile to other targets.
Be sure to generate self-signed certs for easy SSL setup in local environment.
If you already have the required toolkit files installed and have built for GPU then the necessary GPU drivers/dlls should be detected by PyInstaller and included in the _deps
dir.
This is handled automatically by npm scripts so you do not need to execute these manually. The -F flag bundles everything into one .exe file.
To install the pyinstaller tool:
pip install -U pyinstaller
Then use it to bundle a python script:
pyinstaller -c -F your_program.py
This is a GUI tool that greatly simplifies the process. You can also save and load configs. It uses PyInstaller under the hood and requires it to be installed. Please note if using a conda or virtual environment, be sure to install both PyInstaller and auto-py-to-exe in your virtual environment and also run them from there, otherwise one or both will build from incorrect deps.
*Note, you will need to edit paths for the following in auto-py-to-exe
to point to your base project directory:
- Settings -> Output directory
- Additional Files
- Script Location
To run:
auto-py-to-exe
This utility will take your exe and dependencies and compress the files, then wrap them in a user friendly executable that guides the user through installation.
-
Download Inno Setup from (here)[https://jrsoftware.org/isinfo.php]
-
Install and run the setup wizard for a new script
-
Follow the instructions and before it asks to compile the script, cancel and inspect the script where it points to your included files/folders
-
Be sure to append
/[your_included_folder_name]
after theDestDir: "{app}"
. So instead of{app}
we have{app}/assets
. This will ensure it points to the correct paths of the added files you told pyinstaller to include. -
After that compile the script and it should output your setup file where you specified (or project root).
For production deployments you will either want to run the server behind a reverse proxy using something like Traefic-Hub (free and opens your self hosted server to public internet using encrypted https protocol).
If you wish to deploy this on your private network for local access from any device on that network, you will need to run the server using https which requires SSL certificates. Be sure to set the .env var ENABLE_SSL
.
Rename the included .env.example
file to .env
in the /_deps
folder and modify the vars accordingly.
This command will create a self-signed key and cert files in your current dir that are good for 100 years. These files should go in the _deps/public
folder. You should generate your own and overwrite the files in _deps/public
, do not use the provided certs in a production environment.
openssl req -x509 -newkey rsa:4096 -nodes -out public/cert.pem -keyout public/key.pem -days 36500
# OR (an alias for same command as above)
yarn makecert
This should be enough for any webapp served over https to access the server. If you see "Warning: Potential Security Risk Ahead" in your browser when using the webapp, you can ignore it by clicking advanced
then Accept the Risk
button to continue.
- Create a tag with:
Increase the patch version by 1 (x.x.1 to x.x.2)
yarn version --patch
Increase the minor version by 1 (x.1.x to x.2.x)
yarn version --minor
Increase the major version by 1 (1.x.x to 2.x.x)
yarn version --major
-
Create a new release in Github and choose the tag just created or enter a new tag name for Github to make.
-
Drag & Drop the binary file you wish to bundle with the release. Then hit done.
-
If the project is public then the latest release's binary should be available on the web to anyone with the link:
https://github.com/[github-user]/[project-name]/releases/latest/download/[installer-file-name]
This project deploys several servers/processes (databases, inference, etc.) exposed using the /v1
endpoint. The goal is to separate all OS level logic and processing from the client apps. This can make deploying new apps and swapping out functionality easier.
A complete list of endpoint documentation can be found at http://localhost:8000/docs after Obrew Server is started.
There is currently a javascript library under development and being used by Obrew Studio. Once the project becomes stable, it will be broken out into its own module and repo. Stay tuned.
Development: Put your .env file in the base directory of the project.
Installed App: Put your .env file in _deps
folder in the executable's root directory.
It is highly recommended to use an package/environment manager like Anaconda to manage Python installations and the versions of dependencies they require. This allows you to create virtual environments from which you can install different versions of software and build/deploy from within this sandboxed environment.
To update PIP package installer:
conda update pip
The following commands should be done in Anaconda Prompt
terminal. If on Windows, run as Admin
.
- Create a new environment. This project uses
3.12
:
conda create --name env1 python=3.12
- To work in this env, activate it:
conda activate env1
- When you are done using it, deactivate it:
conda deactivate
- If using an IDE like VSCode, you must apply your newly created virtual environment by selecting the
python interpreter
button at the bottom when inside your project directory.
Some notes on how to create a new tool:
- File name and function name should be the same
- 1 function per file
- Functions must be written in Python:
function_name.py
- Each function needs a description to help the llm
- Each function needs a Pydantic class (named "Params") assigned to input args
Where to store the function code:
From the project's root tools\functions
OR
From the installation directory, create a new folder tools\functions
Take a look at the calculator.py example for reference.
- Server: FastAPI - learn about FastAPI features and API.
- Inference: llama-cpp-python for Ai inference.
- Memory: Llama-Index for data retrieval and ChromaDB for vector database.
- Web UI: Next.js for front-end and Vercel for hosting.