Need
Ensure you have quotas for
- ${gpu_count}*4 for On-Demand G and VT instances in the region of choice
- At least 1 load-balancer per each model you want. (Not per server running)
Modify the following lines in create_cluster.sh
To get your account id run
aws sts get-caller-identity
install-embeddings/create_cluster.sh
Lines 7 to 12 in d55047e
Run ./create_cluster.sh
to generate the cluster
- Specify your embedding models
Modify embedding_models.yaml for the models that you want to use
- Install the helm chart
helm upgrade -i embedding-release oci://registry-1.docker.io/trieve/embeddings-helm -f embedding_models.yaml
- Get your model endpoints
kubectl get ing
helm uninstall embedding-release
./delete_cluster.sh