Skip to content

VMware GenAI reference architecture. A set of companion assets (Python scripts and YAML config files) intended to help customers set up compute and networking accelerators in vSphere and Tanzu Kubernetes to run GenAI workloads.

License

Notifications You must be signed in to change notification settings

vmware-private-ai/VMware-generative-ai-reference-architecture

Repository files navigation

VMware-generative-ai-reference-architecture

Overview

This repository contains a series of Python scripts and configuration files that serve as a complement to the white paper Deploying Enterprise-Ready Generative AI on VMware Cloud Foundation

Disclaimer

The scripts provided in this repository are intended to be used for educational purposes but not for production applications. Be aware that LLMs pose inherent vulnerabilities and risks, as illustrated by the OWASP Top 10 for Large Language Model Applications. We strongly encourage customers to pay attention to OWASP guidance and the NIST AI Risk Management Framework to build safe and robust AI systems.

Directory Structure

The repository is organized by the following structure:

  • The vSphere-and-TKG-config-files directory provides configuration files to set the Tanzu Kubernetes Cluster, NVIDIA GPUs, and Network Kubernetes Operators that provide hardware acceleration services to VMware Tanzu Kubernetes clusters.
  • The Examples/LLM-fine-tuning-example directory provides the steps to configure a Python virtual environment suitable for LLM fine-tuning tasks based on a series of Hugging Face libraries. It also includes a Python notebook that illustrates all the steps required to fine-tune the Falcon LLMs on a custom dataset to teach the model to follow instructions.
  • The Examples/LLM-serving-wt-vLLM-and-RayServe-example directory provides the configuration steps, the configuration files, and the Python scripts to set a Ray cluster that serves the Falcon LLMs via vLLM running as a Ray Serve application. The Ray cluster gets deployed on Tanzu Kubernetes using Kuberay.
  • We also include Starter Packs, which provide code examples about the implementation of the following use cases:
    • Improved RAG v2.0.0 (powered by NVIDIA NIMs, vLLM, LlamaIndex, PGVector, and DeepEval)
    • Intro to RAG (retrieval augmented generation with LangChain and Gradio)
    • AI coding assistance via StarCoder (Code_Assistant)

Contributing

The VMware-generative-ai-reference-architecture project team welcomes contributions from the community. Before you start working with VMware-generative-ai-reference-architecture, please read our Contributor License Agreement. All contributions to this repository must be signed as described on that page. Your signature certifies that you wrote the patch or have the right to pass it on as an open-source patch. For more detailed information, refer to CONTRIBUTING.md.

License

The project is licensed under the terms of the Apache 2.0 license.

About

VMware GenAI reference architecture. A set of companion assets (Python scripts and YAML config files) intended to help customers set up compute and networking accelerators in vSphere and Tanzu Kubernetes to run GenAI workloads.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published