-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Add example and document for NIM Semantic kernel plugin #8628
base: main
Are you sure you want to change the base?
Conversation
@josephwnv please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice one! small note
|
||
while True: | ||
# The result is a list of ChatMessageContent objects, grab the first one | ||
result = await chat.get_chat_message_contents(chat_history=chat_history, settings=settings, kernel=kernel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also use get_chat_message_content
then you do not get a list!
@@ -0,0 +1,35 @@ | |||
# NVidia NIM Plugin | |||
|
|||
NVidia Inference Microservice is fully optimized and has wide variety. Perfect tools to enpower Copilot's semantic kernel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVidia Inference Microservice is fully optimized and has wide variety. Perfect tools to enpower Copilot's semantic kernel. | |
NVidia Inference Microservice is fully optimized and has wide variety. Perfect tools to empower Copilot's semantic kernel. |
NVidia Inference Microservice is fully optimized and has wide variety. Perfect tools to enpower Copilot's semantic kernel. | ||
|
||
This sample show how to encorperate NIM into semantic kernel. | ||
This sample is based on llama-3.1-8b-instruct:latest which is version 1.1.2 at this time. Please check the the documentation of the NIM you plan to sue to see whethere there is any additional change you need to make. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sample is based on llama-3.1-8b-instruct:latest which is version 1.1.2 at this time. Please check the the documentation of the NIM you plan to sue to see whethere there is any additional change you need to make. | |
This sample is based on llama-3.1-8b-instruct:latest which is version 1.1.2 at this time. Please check the documentation of the NIM you plan to use to see whether there is any additional change you need to make. |
|
||
## Deploy NIM to Azure | ||
|
||
NIM can deploy to anyplace include but not limited to Azure ML. AKS and Azure VM. Just do one of the following to prepare NIM endpoint for next step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIM can deploy to anyplace include but not limited to Azure ML. AKS and Azure VM. Just do one of the following to prepare NIM endpoint for next step. | |
NIM can be deployed to anywhere including but not limited to Azure ML, AKS and Azure VM. Just perform one of the following to prepare a NIM endpoint for next step. |
|
||
1. **Azure ML Deployment:** | ||
|
||
- Detail instruction can be found [here](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/azureml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Detail instruction can be found [here](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/azureml) | |
- Detail instructions can be found [here](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/azureml) |
|
||
2. **Azure Kubernetes Service Deployment** | ||
|
||
- Detail instruction can be found [here](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/aks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Detail instruction can be found [here](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/aks) | |
- Detail instructions can be found [here](https://github.com/NVIDIA/nim-deploy/tree/main/cloud-service-providers/azure/aks) |
|
||
## NVidia NIM Plugin | ||
|
||
We use llama-3.1-8b-instruct as example. We assume there is an expert called nllama3. We create a plugin called nllama3 and a magic word called nllama3. Any question asked nllama3 will redirect to this plugin and other questions will use default llm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use llama-3.1-8b-instruct as example. We assume there is an expert called nllama3. We create a plugin called nllama3 and a magic word called nllama3. Any question asked nllama3 will redirect to this plugin and other questions will use default llm | |
We use llama-3.1-8b-instruct in this sample. We assume there is an expert called nllama3. We create a plugin called nllama3 and a magic word called nllama3. Any question asked to nllama3 will redirect to this plugin and other questions will use the default llm |
We use llama-3.1-8b-instruct as example. We assume there is an expert called nllama3. We create a plugin called nllama3 and a magic word called nllama3. Any question asked nllama3 will redirect to this plugin and other questions will use default llm | ||
|
||
- Update the nim_url with the endpoint created in previous step. | ||
- run nvidia_nim_plugin.py and see how it work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- run nvidia_nim_plugin.py and see how it work. | |
- Run nvidia_nim_plugin.py and see how it work. |
def get_nllama3_opinion(self, question: Annotated[str, "The input question"]) -> Annotated[str, "The output is a string"]: | ||
|
||
prompt = question.replace("nllama3", "you") | ||
# Make sure model name match the model of NIM you deploy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Make sure model name match the model of NIM you deploy | |
# Make sure model name matches the model of NIM you deploy |
|
||
prompt = question.replace("nllama3", "you") | ||
# Make sure model name match the model of NIM you deploy | ||
client = OpenAI(base_url=nim_url, api_key="not-used") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: What is the benefit of using a non-async client here?
settings: OpenAIChatPromptExecutionSettings = kernel.get_prompt_execution_settings_from_service_id( | ||
service_id=service_id | ||
) | ||
settings.function_call_behavior = FunctionCallBehavior.EnableFunctions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FunctionCallBehavior
has been deprecated. We recommend to use FunctionChoiceBehavior
instead.
Motivation and Context
Description
Contribution Checklist