How to run phi3 on NPU via OnnxRuntime+DirectML #679

Gusha-nye · 2025-01-07T09:50:48Z

1.Now I successfully run Phi3 on GPU via ORT+DML, but I want it to run on NPU, how should I do it and what are the steps required?
2. Does he have hardware configuration requirements for the machine?
3. The current computer configuration is:
Processor: Intel(R) Core(TM) Ultra 7 165H 1.40GHz
GPU: Intel(R) Arc(TM) Graphics
NPU: Intel(R) AI Boost
Installed RAM: 32.0GB
System Type:64-bit operating system, x64-based processor
Windows Edition: Windows 11 pro
version: 24H2
But I also have a Copilot+ PC with an Arm-based Snapdragon Elite X chip built by Qualcomm.

ashumish-QCOM · 2025-01-10T13:32:40Z

Hi @Gusha-nye,

To run Phi3 on an NPU via OnnxRuntime and DirectML, follow these steps:

Ensure Hardware Compatibility: Verify that your NPU (Intel AI Boost) is supported by DirectML. Your current configuration looks good, but make sure your NPU drivers are up to date.
Install ONNX Runtime with DirectML:
- Install the ONNX Runtime with DirectML support:
  pip install onnxruntime-directml
Download the Phi3 Model:
- Use the HuggingFace CLI to download the Phi3 model optimized for DirectML:
  huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include directml/* --local-dir .
Run the Model:
- Use the ONNX Runtime API to load and run the model on your NPU.
  import onnxruntime as ort
  Load the model
  
  Prepare your input data
  
  Run inference

Regarding your second question, the hardware configuration you provided should be sufficient. Just ensure that your NPU is properly configured.

Gusha-nye · 2025-01-13T01:03:29Z

Hi @Gusha-nye,

To run Phi3 on an NPU via OnnxRuntime and DirectML, follow these steps:

Ensure Hardware Compatibility: Verify that your NPU (Intel AI Boost) is supported by DirectML. Your current configuration looks good, but make sure your NPU drivers are up to date.

Install ONNX Runtime with DirectML:

Install the ONNX Runtime with DirectML support:
pip install onnxruntime-directml

Download the Phi3 Model:

Use the HuggingFace CLI to download the Phi3 model optimized for DirectML:
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include directml/* --local-dir .

Run the Model:

Use the ONNX Runtime API to load and run the model on your NPU.
import onnxruntime as ort
Load the model

Prepare your input data

Run inference

Regarding your second question, the hardware configuration you provided should be sufficient. Just ensure that your NPU is properly configured.

Thank you for your answer

Gusha-nye · 2025-01-15T07:33:51Z

Hi, bro @ashumish-QCOM
I loaded the model and tokenizer through the Model class and Tokenizer class, but from the C# API given in the official Onnxruntime docs(https://onnxruntime.ai/docs/genai/api/csharp.html), I can't find any API about being able to set up the NPU, can you expand on this in detail to explain how to load the model onto the NPU?
Or instead do you know if there are demos on loading Phi3 or other LLM models to NPU?
Looking forward to your reply！
Below is the code I used to load the model and tokenizer using the OnnxruntimeGenai-directml library and enter the question to be answered.

ashumish-QCOM · 2025-01-15T07:40:06Z

Hi @Gusha-nye,

To load the model onto the NPU using the C# API, you can refer to the DirectMLNpuInference sample on Microsoft Learn. This sample demonstrates how to perform ONNX Runtime inference via DirectML on an NPU, including selecting an NPU device, creating an ONNX Runtime session, executing the model on the NPU, and retrieving the results.

Additionally, the ONNX Runtime C# API documentation provides detailed guidance on using the C# API. While it might not explicitly mention setting up the NPU, the DirectML sample should help you understand the process.

Gusha-nye · 2025-01-15T08:48:41Z

Hi @Gusha-nye,

To load the model onto the NPU using the C# API, you can refer to the DirectMLNpuInference sample on Microsoft Learn. This sample demonstrates how to perform ONNX Runtime inference via DirectML on an NPU, including selecting an NPU device, creating an ONNX Runtime session, executing the model on the NPU, and retrieving the results.

Additionally, the ONNX Runtime C# API documentation provides detailed guidance on using the C# API. While it might not explicitly mention setting up the NPU, the DirectML sample should help you understand the process.

Okay. Thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run phi3 on NPU via OnnxRuntime+DirectML #679

How to run phi3 on NPU via OnnxRuntime+DirectML #679

Gusha-nye commented Jan 7, 2025

ashumish-QCOM commented Jan 10, 2025

Load the model

Prepare your input data

Run inference

Gusha-nye commented Jan 13, 2025

Load the model

Prepare your input data

Run inference

Gusha-nye commented Jan 15, 2025

ashumish-QCOM commented Jan 15, 2025

Gusha-nye commented Jan 15, 2025

How to run phi3 on NPU via OnnxRuntime+DirectML #679

How to run phi3 on NPU via OnnxRuntime+DirectML #679

Comments

Gusha-nye commented Jan 7, 2025

ashumish-QCOM commented Jan 10, 2025

Load the model

Prepare your input data

Run inference

Gusha-nye commented Jan 13, 2025

Load the model

Prepare your input data

Run inference

Gusha-nye commented Jan 15, 2025

ashumish-QCOM commented Jan 15, 2025

Gusha-nye commented Jan 15, 2025