Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run phi3 on NPU via OnnxRuntime+DirectML #679

Open
Gusha-nye opened this issue Jan 7, 2025 · 5 comments
Open

How to run phi3 on NPU via OnnxRuntime+DirectML #679

Gusha-nye opened this issue Jan 7, 2025 · 5 comments

Comments

@Gusha-nye
Copy link

1.Now I successfully run Phi3 on GPU via ORT+DML, but I want it to run on NPU, how should I do it and what are the steps required?
2. Does he have hardware configuration requirements for the machine?
3. The current computer configuration is:
Processor: Intel(R) Core(TM) Ultra 7 165H 1.40GHz
GPU: Intel(R) Arc(TM) Graphics
NPU: Intel(R) AI Boost
Installed RAM: 32.0GB
System Type:64-bit operating system, x64-based processor
Windows Edition: Windows 11 pro
version: 24H2
But I also have a Copilot+ PC with an Arm-based Snapdragon Elite X chip built by Qualcomm.

@ashumish-QCOM
Copy link

Hi @Gusha-nye,

To run Phi3 on an NPU via OnnxRuntime and DirectML, follow these steps:

  1. Ensure Hardware Compatibility: Verify that your NPU (Intel AI Boost) is supported by DirectML. Your current configuration looks good, but make sure your NPU drivers are up to date.

  2. Install ONNX Runtime with DirectML:

    • Install the ONNX Runtime with DirectML support:
      pip install onnxruntime-directml
  3. Download the Phi3 Model:

    • Use the HuggingFace CLI to download the Phi3 model optimized for DirectML:
      huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include directml/* --local-dir .
  4. Run the Model:

    • Use the ONNX Runtime API to load and run the model on your NPU.
      import onnxruntime as ort

      Load the model

      Prepare your input data

      Run inference

Regarding your second question, the hardware configuration you provided should be sufficient. Just ensure that your NPU is properly configured.

@Gusha-nye
Copy link
Author

Hi @Gusha-nye,

To run Phi3 on an NPU via OnnxRuntime and DirectML, follow these steps:

  1. Ensure Hardware Compatibility: Verify that your NPU (Intel AI Boost) is supported by DirectML. Your current configuration looks good, but make sure your NPU drivers are up to date.

  2. Install ONNX Runtime with DirectML:

    • Install the ONNX Runtime with DirectML support:
      pip install onnxruntime-directml
  3. Download the Phi3 Model:

    • Use the HuggingFace CLI to download the Phi3 model optimized for DirectML:
      huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include directml/* --local-dir .
  4. Run the Model:

    • Use the ONNX Runtime API to load and run the model on your NPU.
      import onnxruntime as ort

      Load the model

      Prepare your input data

      Run inference

Regarding your second question, the hardware configuration you provided should be sufficient. Just ensure that your NPU is properly configured.

Thank you for your answer

@Gusha-nye
Copy link
Author

Hi, bro @ashumish-QCOM
I loaded the model and tokenizer through the Model class and Tokenizer class, but from the C# API given in the official Onnxruntime docs(https://onnxruntime.ai/docs/genai/api/csharp.html), I can't find any API about being able to set up the NPU, can you expand on this in detail to explain how to load the model onto the NPU?
Or instead do you know if there are demos on loading Phi3 or other LLM models to NPU?
Looking forward to your reply!
Below is the code I used to load the model and tokenizer using the OnnxruntimeGenai-directml library and enter the question to be answered.

Image

Image

@ashumish-QCOM
Copy link

Hi @Gusha-nye,

To load the model onto the NPU using the C# API, you can refer to the DirectMLNpuInference sample on Microsoft Learn. This sample demonstrates how to perform ONNX Runtime inference via DirectML on an NPU, including selecting an NPU device, creating an ONNX Runtime session, executing the model on the NPU, and retrieving the results.

Additionally, the ONNX Runtime C# API documentation provides detailed guidance on using the C# API. While it might not explicitly mention setting up the NPU, the DirectML sample should help you understand the process.

@Gusha-nye
Copy link
Author

Hi @Gusha-nye,

To load the model onto the NPU using the C# API, you can refer to the DirectMLNpuInference sample on Microsoft Learn. This sample demonstrates how to perform ONNX Runtime inference via DirectML on an NPU, including selecting an NPU device, creating an ONNX Runtime session, executing the model on the NPU, and retrieving the results.

Additionally, the ONNX Runtime C# API documentation provides detailed guidance on using the C# API. While it might not explicitly mention setting up the NPU, the DirectML sample should help you understand the process.

Okay. Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants