Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs][Serve] Add model serve using AWS NeuronCore #38811

Merged
merged 3 commits into from
Aug 25, 2023

Conversation

chappidim
Copy link
Contributor

@chappidim chappidim commented Aug 23, 2023

Why are these changes needed?

Example of AWS NeuronCore usage by compiling a model and deploying it using Ray Serve.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Manual testing

Manual Testing

Tested on a Inferentia(inf2.8xl) instance (with 2 neuron_cores).

Serve deployment

2023-08-23 22:33:58,267 INFO worker.py:1640 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
(HTTPProxyActor pid=51626) INFO 2023-08-23 22:33:59,693 http_proxy 10.0.1.234 http_proxy.py:1328 - Proxy actor 442c87516b8f2fb93876c13e01000000 starting on node 7695235f0ea52a3784bfcd9df9e7860f32589150d8137dd3a4cf6ce0.
(ServeController pid=51586) INFO 2023-08-23 22:33:59,769 controller 51586 deployment_state.py:1372 - Deploying new version of deployment default_BertBaseModel.
(ServeController pid=51586) INFO 2023-08-23 22:33:59,770 controller 51586 deployment_state.py:1372 - Deploying new version of deployment default_APIIngress.
(HTTPProxyActor pid=51626) INFO:     Started server process [51626]
(ServeController pid=51586) INFO 2023-08-23 22:33:59,874 controller 51586 deployment_state.py:1654 - Adding 1 replica to deployment default_BertBaseModel.
(ServeController pid=51586) INFO 2023-08-23 22:33:59,876 controller 51586 deployment_state.py:1654 - Adding 1 replica to deployment default_APIIngress.
2023-08-23 22:34:11,767 SUCC scripts.py:462 -- Deployed Serve app successfully.
(ServeReplica:default_BertBaseModel pid=51656) INFO 2023-08-23 22:35:11,030 default_BertBaseModel default_BertBaseModel#ypldKT fb7a88ea-fae6-48a8-b9fb-8abce5d9ddaa /infer default replica.py:727 - INFER OK 63.1ms
(ServeReplica:default_APIIngress pid=51657) INFO 2023-08-23 22:35:11,032 default_APIIngress default_APIIngress#WPhXHi fb7a88ea-fae6-48a8-b9fb-8abce5d9ddaa /infer default replica.py:727 - __CALL__ OK 82.4ms
(ServeReplica:default_BertBaseModel pid=51656) INFO 2023-08-23 22:36:33,731 default_BertBaseModel default_BertBaseModel#ypldKT 6a456015-9789-404e-9cff-9a2b69df6156 /infer default replica.py:727 - INFER OK 1.7ms
(ServeReplica:default_APIIngress pid=51657) INFO 2023-08-23 22:36:33,732 default_APIIngress default_APIIngress#WPhXHi 6a456015-9789-404e-9cff-9a2b69df6156 /infer default replica.py:727 - __CALL__ OK 6.7ms
(ServeReplica:default_BertBaseModel pid=51656) INFO 2023-08-23 22:36:51,046 default_BertBaseModel default_BertBaseModel#ypldKT ec8feb78-0805-46be-be1a-ffcd0775b998 /infer default replica.py:727 - INFER OK 1.1ms
(ServeReplica:default_APIIngress pid=51657) INFO 2023-08-23 22:36:51,046 default_APIIngress default_APIIngress#WPhXHi ec8feb78-0805-46be-be1a-ffcd0775b998 /infer default replica.py:727 - __CALL__ OK 5.9ms
>>> resp = requests.get(f"http://127.0.0.1:8000/infer?sentence=Ray is super cool")
>>> print(resp.status_code, resp.json())
200 joy
>>>
                                                                                                                    neuron-top 2.12.2.0 running on i-xxx (inf2.8xlarge)

 NeuronCore v2 Utilization (Avg:  0.00%)
                                                                            NC0                                                                                                                                               NC1
 ND0  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ 0.00%] |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ 0.00%]

 vCPU Utilization
 System vCPU Usage  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ 0.19%, 0.03%]
 Runtime vCPU Usage ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ 0.00%, 0.00%]

 Memory Usage Summary
 Host Used Memory          Total: 527.4MB                               Tensors: 0.0B                                Constants: 0.0B                              DMA Buffers: 256.0KB                         App. Memory: 527.1MB
 Device Used Memory        Total: 380.3MB                               Tensors: 0.0B                                Constants: 282.7MB                           Model Code: 97.6MB                           Runtime Memory: 1.1KB                        Model Scratchpad: 0.0B

 Memory Usage Details
                                                                                                                                                                 Model ID                                     Device Memory                                Host Memory
  [-] ND 0                                                                                                                                                                                                     380.3MB                                      20.0KB
      [-] NC 0                                                                                                                                                                                                 380.3MB                                      20.0KB
          [+] /tmp/tmp_9ord0xt/graph.neff                                                                                                                        10001                                         283.3MB                                      20.0KB
          Model Code                                                                                                                                                                                           97.0MB                                       0.0B
          Runtime Memory                                                                                                                                                                                       1.1KB                                        0.0B
      NC 1

Signed-off-by: maheedhar reddy chappidi <[email protected]>
@chappidim chappidim changed the title [Docs] Add model serve using AWS NeuronCore [Docs][Serve] Add model serve using AWS NeuronCore Aug 23, 2023
Copy link
Contributor

@shrekris-anyscale shrekris-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp cc @akshay-anyscale

@akshay-anyscale
Copy link
Contributor

@zhe-thoughts tagging for merge

@akshay-anyscale akshay-anyscale added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Aug 25, 2023
Copy link
Collaborator

@zhe-thoughts zhe-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean docs change

@zhe-thoughts zhe-thoughts merged commit 99bf189 into ray-project:master Aug 25, 2023
@chappidim chappidim deleted the docs-serve branch August 29, 2023 17:10
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
* [Docs] Add model serve using AWS NeuronCore

Signed-off-by: maheedhar reddy chappidi <[email protected]>
Signed-off-by: e428265 <[email protected]>
LeonLuttenberger pushed a commit to jaidisido/ray that referenced this pull request Sep 5, 2023
* [Docs] Add model serve using AWS NeuronCore

Signed-off-by: maheedhar reddy chappidi <[email protected]>
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
* [Docs] Add model serve using AWS NeuronCore

Signed-off-by: maheedhar reddy chappidi <[email protected]>
Signed-off-by: Jim Thompson <[email protected]>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
* [Docs] Add model serve using AWS NeuronCore

Signed-off-by: maheedhar reddy chappidi <[email protected]>
Signed-off-by: Victor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Add inference serve example using AWS NeuronCore
4 participants