[Docs][Serve] Add model serve using AWS NeuronCore #38811

chappidim · 2023-08-23T23:46:52Z

Why are these changes needed?

Example of AWS NeuronCore usage by compiling a model and deploying it using Ray Serve.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Manual testing

Manual Testing

Tested on a Inferentia(inf2.8xl) instance (with 2 neuron_cores).

Serve deployment

2023-08-23 22:33:58,267 INFO worker.py:1640 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
(HTTPProxyActor pid=51626) INFO 2023-08-23 22:33:59,693 http_proxy 10.0.1.234 http_proxy.py:1328 - Proxy actor 442c87516b8f2fb93876c13e01000000 starting on node 7695235f0ea52a3784bfcd9df9e7860f32589150d8137dd3a4cf6ce0.
(ServeController pid=51586) INFO 2023-08-23 22:33:59,769 controller 51586 deployment_state.py:1372 - Deploying new version of deployment default_BertBaseModel.
(ServeController pid=51586) INFO 2023-08-23 22:33:59,770 controller 51586 deployment_state.py:1372 - Deploying new version of deployment default_APIIngress.
(HTTPProxyActor pid=51626) INFO:     Started server process [51626]
(ServeController pid=51586) INFO 2023-08-23 22:33:59,874 controller 51586 deployment_state.py:1654 - Adding 1 replica to deployment default_BertBaseModel.
(ServeController pid=51586) INFO 2023-08-23 22:33:59,876 controller 51586 deployment_state.py:1654 - Adding 1 replica to deployment default_APIIngress.
2023-08-23 22:34:11,767 SUCC scripts.py:462 -- Deployed Serve app successfully.
(ServeReplica:default_BertBaseModel pid=51656) INFO 2023-08-23 22:35:11,030 default_BertBaseModel default_BertBaseModel#ypldKT fb7a88ea-fae6-48a8-b9fb-8abce5d9ddaa /infer default replica.py:727 - INFER OK 63.1ms
(ServeReplica:default_APIIngress pid=51657) INFO 2023-08-23 22:35:11,032 default_APIIngress default_APIIngress#WPhXHi fb7a88ea-fae6-48a8-b9fb-8abce5d9ddaa /infer default replica.py:727 - __CALL__ OK 82.4ms
(ServeReplica:default_BertBaseModel pid=51656) INFO 2023-08-23 22:36:33,731 default_BertBaseModel default_BertBaseModel#ypldKT 6a456015-9789-404e-9cff-9a2b69df6156 /infer default replica.py:727 - INFER OK 1.7ms
(ServeReplica:default_APIIngress pid=51657) INFO 2023-08-23 22:36:33,732 default_APIIngress default_APIIngress#WPhXHi 6a456015-9789-404e-9cff-9a2b69df6156 /infer default replica.py:727 - __CALL__ OK 6.7ms
(ServeReplica:default_BertBaseModel pid=51656) INFO 2023-08-23 22:36:51,046 default_BertBaseModel default_BertBaseModel#ypldKT ec8feb78-0805-46be-be1a-ffcd0775b998 /infer default replica.py:727 - INFER OK 1.1ms
(ServeReplica:default_APIIngress pid=51657) INFO 2023-08-23 22:36:51,046 default_APIIngress default_APIIngress#WPhXHi ec8feb78-0805-46be-be1a-ffcd0775b998 /infer default replica.py:727 - __CALL__ OK 5.9ms

>>> resp = requests.get(f"http://127.0.0.1:8000/infer?sentence=Ray is super cool")
>>> print(resp.status_code, resp.json())
200 joy
>>>

                                                                                                                    neuron-top 2.12.2.0 running on i-xxx (inf2.8xlarge)

 NeuronCore v2 Utilization (Avg:  0.00%)
                                                                            NC0                                                                                                                                               NC1
 ND0  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ 0.00%] |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ 0.00%]

 vCPU Utilization
 System vCPU Usage  ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ 0.19%, 0.03%]
 Runtime vCPU Usage ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||[ 0.00%, 0.00%]

 Memory Usage Summary
 Host Used Memory          Total: 527.4MB                               Tensors: 0.0B                                Constants: 0.0B                              DMA Buffers: 256.0KB                         App. Memory: 527.1MB
 Device Used Memory        Total: 380.3MB                               Tensors: 0.0B                                Constants: 282.7MB                           Model Code: 97.6MB                           Runtime Memory: 1.1KB                        Model Scratchpad: 0.0B

 Memory Usage Details
                                                                                                                                                                 Model ID                                     Device Memory                                Host Memory
  [-] ND 0                                                                                                                                                                                                     380.3MB                                      20.0KB
      [-] NC 0                                                                                                                                                                                                 380.3MB                                      20.0KB
          [+] /tmp/tmp_9ord0xt/graph.neff                                                                                                                        10001                                         283.3MB                                      20.0KB
          Model Code                                                                                                                                                                                           97.0MB                                       0.0B
          Runtime Memory                                                                                                                                                                                       1.1KB                                        0.0B
      NC 1

Signed-off-by: maheedhar reddy chappidi <[email protected]>

shrekris-anyscale

Stamp cc @akshay-anyscale

akshay-anyscale · 2023-08-25T00:28:15Z

@zhe-thoughts tagging for merge

zhe-thoughts

Clean docs change

* [Docs] Add model serve using AWS NeuronCore Signed-off-by: maheedhar reddy chappidi <[email protected]> Signed-off-by: e428265 <[email protected]>

* [Docs] Add model serve using AWS NeuronCore Signed-off-by: maheedhar reddy chappidi <[email protected]>

* [Docs] Add model serve using AWS NeuronCore Signed-off-by: maheedhar reddy chappidi <[email protected]> Signed-off-by: Jim Thompson <[email protected]>

* [Docs] Add model serve using AWS NeuronCore Signed-off-by: maheedhar reddy chappidi <[email protected]> Signed-off-by: Victor <[email protected]>

[Docs] Add model serve using AWS NeuronCore

7fcd7b9

Signed-off-by: maheedhar reddy chappidi <[email protected]>

chappidim requested review from edoakes, shrekris-anyscale, sihanwang41, zcin, architkulkarni and a team as code owners August 23, 2023 23:46

chappidim changed the title ~~[Docs] Add model serve using AWS NeuronCore~~ [Docs][Serve] Add model serve using AWS NeuronCore Aug 23, 2023

[Docs] Add model serve using AWS NeuronCore - fix comma

a22e8ee

Signed-off-by: maheedhar reddy chappidi <[email protected]>

akshay-anyscale approved these changes Aug 24, 2023

View reviewed changes

[Docs] Add model serve using AWS NeuronCore - fix notes

ef7f9f4

Signed-off-by: maheedhar reddy chappidi <[email protected]>

chappidim requested a review from akshay-anyscale August 24, 2023 22:19

shrekris-anyscale approved these changes Aug 24, 2023

View reviewed changes

akshay-anyscale assigned zhe-thoughts Aug 25, 2023

akshay-anyscale added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Aug 25, 2023

zhe-thoughts approved these changes Aug 25, 2023

View reviewed changes

zhe-thoughts merged commit 99bf189 into ray-project:master Aug 25, 2023

chappidim deleted the docs-serve branch August 29, 2023 17:10

LeonLuttenberger pushed a commit to jaidisido/ray that referenced this pull request Sep 5, 2023

[Docs][Serve] Add model serve using AWS NeuronCore (ray-project#38811)

fb413cd

* [Docs] Add model serve using AWS NeuronCore Signed-off-by: maheedhar reddy chappidi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs][Serve] Add model serve using AWS NeuronCore #38811

[Docs][Serve] Add model serve using AWS NeuronCore #38811

chappidim commented Aug 23, 2023 •

edited

Loading

shrekris-anyscale left a comment

akshay-anyscale commented Aug 25, 2023

zhe-thoughts left a comment

[Docs][Serve] Add model serve using AWS NeuronCore #38811

[Docs][Serve] Add model serve using AWS NeuronCore #38811

Conversation

chappidim commented Aug 23, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

Manual Testing

shrekris-anyscale left a comment

Choose a reason for hiding this comment

akshay-anyscale commented Aug 25, 2023

zhe-thoughts left a comment

Choose a reason for hiding this comment

chappidim commented Aug 23, 2023 •

edited

Loading