fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA #7705

indrajit96 · 2024-10-15T21:27:58Z

What does the PR do?

Split L0_perf_nomodel into 2 test to better debug and run PA for custom backend.
Currently a single test gets, split into 2
Other fixes as suggested by ops team and tools team

Remove tee utility
Toggle PA args

This is a PR to discuss around PA args and above fixes

Checklist

PR title reflects the change and is of format <commit_type>: <Title>
Changes are described in the pull request.
Related issues are referenced.
Populated github labels field
Added test plan and verified test passes.
Verified that the PR passes existing CI.
Verified copyright is correct on all changed files.
Added succinct git squash message before merging ref.
All template sections are filled out.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

NA

Where should the reviewer start?

L0_perf_nomodel_new

Test plan:

None

CI Pipeline ID:
https://gitlab-master.nvidia.com/dl/dgx/tritonserver/-/pipelines/19337854

Caveats:

Altering PA values alters the measurements greatly

Background

Kibana dashboard
https://gpuwa.nvidia.com/kibana/app/dashboards#/view/e18ad380-79e8-11ef-9f55-436af67f73cb?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-90d,to:now))

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

None

* vLLM Benchmarking Test

#6639) * Add ability to configure GRPC max connection age and max connection age grace * Allow pass GRPC connection age args when they are set from command ---------- Co-authored-by: Katherine Yang <[email protected]>

…d test (#6713) * Modify HTTP frontend to return error code reflecting Triton error * Add test for dedicated HTTP error. Releax existing test on HTTP code * Address comment. Fix copy right

* Update README and versions for 23.12 branch * Bring back the README (#6671) * Bring back the README * main -> r23.12 * Remove L0_libtorch_nvfuser (#6674) * iGPU build refactor (#6684) * Fix iGPU CMakeFile tags (#6695) (#6698) * Unify iGPU test build with x86 ARM * adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1 * re-organizing some copies in Dockerfile.QA to fix igpu devel build * Pre-commit fix --------- Co-authored-by: kyle <[email protected]> * Update windows Dockerfile versions (#6672) Changing version to the latest one Co-authored-by: Misha Chornyi <[email protected]> * Remove README banner (#6719) * Update README --------- Co-authored-by: tanmayv25 <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: kyle <[email protected]>

* testing apprroach with pre-built image * Build TensorRT-LLM * Disable Triton Build * Remove file * Update config * Changet PATH variables * Update path * Update configuration for CMake * Getting back TRITON_BUILD flag * REvert missing files creation * Update configuration for the PyTorch installation * Update configuration for docker * Change the location * Update configuration * update config * Set CMake version to 3.27.7 * Fix double slash typo * remove unused strings * restore typo (#6680) * remove old line * fix line indentation * Update LD_LIBRARY_PATH for TensorRT-LLM * Addign TRT llm changes * remove TRT-LLM container from bhte argument list * Update indentation

* Update RE2 package location * Use only 1 parallel thread for build * Revert "Use only 1 parallel thread for build" This reverts commit 93eab3a.

* Add testing for zero tensors in PyTorch backend * Fix up * Review edit

* Do not fail test on insufficient hardware concurrency * Track instead of fail test if cannot replicate load while async unload * Add some TODOs for the sub-test

* Simplify cmake install command * Fix up * Review comment

* Add cmdline option to set model load retry. Add test * Fix copyright * Minor change on testing model * Remove unused import

- Extend L0_storage_S3 test timeout

#6775)

* Patch L0_model_config with runtime * Add L0_pytorch_python_runtime * Update expected runtime field * Add test for escaping runtime * Add comments on unit test imports * Add invalid runtime test * User to build PyTorch env * Update copyright

* Test case * Update metrics.md * Fix alert * Add copyright * Update test * Improve pinned_memory_metrics_test.py * Update qa/L0_metrics/pinned_memory_metrics_test.py Co-authored-by: Ryan McCormick <[email protected]> * Update pinned_memory_metrics_test.py --------- Co-authored-by: Ryan McCormick <[email protected]>

Added support for OTel context propagation --------- Co-authored-by: Markus Hennerbichler <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

This validates the change made to ../core wrt how model configuration mtime is handled.

* Run all cases wihh shm probe * Warmup test and then run multiple iterations * Log free shared memory on enter/exit of probe * Add shm probe to all tests * Add debug_str to shm_util * Refactor ensemble_io test, modify probe to check for growth rather than inequality * Improve stability of bls_tensor_lifecycle gpu memory tests * Add more visibility into failing model/case in python_unittest helper * [FIXME] Skip probe on certain subtests for now * [FIXME] Remove shm probe from test_restart on unhealthy stub * Start clean server run for each bls test case * Don't exit early on failure so logs can be properly collected * Restore bls test logic * Fix shm size compare * Print region name that leaked * Remove special handling on unittest * Remove debug str * Add enter and exit delay to shm leak probe --------- Co-authored-by: Ryan McCormick <[email protected]>

…ackend_python (#6823)

* Update trace_summery script * Remove GRPC_WAITREAD and Overhead

* Add gsutil cp retry helper function * Add max retry to GCS upload * Use simple sequential upload

Co-authored-by: Sai Kiran Polisetty <[email protected]>

…7684)

Co-authored-by: Ryan McCormick <[email protected]>

Co-authored-by: nnshah1

nealvaidya and others added 30 commits December 14, 2023 18:33

Escape special characters in general docs (#6697)

8f5f515

vLLM Benchmarking Test (#6631)

5b46d0e

* vLLM Benchmarking Test

Modify HTTP frontend to return error code reflecting Triton error. Ad…

a4b8162

…d test (#6713) * Modify HTTP frontend to return error code reflecting Triton error * Add test for dedicated HTTP error. Releax existing test on HTTP code * Address comment. Fix copy right

Remove double unit test (#6714)

cb0c2e5

Update RE2 package location (#6750)

14f70b6

* Update RE2 package location * Use only 1 parallel thread for build * Revert "Use only 1 parallel thread for build" This reverts commit 93eab3a.

Add testing for zero tensors in PyTorch backend (#6760)

6bc5625

* Add testing for zero tensors in PyTorch backend * Fix up * Review edit

Fix L0_lifecycle on insufficient hardware concurrency (#6762)

fb3747a

* Do not fail test on insufficient hardware concurrency * Track instead of fail test if cannot replicate load while async unload * Add some TODOs for the sub-test

Simplify cmake install command (#6725)

1ea633a

* Simplify cmake install command * Fix up * Review comment

Add cmdline option to set model load retry. Add test (#6764)

b48aa57

* Add cmdline option to set model load retry. Add test * Fix copyright * Minor change on testing model * Remove unused import

Increase timeout (#6774)

8af13e9

- Extend L0_storage_S3 test timeout

Move from jfrog artifactory to archives.boost.io to fix boost download (

b5f1f7d

#6775)

Add Triton Inference Server In-Process Python API Tests

310c38c

Bump min cxx standard to 17 (#6742)

2782d30

Update 'main' to track development of 2.42.0 / 24.02 (#6786)

c205451

Support for Context Propagation for OTel trace mode (#6785)

4a719e4

Added support for OTel context propagation --------- Co-authored-by: Markus Hennerbichler <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

Use current time when overwriting model configuration. (#6727)

87165b2

This validates the change made to ../core wrt how model configuration mtime is handled.

Added docs for otel context propagation (#6804)

7b06a37

Fix typos in trace.md (#6808)

b6e017e

Fix test_model_config_overwite in L0_lifecycle (#6818)

3e79b2a

Remove boost::filesystem (#6810)

3bff367

Generate unittest xml reports from L0_python_api (#6822)

bc71da0

Add unit test reports to L0_json, L0_metrics, L0_response_cache, L0_b…

6192c6e

…ackend_python (#6823)

Update trace summary script (#6758)

a514a05

* Update trace_summery script * Remove GRPC_WAITREAD and Overhead

Add gsutil upload retry helper function (#6817)

28f497c

* Add gsutil cp retry helper function * Add max retry to GCS upload * Use simple sequential upload

indrajit96 and others added 29 commits October 4, 2024 13:18

Remove tee utility

b59b1f2

fix: usage of ReadDataFromJson in array tensors (#7624)

1df30ed

Co-authored-by: Sai Kiran Polisetty <[email protected]>

Split L0_perf_nomodel

48eed86

fix: tritonfrontend gRPC Streaming Segmentation Fault (#7671)

9bbee48

Fix dockerfile

40f1dca

test: Enhance Python gRPC streaming test to send multiple requests (#…

71a285a

…7684)

refactor: Removing Server subclass from tritonfrontend (#7683)

d6488fd

Add timeout

5de1e60

Remove PA logs

749a6b1

feat: Add copyright hook (#7666)

fb430c7

Add more logs and update PA args

1e50e37

GDB attached for debug

333818d

build: Adding tritonfrontend to build.py (#7681)

d13235c

Co-authored-by: Ryan McCormick <[email protected]>

Improve logging and reduce variance %

ccc6449

Improve logging and reduce variance %

d4a2a9a

Improve logging and reduce variance %

e405ff1

Merge branch 'main' into ibhosale_nomodel_perf

2f86f67

Remove verbose

140adc7

run_test.sh syntax

dbe7358

I don't know shell

ea13eb9

Update test order

5af36b4

feat: OpenAI Compatible Frontend (#7561)

466fed4

Co-authored-by: nnshah1

docs: Add beta note to OpenAI compatible API (#7695)

f9ca1b8

Clean-up changes

a280014

Update PERF_CLIENT_STABILIZE_WINDOW to 5000

776c4cb

Merge branch 'main' into ibhosale_nomodel_perf

cc72b7f

Remove comments

8d1eceb

Copyright fixed

273a2c0

New PA Experiment for --request-count=3000000

827deb5

pvijayakrish force-pushed the ibhosale_nomodel_perf branch from f3ba200 to 827deb5 Compare January 15, 2025 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA #7705

fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA #7705

indrajit96 commented Oct 15, 2024

fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA #7705

Are you sure you want to change the base?

fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA #7705

Conversation

indrajit96 commented Oct 15, 2024

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)