Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA #7705

Draft
wants to merge 3,450 commits into
base: main
Choose a base branch
from

Conversation

indrajit96
Copy link
Contributor

What does the PR do?

Split L0_perf_nomodel into 2 test to better debug and run PA for custom backend.
Currently a single test gets, split into 2
Other fixes as suggested by ops team and tools team

  1. Remove tee utility
  2. Toggle PA args

This is a PR to discuss around PA args and above fixes

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

NA

Where should the reviewer start?

L0_perf_nomodel_new

Test plan:

None

Caveats:

Altering PA values alters the measurements greatly

Background

Kibana dashboard
https://gpuwa.nvidia.com/kibana/app/dashboards#/view/e18ad380-79e8-11ef-9f55-436af67f73cb?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-90d,to:now))

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

None

nealvaidya and others added 30 commits December 14, 2023 18:33
* vLLM Benchmarking Test
#6639)

* Add ability to configure GRPC max connection age and max connection age grace
* Allow pass GRPC connection age args when they are set from command
----------
Co-authored-by: Katherine Yang <[email protected]>
…d test (#6713)

* Modify HTTP frontend to return error code reflecting Triton error

* Add test for dedicated HTTP error. Releax existing test on HTTP code

* Address comment. Fix copy right
* Update README and versions for 23.12 branch

* Bring back the README (#6671)

* Bring back the README

* main -> r23.12

* Remove L0_libtorch_nvfuser (#6674)

* iGPU build refactor (#6684)

* Fix iGPU CMakeFile tags (#6695) (#6698)

* Unify iGPU test build with x86 ARM

* adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1

* re-organizing some copies in Dockerfile.QA to fix igpu devel build

* Pre-commit fix

---------

Co-authored-by: kyle <[email protected]>

* Update windows Dockerfile versions (#6672)

Changing version to the latest one

Co-authored-by: Misha Chornyi <[email protected]>

* Remove README banner (#6719)

* Update README

---------

Co-authored-by: tanmayv25 <[email protected]>
Co-authored-by: Jacky <[email protected]>
Co-authored-by: kyle <[email protected]>
* testing apprroach with pre-built image

* Build TensorRT-LLM

* Disable Triton Build

* Remove file

* Update config

* Changet PATH variables

* Update path

* Update configuration for CMake

* Getting back TRITON_BUILD flag

* REvert missing files creation

* Update configuration for the PyTorch installation

* Update configuration for docker

* Change the location

* Update configuration

* update config

* Set CMake version to 3.27.7

* Fix double slash typo

* remove unused strings

* restore typo (#6680)

* remove old line

* fix line indentation

* Update LD_LIBRARY_PATH for TensorRT-LLM

* Addign TRT llm changes

* remove TRT-LLM container from bhte argument list

* Update indentation
* Update RE2 package location

* Use only 1 parallel thread for build

* Revert "Use only 1 parallel thread for build"

This reverts commit 93eab3a.
* Add testing for zero tensors in PyTorch backend

* Fix up

* Review edit
* Do not fail test on insufficient hardware concurrency

* Track instead of fail test if cannot replicate load while async unload

* Add some TODOs for the sub-test
* Simplify cmake install command

* Fix up

* Review comment
* Add cmdline option to set model load retry. Add test

* Fix copyright

* Minor change on testing model

* Remove unused import
- Extend L0_storage_S3 test timeout
* Patch L0_model_config with runtime

* Add L0_pytorch_python_runtime

* Update expected runtime field

* Add test for escaping runtime

* Add comments on unit test imports

* Add invalid runtime test

* User to build PyTorch env

* Update copyright
* Test case

* Update metrics.md

* Fix alert

* Add copyright

* Update test

* Improve pinned_memory_metrics_test.py

* Update qa/L0_metrics/pinned_memory_metrics_test.py

Co-authored-by: Ryan McCormick <[email protected]>

* Update pinned_memory_metrics_test.py

---------

Co-authored-by: Ryan McCormick <[email protected]>
Added support for OTel context propagation

---------

Co-authored-by: Markus Hennerbichler <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
This validates the change made to ../core wrt how model configuration mtime is handled.
* Run all cases wihh shm probe

* Warmup test and then run multiple iterations

* Log free shared memory on enter/exit of probe

* Add shm probe to all tests

* Add debug_str to shm_util

* Refactor ensemble_io test, modify probe to check for growth rather than inequality

* Improve stability of bls_tensor_lifecycle gpu memory tests

* Add more visibility into failing model/case in python_unittest helper

* [FIXME] Skip probe on certain subtests for now

* [FIXME] Remove shm probe from test_restart on unhealthy stub

* Start clean server run for each bls test case

* Don't exit early on failure so logs can be properly collected

* Restore bls test logic

* Fix shm size compare

* Print region name that leaked

* Remove special handling on unittest

* Remove debug str

* Add enter and exit delay to shm leak probe

---------

Co-authored-by: Ryan McCormick <[email protected]>
* Update trace_summery script

* Remove GRPC_WAITREAD and Overhead
* Add gsutil cp retry helper function

* Add max retry to GCS upload

* Use simple sequential upload
@pvijayakrish pvijayakrish force-pushed the ibhosale_nomodel_perf branch from f3ba200 to 827deb5 Compare January 15, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.