-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix:Split L0_nomodel_perf into 2 test to ensure better debug-ability and resource util for PA #7705
Draft
indrajit96
wants to merge
3,450
commits into
main
Choose a base branch
from
ibhosale_nomodel_perf
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* vLLM Benchmarking Test
#6639) * Add ability to configure GRPC max connection age and max connection age grace * Allow pass GRPC connection age args when they are set from command ---------- Co-authored-by: Katherine Yang <[email protected]>
…d test (#6713) * Modify HTTP frontend to return error code reflecting Triton error * Add test for dedicated HTTP error. Releax existing test on HTTP code * Address comment. Fix copy right
* Update README and versions for 23.12 branch * Bring back the README (#6671) * Bring back the README * main -> r23.12 * Remove L0_libtorch_nvfuser (#6674) * iGPU build refactor (#6684) * Fix iGPU CMakeFile tags (#6695) (#6698) * Unify iGPU test build with x86 ARM * adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1 * re-organizing some copies in Dockerfile.QA to fix igpu devel build * Pre-commit fix --------- Co-authored-by: kyle <[email protected]> * Update windows Dockerfile versions (#6672) Changing version to the latest one Co-authored-by: Misha Chornyi <[email protected]> * Remove README banner (#6719) * Update README --------- Co-authored-by: tanmayv25 <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: kyle <[email protected]>
* testing apprroach with pre-built image * Build TensorRT-LLM * Disable Triton Build * Remove file * Update config * Changet PATH variables * Update path * Update configuration for CMake * Getting back TRITON_BUILD flag * REvert missing files creation * Update configuration for the PyTorch installation * Update configuration for docker * Change the location * Update configuration * update config * Set CMake version to 3.27.7 * Fix double slash typo * remove unused strings * restore typo (#6680) * remove old line * fix line indentation * Update LD_LIBRARY_PATH for TensorRT-LLM * Addign TRT llm changes * remove TRT-LLM container from bhte argument list * Update indentation
* Update RE2 package location * Use only 1 parallel thread for build * Revert "Use only 1 parallel thread for build" This reverts commit 93eab3a.
* Add testing for zero tensors in PyTorch backend * Fix up * Review edit
* Do not fail test on insufficient hardware concurrency * Track instead of fail test if cannot replicate load while async unload * Add some TODOs for the sub-test
* Simplify cmake install command * Fix up * Review comment
* Add cmdline option to set model load retry. Add test * Fix copyright * Minor change on testing model * Remove unused import
- Extend L0_storage_S3 test timeout
* Patch L0_model_config with runtime * Add L0_pytorch_python_runtime * Update expected runtime field * Add test for escaping runtime * Add comments on unit test imports * Add invalid runtime test * User to build PyTorch env * Update copyright
* Test case * Update metrics.md * Fix alert * Add copyright * Update test * Improve pinned_memory_metrics_test.py * Update qa/L0_metrics/pinned_memory_metrics_test.py Co-authored-by: Ryan McCormick <[email protected]> * Update pinned_memory_metrics_test.py --------- Co-authored-by: Ryan McCormick <[email protected]>
Added support for OTel context propagation --------- Co-authored-by: Markus Hennerbichler <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>
This validates the change made to ../core wrt how model configuration mtime is handled.
* Run all cases wihh shm probe * Warmup test and then run multiple iterations * Log free shared memory on enter/exit of probe * Add shm probe to all tests * Add debug_str to shm_util * Refactor ensemble_io test, modify probe to check for growth rather than inequality * Improve stability of bls_tensor_lifecycle gpu memory tests * Add more visibility into failing model/case in python_unittest helper * [FIXME] Skip probe on certain subtests for now * [FIXME] Remove shm probe from test_restart on unhealthy stub * Start clean server run for each bls test case * Don't exit early on failure so logs can be properly collected * Restore bls test logic * Fix shm size compare * Print region name that leaked * Remove special handling on unittest * Remove debug str * Add enter and exit delay to shm leak probe --------- Co-authored-by: Ryan McCormick <[email protected]>
* Update trace_summery script * Remove GRPC_WAITREAD and Overhead
* Add gsutil cp retry helper function * Add max retry to GCS upload * Use simple sequential upload
Co-authored-by: Sai Kiran Polisetty <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: nnshah1
pvijayakrish
force-pushed
the
ibhosale_nomodel_perf
branch
from
January 15, 2025 17:13
f3ba200
to
827deb5
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does the PR do?
Split L0_perf_nomodel into 2 test to better debug and run PA for custom backend.
Currently a single test gets, split into 2
Other fixes as suggested by ops team and tools team
This is a PR to discuss around PA args and above fixes
Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
NA
Where should the reviewer start?
L0_perf_nomodel_new
Test plan:
None
https://gitlab-master.nvidia.com/dl/dgx/tritonserver/-/pipelines/19337854
Caveats:
Altering PA values alters the measurements greatly
Background
Kibana dashboard
https://gpuwa.nvidia.com/kibana/app/dashboards#/view/e18ad380-79e8-11ef-9f55-436af67f73cb?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-90d,to:now))
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
None