docs: update to fix autoscaling example command #7883

mattwittwer · 2024-12-16T00:21:00Z

What does the PR do?

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

* Run all cases wihh shm probe * Warmup test and then run multiple iterations * Log free shared memory on enter/exit of probe * Add shm probe to all tests * Add debug_str to shm_util * Refactor ensemble_io test, modify probe to check for growth rather than inequality * Improve stability of bls_tensor_lifecycle gpu memory tests * Add more visibility into failing model/case in python_unittest helper * [FIXME] Skip probe on certain subtests for now * [FIXME] Remove shm probe from test_restart on unhealthy stub * Start clean server run for each bls test case * Don't exit early on failure so logs can be properly collected * Restore bls test logic * Fix shm size compare * Print region name that leaked * Remove special handling on unittest * Remove debug str * Add enter and exit delay to shm leak probe --------- Co-authored-by: Ryan McCormick <[email protected]>

…ackend_python (#6823)

* Update trace_summery script * Remove GRPC_WAITREAD and Overhead

* Add gsutil cp retry helper function * Add max retry to GCS upload * Use simple sequential upload

* Handle empty output * Add test case for 0 dimension output * Fix up number of tests

* tensorrt-llm benchmarking test

* Update README and versions for 2.42.0 / 24.01 (#6789) * Update versions * Update README and versions for 2.42.0 / 24.01 * Fix documentaation genarion (#6801) * Ser version of sphix to 5.0 * Set verions 5.0.0 * Update README.md and versions post 24.01

…und (#6834) * Update miniconda version * Install pytest for different py version * Install pytest

* Add test for shutdown while loading * Fix intermittent failure on test_model_config_overwrite

Adding OpenTelemetry Batch Span Processor --------- Co-authored-by: Theo Clark <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

* Support Double-Type Infer/Response Parameters

* Base Python Backend Support for Windows

* Add unit test reports to L0_dlpack_multi_gpu * Add unit test reports to L0_warmup

* Add response statistics * Add L0_response_statistics * Enable http vs grpc statistics comparison * Add docs for response statistics protocol * Add more comments for response statistics test * Remove model name from config * Improve docs wordings * [Continue] Improve docs wordings * [Continue] Add more comments for response statistics test * [Continue 2] Improve docs wordings * Fix typo * Remove mentioning decoupled from docs * [Continue 3] Improve docs wordings * [Continue 4] Improve docs wordings Co-authored-by: Ryan McCormick <[email protected]> --------- Co-authored-by: Ryan McCormick <[email protected]>

* Switch to Python model for busyop test * Clean up * Address comment * Remove unused import

* Add cancellation into response statistics * Add test for response statistics cancel * Remove debugging print * Use is None comparison * Fix docs * Use default args None * Refactor RegisterModelStatistics()

* Modify "header_forward_pattern" to match headers case-insensitively. Add unit tests. * fix indentation * fix pre-comiit errors * Update doc * Update copyright * Add test case for "(?-i)", which disables regex case-insensitive mode. * fix pre-commit * Name each test. Remove support of disabling --http-header-forward-pattern case-insensitive mode on http python client. * Update .md file. * fix typo * Reformat args. * Fix pre-commit * Fix test name issue. * Fix pre-commit. * Update md file and copyright.

* Update README and versions for 2.43.0 / 24.02 * Update Dockefile to reduce image size. * Update path in patch file for model generation Update README.md post-24.02

…am metric buckets (#7752)

…7770)

Co-authored-by: Kyle McGill <[email protected]>

…t version map to dictionary (#7500) Co-authored-by: Olga Andreeva <[email protected]> Co-authored-by: Kyle McGill <[email protected]>

Co-authored-by: GuanLuo <[email protected]>

…ference (#7743)

Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: Kris Hung <[email protected]>

Co-authored-by: Kyle McGill <[email protected]>

…7871)

…ts) (#7855)

kthui and others added 30 commits January 19, 2024 15:34

Remove boost::filesystem (#6810)

3bff367

Generate unittest xml reports from L0_python_api (#6822)

bc71da0

Add unit test reports to L0_json, L0_metrics, L0_response_cache, L0_b…

6192c6e

…ackend_python (#6823)

Update trace summary script (#6758)

a514a05

* Update trace_summery script * Remove GRPC_WAITREAD and Overhead

Add gsutil upload retry helper function (#6817)

28f497c

* Add gsutil cp retry helper function * Add max retry to GCS upload * Use simple sequential upload

Add test for shutdown while unloading in background (#6835)

ddfdb2a

Handle 0 dimension output for generate endpoint (#6833)

56e4232

* Handle empty output * Add test case for 0 dimension output * Fix up number of tests

tensorrt-llm benchmarking test (#6771)

d98a59c

* tensorrt-llm benchmarking test

Use libmamba solver for L0_backend_python env test. Fix pytest not fo…

d0e2653

…und (#6834) * Update miniconda version * Install pytest for different py version * Install pytest

Add test for shutdown while loading model (#6837)

f92732d

* Add test for shutdown while loading * Fix intermittent failure on test_model_config_overwrite

Adding OpenTelemetry Batch Span Processor (#6842)

776e641

Adding OpenTelemetry Batch Span Processor --------- Co-authored-by: Theo Clark <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

Support Double-Type Inference Request/Response Parameters (#6755)

b0a495a

* Support Double-Type Infer/Response Parameters

Updating vllm version to 0.3.0 (#6858)

508929a

Python Backend Windows Support (#6830)

738c98f

* Base Python Backend Support for Windows

Add support for Oracle Cloud in deploy (#6850)

3d79568

Add link to TRTLLM metrics docs (#6874)

1df73dc

Add unit test reports to L0_dlpack_multi_gpu and L0_warmup (#6873)

4294cc6

* Add unit test reports to L0_dlpack_multi_gpu * Add unit test reports to L0_warmup

Set OV version to 2023.3.0 (#6880)

f078bfb

Fixing StringTo uint32_t used only by tracing (#6883)

80fc56c

Update 'main' to track development of 2.44.0 / 24.03 (#6892)

8a2a229

Fix busyop test for L0_memory_growth (#6900)

21a7fc5

* Switch to Python model for busyop test * Clean up * Address comment * Remove unused import

Add cancellation into response statistics (#6904)

60872b9

* Add cancellation into response statistics * Add test for response statistics cancel * Remove debugging print * Use is None comparison * Fix docs * Use default args None * Refactor RegisterModelStatistics()

Install required pip pkgs (#6906)

8d8b607

Add note on --cache-config spacing and fix typos (#6929)

551978b

Remove ignore files that are not in use by repository (#6893)

246f46c

Update README and versions for 2.43.0 / 24.02 (#6886)

1dcf2cf

* Update README and versions for 2.43.0 / 24.02 * Update Dockefile to reduce image size. * Update path in patch file for model generation Update README.md post-24.02

yinggeh and others added 27 commits November 5, 2024 20:28

test: Test per-model metric customization and document custom histogr…

6191c67

…am metric buckets (#7752)

fix: Fixing pip installation as a system package (#7768)

4725600

fix: Adding copyright support for .pyi files (#7769)

5f8f07b

fix: Skip copyrights check for "expected" files in L0_model_config (#…

0269a3c

…7770)

Update 'main' to track development of 2.53.0 / 24.12 (#7771)

51b304f

test: OpenAI frontend invalid chat tokenizer network issue WAR (#7779)

d2ecac1

Update ONNX version for generated models (#7785)

60f22e4

test: RHEL Filesystem Tests (#7788)

3c7a263

Update model generation scenario (#7793) (#7797)

66026e5

fix: Fix L0_input_validation (#7800)

d4d9ebc

build: Support RHEL ORT TensorRT Execution Provider (#7812)

3815390

Co-authored-by: Kyle McGill <[email protected]>

ci: modifying stat count for L0_server_status (#7820)

2eb481d

build: update build.py to pass versions as input parameter and conver…

fb89be7

…t version map to dictionary (#7500) Co-authored-by: Olga Andreeva <[email protected]> Co-authored-by: Kyle McGill <[email protected]>

fix: Resolve integer overflow in Load API file decoding (#7787)

16154f2

Co-authored-by: GuanLuo <[email protected]>

feat: Enable deferred unregistering of shared memory regions after in…

eb1d290

…ference (#7743)

ci: Fix L0_cuda_shared_memory (#7832)

9e181b9

Update main branch post 24.11 (#7829)

3ac229e

Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: Kris Hung <[email protected]>

build: Update OpenVINO model generation script with new API (#7811)

82bcdc4

fix: L0_sequence_batcher_cudashm (#7852)

cffd318

fix: gRPC segfault due to Low Request Cancellation Timeout (#7840)

788802c

ci: RHEL8 L0_backend_python Support (#7859)

0d6b9b4

Co-authored-by: Kyle McGill <[email protected]>

fix: Lock httpx version to fix L0_openai--trtllm test failures (#7870)

c87259a

fix: Remove .Server subclass to reflect 24.12 tritonfrontend version (#…

440c827

…7871)

test: Fix requested output deleting extra outputs (#7866)

11af829

Update generated Dockerfile (#7876)

fc0fe6b

build: Adding b64 dependency to relevant targets (fix L0_build_varian…

e8a6090

…ts) (#7855)

update autoscaling example command

5d35e70

mattwittwer marked this pull request as draft December 16, 2024 00:21

mattwittwer changed the title ~~docs update to fix autoscaling example command~~ docs: update to fix autoscaling example command Dec 16, 2024

pvijayakrish force-pushed the mwittwer/docs-update-autoscaling-tag branch from 2963d69 to 5d35e70 Compare January 15, 2025 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update to fix autoscaling example command #7883

docs: update to fix autoscaling example command #7883

mattwittwer commented Dec 16, 2024

docs: update to fix autoscaling example command #7883

Are you sure you want to change the base?

docs: update to fix autoscaling example command #7883

Conversation

mattwittwer commented Dec 16, 2024

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)