Build: Fixing model generation #7763

pvijayakrish · 2024-11-04T18:19:54Z

What does the PR do?

Fixing model generation.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

* Unify iGPU test build with x86 ARM * adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1 * re-organizing some copies in Dockerfile.QA to fix igpu devel build * Pre-commit fix --------- Co-authored-by: kyle <[email protected]>

* adding default value for TRITON_IGPU_BUILD=OFF * fix newline --------- Co-authored-by: kyle <[email protected]>

* Add test case for decoupled model raising exception * Remove unused import * Address comment

* vLLM Benchmarking Test

#6639) * Add ability to configure GRPC max connection age and max connection age grace * Allow pass GRPC connection age args when they are set from command ---------- Co-authored-by: Katherine Yang <[email protected]>

…d test (#6713) * Modify HTTP frontend to return error code reflecting Triton error * Add test for dedicated HTTP error. Releax existing test on HTTP code * Address comment. Fix copy right

* Update README and versions for 23.12 branch * Bring back the README (#6671) * Bring back the README * main -> r23.12 * Remove L0_libtorch_nvfuser (#6674) * iGPU build refactor (#6684) * Fix iGPU CMakeFile tags (#6695) (#6698) * Unify iGPU test build with x86 ARM * adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1 * re-organizing some copies in Dockerfile.QA to fix igpu devel build * Pre-commit fix --------- Co-authored-by: kyle <[email protected]> * Update windows Dockerfile versions (#6672) Changing version to the latest one Co-authored-by: Misha Chornyi <[email protected]> * Remove README banner (#6719) * Update README --------- Co-authored-by: tanmayv25 <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: kyle <[email protected]>

* testing apprroach with pre-built image * Build TensorRT-LLM * Disable Triton Build * Remove file * Update config * Changet PATH variables * Update path * Update configuration for CMake * Getting back TRITON_BUILD flag * REvert missing files creation * Update configuration for the PyTorch installation * Update configuration for docker * Change the location * Update configuration * update config * Set CMake version to 3.27.7 * Fix double slash typo * remove unused strings * restore typo (#6680) * remove old line * fix line indentation * Update LD_LIBRARY_PATH for TensorRT-LLM * Addign TRT llm changes * remove TRT-LLM container from bhte argument list * Update indentation

* Update RE2 package location * Use only 1 parallel thread for build * Revert "Use only 1 parallel thread for build" This reverts commit 93eab3a.

* Add testing for zero tensors in PyTorch backend * Fix up * Review edit

* Do not fail test on insufficient hardware concurrency * Track instead of fail test if cannot replicate load while async unload * Add some TODOs for the sub-test

* Simplify cmake install command * Fix up * Review comment

* Add cmdline option to set model load retry. Add test * Fix copyright * Minor change on testing model * Remove unused import

- Extend L0_storage_S3 test timeout

#6775)

* Patch L0_model_config with runtime * Add L0_pytorch_python_runtime * Update expected runtime field * Add test for escaping runtime * Add comments on unit test imports * Add invalid runtime test * User to build PyTorch env * Update copyright

* Test case * Update metrics.md * Fix alert * Add copyright * Update test * Improve pinned_memory_metrics_test.py * Update qa/L0_metrics/pinned_memory_metrics_test.py Co-authored-by: Ryan McCormick <[email protected]> * Update pinned_memory_metrics_test.py --------- Co-authored-by: Ryan McCormick <[email protected]>

Added support for OTel context propagation --------- Co-authored-by: Markus Hennerbichler <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

This validates the change made to ../core wrt how model configuration mtime is handled.

* Run all cases wihh shm probe * Warmup test and then run multiple iterations * Log free shared memory on enter/exit of probe * Add shm probe to all tests * Add debug_str to shm_util * Refactor ensemble_io test, modify probe to check for growth rather than inequality * Improve stability of bls_tensor_lifecycle gpu memory tests * Add more visibility into failing model/case in python_unittest helper * [FIXME] Skip probe on certain subtests for now * [FIXME] Remove shm probe from test_restart on unhealthy stub * Start clean server run for each bls test case * Don't exit early on failure so logs can be properly collected * Restore bls test logic * Fix shm size compare * Print region name that leaked * Remove special handling on unittest * Remove debug str * Add enter and exit delay to shm leak probe --------- Co-authored-by: Ryan McCormick <[email protected]>

Co-authored-by: Ryan McCormick <[email protected]>

… outputs are not created (#7701)

Co-authored-by: Ryan McCormick <[email protected]>

…-commi… (#7699)

…ent startup timeouts (#7730)

)

…er each model reload (#7735)

Co-authored-by: Ryan McCormick <[email protected]>

Co-authored-by: Misha Chornyi <[email protected]>

Co-authored-by: Ryan McCormick <[email protected]>

mc-nv · 2025-01-09T21:57:59Z

Is this PR still actual?

rmccorm4 and others added 30 commits December 13, 2023 12:54

Fix extra content-type headers in HTTP server (#6678)

ebd6a3e

adding default value for TRITON_IGPU_BUILD=OFF (#6705)

57b49b4

* adding default value for TRITON_IGPU_BUILD=OFF * fix newline --------- Co-authored-by: kyle <[email protected]>

Add test case for decoupled model raising exception (#6686)

b0bbabb

* Add test case for decoupled model raising exception * Remove unused import * Address comment

Escape special characters in general docs (#6697)

8f5f515

vLLM Benchmarking Test (#6631)

5b46d0e

* vLLM Benchmarking Test

Modify HTTP frontend to return error code reflecting Triton error. Ad…

a4b8162

…d test (#6713) * Modify HTTP frontend to return error code reflecting Triton error * Add test for dedicated HTTP error. Releax existing test on HTTP code * Address comment. Fix copy right

Remove double unit test (#6714)

cb0c2e5

Update RE2 package location (#6750)

14f70b6

* Update RE2 package location * Use only 1 parallel thread for build * Revert "Use only 1 parallel thread for build" This reverts commit 93eab3a.

Add testing for zero tensors in PyTorch backend (#6760)

6bc5625

* Add testing for zero tensors in PyTorch backend * Fix up * Review edit

Fix L0_lifecycle on insufficient hardware concurrency (#6762)

fb3747a

* Do not fail test on insufficient hardware concurrency * Track instead of fail test if cannot replicate load while async unload * Add some TODOs for the sub-test

Simplify cmake install command (#6725)

1ea633a

* Simplify cmake install command * Fix up * Review comment

Add cmdline option to set model load retry. Add test (#6764)

b48aa57

* Add cmdline option to set model load retry. Add test * Fix copyright * Minor change on testing model * Remove unused import

Increase timeout (#6774)

8af13e9

- Extend L0_storage_S3 test timeout

Move from jfrog artifactory to archives.boost.io to fix boost download (

b5f1f7d

#6775)

Add Triton Inference Server In-Process Python API Tests

310c38c

Bump min cxx standard to 17 (#6742)

2782d30

Update 'main' to track development of 2.42.0 / 24.02 (#6786)

c205451

Support for Context Propagation for OTel trace mode (#6785)

4a719e4

Added support for OTel context propagation --------- Co-authored-by: Markus Hennerbichler <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>

Use current time when overwriting model configuration. (#6727)

87165b2

This validates the change made to ../core wrt how model configuration mtime is handled.

Added docs for otel context propagation (#6804)

7b06a37

Fix typos in trace.md (#6808)

b6e017e

Fix test_model_config_overwite in L0_lifecycle (#6818)

3e79b2a

Remove boost::filesystem (#6810)

3bff367

rmccorm4 and others added 27 commits October 11, 2024 18:28

docs: Add beta note to OpenAI compatible API (#7695)

f9ca1b8

fix: Fix bug when targeting the TRT-LLM backend ensemble (#7700)

c730982

Co-authored-by: Ryan McCormick <[email protected]>

test: Allow ensemble to create the final response even if some of the…

0200d2c

… outputs are not created (#7701)

test: Update server repo for some tests (#7704)

1a54d83

docs: Add example outputs to OpenAI Frontend docs (#7691)

2961cf8

Co-authored-by: Ryan McCormick <[email protected]>

chore: Fix genai-perf command and add missing copyrights (#7710)

01e77a8

docs: Clarify meanings of ensemble key and value (#7711)

aeb20a1

fix: Re-enables copyright hook, updates GitHub Action to only run pre…

dedb9e7

…-commi… (#7699)

fix: Fix L0_perf_nomodel shared memory (#7709)

940aa22

Change compute capablity min value (#7708)

6f6cbe0

build: tritonfrontend support for no/partial endpoint builds (#7605)

aa93b95

Revert "Change compute capablity min value (#7708)" (#7721)

ee198de

test: Test and document histogram latency metrics (#7694)

12b1968

fix: Copy models out of NFS before starting Triton to avoid intermitt…

10d7eaa

…ent startup timeouts (#7730)

docs: Add support matrix for model parallelism in OpenAI Frontend (#7715

2f8de73

)

test: Add L0_additional_dependency_dirs (#7707)

dcfc6a0

test: Add small delay to L0_lifecycle test_load_new_model_version aft…

128f19a

…er each model reload (#7735)

Removing caching on windows. (#7717)

604b2aa

feat: Metrics Support in tritonfrontend (#7703)

4453fa3

Co-authored-by: Ryan McCormick <[email protected]>

build: RHEL8 Python Backend (#7744)

284e71d

Co-authored-by: Misha Chornyi <[email protected]>

chore: ensure proper clean up in shared memory related tests (#7729)

3bfacf8

refactor: Include job id and nightly tag to results uploaded (#7751)

c7589f1

Update test script for TRT compatibility test to check for

0b724f2

Build: Update main branch post 24.10 release (#7754)

3b4fabd

ci: Adding tests for numpy>=2 (#7756)

67c59c8

Co-authored-by: Ryan McCormick <[email protected]>

Reapply "Change compute capability min value (#7708)" (#7757)

06b358a

Fixing model generation

a217efe

pvijayakrish requested a review from mc-nv November 4, 2024 18:20

pvijayakrish force-pushed the pvijayakrish-model-generation branch from fa1b084 to a217efe Compare January 15, 2025 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build: Fixing model generation #7763

Build: Fixing model generation #7763

pvijayakrish commented Nov 4, 2024

mc-nv commented Jan 9, 2025

Build: Fixing model generation #7763

Are you sure you want to change the base?

Build: Fixing model generation #7763

Conversation

pvijayakrish commented Nov 4, 2024

What does the PR do?

Checklist

Commit Type:

mc-nv commented Jan 9, 2025