Decoupled Async Execute #7062

kthui · 2024-04-01T22:30:05Z

Related PR: triton-inference-server/python_backend#350

Tests for Python backend decoupled async execute function. Covered cases:

A async execute function may start multiple coroutines and those are ran concurrently.
Multiple async execute coroutines are ran concurrently.
An async execute coroutine started late but finished early will respond before another async execute coroutine that started early but finished late.
A coroutine may raise an exception and it will be logged.

qa/python_models/async_execute_decouple_bls/model.py

qa/python_models/async_execute_decouple/model.py

… jacky-py-aio

Tabrizian

Nice testing. Could you also profile this model with perf analyzer with high concurrencies just to make sure everything works fine?

kthui · 2024-04-08T23:26:52Z

Nice testing. Could you also profile this model with perf analyzer with high concurrencies just to make sure everything works fine?

Yes, the perf_analyzer run on "async_execute_decouple" model works great

# perf_analyzer -i grpc -m async_execute_decouple --async --streaming true -b 8 --concurrency-range 128:128 --input-data zero
*** Measurement Settings ***
  Batch size: 8
  Service Kind: Triton
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Latency limit: 0 msec
  Concurrency limit: 128 concurrent requests
  Using asynchronous calls for inference
  Detected decoupled model, using the first response for measuring latency
  Stabilizing using average latency

Request concurrency: 128
  Client: 
    Request count: 44624
    Throughput: 19743.8 infer/sec
    Response Throughput: 2467.97 infer/sec
    Avg latency: 51762 usec (standard deviation 4416 usec)
    p50 latency: 53131 usec
    p90 latency: 54035 usec
    p95 latency: 54909 usec
    p99 latency: 55042 usec
    
  Server: 
    Inference count: 357016
    Execution count: 44627
    Successful request count: 44627
    Avg request latency: 49176 usec (overhead 1 usec + queue 48797 usec + compute input 25 usec + compute infer 349 usec + compute output 3 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 128, throughput: 19743.8 infer/sec, latency 51762 usec

* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes

kthui mentioned this pull request Apr 1, 2024

Decoupled Async Execute triton-inference-server/python_backend#350

Merged

kthui force-pushed the jacky-py-aio branch from fd8e3c8 to 5f7f9c9 Compare April 1, 2024 22:30

github-advanced-security bot found potential problems Apr 1, 2024

View reviewed changes

qa/python_models/async_execute_decouple_bls/model.py Fixed Show fixed Hide fixed

qa/python_models/async_execute_decouple/model.py Fixed Show fixed Hide fixed

Add async execute decoupled test

fb217a7

kthui force-pushed the jacky-py-aio branch from 5f7f9c9 to 110d8eb Compare April 2, 2024 00:20

Add decoupled bls async exec test

5d422c0

kthui force-pushed the jacky-py-aio branch from 110d8eb to 5d422c0 Compare April 2, 2024 00:28

Enhance test with different durations for concurrent executes

e4e3cce

kthui marked this pull request as ready for review April 3, 2024 18:28

kthui requested review from Tabrizian, nnshah1 and GuanLuo April 3, 2024 22:12

kthui added 2 commits April 4, 2024 20:24

Merge branch 'main' of github.com:triton-inference-server/server into…

e0540e6

… jacky-py-aio

Merge branch 'main' of github.com:triton-inference-server/server into…

2f80194

… jacky-py-aio

Tabrizian approved these changes Apr 5, 2024

View reviewed changes

Merge branch 'main' into jacky-py-aio

93e9af5

kthui added 2 commits April 8, 2024 20:08

Merge branch 'main' into jacky-py-aio

454ef05

Merge branch 'main' into jacky-py-aio

8697dcd

kthui merged commit f1a515d into main Apr 11, 2024
3 checks passed

kthui deleted the jacky-py-aio branch April 11, 2024 17:55

kthui added a commit that referenced this pull request Apr 11, 2024

Decoupled Async Execute (#7062)

2b562a8

* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes

kthui mentioned this pull request Apr 11, 2024

Decoupled Async Execute (#7062) #7100

Merged

mc-nv pushed a commit that referenced this pull request Apr 11, 2024

Decoupled Async Execute (#7062) (#7100)

3ae2edb

* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoupled Async Execute #7062

Decoupled Async Execute #7062

kthui commented Apr 1, 2024 •

edited

Loading

Tabrizian left a comment

kthui commented Apr 8, 2024

Decoupled Async Execute #7062

Decoupled Async Execute #7062

Conversation

kthui commented Apr 1, 2024 • edited Loading

Tabrizian left a comment

Choose a reason for hiding this comment

kthui commented Apr 8, 2024

kthui commented Apr 1, 2024 •

edited

Loading