Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Delete response factory after sending complete final #373

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

kthui
Copy link
Contributor

@kthui kthui commented Aug 2, 2024

What does the PR do?

Delete the response factory after the response sender sends a response with complete final flag. If the Python process did not get a chance to garbage collect the response sender before unloading the model, the response factory would be destructed which allows the model to gracefully unload.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

triton-inference-server/server#7504
triton-inference-server/vllm_backend#55

Where should the reviewer start?

N/A

Test plan:

The PR refactors how a response factory object can be deleted, it is neither a feature nor a bug fix, so existing tests should be sufficient to cover any regression.

A test enhancement is added for response sender to ensure the deleted response factory cannot be accidentally dereferenced. See triton-inference-server/server#7504 for more details.

  • CI Pipeline ID: 17156719

Caveats:

N/A

Background

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

@kthui kthui added the PR: refactor A code change that neither fixes a bug nor adds a feature label Aug 2, 2024
@kthui kthui marked this pull request as ready for review August 2, 2024 22:51
@GuanLuo
Copy link
Contributor

GuanLuo commented Aug 3, 2024

Should have test to show that the user will not be able to send any response via response sender after sending a final response

@kthui
Copy link
Contributor Author

kthui commented Aug 3, 2024

Should have test to show that the user will not be able to send any response via response sender after sending a final response

Have an indirect test, which sends the first response with complete final flag by returning a response on a non-decoupled model, and then send another response and complete final flag afterward. The client should only see the first returned response.

I can add a more direct test on decoupled model sending complete final flag two times.

@kthui
Copy link
Contributor Author

kthui commented Aug 6, 2024

Merging the refactor changes first, so the cancellation on response thread can be rebased on top. The test enhancement can be merged separately.

@kthui kthui merged commit 2203a5b into main Aug 6, 2024
3 checks passed
@kthui kthui deleted the jacky-res-factory-del branch August 6, 2024 23:40
@kthui kthui restored the jacky-res-factory-del branch August 7, 2024 01:10
@kthui kthui deleted the jacky-res-factory-del branch August 7, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: refactor A code change that neither fixes a bug nor adds a feature
Development

Successfully merging this pull request may close these issues.

4 participants