Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ai/live: Add limited retries for segment publish #3315

Merged
merged 2 commits into from
Dec 17, 2024
Merged

Conversation

j0sh
Copy link
Collaborator

@j0sh j0sh commented Dec 13, 2024

Only retry if the error occurs before sending any data, and if the next segment hasn't arrived yet.

Depends on #3308

@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Dec 13, 2024
Base automatically changed from ja/check-slow-orchs to master December 16, 2024 22:51
Only retry if the error occurs before sending any data, and
if the next segment hasn't arrived yet.
@j0sh j0sh marked this pull request as ready for review December 16, 2024 23:54
@j0sh j0sh requested review from victorges, leszko and mjh1 December 16, 2024 23:54
Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 0% with 86 lines in your changes missing coverage. Please review.

Project coverage is 33.86599%. Comparing base (a842a0d) to head (2ea2080).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
trickle/trickle_publisher.go 0.00000% 48 Missing ⚠️
server/ai_live_video.go 0.00000% 33 Missing ⚠️
trickle/trickle_server.go 0.00000% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3315         +/-   ##
===================================================
- Coverage   33.92247%   33.86599%   -0.05648%     
===================================================
  Files            141         141                 
  Lines          37173       37235         +62     
===================================================
  Hits           12610       12610                 
- Misses         23843       23905         +62     
  Partials         720         720                 
Files with missing lines Coverage Δ
trickle/trickle_server.go 0.00000% <0.00000%> (ø)
server/ai_live_video.go 0.00000% <0.00000%> (ø)
trickle/trickle_publisher.go 0.00000% <0.00000%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ac5972...2ea2080. Read the comment docs.

Files with missing lines Coverage Δ
trickle/trickle_server.go 0.00000% <0.00000%> (ø)
server/ai_live_video.go 0.00000% <0.00000%> (ø)
trickle/trickle_publisher.go 0.00000% <0.00000%> (ø)

Copy link
Member

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

}
clog.Infof(ctx, "Error publishing segment before writing; retrying err=%v", err)
// Clone in case read head was incremented somewhere, which cloning ressets
r = reader.Clone()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this seems maybe not 100% guaranteed, since it could happen that something was read from the reader and then failed to be sent to the writer (and we got 0 on L89).

I think there's no really easy option here apart from maybe an additional layer to check if the reader was used at some point, like a wrapped reader that flags if it was read. A built-in approach could be using a io.TeeReader to read into a buffer as we read from the input (which would also allow us to retry when some bytes have been read).

Anyway, if you think this is too much of a corner case, this LGTM as it's much simpler. But I think that conceptually it could still have this issue 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it could happen that something was read from the reader and then failed to be sent to the writer (and we got 0 on L89).

That is basically why we call Clone() (which resets the read head for the clone) so when we try to read it during the next iteration, it is starting from the beginning, regardless of the actual read head. Note that this isn't a "standard" IO reader but rather something specialized that allows for clean resets, nonblocking reads/writes, multiple readers, etc, without having to go through hoops like TeeReader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh that's cool! Didn't know we already had that custom impl. Pretty neat :)

@j0sh j0sh merged commit 706fda1 into master Dec 17, 2024
18 checks passed
@j0sh j0sh deleted the ja/retry-publish branch December 17, 2024 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch. go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants