-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix concurrent map panic on metadata #2
Fix concurrent map panic on metadata #2
Conversation
9f57df4
to
97fca02
Compare
metadata_supplier.go
Outdated
func inject(ctx context.Context, propagators propagation.TextMapPropagator, req *ttrpc.Request) context.Context { | ||
md, ok := ttrpc.GetMetadata(ctx) | ||
if !ok { | ||
md = make(ttrpc.MD) | ||
} else { | ||
// make a copy to avoid concurrent read/write panic | ||
md = copyMD(md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be implemented in ttrpc.GetMetadata
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel letting caller decide if it should copy (if modification is needed) or not will be more flexible? and it's more similar to http.Header
.
E.g., if ttrpc.GetMetadata
returns a copy directly, then every extract
call will also make a copy (which is unnecessary since it's only read IIUC).
otelttrpc/metadata_supplier.go
Lines 99 to 103 in ea5083f
func extract(ctx context.Context, propagators propagation.TextMapPropagator) context.Context { | |
md, ok := ttrpc.GetMetadata(ctx) | |
if !ok { | |
md = make(ttrpc.MD) | |
} |
And also we need to modify other places (e.g., in containerd) where the caller already makes copy itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we'll anyway need to be doing cross-package synchronization (at least merge this, preferably tag a patch-release, then update containerd to point to the new release), maybe once we're at it, we should go all the way and
- review and merge ttrpc #177, and tag a new patch release
- switch here to using the new
ttrpc.MD.Clone()
introduced by #177, merge this, and tag a new patch release - point containerd to the newly tagged releases and update the
ttrpc.MD
-copying code in containerd to use the samettrpc.MD.Clone()
@cpuguy83 WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds great to me 👍 , happy to update this PR after containerd/ttrpc#177 is merged/released, and update containerd code once both PR are merged/released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this PR to use the new MD.Clone
from containerd/ttrpc#177
Thanks @fuweid for helping make the quick release! (https://github.com/containerd/ttrpc/releases/tag/v1.2.7)
97fca02
to
b9ec520
Compare
Signed-off-by: Jin Dong <[email protected]>
b9ec520
to
eb0c89c
Compare
312dbd7
to
0d7ed46
Compare
kindly ping @klihub, PTAL when you have some time 🙏 thanks! |
Signed-off-by: Jin Dong <[email protected]>
0d7ed46
to
2ba3be1
Compare
I'm fine in general with this PR. It is a necessary fix, and now that we export I have two small nits remaining. One is about getting rid of that post-go1.22 comment by changing the code so that its behavior is independent of the golang version used. The other is a more general one about bumping dependencies... and I'm not sure about this one at the deps-detail level. I haven't looked at all the changes there and, for instance, compare the bumps to the versions used in containerd itself, or something similar. @cpuguy83 Any thoughts on that one ? |
Hi @klihub thanks for the comments.
Good point, updated as you suggested.
Also replied inline, only ttrpc 1.2.2 -> 1.2.7 is manuall updated and all others are from On a side note, my personal preference (on deps) is to bump up regularly (i.e., not keep at minimum version like go version in go.mod):
But this is only my view :) happy to discuss more and follow the guidance of the project/community, thanks! |
Seems like there isn't really a choice here since the bumps are coming from ttrpc itself. |
Hi @klihub 👋 kindly ping, let me know if you have any other comments/suggestions on this PR, thank you! |
@djdongjin I think this is fine now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see any locking in MD.clone so we would still fall to concurrent read/write operations during the cloning func.. IOW do we need to serialize read/write to the metadata here?
Is there any similar chance for a parallel problem on calls to "extract?"
The solution is to clone if you need to modify anything. |
Yeah as Brian mentioned we will clone if we need to make modifications, so write always happens after read (clone).
IIUC, the case you describe is:
This PR fixes the issue on parallel step 2. (Otherwise N client all writes to MD_0). I think there won't be read/write map panics, since 1 and concurrent 2s are all read on MD_0; 3 can only happen after 2 completes the clone (write MD_I). (Let me know if this makes sense, maybe sounds a little confusing :)) |
:-) on the n clients case... inject A starts cloneA of metadata in ctx address "prior" and change occurs on cloneA's address which is set in ctx after performing inject... in parallel inject B started cloneB of address "prior" before inject A wrote overwrote "prior" with cloneA ... inject B finishes... overwrites cloneA with cloneB... Is that ^ possible? |
I think it's not possible, because neither inject A nor inject B will So in this case:
WDYT @mikebrow |
Ah I see now.. so each inject creates a new context thus there are N different copies in a context "tree" if you will and any reader will be in their branch of the tree... "func WithValue(parent Context, key, val any) Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fix (require release and update otelttrpc in containerd) containerd/containerd#11138
If the
ctx
has attrpc.MD
,inject
will reuse the samettrpc.MD
across requests, causing concurrent map write panic.This PR adds a UT for concurrent client/server request to capture the issue, and fix this by cloning
ttrpc.MD
oninject
.Running the UT without the fix (commit2) can repro the panic:
With the fix (test pass and CI green):