-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silently drop PDUs we fail to parse from federation, instead of failing entire transaction #17893
base: develop
Are you sure you want to change the base?
Changes from 4 commits
95cbc99
730822c
e89add8
07578bd
9953a36
f55e645
464e9cf
b6b5d81
89f91aa
3ca943a
b8ce471
b2916f1
7b3a695
78e04dd
df07930
f61de20
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir | ||
MadLittleMods marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,7 +56,11 @@ | |
SynapseError, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @morguldir Are you up for adding a test for this? Probably something in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is b6b5d81 what you had in mind? See also matrix-org/complement#743 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh, I completely missed your Complement test in my initial review. The Synapse test is looking good 👍 |
||
UnsupportedRoomVersionError, | ||
) | ||
from synapse.api.room_versions import KNOWN_ROOM_VERSIONS, RoomVersion | ||
from synapse.api.room_versions import ( | ||
KNOWN_ROOM_VERSIONS, | ||
EventFormatVersions, | ||
RoomVersion, | ||
) | ||
from synapse.crypto.event_signing import compute_event_signature | ||
from synapse.events import EventBase | ||
from synapse.events.snapshot import EventContext | ||
|
@@ -432,6 +436,8 @@ async def _handle_pdus_in_txn( | |
|
||
newest_pdu_ts = 0 | ||
|
||
pdu_results = {} | ||
|
||
for p in transaction.pdus: | ||
# FIXME (richardv): I don't think this works: | ||
# https://github.com/matrix-org/synapse/issues/8429 | ||
|
@@ -469,14 +475,27 @@ async def _handle_pdus_in_txn( | |
logger.info("Ignoring PDU: %s", e) | ||
continue | ||
|
||
event = event_from_pdu_json(p, room_version) | ||
if possible_event_id != "<Unknown>": | ||
morguldir marked this conversation as resolved.
Show resolved
Hide resolved
morguldir marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if room_version.event_format != EventFormatVersions.ROOM_V1_V2: | ||
logger.info(f"Rejecting event {possible_event_id} from {origin} " | ||
f"because the event was made for a v1 room, " | ||
f"while {room_id} is a v{room_version.identifier} room") | ||
pdu_results[possible_event_id] = {"error": "Event ID incorrectly supplied in non-v1/v2 room"} | ||
morguldir marked this conversation as resolved.
Show resolved
Hide resolved
|
||
continue | ||
|
||
try: | ||
event = event_from_pdu_json(p, room_version) | ||
except Exception as e: | ||
if possible_event_id != "<Unknown>": | ||
pdu_results[possible_event_id] = {"error": f"Failed to convert json into event, {e}"} | ||
morguldir marked this conversation as resolved.
Show resolved
Hide resolved
|
||
logger.warning("Failed to parse event {possible_event_id} in transaction from {origin}, due to {e}") | ||
continue | ||
|
||
pdus_by_room.setdefault(room_id, []).append(event) | ||
|
||
if event.origin_server_ts > newest_pdu_ts: | ||
newest_pdu_ts = event.origin_server_ts | ||
|
||
pdu_results = {} | ||
|
||
# we can process different rooms in parallel (which is useful if they | ||
# require callouts to other servers to fetch missing events), but | ||
# impose a limit to avoid going too crazy with ram/cpu. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there more context for the issue you ran into specifically? Some issue that should be linked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think there's an issue for it, but basically unless the room is v11, the version key is actually unprotected. Unfortunately there was no cs api protection for redacting the create event in conduwuit, so a silly bot creator accidentally redacted everything in the room, making that particular room version fall back to "1" like the spec says
Once i finally returned 200 OK, all the pdus and edus that had been failing started to trickle in again at last
I think matrix.org was also in the room at the time, so it's similarly affected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're talking about the behavior described here:
And the
m.room.create
event defines thecontent.room_version
.For my own reference, this was added in MSC2176 and included in room version 11 via MSC3820.
I see how this PR allows valid events from a transaction to be received while ignoring invalid ones.
In the scenario where a room falls back to a v1 room, does the
room_version
we have stored in Synapse get updated?If the
room_version
doesn't get updated, this just makes sure that the server can receive the final "valid" events in the room before it broke (people started sending v1 events). Is there an issue tracking whether Synapse should update theroom_version
after them.room.create
redaction?If the
room_version
does get updated to v1, then events are only accepted according to the order received (before/after receiving the redaction forcing the fallback).Perhaps we should have a test for this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this exist in Synapse somewhere already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cs api errors with
synapse/synapse/handlers/message.py
Lines 1855 to 1856 in 4be3bd4
i tried to send a redaction from construct (conduwuit would update the room version and fail to send the redaction as well), when it was a partial state room it seemed to error somewhat, but for a normal room it didn't seem like anything happened, the event didn't show in my timeline, but it was persisted on disk without errors.
I did an initial sync too, and the create event was still there, so i assume it just gets ignored? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also to be clear, the scenario is mostly on the conduwuit side in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing the sytest failure now there actually is an for it already, matrix-org/synapse#7543 / #7543
https://github.com/matrix-org/sytest/blob/9bff8eba1bdbdcdfcdbf8277c1515e07e356e37b/tests/50federation/51transactions.pl#L50-L58
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find! I've linked the issues in the PR description at the top
The Sytest says this:
I feel like given the nature of how
PUT /_matrix/federation/v1/send/{txnId}
is spec'ed, it makes sense that some PDU's can fail to process while others succeed. The response is literally a map of events to any potential errors when processing that specific event.Since the Sytest test also says "It is unclear if this is the correct behavior or not.", my interpretation could be correct. Probably should get a few more eyeballs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@morguldir I had this on the To-Discuss for the weekly meeting today but we ran out of time to discuss. This is the first thing on the board to discuss next week though 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the reviews so far 🙏