-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silently drop PDUs we fail to parse from federation, instead of failing entire transaction #17893
base: develop
Are you sure you want to change the base?
Changes from 14 commits
95cbc99
730822c
e89add8
07578bd
9953a36
f55e645
464e9cf
b6b5d81
89f91aa
3ca943a
b8ce471
b2916f1
7b3a695
78e04dd
df07930
f61de20
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @morguldir it seems that this sytest test is currently failing with this PR. That is due to the test expecting the homeserver to return a 400 when a bad event is included in a transaction, whereas we're now just silently dropping them (and returning a 200). Could you create a PR to update that test please? I consider that a blocking change to merging this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, perl didn't live up to the bad rumors so far :D, added in matrix-org/sytest#1391 Although according to halfie it sounds like you need to push the branch to sytest if you want CI to pass matrix-org/synapse#12754 👀 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -405,7 +405,14 @@ def __init__( | |
for name, sigs in event_dict.pop("signatures", {}).items() | ||
} | ||
|
||
assert "event_id" not in event_dict | ||
# An event should only have an event_id at this point if it's for a v1/v2 like room. | ||
# In future room versions, the `event_id` is derived from the event canonical JSON. | ||
|
||
# So if we see a `event_id` but the room version doesn't support | ||
# v1/v2 events, then it's invalid and we should reject it. | ||
assert ( | ||
"event_id" not in event_dict | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The assert has always been there, this just adds an error message and a comment |
||
), "Event ID should not be supplied in non-v1/v2 room" | ||
morguldir marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
unsigned = dict(event_dict.pop("unsigned", {})) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,7 +56,10 @@ | |
SynapseError, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @morguldir Are you up for adding a test for this? Probably something in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is b6b5d81 what you had in mind? See also matrix-org/complement#743 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh, I completely missed your Complement test in my initial review. The Synapse test is looking good 👍 |
||
UnsupportedRoomVersionError, | ||
) | ||
from synapse.api.room_versions import KNOWN_ROOM_VERSIONS, RoomVersion | ||
from synapse.api.room_versions import ( | ||
KNOWN_ROOM_VERSIONS, | ||
RoomVersion, | ||
) | ||
from synapse.crypto.event_signing import compute_event_signature | ||
from synapse.events import EventBase | ||
from synapse.events.snapshot import EventContext | ||
|
@@ -129,6 +132,8 @@ | |
# federation. | ||
_INBOUND_EVENT_HANDLING_LOCK_NAME = "federation_inbound_pdu" | ||
|
||
_UNKNOWN_EVENT_ID = "<Unknown>" | ||
|
||
|
||
class FederationServer(FederationBase): | ||
def __init__(self, hs: "HomeServer"): | ||
|
@@ -432,6 +437,8 @@ async def _handle_pdus_in_txn( | |
|
||
newest_pdu_ts = 0 | ||
|
||
pdu_results = {} | ||
|
||
for p in transaction.pdus: | ||
# FIXME (richardv): I don't think this works: | ||
# https://github.com/matrix-org/synapse/issues/8429 | ||
|
@@ -446,7 +453,7 @@ async def _handle_pdus_in_txn( | |
# We try and pull out an event ID so that if later checks fail we | ||
# can log something sensible. We don't mandate an event ID here in | ||
# case future event formats get rid of the key. | ||
possible_event_id = p.get("event_id", "<Unknown>") | ||
possible_event_id = p.get("event_id", _UNKNOWN_EVENT_ID) | ||
|
||
# Now we get the room ID so that we can check that we know the | ||
# version of the room. | ||
|
@@ -469,14 +476,26 @@ async def _handle_pdus_in_txn( | |
logger.info("Ignoring PDU: %s", e) | ||
continue | ||
|
||
event = event_from_pdu_json(p, room_version) | ||
try: | ||
event = event_from_pdu_json(p, room_version) | ||
except Exception as e: | ||
# We can only provide feedback to the federating server if we can determine what the event_id is | ||
# but since we failed to parse the event, we can't derive the `event_id` so there is nothing | ||
# to use as the `pdu_results` key. Best we can do is just log for our own record and move on. | ||
morguldir marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if possible_event_id != _UNKNOWN_EVENT_ID: | ||
anoadragon453 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
pdu_results[possible_event_id] = { | ||
"error": f"Failed to convert JSON into event: {e}" | ||
} | ||
logger.warning( | ||
f"Failed to parse event {possible_event_id} in transaction from {origin}, due to: {e}" | ||
) | ||
continue | ||
|
||
pdus_by_room.setdefault(room_id, []).append(event) | ||
|
||
if event.origin_server_ts > newest_pdu_ts: | ||
newest_pdu_ts = event.origin_server_ts | ||
|
||
pdu_results = {} | ||
|
||
# we can process different rooms in parallel (which is useful if they | ||
# require callouts to other servers to fetch missing events), but | ||
# impose a limit to avoid going too crazy with ram/cpu. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a spec issue to clarify whether events for v3+ rooms containing an
event_id
field should be considered valid: matrix-org/matrix-spec#2027