-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OF-2954: New feature: Spam Reporting #2661
base: main
Are you sure you want to change the base?
Conversation
abbd110
to
d4f9b04
Compare
This isn't quite ready yet, but seems to be functional. Pending work:
|
This commit provides a basic implementation of XEP-0377: Spam Reporting The changes include: - persistent storage of spam reports - a provider implementation to allow other spam reporting persistence providers - an event listening mechanism - an optional notification of admins
d4f9b04
to
9b82d8e
Compare
05865cc
to
b6e74e7
Compare
b6e74e7
to
83fb015
Compare
The XML element representation that's in a SpamReport should be a copy, and detached. This allows it to be used elsewhere, without one consumer's modifications affected another's.
When admins are notified of a new spam report, the report itself should be included. Supporting clients can then render it.
reporter VARCHAR(1024) NOT NULL, | ||
reported VARCHAR(1024) NOT NULL, | ||
reason VARCHAR(255) NOT NULL, | ||
created BIGINT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may use timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, but not for now: We've had trouble before finding a consistent way to represent a timestamp in all of the databases that we support, which is why we use a number instead. I'm not sure if this is still as impossible as it was in 2004, by the way, but I'd like Openfire to be consistent.
If we do change number for timestamp (which would be a good thing), we should do it for all columns that currently use a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see the remark at the top of https://download.igniterealtime.org/openfire/docs/latest/documentation/database-guide.html which describes this.
reporter VARCHAR(1024) NOT NULL, | ||
reported VARCHAR(1024) NOT NULL, | ||
reason VARCHAR(255) NOT NULL, | ||
created BIGINT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the db2 has the TIMESTAMP type. In other places of the file used char for date, which it very strange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as above: we currently never use a TIMESTAMP field, which is certainly something I'd like to improve on. If we do improve on that, we should do it consistently, everywhere).
reported NVARCHAR(1024) NOT NULL, | ||
reason NVARCHAR(255) NOT NULL, | ||
created INTEGER NOT NULL, | ||
"raw" LONG VARCHAR NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename the raw
field?
It contains the "XML representation of the report" but maybe we just need only the reported message so that we can easily parse the spam and train spamd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That column is not great, indeed. I'm in the middle of a refactoring of the database structure. I think this column will be dropped (and instead we'll have a table that can be JOINed with, to have zero-to-many reported stanzas. More on this later!
xmppserver/src/main/java/org/jivesoftware/openfire/spamreporting/SpamReport.java
Outdated
Show resolved
Hide resolved
@@ -130,13 +133,18 @@ else if ( iq.getType().equals( IQ.Type.set ) && "block".equals( iq.getChildEleme | |||
} | |||
|
|||
final List<JID> toBlocks = new ArrayList<>(); | |||
final Set<SpamReport> reports = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the SpamReport doesn't have equals()
so the Set
won't check for duplicates. Maybe we can use a List here?
In general the HashSet eats a lot of memory and if it's possible to avoid it and use the ArrayList then it will reduce pressure on GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh. Resizing ArrayLists when they are populated can be costly too. All such arguments border on premature optimization, which I'd like to avoid.
Instead, I prefer to look at the semantics:
- usage of Set, unlike List, suggests that the entities are all different.
- usage of List, unlike Set, suggests that entities are ordered.
In that sense, Set is a better fit for this use-case than List.
public synchronized Set<Text> getContext() | ||
{ | ||
if (context == null) { | ||
context = new HashSet<>(Text.allFromChildren(reportElement)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we have an allocation and copy, two heavy operations
|
||
public static List<Text> allFromChildren(final Element parentElement) | ||
{ | ||
final List<Text> result = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can set an initial capacity to texts.size()
listeners.forEach(listener -> { | ||
try { | ||
listener.receivedSpamReport(report); | ||
} catch (Throwable t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe catch an Exception and not the Throwable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested in learning the rationale for that. I've always been going back and forth between the two without a clear reason. I sometimes wish to catch Error
instances, I think - but maybe that's not always appropriate.
notification.setBody(body); | ||
notification.getElement().add(reportElement); | ||
|
||
XMPPServer.getInstance().getAdmins().forEach(jid -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor, style) you may use a for each loop that is easier to understand and faster to compile and analyze for linters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style is in the eye of the beholder. :) The linter in Intellij is perfectly able to reason about this.
return null; | ||
final List<StanzaID> sids = StanzaID.allFromChildren(packet.getElement()); | ||
return sids.stream() | ||
.filter(stanzaID -> stanzaID.getBy().toString().equals(by)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the filter will be executed for all elements so this will work slower than previously.
Thanks for your feedback, @stokito! I've commented on most of them. Most of the others for me fall in similar categories of what I wrote elsewhere. This PR is still very much a work in progress, so do expect changes! |
The data structure defined in XEP-0377 is only usable in context of IQ Blocking. It's not really usable for more generic incident reporting. The refactoring moves the database structure towards a more re-usable structure: - it no longer contains the XEP-0377 element as raw XML - it references 0 to many stanzas in a different table - the stanzas are minimally represented by a StanzaID, but may include the full XMPP stanza.
This adds a simple page to the admin console on which spam reports can be viewed. The page is heavily based on the audit log viewer.
I have now applied a major refactoring. |
Another thing that I can't get working yet is for Conversations (the client that I'm using to report spam) to include a stanza-ID that identifies a spam message. This should be done by long-pressing a message from a stranger, as implemented here: https://codeberg.org/iNPUTmice/Conversations/src/branch/master/src/main/java/eu/siacs/conversations/ui/ConversationFragment.java#L1323-L1328 That fails for me. My theory is that Openfire doesn't include a stanza-ID to the messages. Does this need (changing in) the monitoring plugin? |
This commit provides a basic implementation of XEP-0377: Spam Reporting
The changes include: