Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventStreams: Allow title matching using regular expressions #13

Open
stjohann opened this issue Dec 4, 2021 · 1 comment
Open

EventStreams: Allow title matching using regular expressions #13

stjohann opened this issue Dec 4, 2021 · 1 comment

Comments

@stjohann
Copy link
Owner

stjohann commented Dec 4, 2021

Some server owners have long requested adding ways to stream a number of defined pages using the bot.

I have thought before that the best way for doing this would be something like glob patterns, but this has multiple problems. For one, you would have to re-implement or take a library that is doing glob matching. There are also questions on whether it would be clashing with actual MediaWiki titles. After researching this question for a bit, I decided that just allowing people to use regular expressions (regexps) is good enough to solve this need.

Here are the theoretical requirements for any potential implementation:

  1. Regexps can be passed only to --title attribute of the configuration.
  2. Regexps should be passed using --title /.*/ syntax (i. e. always wrapped into //), since this would keep the params to the minimum and introduce a simple way to tell what is a regexp and what is not (str.StartsWith('/')). This needs to account for articles like https://en.wikipedia.org/wiki//b/ which are unlikely to have their own stream feeds but probably still need some way to reference them in EventStreams (e. g. :/b/?).
  3. The code should define a reasonable MatchTimeout (0.5 second?) and try/catch errors from slow regexps to prevent any ReDOS attacks.
  4. Passed regexps should be tested with the timeout and slow regexps should be rejected by the bot on the configuration step (!openStream).
  5. Passed regexps should match the whole string for clarity (^…$) and should not ignore case.
  6. (If we can find a way) Regexps should be as simple as possible in the number of features allowed.

There might be other notable things I forgot, please report them if you read the issue and can think of them.

@stjohann
Copy link
Owner Author

stjohann commented Aug 19, 2023

Another idea: make --title-matches key (name can be discussed) (--in-title?) for --namespace streams only for simplicity (makes it easier to process this and would require less changes to the current shaky structure of the code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant