-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factoids Rewrite #36
base: master
Are you sure you want to change the base?
Factoids Rewrite #36
Conversation
const [, key, value] = learnMatch; | ||
if (key.length > 50) { | ||
msg.respondWithMention( | ||
`Is anyone going to remember a ${key.length} character trigger? Try something shorter (max 50)`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a less sassy message; I dunno.
let i = 0; | ||
i < MAX_ALIAS_DEPTH && cursor && cursor.type === 'alias'; | ||
i += 1 | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please recommend a better way to do this while loop with an iteration limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to resolve aliases at save-time, so read is always O(1)? a command could have an array of aliases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can update the thing the alias points to.
me: !learn alias foo = bar
me: !learn bar = new value
me: !foo
jelllbot: new value
We could do depth and circular reference checking at !learn
time, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this output?
me: !learn alias foo = bar
me: !foo
I think it's not necessary to store aliases apart, other than an array around the command, and the above !foo
command can either fail, or succeed and create a bar command with aliases: ["foo"], and empty content (so no output by jellobot, until someone adds content to bar), both choices make sense
I think that's simpler no? in term of storage structure
Or even if aliases are stored apart, it's possible to resolves them at save time, what I mean is for example:
!learn foo=bar
!learn alias foo1=foo
!learn alias foo2=foo1
!learn alias foo3=foo2
then foo3
should directly point to foo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted O(1) lookup, we'd need to maintain a mapping of aliases to factoids in memory, which is extra synchronization work. I'm not worried about the cost of resolving a few aliases on factoid usage; especially considering the cost is only paid when it is an actual alias.
Storing aliases on the thing they point to is much more expensive, and the data model would allow creating the same alias twice without special checks for it existing anywhere else, as well as !learn conflicting with an alias. The way both the previous and new file format deal with aliases seems to solve all of these problems fairly well.
I think I'd like to prevent a dangling alias simply because it would typically be a mistake, and make it easier to end up with a circular alias.
Limiting alias depth also limits complexity (to users of the bot), and I'm not convinced we even need more than one level of aliases (i.e. it might be fine to forbid an alias to an alias).
); | ||
} | ||
|
||
if (value.length > 400) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In IRC, what is the message limit actually measured in? Is it bytes? Might need to correct this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it hardcoded in the protocol, or determined by the server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in node-irc and splitlong.pl, the max length is calculated as
497 - nick length - hostmask length
the protocol gives a limit of 510 /bytes/ but things like PRIVMSG
decrease this number
you can see how 497 is calculated here: https://github.com/irssi/scripts.irssi.org/blob/master/scripts/splitlong.pl#L35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. In that case, we should do utf-16 string -> utf8 byte length and check, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, hadn't realised this - have just been checking the JS string length
I doubt the edge cases that affect it will crop up in factoids, but it definitely makes sense to do so.
using this to calculate it now;
function byteCount(s) {
return encodeURI(s).split(/%..|./).length - 1;
}
this._dirty = true; | ||
} | ||
|
||
async writeToDisk() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a sanity check here. The goal is to not have two of these running concurrently.
return !this._loaded; | ||
} | ||
|
||
async loadFromDisk() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a sanity check here. The goal is to not have two of these running concurrently.
What's the overarching goal here? What's wrong with the current factoids system and format that needs changing? |
I would like to work on the new features mentioned above, and be able to add new features in the future. I actually intend to change the format a little bit more. It'll be simpler if even the first !learn is in the changes array (with This allows implementing access control + personal factoids with There's quite a bit of duplication in the current format, but only after the first edit. It's just pretty weird how the data is laid out. It would be quite awkward to implement various things in the current format. |
I'm quite interested in access controls. When you say "personal factoids", however, that seems rife for abuse. I actively would not want individuals to be able to store their own private factoids in the bot. |
Sure, makes sense. Then the check for retrieving a factoid is reduced to |
} | ||
|
||
if (cursor && cursor.type === 'alias') { | ||
throw new Error(`Alias depth exceeded when looking up a factoid.`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a depth limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can replace this with circular alias checking at insertion time.
const entries = JSON.parse(await readFile(FILE_PATH, 'utf-8')); | ||
for (const key of Object.keys(entries)) { | ||
map.set(key, entries[key]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const map = new Map(Object.entries(entries));
Some updates... The config file now has an entry for factoid moderators. {
"plugins": {
"jsEval": { "timeout": 5000 },
"factoid": {
"moderators": ["GreenJello", "Other", "Names", "2B", "Added"]
} Anyone can
The JSON file format has been reduced to the following: "classes": {
"type": "factoid",
"popularity": 5,
"editors": [
"ljharb",
"some-user"
],
"changes": [
{
"date": "2019-10-07T01:38:06.850Z",
"editor": "some-user",
"value": "is this java?",
"live": false
},
{
"date": "2019-01-29T00:18:51.526Z",
"editor": "ljharb",
"value": "Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)",
"live": true
},
{
"editor": "mjcd",
"date": "2019-01-28T12:43:13.327Z",
"value": "Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)",
"live": true
}
]
}, I might also get rid of "editors". Given that config (note the first item is a proposed change, i.e.
Also, !forget is implemented for both moderators and other users. In the store it simply adds a change where the I still need to work out the best way to review factoids others have proposed. Suggestions welcome on that. Maybe we just take the most recent proposals and put them in a markdown file in a gist. |
@brigand moderator-only text should only appear in PM |
You're free to run the commands in PM. Is that insufficient? |
Yes - it would be pretty bad if "being a moderator" meant that factoids i triggered (or that were triggered pointing at me) contained noise that was irrelevant to the majority of the people that saw it. People without permissions should ideally not even know that those commands exist. |
Triggering factoids doesn't consider the moderator status of anyone; only For those commands, I think it's important that we signal to users that e.g. We could require these to be run in private messages, but I think we can leave the agency of where to invoke them in the hands of individual users. Edit: to clarify my previous message, "Anyone" means any user on IRC, in the moderator list or not. |
If I wanted to truly hide being a moderator, I'd have to use |
That actually doesn't really solve it, because there will be a difference if you do The only way around that is for the bot to make your edit live at a random time, simulating an approval from a human. I don't think we want to go that route. What is your cause and level of concern with people knowing someone is a moderator? |
I suspect it will immediately and frequently generate complaints about "i want to be a moderator", "why is that person a moderator", etc; it will also generate confusion and frequent re-explanation about how permissions work. |
I think you're right on that, but I'm not clear on what the solution is. Should we only allow commands like |
Maybe even moderator-created |
I like that idea. I'll implement it that way. |
|
||
const moderators = (msg.selfConfig && msg.selfConfig.moderators) || []; | ||
console.log({ self: msg.selfConfig }); | ||
const isModerator = moderators.includes(msg.from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the only check done? people can circumvent this by changing their nicks to a mod's nick when the mod is not online
checking if the user is identified can resolve this. on freenode you can send ACC username
to NickServ to see if they are. most other services / servers use STATUS username
instead
There isnt a standard for this but using ACC or STATUS has convered every server I've tried it on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, would this work?
- cache mapping msg.from (user sending the message) to account name
- invalidate cache for a user on part, quit, nick change
- use ACC or STATUS (based on config) as needed to populate the cache
- failure of ACC/STATUS causes permission to be denied
"part" is overly aggressive invalidation as they may be in multiple channels, but I don't know how to work around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if the cache will buy you much at the cost of complexity and runtime cost
In my implementation I simply check on every elevated request. The difference is basically unnoticable so I had no desire to try and improve the speed.
That said, if you wanted to add caching I cant think of a way to circumvent what you've outlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the bot misses a part, quit, or nick change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
- no cache (for now)
- check username on !learn
- if not logged in, reject the !learn
- otherwise, store the proposed change with the account name rather than nick
- check the username on !publish
- if not logged in, reject the publish
- if not in moderator list, reject the publish
- otherwise make the change live
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why reject the learn from an anon user? we don't get that problem with ecmabot, and anyone on freenode could edit the factoids directly in a PM, auth or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to track the account name in the proposals to have a more reliable history of changes. That requires the user to be logged in.
We could alternatively allow anonymous !learn and do something having an editorNick
, editorAccount
, with the latter potentially being null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking both nick and account seems important regardless; it seems useful to then allow account to be null?
We can always lock it down further later if it becomes a problem.
date: toISOString(input.date), | ||
value: input.value || input.alias || null, | ||
live: true, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a
entry.changes = [
{
date: toISOString(input.date),
editor: (input.creator || 'unknown').toLowerCase(),
value: input.value || input.alias || null,
live: true,
},
...entry.changes.map((change) => ({
date: toISOString(change.date),
editor: (change.editor || 'unknown').toLowerCase(),
value: change['new-value'],
live: true,
}))
]
if (msg.from === 'ecmabot') { | ||
fs.writeFile('/tmp/disable-factoids', 'x', () => {}); | ||
const readFile = promisify(fs.readFile); | ||
const writeFile = promisify(fs.writeFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use fs.promises.readFile/writeFile rather?
return (key && key.toLowerCase()) || null; | ||
} | ||
|
||
class Store { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be worth using something like https://github.com/simonlast/node-persist or redis/sqlite as a key-value DB rather than implementing one?
This is initial work for the reimplementation of factoids so we can sunset ecmabot in the near future (a bot we can't update isn't healthy). Please open issues or message me if other things need to be done before then.
Future plans (maybe tomorrow):
We recommend {!airbnb}
In case it's confusing, the
.persistent.js
file is immune to HMR, while.internal.js
can be evicted from the cache.Edit: will add some tests tomorrow.