Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factoids Rewrite #36

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

Factoids Rewrite #36

wants to merge 30 commits into from

Conversation

brigand
Copy link
Owner

@brigand brigand commented Oct 6, 2019

This is initial work for the reimplementation of factoids so we can sunset ecmabot in the near future (a bot we can't update isn't healthy). Please open issues or message me if other things need to be done before then.

Future plans (maybe tomorrow):

  • proper alias support
  • access control (users can propose factoids, elevated users can directly set factoids or approve proposals)
  • inline factoid usage, e.g. We recommend {!airbnb}

In case it's confusing, the .persistent.js file is immune to HMR, while .internal.js can be evicted from the cache.

Edit: will add some tests tomorrow.

const [, key, value] = learnMatch;
if (key.length > 50) {
msg.respondWithMention(
`Is anyone going to remember a ${key.length} character trigger? Try something shorter (max 50)`,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a less sassy message; I dunno.

let i = 0;
i < MAX_ALIAS_DEPTH && cursor && cursor.type === 'alias';
i += 1
) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please recommend a better way to do this while loop with an iteration limit.

Copy link
Contributor

@caub caub Oct 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to resolve aliases at save-time, so read is always O(1)? a command could have an array of aliases

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can update the thing the alias points to.

me: !learn alias foo = bar
me: !learn bar = new value
me: !foo
jelllbot: new value

We could do depth and circular reference checking at !learn time, though.

Copy link
Contributor

@caub caub Oct 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this output?

me: !learn alias foo = bar
me: !foo

I think it's not necessary to store aliases apart, other than an array around the command, and the above !foo command can either fail, or succeed and create a bar command with aliases: ["foo"], and empty content (so no output by jellobot, until someone adds content to bar), both choices make sense
I think that's simpler no? in term of storage structure

Or even if aliases are stored apart, it's possible to resolves them at save time, what I mean is for example:

!learn foo=bar
!learn alias foo1=foo
!learn alias foo2=foo1
!learn alias foo3=foo2

then foo3 should directly point to foo

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted O(1) lookup, we'd need to maintain a mapping of aliases to factoids in memory, which is extra synchronization work. I'm not worried about the cost of resolving a few aliases on factoid usage; especially considering the cost is only paid when it is an actual alias.

Storing aliases on the thing they point to is much more expensive, and the data model would allow creating the same alias twice without special checks for it existing anywhere else, as well as !learn conflicting with an alias. The way both the previous and new file format deal with aliases seems to solve all of these problems fairly well.

I think I'd like to prevent a dangling alias simply because it would typically be a mistake, and make it easier to end up with a circular alias.

Limiting alias depth also limits complexity (to users of the bot), and I'm not convinced we even need more than one level of aliases (i.e. it might be fine to forbid an alias to an alias).

);
}

if (value.length > 400) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In IRC, what is the message limit actually measured in? Is it bytes? Might need to correct this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it hardcoded in the protocol, or determined by the server?

Copy link

@kirjavascript kirjavascript Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in node-irc and splitlong.pl, the max length is calculated as

497 - nick length - hostmask length

the protocol gives a limit of 510 /bytes/ but things like PRIVMSG decrease this number

you can see how 497 is calculated here: https://github.com/irssi/scripts.irssi.org/blob/master/scripts/splitlong.pl#L35

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. In that case, we should do utf-16 string -> utf8 byte length and check, right?

Copy link

@kirjavascript kirjavascript Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, hadn't realised this - have just been checking the JS string length

I doubt the edge cases that affect it will crop up in factoids, but it definitely makes sense to do so.

using this to calculate it now;

function byteCount(s) {
    return encodeURI(s).split(/%..|./).length - 1;
}

this._dirty = true;
}

async writeToDisk() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a sanity check here. The goal is to not have two of these running concurrently.

return !this._loaded;
}

async loadFromDisk() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a sanity check here. The goal is to not have two of these running concurrently.

@brigand
Copy link
Owner Author

brigand commented Oct 6, 2019

@ljharb @caub please review this if you have time.

Changes without the extra lint fixes

@ljharb
Copy link
Contributor

ljharb commented Oct 6, 2019

What's the overarching goal here? What's wrong with the current factoids system and format that needs changing?

@brigand
Copy link
Owner Author

brigand commented Oct 6, 2019

I would like to work on the new features mentioned above, and be able to add new features in the future.

I actually intend to change the format a little bit more. It'll be simpler if even the first !learn is in the changes array (with previous: null). The first item in the array is "most recent" and the last item is "the initial value".

This allows implementing access control + personal factoids with changes.find(item => item.editor === msg.from || item.global), where global is set by an elevated user on !learn, or on someone else's edit via a to-be-implemented command.

There's quite a bit of duplication in the current format, but only after the first edit. It's just pretty weird how the data is laid out. It would be quite awkward to implement various things in the current format.

@ljharb
Copy link
Contributor

ljharb commented Oct 6, 2019

I'm quite interested in access controls.

When you say "personal factoids", however, that seems rife for abuse. I actively would not want individuals to be able to store their own private factoids in the bot.

@brigand
Copy link
Owner Author

brigand commented Oct 6, 2019

Sure, makes sense. Then the check for retrieving a factoid is reduced to changes.find(item => item.global), and the pending changes will only appear in the factoid file, waiting to be approved.

}

if (cursor && cursor.type === 'alias') {
throw new Error(`Alias depth exceeded when looking up a factoid.`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a depth limit?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can replace this with circular alias checking at insertion time.

const entries = JSON.parse(await readFile(FILE_PATH, 'utf-8'));
for (const key of Object.keys(entries)) {
map.set(key, entries[key]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const map = new Map(Object.entries(entries));

@brigand
Copy link
Owner Author

brigand commented Oct 7, 2019

Some updates...

The config file now has an entry for factoid moderators.

{
  "plugins": {
    "jsEval": { "timeout": 5000 },
    "factoid": {
      "moderators": ["GreenJello", "Other", "Names", "2B", "Added"]
    }

Anyone can !learn and they get a different message depending on if they're a moderator or not.

NormalUser: !learn foo = bar
bot: NormalUser, change proposed to "foo"

Moderator: !learn foo = bar
bot: Moderator, got it. I'll remember this for when "!foo" is used.

Moderator: !publish foo
bot: Moderator, done. Everyone will now see the previous draft when "!foo" is used.

The JSON file format has been reduced to the following:

  "classes": {
    "type": "factoid",
    "popularity": 5,
    "editors": [
      "ljharb",
      "some-user"
    ],
    "changes": [
      {
        "date": "2019-10-07T01:38:06.850Z",
        "editor": "some-user",
        "value": "is this java?",
        "live": false
      },
      {
        "date": "2019-01-29T00:18:51.526Z",
        "editor": "ljharb",
        "value": "Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)",
        "live": true
      },
      {
        "editor": "mjcd",
        "date": "2019-01-28T12:43:13.327Z",
        "value": "Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)",
        "live": true
      }
    ]
  },

I might also get rid of "editors".

Given that config (note the first item is a proposed change, i.e. live: false):

Anyone: !classes
bot: Anyone, Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)

Moderator: !publish classes
bot: Moderator, done. Everyone will now see the previous draft when "!classes" is used.

Anyone: !classes
bot: Anyone, is this java?

Also, !forget is implemented for both moderators and other users. In the store it simply adds a change where the value is null.

I still need to work out the best way to review factoids others have proposed. Suggestions welcome on that. Maybe we just take the most recent proposals and put them in a markdown file in a gist.

@ljharb
Copy link
Contributor

ljharb commented Oct 7, 2019

@brigand moderator-only text should only appear in PM

@brigand
Copy link
Owner Author

brigand commented Oct 7, 2019

You're free to run the commands in PM. Is that insufficient?

@ljharb
Copy link
Contributor

ljharb commented Oct 7, 2019

Yes - it would be pretty bad if "being a moderator" meant that factoids i triggered (or that were triggered pointing at me) contained noise that was irrelevant to the majority of the people that saw it.

People without permissions should ideally not even know that those commands exist.

@brigand
Copy link
Owner Author

brigand commented Oct 7, 2019

Triggering factoids doesn't consider the moderator status of anyone; only !learn and !publish consider this. You're always free to execute those in private messages, and if you do then no one will even know you're a moderator.

For those commands, I think it's important that we signal to users that e.g. !learn foo = bar won't be immediately visible when !foo is used, and !publish should present an error, even if only for the cases where a human that is a moderator is presenting with a nick not in the moderator list.

We could require these to be run in private messages, but I think we can leave the agency of where to invoke them in the hands of individual users.

Edit: to clarify my previous message, "Anyone" means any user on IRC, in the moderator list or not.

@ljharb
Copy link
Contributor

ljharb commented Oct 7, 2019

If I wanted to truly hide being a moderator, I'd have to use !learn in a channel and not have the moderator-specific commands show up.

@brigand
Copy link
Owner Author

brigand commented Oct 7, 2019

That actually doesn't really solve it, because there will be a difference if you do !learn foo = bar in a channel and someone immediately runs !foo. It'll either return the old value if you're not a moderator, or the new value if you are.

The only way around that is for the bot to make your edit live at a random time, simulating an approval from a human. I don't think we want to go that route.

What is your cause and level of concern with people knowing someone is a moderator?

@ljharb
Copy link
Contributor

ljharb commented Oct 7, 2019

I suspect it will immediately and frequently generate complaints about "i want to be a moderator", "why is that person a moderator", etc; it will also generate confusion and frequent re-explanation about how permissions work.

@brigand
Copy link
Owner Author

brigand commented Oct 7, 2019

I think you're right on that, but I'm not clear on what the solution is. Should we only allow commands like !learn in private messages?

@ljharb
Copy link
Contributor

ljharb commented Oct 8, 2019

Maybe even moderator-created learn commands should require a PM-only publish command?

@brigand
Copy link
Owner Author

brigand commented Oct 8, 2019

I like that idea. I'll implement it that way.


const moderators = (msg.selfConfig && msg.selfConfig.moderators) || [];
console.log({ self: msg.selfConfig });
const isModerator = moderators.includes(msg.from);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only check done? people can circumvent this by changing their nicks to a mod's nick when the mod is not online

checking if the user is identified can resolve this. on freenode you can send ACC username to NickServ to see if they are. most other services / servers use STATUS username instead

There isnt a standard for this but using ACC or STATUS has convered every server I've tried it on

Copy link
Owner Author

@brigand brigand Oct 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, would this work?

  • cache mapping msg.from (user sending the message) to account name
  • invalidate cache for a user on part, quit, nick change
  • use ACC or STATUS (based on config) as needed to populate the cache
  • failure of ACC/STATUS causes permission to be denied

"part" is overly aggressive invalidation as they may be in multiple channels, but I don't know how to work around that.

Copy link

@kirjavascript kirjavascript Oct 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if the cache will buy you much at the cost of complexity and runtime cost

In my implementation I simply check on every elevated request. The difference is basically unnoticable so I had no desire to try and improve the speed.

That said, if you wanted to add caching I cant think of a way to circumvent what you've outlined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the bot misses a part, quit, or nick change?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

  • no cache (for now)
  • check username on !learn
    • if not logged in, reject the !learn
    • otherwise, store the proposed change with the account name rather than nick
  • check the username on !publish
    • if not logged in, reject the publish
    • if not in moderator list, reject the publish
    • otherwise make the change live

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why reject the learn from an anon user? we don't get that problem with ecmabot, and anyone on freenode could edit the factoids directly in a PM, auth or not.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to track the account name in the proposals to have a more reliable history of changes. That requires the user to be logged in.

We could alternatively allow anonymous !learn and do something having an editorNick, editorAccount, with the latter potentially being null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking both nick and account seems important regardless; it seems useful to then allow account to be null?

We can always lock it down further later if it becomes a problem.

date: toISOString(input.date),
value: input.value || input.alias || null,
live: true,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a

entry.changes = [
  {
    date: toISOString(input.date),
    editor: (input.creator || 'unknown').toLowerCase(),
    value: input.value || input.alias || null,
    live: true,
  },
  ...entry.changes.map((change) => ({
    date: toISOString(change.date),
    editor: (change.editor || 'unknown').toLowerCase(),
    value: change['new-value'],
    live: true,
  }))
]

if (msg.from === 'ecmabot') {
fs.writeFile('/tmp/disable-factoids', 'x', () => {});
const readFile = promisify(fs.readFile);
const writeFile = promisify(fs.writeFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use fs.promises.readFile/writeFile rather?

return (key && key.toLowerCase()) || null;
}

class Store {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be worth using something like https://github.com/simonlast/node-persist or redis/sqlite as a key-value DB rather than implementing one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants