Factoids Rewrite #36

brigand · 2019-10-06T01:16:33Z

This is initial work for the reimplementation of factoids so we can sunset ecmabot in the near future (a bot we can't update isn't healthy). Please open issues or message me if other things need to be done before then.

Future plans (maybe tomorrow):

proper alias support
access control (users can propose factoids, elevated users can directly set factoids or approve proposals)
inline factoid usage, e.g. We recommend {!airbnb}

In case it's confusing, the .persistent.js file is immune to HMR, while .internal.js can be evicted from the cache.

Edit: will add some tests tomorrow.

src/plugins/factoids/factoidsPlugin.js

brigand · 2019-10-06T01:18:41Z

src/plugins/factoids/factoidsPlugin.js

+    const [, key, value] = learnMatch;
+    if (key.length > 50) {
+      msg.respondWithMention(
+        `Is anyone going to remember a ${key.length} character trigger? Try something shorter (max 50)`,


Maybe a less sassy message; I dunno.

brigand · 2019-10-06T01:19:43Z

src/plugins/factoids/storage.internal.js

+      let i = 0;
+      i < MAX_ALIAS_DEPTH && cursor && cursor.type === 'alias';
+      i += 1
+    ) {


Please recommend a better way to do this while loop with an iteration limit.

would it be possible to resolve aliases at save-time, so read is always O(1)? a command could have an array of aliases

You can update the thing the alias points to.

me: !learn alias foo = bar me: !learn bar = new value me: !foo jelllbot: new value

We could do depth and circular reference checking at !learn time, though.

What does this output?

me: !learn alias foo = bar me: !foo

I think it's not necessary to store aliases apart, other than an array around the command, and the above !foo command can either fail, or succeed and create a bar command with aliases: ["foo"], and empty content (so no output by jellobot, until someone adds content to bar), both choices make sense
I think that's simpler no? in term of storage structure

Or even if aliases are stored apart, it's possible to resolves them at save time, what I mean is for example:

!learn foo=bar !learn alias foo1=foo !learn alias foo2=foo1 !learn alias foo3=foo2

then foo3 should directly point to foo

If we wanted O(1) lookup, we'd need to maintain a mapping of aliases to factoids in memory, which is extra synchronization work. I'm not worried about the cost of resolving a few aliases on factoid usage; especially considering the cost is only paid when it is an actual alias.

Storing aliases on the thing they point to is much more expensive, and the data model would allow creating the same alias twice without special checks for it existing anywhere else, as well as !learn conflicting with an alias. The way both the previous and new file format deal with aliases seems to solve all of these problems fairly well.

I think I'd like to prevent a dangling alias simply because it would typically be a mistake, and make it easier to end up with a circular alias.

Limiting alias depth also limits complexity (to users of the bot), and I'm not convinced we even need more than one level of aliases (i.e. it might be fine to forbid an alias to an alias).

brigand · 2019-10-06T01:20:53Z

src/plugins/factoids/storage.internal.js

+      );
+    }
+
+    if (value.length > 400) {


In IRC, what is the message limit actually measured in? Is it bytes? Might need to correct this.

is it hardcoded in the protocol, or determined by the server?

in node-irc and splitlong.pl, the max length is calculated as

497 - nick length - hostmask length

the protocol gives a limit of 510 /bytes/ but things like PRIVMSG decrease this number

you can see how 497 is calculated here: https://github.com/irssi/scripts.irssi.org/blob/master/scripts/splitlong.pl#L35

Thanks for clarifying. In that case, we should do utf-16 string -> utf8 byte length and check, right?

ah, hadn't realised this - have just been checking the JS string length

I doubt the edge cases that affect it will crop up in factoids, but it definitely makes sense to do so.

using this to calculate it now;

function byteCount(s) { return encodeURI(s).split(/%..|./).length - 1; }

brigand · 2019-10-06T01:22:12Z

src/plugins/factoids/storage.internal.js

+    this._dirty = true;
+  }
+
+  async writeToDisk() {


Need a sanity check here. The goal is to not have two of these running concurrently.

brigand · 2019-10-06T01:22:40Z

src/plugins/factoids/storage.internal.js

+    return !this._loaded;
+  }
+
+  async loadFromDisk() {


Need a sanity check here. The goal is to not have two of these running concurrently.

brigand · 2019-10-06T01:24:01Z

@ljharb @caub please review this if you have time.

Changes without the extra lint fixes

ljharb · 2019-10-06T17:26:55Z

What's the overarching goal here? What's wrong with the current factoids system and format that needs changing?

brigand · 2019-10-06T18:22:59Z

I would like to work on the new features mentioned above, and be able to add new features in the future.

I actually intend to change the format a little bit more. It'll be simpler if even the first !learn is in the changes array (with previous: null). The first item in the array is "most recent" and the last item is "the initial value".

This allows implementing access control + personal factoids with changes.find(item => item.editor === msg.from || item.global), where global is set by an elevated user on !learn, or on someone else's edit via a to-be-implemented command.

There's quite a bit of duplication in the current format, but only after the first edit. It's just pretty weird how the data is laid out. It would be quite awkward to implement various things in the current format.

ljharb · 2019-10-06T20:43:08Z

I'm quite interested in access controls.

When you say "personal factoids", however, that seems rife for abuse. I actively would not want individuals to be able to store their own private factoids in the bot.

brigand · 2019-10-06T21:04:08Z

Sure, makes sense. Then the check for retrieving a factoid is reduced to changes.find(item => item.global), and the pending changes will only appear in the factoid file, waiting to be approved.

.github/workflows/pull_request.yml

src/plugins/factoids/storage.internal.js

ljharb · 2019-10-06T21:12:43Z

src/plugins/factoids/storage.internal.js

+    }
+
+    if (cursor && cursor.type === 'alias') {
+      throw new Error(`Alias depth exceeded when looking up a factoid.`);


why is there a depth limit?

I can replace this with circular alias checking at insertion time.

ljharb · 2019-10-06T21:14:25Z

src/plugins/factoids/storage.internal.js

+    const entries = JSON.parse(await readFile(FILE_PATH, 'utf-8'));
+    for (const key of Object.keys(entries)) {
+      map.set(key, entries[key]);
+    }


const map = new Map(Object.entries(entries));

src/plugins/factoids/storage.persistent.js

brigand · 2019-10-07T01:53:10Z

Some updates...

The config file now has an entry for factoid moderators.

{
  "plugins": {
    "jsEval": { "timeout": 5000 },
    "factoid": {
      "moderators": ["GreenJello", "Other", "Names", "2B", "Added"]
    }

Anyone can !learn and they get a different message depending on if they're a moderator or not.

NormalUser: !learn foo = bar
bot: NormalUser, change proposed to "foo"

Moderator: !learn foo = bar
bot: Moderator, got it. I'll remember this for when "!foo" is used.

Moderator: !publish foo
bot: Moderator, done. Everyone will now see the previous draft when "!foo" is used.

The JSON file format has been reduced to the following:

  "classes": {
    "type": "factoid",
    "popularity": 5,
    "editors": [
      "ljharb",
      "some-user"
    ],
    "changes": [
      {
        "date": "2019-10-07T01:38:06.850Z",
        "editor": "some-user",
        "value": "is this java?",
        "live": false
      },
      {
        "date": "2019-01-29T00:18:51.526Z",
        "editor": "ljharb",
        "value": "Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)",
        "live": true
      },
      {
        "editor": "mjcd",
        "date": "2019-01-28T12:43:13.327Z",
        "value": "Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)",
        "live": true
      }
    ]
  },

I might also get rid of "editors".

Given that config (note the first item is a proposed change, i.e. live: false):

Anyone: !classes
bot: Anyone, Class hierarchies? Don't do that! http://raganwald.com/2014/03/31/class-hierarchies-dont-do-that.html (See also, !inheritance)

Moderator: !publish classes
bot: Moderator, done. Everyone will now see the previous draft when "!classes" is used.

Anyone: !classes
bot: Anyone, is this java?

Also, !forget is implemented for both moderators and other users. In the store it simply adds a change where the value is null.

I still need to work out the best way to review factoids others have proposed. Suggestions welcome on that. Maybe we just take the most recent proposals and put them in a markdown file in a gist.

ljharb · 2019-10-07T17:53:00Z

@brigand moderator-only text should only appear in PM

brigand · 2019-10-07T18:01:45Z

You're free to run the commands in PM. Is that insufficient?

ljharb · 2019-10-07T18:30:02Z

Yes - it would be pretty bad if "being a moderator" meant that factoids i triggered (or that were triggered pointing at me) contained noise that was irrelevant to the majority of the people that saw it.

People without permissions should ideally not even know that those commands exist.

brigand · 2019-10-07T18:40:46Z

Triggering factoids doesn't consider the moderator status of anyone; only !learn and !publish consider this. You're always free to execute those in private messages, and if you do then no one will even know you're a moderator.

For those commands, I think it's important that we signal to users that e.g. !learn foo = bar won't be immediately visible when !foo is used, and !publish should present an error, even if only for the cases where a human that is a moderator is presenting with a nick not in the moderator list.

We could require these to be run in private messages, but I think we can leave the agency of where to invoke them in the hands of individual users.

Edit: to clarify my previous message, "Anyone" means any user on IRC, in the moderator list or not.

ljharb · 2019-10-07T19:07:06Z

If I wanted to truly hide being a moderator, I'd have to use !learn in a channel and not have the moderator-specific commands show up.

brigand · 2019-10-07T19:50:58Z

That actually doesn't really solve it, because there will be a difference if you do !learn foo = bar in a channel and someone immediately runs !foo. It'll either return the old value if you're not a moderator, or the new value if you are.

The only way around that is for the bot to make your edit live at a random time, simulating an approval from a human. I don't think we want to go that route.

What is your cause and level of concern with people knowing someone is a moderator?

ljharb · 2019-10-07T21:23:56Z

I suspect it will immediately and frequently generate complaints about "i want to be a moderator", "why is that person a moderator", etc; it will also generate confusion and frequent re-explanation about how permissions work.

brigand · 2019-10-07T23:41:54Z

I think you're right on that, but I'm not clear on what the solution is. Should we only allow commands like !learn in private messages?

ljharb · 2019-10-08T20:27:00Z

Maybe even moderator-created learn commands should require a PM-only publish command?

brigand · 2019-10-08T20:59:36Z

I like that idea. I'll implement it that way.

kirjavascript · 2019-10-13T22:41:37Z

src/plugins/factoids/factoidsPlugin.js

+
+  const moderators = (msg.selfConfig && msg.selfConfig.moderators) || [];
+  console.log({ self: msg.selfConfig });
+  const isModerator = moderators.includes(msg.from);


is this the only check done? people can circumvent this by changing their nicks to a mod's nick when the mod is not online

checking if the user is identified can resolve this. on freenode you can send ACC username to NickServ to see if they are. most other services / servers use STATUS username instead

There isnt a standard for this but using ACC or STATUS has convered every server I've tried it on

Yeah, would this work?

cache mapping msg.from (user sending the message) to account name

invalidate cache for a user on part, quit, nick change

use ACC or STATUS (based on config) as needed to populate the cache

failure of ACC/STATUS causes permission to be denied

"part" is overly aggressive invalidation as they may be in multiple channels, but I don't know how to work around that.

not sure if the cache will buy you much at the cost of complexity and runtime cost

In my implementation I simply check on every elevated request. The difference is basically unnoticable so I had no desire to try and improve the speed.

That said, if you wanted to add caching I cant think of a way to circumvent what you've outlined.

What happens if the bot misses a part, quit, or nick change?

How about this?

no cache (for now)

check username on !learn

if not logged in, reject the !learn

otherwise, store the proposed change with the account name rather than nick

check the username on !publish

if not logged in, reject the publish

if not in moderator list, reject the publish

otherwise make the change live

why reject the learn from an anon user? we don't get that problem with ecmabot, and anyone on freenode could edit the factoids directly in a PM, auth or not.

I would like to track the account name in the proposals to have a more reliable history of changes. That requires the user to be logged in.

We could alternatively allow anonymous !learn and do something having an editorNick, editorAccount, with the latter potentially being null.

Tracking both nick and account seems important regardless; it seems useful to then allow account to be null?

We can always lock it down further later if it becomes a problem.

caub · 2019-11-03T11:00:58Z

scripts/convert-factoids

+    date: toISOString(input.date),
+    value: input.value || input.alias || null,
+    live: true,
+  });


I'd prefer a

entry.changes = [ { date: toISOString(input.date), editor: (input.creator || 'unknown').toLowerCase(), value: input.value || input.alias || null, live: true, }, ...entry.changes.map((change) => ({ date: toISOString(change.date), editor: (change.editor || 'unknown').toLowerCase(), value: change['new-value'], live: true, })) ]

caub · 2019-11-03T11:02:01Z

src/plugins/factoids/factoidsPlugin.js

-  if (msg.from === 'ecmabot') {
-    fs.writeFile('/tmp/disable-factoids', 'x', () => {});
+const readFile = promisify(fs.readFile);
+const writeFile = promisify(fs.writeFile);


can you use fs.promises.readFile/writeFile rather?

caub · 2019-11-03T11:11:51Z

src/plugins/factoids/storage.internal.js

+  return (key && key.toLowerCase()) || null;
+}
+
+class Store {


would it be worth using something like https://github.com/simonlast/node-persist or redis/sqlite as a key-value DB rather than implementing one?

brigand added 5 commits October 5, 2019 16:16

style(require): use .persistent. for more clear naming

15321a5

chore: update factoids

8e2b5a3

chore: factoid scripts and remove toml

64b095d

feat(factoids): reimplement factoids and adds !learn

4029374

chore(factoids): cleanup

ab311ba

brigand commented Oct 6, 2019

View reviewed changes

src/plugins/factoids/factoidsPlugin.js Outdated Show resolved Hide resolved

brigand commented Oct 6, 2019

View reviewed changes

src/plugins/factoids/factoidsPlugin.js Outdated Show resolved Hide resolved

brigand commented Oct 6, 2019

View reviewed changes

brigand added 3 commits October 5, 2019 18:24

chore(factoids): cleanup

ab38c84

chore(lint): resolves all lint errors

2bb0a4b

build(actions): run lint in pr action

6b80b03

ljharb reviewed Oct 6, 2019

View reviewed changes

.github/workflows/pull_request.yml Outdated Show resolved Hide resolved

ljharb reviewed Oct 6, 2019

View reviewed changes

src/plugins/factoids/storage.internal.js Outdated Show resolved Hide resolved

ljharb reviewed Oct 6, 2019

View reviewed changes

src/plugins/factoids/storage.persistent.js Outdated Show resolved Hide resolved

brigand added 5 commits October 6, 2019 14:35

ci: switch to npm, use node 12.x

e8d15af

feat(factoids): wip trying to use private fields

7ee1e7e

fix(factoids): use class fields properly

8222499

enhance(factoids): use slugify for channel file name

3e38084

feat: adds validation for non-diacritic text

b2f2cb0

brigand added 8 commits October 6, 2019 17:20

updated factoids

0ea66fa

feat(factoids): simplify the factoids data structure

fccfbb2

refactor(factoids): separate parsing and behavior

8247cfb

feat: better error handling/reporting

f5deeae

enhance(factoids): improved messages and bug fixes

8caaba9

feat(factoids): !publish <key>

07cf070

fix(factoids): minor fixes

631920a

chore: lint

58e794d

brigand added 2 commits October 6, 2019 18:53

fix(factoids): misc fixes

4975a94

chore(factoids): updates data file

00cfff5

brigand added 4 commits October 10, 2019 10:58

updated factoids

a097c31

feat(factoids): !learn always requires a separate !publish, and related

7a12579

updated factoids

c98043a

fix(factoids): use default name for very old factoids

0b57a42

kirjavascript reviewed Oct 13, 2019

View reviewed changes

test(jseval): increase timeout for OSX hosts

0c07b50

caub reviewed Nov 3, 2019

View reviewed changes

Factoids Rewrite #36

Are you sure you want to change the base?

Factoids Rewrite #36

Conversation

brigand commented Oct 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caub Oct 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caub Oct 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kirjavascript Oct 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kirjavascript Oct 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brigand commented Oct 6, 2019 • edited Loading

ljharb commented Oct 6, 2019

brigand commented Oct 6, 2019

ljharb commented Oct 6, 2019

brigand commented Oct 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brigand commented Oct 7, 2019 • edited Loading

ljharb commented Oct 7, 2019

brigand commented Oct 7, 2019

ljharb commented Oct 7, 2019

brigand commented Oct 7, 2019 • edited Loading

ljharb commented Oct 7, 2019

brigand commented Oct 7, 2019

ljharb commented Oct 7, 2019

brigand commented Oct 7, 2019

ljharb commented Oct 8, 2019

brigand commented Oct 8, 2019

Choose a reason for hiding this comment

brigand Oct 13, 2019 • edited Loading

Choose a reason for hiding this comment

kirjavascript Oct 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brigand commented Oct 6, 2019 •

edited

Loading

caub Oct 6, 2019 •

edited

Loading

caub Oct 7, 2019 •

edited

Loading

kirjavascript Oct 10, 2019 •

edited

Loading

kirjavascript Oct 11, 2019 •

edited

Loading

brigand commented Oct 6, 2019 •

edited

Loading

brigand commented Oct 7, 2019 •

edited

Loading

brigand commented Oct 7, 2019 •

edited

Loading

brigand Oct 13, 2019 •

edited

Loading

kirjavascript Oct 13, 2019 •

edited

Loading