Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further tweaks to bookloupe checks #726

Open
windymilla opened this issue Feb 2, 2025 · 4 comments
Open

Further tweaks to bookloupe checks #726

windymilla opened this issue Feb 2, 2025 · 4 comments
Labels
core feature Required for basic PPing

Comments

@windymilla
Copy link
Collaborator

  1. Should a colon following a footnote number or letter be identified by "Query punctuation after"? (e.g. Footnote A:). Mentioned here
  2. Is "unspaced bracket" doing what is wanted (e.g. footnote anchors). Also here
@tangledhelix
Copy link
Collaborator

I'm not sure how valuable "unspaced bracket" is for footnote anchors, but there are other cases where it will identify unprocessed DP markup:

  • Diacriticals: [=e] for ē
  • Scripts not handled via character picker: [Cyrillic: **]
  • RTL languages: [Hebrew: **], [Arabic: **]
  • Not-quite-right proofer notes PP'er may have missed otherwise: [* only one asterisk]

@tangledhelix
Copy link
Collaborator

... about 60 seconds after I made that comment, I thought "should I have read the code first?"

Now I see that the code is looking for a bracket with letters on both sides of it, so I'll say it probably won't find the 2nd or 3rd items in my list above. It would still locate diacriticals, and potentially bad proofer notes.

@tangledhelix
Copy link
Collaborator

Footnote anchors seem like something we could code around. They're caught in a case like this:

and they lived happily[A] ever after   ---> matches y[A

Footnote anchors follow a predictable pattern, so they're perfect for regex. We could detect and ignore them by inserting a negative lookahead into the positive lookahead (I've marked the new bit below):

(?<=\p{Letter})[][}{)(](?=\p{Letter}(?![\]]))
                                    ^------^

The negative lookahead is limited to ] intentionally, to more accurately match the use-case of ignoring a footnote.

There are edge cases this could miss. Imagine a really nasty scanno passing all the rounds, like windm]l]. That would defeat this regex (as long as the bad bracketing pattern occurred at the end of a word... otherwise it would still be caught, e.g. mi[l]meter). Further regex tomfoolery, or a second pass in code, might fix that edge case too.

@windymilla
Copy link
Collaborator Author

Consider whether to permit etc., as a permissible double punctuation

@windymilla windymilla added the core feature Required for basic PPing label Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core feature Required for basic PPing
Projects
None yet
Development

No branches or pull requests

2 participants