Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#64] Implement copy paste protection #245
base: YuriRomanowski/#64-refactor-markdown-scanner
Are you sure you want to change the base?
[#64] Implement copy paste protection #245
Changes from 3 commits
44f21e5
d497a1b
2e0d0f6
a844583
fd71f97
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmhm, I'm worried about how it's integrated into existing verification:
Also currently this logic is quite hanging at the end, we could extract copy-paste checking to a separate function outside of
verifyRepo
and mostly avoid any duplication.But I suppose that simultaneously resolving all the 3 things I mentioned above is impossible.
So let's discuss what is worth doing here. How would you go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I somehow missed your first comment in this PR. Let's probably discuss it here.
Regarding the progress bar, options that came to my mind:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 3: I think we can't combine reference checks and copypaste checks in one progress bar, because the number of links for the latter check should be much fewer. We could of course say that check already passed for those links, but it's strange. And we will need the common storage for storing statuses of links, this will cause extra problems for the concurrency. It seems that there will be much more overhead than worth.
I mostly like option 4, most of the time the progress bar simply won't appear because this check is much faster than checking external links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, checking on subsequences is singificantly better than checking on substrings I believe.
The downside of this is that we may now detect complete nonsense like
and the smaller the text is, the higher will be the chance of false triggering.
What do you think?
My thoughts - spoiler
Out of my head, we could split the text into subwords, letters of each subword will have to be encountered in a row in the link, while different subwords may appear separated.
Perhaps subwords don't even need to appear in the link+anchor in the same order, this should be recognized:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect 👍 I like this, let's do this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, there are some issues with this approach:
If link name is something like "file 3", how should we handle '3' digit? I think that we won't find a separate digit in the link. We also can't just search a word that contains a given word from the link name, because every git commit hash will likely contain digits. Because of this it's also unclear how to handle 's suffix in your example. Well, there is something to think about 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair 🤔 🤔