Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Censoring #2518

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Text Censoring #2518

wants to merge 2 commits into from

Conversation

MotoMatt5040
Copy link

Text censoring has been implemented to allow users to filter out words from a list they create themselves. The format will be a json format using {lang: [words]} to allow multi language censoring. censor must be set to True and a path must be entered for it to work. Checks have been put in place to ensure if the path is incorrect, cannot be found, or cannot be opened, the censor will turn off and the program will run as normal.

re has been added to imports and censor_path added to params. The goal is to allow users to create their own censor json file to use rather than have it supplied to them. A check is used to verify the file exists if the censor flag is set, and if it does not or it is not the proper file tye, the censor is disabled. Segments and full text are both censored. The returned dict was set to a variable called "data" to allow this to occur. To do so another way would be text=tokenizer.decode(all_tokens[len(initial_prompt_tokens) :]) if not censor else censor_text(tokenizer.decode(all_tokens[len(initial_prompt_tokens) :]), forbidden_words).... which is much more difficult to read.

BREAKING CHANGE: I have not confirmed issues yet, however it may be possible for the censor to bug if weird formats or improper design is put in place of the json file.

Signed-off-by: matt@aero <[email protected]>
Removed data variable as it was redundant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant