You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My colleague was working with this library for some NLP stuff, and he was trying to manipulate the CENSOR_WORDS for reasons not particularly important for this question.
It got me wondering, wouldn't this all go a lot faster if CENSOR_WORDS was a set(). Forgive me if I'm wasting your time, I didn't FULLY trace the code.
It seems to me that a lookup against a very large set of words or phrases would always be faster if you had a Set because it works as a hash table under the python covers.
The text was updated successfully, but these errors were encountered:
You are right that using a list is far slower than using a set. I did this to solve the issue, given that you don't edit the censor list afterwards.
from better_profanity import varying_string
varying_string.VaryingString.__hash__ = lambda self : hash(self._original)
import better_profanity
# make your edits to the censor list here
better_profanity.profanity.CENSOR_WORDSET = frozenset(better_profanity.profanity.CENSOR_WORDSET)
If you want everything to work, you are going to need to make all uses of the CENSOR_WORDSET work with sets and not list. The code in the main file is only ~250 lines so it would be easy enough. Otherwise this gets the job done.
My colleague was working with this library for some NLP stuff, and he was trying to manipulate the CENSOR_WORDS for reasons not particularly important for this question.
It got me wondering, wouldn't this all go a lot faster if CENSOR_WORDS was a set(). Forgive me if I'm wasting your time, I didn't FULLY trace the code.
It seems to me that a lookup against a very large set of words or phrases would always be faster if you had a Set because it works as a hash table under the python covers.
The text was updated successfully, but these errors were encountered: