could this be faster with Set instead of List #41

psynautic · 2022-01-26T17:09:32Z

My colleague was working with this library for some NLP stuff, and he was trying to manipulate the CENSOR_WORDS for reasons not particularly important for this question.

It got me wondering, wouldn't this all go a lot faster if CENSOR_WORDS was a set(). Forgive me if I'm wasting your time, I didn't FULLY trace the code.

It seems to me that a lookup against a very large set of words or phrases would always be faster if you had a Set because it works as a hash table under the python covers.

DeathDragon7050 · 2022-01-29T05:20:05Z

You are right that using a list is far slower than using a set. I did this to solve the issue, given that you don't edit the censor list afterwards.

from better_profanity import varying_string
varying_string.VaryingString.__hash__ = lambda self : hash(self._original)
import better_profanity
# make your edits to the censor list here
better_profanity.profanity.CENSOR_WORDSET = frozenset(better_profanity.profanity.CENSOR_WORDSET)

If you want everything to work, you are going to need to make all uses of the CENSOR_WORDSET work with sets and not list. The code in the main file is only ~250 lines so it would be easy enough. Otherwise this gets the job done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

could this be faster with Set instead of List #41

could this be faster with Set instead of List #41

psynautic commented Jan 26, 2022 •

edited

Loading

DeathDragon7050 commented Jan 29, 2022

could this be faster with Set instead of List #41

could this be faster with Set instead of List #41

Comments

psynautic commented Jan 26, 2022 • edited Loading

DeathDragon7050 commented Jan 29, 2022

psynautic commented Jan 26, 2022 •

edited

Loading