-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the regex
package in replacement of the built-in re
module
#128
Comments
From 81edc42 we have the following observations after instrumenting and measuring the time for building a regex object (
Despite showing the worst performance regression, |
At the moment, regex will not be integrated in
It could be possible (maybe) to integrate part of #139 to add a layer of abstraction and make |
Use a home-made regex module to replace Python's re. This home-made serves as a layer to "hide" the used of re: the only purpose is to serve as a thin layer between client code and the real regex engine. Supports only compile and escape to promote the use the regex/pattern objects instead of the global functions search/match/find/sub/... This layer can be used later to do advanced cache or to use a different regex engine. This commit was originally development for #128. (cherry picked from commit 5de8a41)
Use a home-made regex module to replace Python's re. This home-made serves as a layer to "hide" the used of re: the only purpose is to serve as a thin layer between client code and the real regex engine. Supports only compile and escape to promote the use the regex/pattern objects instead of the global functions search/match/find/sub/... This layer can be used later to do advanced cache or to use a different regex engine. This commit was originally development for #128. (cherry picked from commit 5de8a41)
Describe the feature you'd like
The idea is to replace the standard re module with the third party regex module.
There are a few reasons to do it:
re
whileregex
can do it (and it can be enforced withconcurrent=True
). The lack of multithreading support byre
prevented to use threads instead of multi-processes inbyexample
which are, obviously, more expensive.byexample
uses heavily the regex engine but all the regex used in a single run are different, unique. This means that the traditional cache ofre
is pointless because it disappears after each run (byexample
is restarted and as any process the OS frees its memory). That means that in each runbyexample
needs to re-compile every single regex which it is very expensive. Pickling is pointless because currentlyre
pickles only the expression and it compiles it when it loads the pickle so we don't save any time.regex
, however, supports pickling the bytecode directly. Note: we should test how much we win with this.byexample
may lead to a catastrophic collapse (endless high CPU usage).re
does not support atomic groups or possessive qualifiers that could reduce the impact of a catastrophic backtracking (see Use a Thompson NFA instead of the Python default re (regex) engine if it is possible #16).re
neither supports timeouts.regex
in the other hand supports all of them.The text was updated successfully, but these errors were encountered: