You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that some files get stuck in the while true loop of parse_table dealing with footnotes, never hitting break. Took me a while to debug this.
The infinite loops I ran into would forever call into the get_footnote function, which happens within the “while True” statement. To validate, I commented out that code block dealing with footnotes and setting footnote = None right above the while loop....problem solved.
Note: the code would perform A LOT better if the regexes used would be pre-compiled rather than having to be compiled each time during parsing each line. It may not matter once the system is up and just has to parse 1 file a day, but for initial bulk parsing it does make a big difference :).
There is also an assignment of an undefined variable into key “raw_table”, also within this function. At the start this key is populated with variable called “text”. But then a few lines below its reassigned with an undefined variable. It happens 2 times within the function.
Caught this while compiling the code with cython (as I was debugging the issue thinking it was performance related).
Cython found that, and found the unused function at the bottom of this file.
It appears that some files get stuck in the while true loop of parse_table dealing with footnotes, never hitting break. Took me a while to debug this.
The infinite loops I ran into would forever call into the get_footnote function, which happens within the “while True” statement. To validate, I commented out that code block dealing with footnotes and setting footnote = None right above the while loop....problem solved.
Note: the code would perform A LOT better if the regexes used would be pre-compiled rather than having to be compiled each time during parsing each line. It may not matter once the system is up and just has to parse 1 file a day, but for initial bulk parsing it does make a big difference :).
There is also an assignment of an undefined variable into key “raw_table”, also within this function. At the start this key is populated with variable called “text”. But then a few lines below its reassigned with an undefined variable. It happens 2 times within the function.
Caught this while compiling the code with cython (as I was debugging the issue thinking it was performance related).
Cython found that, and found the unused function at the bottom of this file.
Sent with GitHawk
The text was updated successfully, but these errors were encountered: