Regular Expression Engine research

Michiel Eghuizen 7 years ago 0

Isn't there a good possibility to replace the regular expression engine with a faster one? I guess you use "re" from python, but there are faster ones.

Find/replace on large files is rather slow and could crash. For example:

a file of 1.000.000 rows with empty lines, replace "^\n" with "" (without the quotes), to remove the empty lines. This will take a while or crash.

This is because the "re" uses backtracking, which could be rather slow. But there are also regular expression engines which use finite state machine (FSA). You can see this page for more information: http://swtch.com/~rsc/regexp/regexp1.html

Possible options could be (but not limited to):

Seeing the benchmarks on the internet, it could make a lot of difference.