Python: Faster regex replace -
I have a large set of large files and a set of "phrases" that need to be replaced in each file .
"Commercial logic ..." bans many restrictions: \ n \ n
My solution (see below) is a little bit slow and how can it be optimized in terms of string and replacement?
data = open ("INPUT__FILE"). Phrase in phrases for read () o = open ("OUTPUT_FILE", "w"): # These are the set of words I'm talking about B1, B2 = string. Strip ( ). Partition ("") regex = re.compile (r "% s \ * \ t * \ n *% S"% (b1, b2), re.IGNORECASE) data = regex.sub (b1 + "_" + b2 , Data) o. Write (Data) Update : By changing all the text to less case and re.IGNORECASE
You can reconstruct your regexp for each file: For the phrase in phrases < B>, b2 = str (phrase):. Partition ("") for precompiled.append (b1 + "_" + b2, re.compile (r "% s \ * \ t * \ n *% s"% (b1, b2, redirector)) In (Input, Output): Open (Output, "W") as O: Input for I: Data = i.read () (Pattern, Regex) in precompiled: data = Regex.sub (pattern, data) o.write (data)
This is the same f or a file, but if you are repeating several files then you can reuse the regexes are doing.
Disclaimer: Can be in unused, typo.
[ update ], you can simplify regexp by changing different codes from \ s * . I suspect that you have a bug in which you want to match "\ t" and not currently.
Comments
Post a Comment