Python: Faster regex replace -

- March 15, 2013

I have a large set of large files and a set of "phrases" that need to be replaced in each file .
"Commercial logic ..." bans many restrictions: \ n \ n

Matching should be insensitive

White pages, tabs and new rows can not be ignored in regedx

My solution (see below) is a little bit slow and how can it be optimized in terms of string and replacement?

  data = open ("INPUT__FILE"). Phrase in phrases for read () o = open ("OUTPUT_FILE", "w"): # These are the set of words I'm talking about B1, B2 = string. Strip ( ). Partition ("") regex = re.compile (r "% s \ * \ t * \ n *% S"% (b1, b2), re.IGNORECASE) data = regex.sub (b1 + "_" + b2 , Data) o. Write (Data)     Update : By changing all the text to less case and  re.IGNORECASE    
 
  You can reconstruct your regexp for each file: For the phrase in phrases < B>, b2 = str (phrase):. Partition ("") for precompiled.append (b1 + "_" + b2, re.compile (r "% s \ * \ t * \ n *% s"% (b1, b2, redirector)) In (Input, Output): Open (Output, "W") as O: Input for I: Data = i.read () (Pattern, Regex) in precompiled: data = Regex.sub (pattern, data) o.write (data)   
 This is the same f or a file, but if you are repeating several files then you can reuse the regexes are doing.  
 Disclaimer: Can be in unused, typo.  
 [ update ], you can simplify regexp by changing different codes from  \ s * . I suspect that you have a bug in which you want to match  "\ t"  and not currently.   

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps

Comments Post a Comment

Search This Blog

Lay Page

Python: Faster regex replace -

Comments

Post a Comment

Popular posts from this blog

mysql - BLOB/TEXT column 'value' used in key specification without a key length -

c# - Using Vici cool Storage with monodroid -

c# - Client template not working with custom helper action methods -