Python string manipulation -- performance problems -
I have the code of the following code that I execute in my application almost 2 million times so that several records can be parsed . This part is being hampered and I was thinking that if there are any such nifty moves which can help me by suggesting that these simple string manipulals are sharp.
try: data = [] for start = 0 end = 0 = column (): end = start + (info.columnLength) piece = line [start: end] if the piece == '' or lane (piece)! = Info.columnLength: Increase the 'wrong input' if info.hasSignage: if (slice [0: 1] .stip ()! = '+' And slice [0: 1]. Stream ()! = '-'): Increase "wrong input" if not info.skipColumn: Data.append (slice) start = end parsedLine = excluding data: parsedLine = false Edit: I'm changing this answer a bit. I will leave the original answer below. In my other answer, I have commented that the best thing to do is to find an underlying Python module which will have to be unpacking. I could not think of one, but maybe I should have been searching Google for one. @John Machinon gave an answer that showed how to do this: Use the Python I agree that the argument in the original code is "unpathic" is not best to return a watchful value; Better to return a valid value or increase exceptions Another way to do this is to return a list of valid values, as well as a second list of invalid values. Since @John Menon had offered the code to present valid values, I thought I would write a version here, which gives two lists. Note: Perhaps the best possible answer is to answer John Manchen's answer and modify it possibly to save the invalid values in a file for later review. Answers one at a time, so there is no need to make a large list of purse records; And saving bad lines to disk means that there is no need to make possibly the largest list of bad lines. original answer text: This answer should be very fast if This code shows That's how to calculate the struct module as it is written in C, so it should be faster than my pure Python solution. (I have not actually measured anything, so it is estimated.)
import structure def parse_records (self): "" "A Tulip Gives: (Good, Bad) is a list of good valid records (as Tuples) is a list of bad tuples (Onboard LINE_NAME, line fault) "" "cols = self.Columns () unpack_fmt =" "sign_checks = [] start = 0 Colx, information enumerate (column, 1): clen = info.columnLength if clen & lt; 1: Increase ValueError ( "column% d: bad column Lang% r"% (colx, clen)) that info.skip column: unpack_fmt + = str (clen) + "x" else: unpack_fmt + = str (clen) + "s" If info.hasSignage: sign_checks.append (Start) Start + = Clen Apekshit_elan = start unpack = struct.Struct (unpack_fmt) Kanupak good = [] bad = [] line for Lain_anu, enumerate (self.which_the_list_of_lines_is , 1): If LAN (line)! = Apekshit_len: Krab.apend (Lain_anyuem line, "bad" length)) If all are (row [i] in sign_checks for I the '+ -'): bad.append ((
is itself. Column information is same on all records. Once processed, and create two lists in which our records need to be processed.
parse list , but does not really produce it or returns it or does anything with it, obviously you will need to change it.
def parse_records (self): cols = self.Column () slice = [] sign_checks = [] start for information cols = 0: if info column Lang & lt; 1: Increase ValueError, " Bad column lang "end = start + info.columnLength if the information is not. SkipColumn: tup = (start, end) slice. Append (tup) if info.hasSignage: sign_checks.append (start ) Using expect_len = end # or to try an interval (end -1): For the line itself. Whatever _the_list_of_lines_is: if LAN (line)! = Expected_Line: Increase value, if not all (in line [i] in sign_checks for '+ -' I): Increase valueError, "wrong input" parsedline = [for line [s: e] , In slice] ValueError: parsedLine = False
Comments
Post a Comment