Comparing CSV files - python 3 (beginner) -
i have 2 csv files. reading without csv reader because there inconsistencies in lines - lines have quotations , not, , throwing off csv reader. files both of same format, have different entries this:
a b c d e f g h j h j k "a b c d e f g h j h j k j" "a b c d e f g h j h j k j"
what need find lines in file 1 , file 2 have have same value third column (c). note rest of values quite different don't think difflib work, unless i've missed something.
at first tried using nested loop -
for line in fileone: entry=line.split() print ("a") row in filetwo: space=row.split print ("b") if space[2]=entry[2]: outputhandle.write(line)
but found using print statements outputting
b b b
i need script check through lines of second file each line in first file this:
b b b b b b....etc
(this expensive, know. staring out, not sure how more efficiently, sadly)
i tried using function:
def file_check(variablename): row in filetwo: return("b") if entry in row: return ("found") return("not found") line in fileone: entry= line.split() print ("a") var=file_check(entry[2]) print (var)
this outputs: ('not found') ('not found') ('not found')
since using test files, know there matching entries , not looping through second file, rather checking first line.
sorry ask such basic question, stackoverflowians, i'm stuck time. advice welcome , appreciated!!!
note: question has been asked before, answers work python 2, csv module python 3 seems different. here previous version of question: comparing 2 csv files based on specific data in 2 columns
i'm not whether mean want find how many lines in b have same value field 3, each line in file does, or match lines both files share same value field 3.... i'm going assume latter.
how sorting each file's lines third column before start?
if that, can read down file a, , each time file a's value in field 3 changes, print records new value , switch handling file b:
arecord = read file while not eof on file a: currentkey = field 3 of arecord print "\n" + arecord arecord = read file while field 3 of arecord == currentkey print arecord while field 3 of brecord < currentkey: brecord = read file b while field 3 of brecord == currentkey: print brecord
because sorted both file field 3, results in 1 quick pass.
if reason need lines in order @ end, add original record-number additional field before start, sort afterwards, , remove field.
if add field says file each line came from, can put files , sort 2 keys: field 3 , "which file came from" field, , results in 1 shot.
caveat: usual *nix "sort" command (like most/all other *nix "field"-related commands) can't deal quoted fields. may have rids of quoting first. "sort" isn't happy unicode, if there non-ascii characters in data use "msort" or instead.
hope helps.
Comments
Post a Comment