Powershell, R, Import-Csv, select-object, Export-csv -
i'm performing several tests using different approaches cleaning big csv file , importing r.
this time i'm playing powershell in windows.
while things work , accurate when using cut() pipe(), process horribly slow.
this command:
shell(shell = "powershell", "import-csv in.csv | select-object col1, col2, etc | export-csv new.csv")
and these system.time() results:
user system elapsed 0.61 0.42 1568.51
i've seen other posts use c# via streaming taking couple of dozens of seconds, don't know c#.
my question is, how can improve powershell command in order make faster?
thanks,
diego
there's fair amout of overhead in reading in csv, converting rows powershell objects, , converting csv. doing through pipeline way causes 1 record @ time. should able speed considerably if switch using get-content -readcount parameter, , extracting data using regular expression in -replace operator, e.g.:
shell(shell = "powershell", "get-content in.csv -readcount 1000 | foreach { $_ -replace '^(.+?,.+?),','$1' | add-content new.csv")
this reduce number if disk reads, , -replace functioning array operator, doing 1000 records @ time.
Comments
Post a Comment