Powershell, R, Import-Csv, select-object, Export-csv -


i'm performing several tests using different approaches cleaning big csv file , importing r.

this time i'm playing powershell in windows.

while things work , accurate when using cut() pipe(), process horribly slow.

this command:

shell(shell = "powershell",                "import-csv in.csv |                 select-object col1, col2, etc |                 export-csv new.csv") 

and these system.time() results:

   user  system elapsed     0.61    0.42 1568.51  

i've seen other posts use c# via streaming taking couple of dozens of seconds, don't know c#.

my question is, how can improve powershell command in order make faster?

thanks,

diego

there's fair amout of overhead in reading in csv, converting rows powershell objects, , converting csv. doing through pipeline way causes 1 record @ time. should able speed considerably if switch using get-content -readcount parameter, , extracting data using regular expression in -replace operator, e.g.:

shell(shell = "powershell",                "get-content  in.csv -readcount 1000 |                 foreach { $_ -replace '^(.+?,.+?),','$1' |                 add-content new.csv") 

this reduce number if disk reads, , -replace functioning array operator, doing 1000 records @ time.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -