vendredi 27 janvier 2017

Speed of Powershell Script. Optimisation sought

I have a working script that's processing a 450MB csv file with > 1 million rows with 8 columns in a little over 2.5hrs. It maxes a single CPU core. Small files complete quickly (in seconds).

Oddly a 350MB file with similar number of rows and 40 columns only takes 30 mins.

My issue is that the files will grow over time and 2.5 hours tying up a CPU ain't good. Can anyone recommend code optimisation ? A similarly title post recommended local paths - which I'm already doing.

$file = "\Your.csv"

$path = "C:\Folder"

$csv  = Get-Content "$path$file"

# Count number of file headers
$count = ($csv[0] -split ',').count

# http://ift.tt/2jbm1OS
$stream1 = [System.IO.StreamWriter] "$path\Passed$file-Pass.txt"
$stream2 = [System.IO.StreamWriter] "$path\Failed$file-Fail.txt"

# 2 validation steps: (1) count number of headers is ge (2) Row split after first col.  Those right hand side cols must total at least 40 characters.
$csv | Select -Skip 1 | % {
  if( ($_ -split ',').count -ge $count -And ($_.split(',',2)[1]).Length -ge 40) {
     $stream1.WriteLine($_)
  } else {
     $stream2.WriteLine($_) 
  }
}
$stream1.close()
$stream2.close()

Aucun commentaire:

Enregistrer un commentaire