I have a working script that's processing a 450MB csv file with > 1 million rows with 8 columns in a little over 2.5hrs. It maxes a single CPU core. Small files complete quickly (in seconds).
Oddly a 350MB file with similar number of rows and 40 columns only takes 30 mins.
My issue is that the files will grow over time and 2.5 hours tying up a CPU ain't good. Can anyone recommend code optimisation ? A similarly title post recommended local paths - which I'm already doing.
$file = "\Your.csv"
$path = "C:\Folder"
$csv = Get-Content "$path$file"
# Count number of file headers
$count = ($csv[0] -split ',').count
# http://ift.tt/2jbm1OS
$stream1 = [System.IO.StreamWriter] "$path\Passed$file-Pass.txt"
$stream2 = [System.IO.StreamWriter] "$path\Failed$file-Fail.txt"
# 2 validation steps: (1) count number of headers is ge (2) Row split after first col. Those right hand side cols must total at least 40 characters.
$csv | Select -Skip 1 | % {
if( ($_ -split ',').count -ge $count -And ($_.split(',',2)[1]).Length -ge 40) {
$stream1.WriteLine($_)
} else {
$stream2.WriteLine($_)
}
}
$stream1.close()
$stream2.close()
Aucun commentaire:
Enregistrer un commentaire