mardi 4 septembre 2018

Reorganizing R dataframe

I have several decades of twice daily data with the following structure

str(Raw.Data)
'data.frame':   709400 obs. of  7 variables:
 $ V1: int  254 1 2 3 9 4 4 4 4 4 ...
 $ V2: Factor w/ 448 levels "0","100","1000",..: 1 40 11 448 286 4 24 23 20 17 ...
 $ V3: Factor w/ 18039 levels "","-1","-10",..: 99 15749 6714 18039 13326 4244 4221 12375 14708 16000 ...
 $ V4: Factor w/ 3509 levels "","-1","-10",..: 3503 3034 3496 1 2176 3496 1219 2878 33 149 ...
 $ V5: Factor w/ 1295 levels "","-1","-10",..: 1092 1273 1019 1 992 1295 1254 40 187 192 ...
 $ V6: int  NA 353 99999 NA 230 99999 163 202 238 262 ...
 $ V7: int  NA 99999 0 NA 40 99999 50 40 70 60 ...

In a spreadsheet like format the first day of data looks like this:

254 0   1   JUN 1957    NA  NA
1   94823   72520   40.50N  80.22W  353 99999
2   2000    2000    99999   13  99999   0
3   PIT ms          NA  NA
9   9780    353 234 105 230 40
4   10000   157 99999   99999   99999   99999
4   8500    1566    143 64  163 50
4   7000    3168    34  -133    202 40
4   5000    5815    -127    -266    238 70
4   4000    7483    -231    -270    262 60
4   3000    9517    -414    99999   258 150
4   2500    10726   -530    99999   260 170
4   2000    12128   -638    99999   271 230
254 12  1   JUN 1957    NA  NA
1   94823   72520   40.50N  80.22W  353 99999
2   1000    1500    1690    15  7   0
3   PIT ms          NA  NA
9   9770    353 168 113 135 40
4   10000   153 99999   99999   99999   99999
4   8500    1537    119 89  216 80
4   7000    3133    16  4   221 70
4   5000    5779    -132    -182    249 90
4   4000    7444    -240    -314    262 90
4   3000    9469    -414    99999   272 120
4   2500    10682   -511    99999   289 130
4   2000    12097   -608    99999   291 150
4   1500    13868   -630    99999   291 160
4   1000    16400   -611    99999   298 110

I want reorganize the data so that the first day of data is reduced to this:

0   1   JUN 1957    9780    353 234 105 230 40
12  1   JUN 1957    9770    353 168 113 135 40

To do this I need cells 2:5 for rows that begin with "254" and cells 2:7 for rows that begin with "9".

I developed the following code, but it doesn't even make it through the first if statement in the first iteration of the for loop. Maybe this is a problem with data type or indexing?

leng <- dim(Raw.Data)[1]
Processed.Data <- as.data.frame(matrix(0,ncol = 10, nrow = 42000))
i <- 1:leng
count <- 1
for (i in 1:leng){
  if(Raw.for.R[i,1]==254){
    Surface.Obs[count,1:4]<-Raw.for.R[i,2:5]
  } else if(Raw.or.R$V1[i,1]==9){
    Surface.Obs[count,5:10]<-Raw.for.R[i,2:7]
  }
  count <- count +1
}

When the code is run I get the following warning messages:

1: In if (Raw.Data[i, 1] == 254) { :
  the condition has length > 1 and only the first element will be used
2: In `[<-.data.frame`(`*tmp*`, count, 1:4, value = list(V2 = c(1L,  :
  replacement element 1 has 709400 rows to replace 1 rows
3: In `[<-.data.frame`(`*tmp*`, count, 1:4, value = list(V2 = c(1L,  :
  replacement element 2 has 709400 rows to replace 1 rows
4: In `[<-.data.frame`(`*tmp*`, count, 1:4, value = list(V2 = c(1L,  :
  replacement element 3 has 709400 rows to replace 1 rows
5: In `[<-.data.frame`(`*tmp*`, count, 1:4, value = list(V2 = c(1L,  :
  replacement element 4 has 709400 rows to replace 1 rows
6: In `[<-.factor`(`*tmp*`, iseq, value = 99L) :
  invalid factor level, NA generated
7: In `[<-.factor`(`*tmp*`, iseq, value = 3503L) :
  invalid factor level, NA generated
8: In `[<-.factor`(`*tmp*`, iseq, value = 1092L) :
  invalid factor level, NA generated

Any help resolving just one of my many problems will be greatly appreciated!

P.S. If you have some ideas of how to insert blank rows for missing dates that might save me an extra question later.

Thank you!
Evan

Aucun commentaire:

Enregistrer un commentaire