In R, I want to add a column named "starts" to a data frame indicating the numeric bin at the start of each row. The remaining bins in that same row should not be included, which might be the key to fixing the code given below.
The bins are: - ones (numbers 0-9) to be encoded as 0 - tens (10-19) to be encoded as 1 - twenties (20-29) to be encoded as 2 - thirties (30-39) to be encoded as 3 - forties (40-46) to be encoded as 4 For example: - If the row starts with 3 numbers in the range 0-9, like: 1 3 5 16 34 43, there should be 000 in the starts column, because the row starts with 3 "ones". - If the row starts with 12, 16, 32, 42, 45, 47 there should be 11 in the starts column; - If the row starts 32, 36, 30, 42, 45, 48, the starts column shall hold the string 333. I know that the individual functional pieces of the code work by themselves, my problem is that I cannot figure how to modify them when they are in the for-loop with the nested if-else statement. To test the code, I created the following example data frame:
n1 <- c(1, 7); n2 <- c(2, 11); n3 <- c(10, 14); n4 <- c(23, 32); n5 <- c(37, 37); n6 <- c(45, 41)
x <- data.frame(n1, n2, n3, n4, n5, n6)
x
n1 n2 n3 n4 n5 n6
1 1 2 10 23 37 45
2 7 11 14 32 37 41
#starts <- character(nrow(x)) # might be helpful to convert to string
for(i in nrow(x)){
# match the numbers at the start of the row
ones <- grep("^[0-9]$", x)
tens <- grep("^[1][0-9]$", x)
twenties <- grep("^[2][0-9]$", x)
thirties <- grep("^[3][0-9]$", x)
forties <- grep("^[4][0-9]$", x)
# classifying starts
# using rep() to return 0, 1, 2, 3, 4 times the length of ones, tens, twenties, thirties or forties, respectfully and paste() with collapes="", to paste as string:
if(any(ones)){
x[i]$starts <- paste(rep("0", each=length(ones)), collapse="")
} else if(any(tens)){
x[i]$starts <- paste(rep("1", each=length(tens)), collapse="")
} else if(any(twenties)){
x[i]$starts <- paste(rep("2", each=length(twenties)), collapse="")
} else if(any(thirties)){
x[i]$starts <- paste(rep("3", each=length(thirties)), collapse="")
} else if(any(forties)){
x[i]$starts <- paste(rep("4", each=length(forties)), collapse="")
} else(stop("error"))
}
x # print x
I expect the output to be:
n1 n2 n3 n4 n5 n6 starts
1 1 2 10 23 37 45 000
2 7 11 14 32 37 41 00
But the program just prints the "Error: error" message from the last line of the if-else statement. I guess, this is because in the above code, the lines with the grep command match not only the numbers at the beginning of the row but all remaining numbers till the end, if the regular expression returns a match. So, the if-else statement just cascades down to the last else(stop("error")) condition.
Aucun commentaire:
Enregistrer un commentaire