In R, I want to classify each rows of the data frame by binning the values and using the number (sum) of values in each bin to assign them into 2 groups (classes) by using if-else logic.
Within an R for-loop, I used the R cut and split commands to bin the values by row.
The bins (ranges) are: 1..9, 10..19, 20..29, 30..39, 40..49.
If a row contains 1 pair of values falling in the same bin (range), say 10..19, then it should be classified as "P". If it contains 2 pairs falling into 2 different bins (ranges), then they should be classified as "PP".
Then I created 2 logical statements that use the sum of the values in each bin to create 2 new variables named p and pp returning TRUE or FALSE. Finally, I used p and pp as conditions in the if-else statement to assign each row to either class P (1st row), or class PP (2nd row).
First, I created a data frame x:
n1 <- c(1, 7); n2 <- c(2, 11); n3 <- c(10, 14); n4 <- c(23, 32); n5 <- c(37, 37); n6 <- c(45, 41)
x <- data.frame(n1, n2, n3, n4, n5, n6)
x
n1 n2 n3 n4 n5 n6
1 1 2 10 23 37 45
2 7 11 14 32 37 41
The 1st row should be classified as "P", because it has 1 pair of values (1, 2) falling in the same bin 1..10.
The 2nd row should be classified as "PP", because it has 2 pairs of values (11, 14 and 32, 37) falling in 2 bins: 10..19 and 30..39, accordingly.
So, after creating the data frame x, I created a for-loop:
for(i in nrow(x)){
# binning the data:
bins <- split(as.numeric(x[i, ]), cut(as.numeric(x[i, ]), c(0, 9, 19, 29, 39, 49)))
p <- (sum(lengths(bins) == 2) == 1 & sum(lengths(bins) == 1) == 4) # P - pair of one color
pp <- (sum(lengths(bins) == 2) == 2 & sum(lengths(bins) == 1) == 2 & sum(lengths(bins) == 0) == 1) # PP - pair of two colors
if(p){
x$types <- "P"
} else if(pp){
x$types <- "PP"
} else{
stop("error")
}
}
print(x)
I want to create a new column named types, holding the class P or PP:
n1 n2 n3 n4 n5 n6 types
1 1 2 10 23 37 45 P
2 7 11 14 32 37 41 PP
Instead the code returned only PP:
n1 n2 n3 n4 n5 n6 types
1 1 2 10 23 37 45 PP
2 7 11 14 32 37 41 PP
This is because the loop runs twice over the rows. But if it runs only once, all the rows are classified as "P", instead of "PP". I expect it's something very simple, just was not able to figure it out so far.
Aucun commentaire:
Enregistrer un commentaire