samedi 10 juillet 2021

Loop function in r to compare values of different data frames

Introduction

Hi to everyone,

for a little project, I try to get a function to compare values of a Data Frame 1 with values from a Data Frame 2. Thereafter, data frames 3 and 4 are supposed to get printed with the information of the comparison.

Data Frame 1:

ID x1i x2i x3i
a 1 2 4
b 1 4 1

Data Frame 2:

Data_Frame_2 <- c(1:4)

Read x1a and compare with Data Frame 2. The value 1 is in Data Frame 2. Print value 1 and the name of the variable (x1a) in Data Frame 3 and cross out the value 1 from Data Frame 2.

Read x1b and compare with Data Frame 2. The value 1 is (not anymore) in Data Frame 2. Read x2b. The value 4 is in Data Frame 2. Print value 4 and the name of the variable (x2b) in Data Frame 3 and cross out the value 4 from Data Frame 2.

The Data Frame 3 is supposed to be something like this:

Data Frame 3:

ID Value Variable
a 1 x1i
b 4 x2i

Data Frame 4 (the remaining numbers of Data Frame 2):

Remaining numbers
2
3

Example in R to solve this theoretical problem

Until now, I worked out this code which does the job:

    b <- as.data.frame(c(1:4)) # data frame 2
    colnames(b, do.NULL = FALSE)
    colnames(b) <- c("b")
    View(b)

    a <- as.data.frame(cbind(c("a","b"), c(3,3), c(2,1), c(1,2))) # data frame 1
    colnames(a, do.NULL = FALSE)
    colnames(a) <- c("ID","x1i","x2i","x3i")
    View(a)

    `%notin%` <- Negate(`%in%`) #got this one from <https://www.marsja.se/how-to-use-in-in-r/>
    Read_Info <- function(a,b)
    {
      if (a[1,2] %in% b[1:4,1]) {c_1<-c(a[1,1:2],names(a)[2]); b1<-subset(b,b %notin% a[1,2])} 
      if (a[2,2] %in% b1[1:3,1]) {c_2<-c(a[2,1:2],names(a)[2]); b2<-subset(b,b %notin% c(a[1,2],a[2,2]))} 
      else if (a[2,3] %in% b1[1:3,1]) {c_2<-c(a[2,1],a[2,3],names(a)[3]); b2<-subset(b,b %notin% c(a[1,2],a[2,3]))} 
      if (a[3,2] %in% b1[1:2,1]) {c_3<-c(a[3,1],a[3,2],names(a)[2]); b3<-subset(b,b %notin% c(a[1,2],a[2,3],a[3,2]))} 
      else if (a[3,2] %notin% b1[1:2,1]) {c_3<-c(NA,NA,NA); b3<-b2} 
      c<-rbind(c_1,c_2,c_3)
      colnames(c, do.NULL = FALSE)
      colnames(c) <- c("ID","Value","Variable")
      bx<-b3
      colnames(bx, do.NULL = FALSE)
      colnames(bx) <- c("Remaining numbers")
      print(c)
      print(bx)
    }

    Read_Info(a,b)

    # In this example, c is data frame 3 and bx is data frame 4

Actual Task at hand - If, else if Loop Function in R

I do face the following obstacle: the actual data which I have is a little bit larger than the above example. Nevertheless, it follows the same structure:

    b <- as.data.frame(c(1:20)) # this would be Data Frame 2 in the theoretical considerations
    colnames(l, do.NULL = FALSE)
    colnames(l) <- c("b")
    View(l)

    # This would be data frame 1 in the theoretical considerations
    # Note: between "ID" and "x1i", there are now two additional variables which were not in the example above
    # Although these two variables are part of the data, they are not of interest right know
    a2 <- cbind(c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t"),c(0),c(1))
    a1 <- data.frame(replicate(16,sample(1:20,rep=T)))
    a <- cbind(a2, a1)
    colnames(a, do.NULL = FALSE)
    colnames(a) <- c("ID","variable1","variable2","x1i","x2i","x3i","x4i","x5i","x6i","x7i","x8i","x9i","x10i","x11i","x12i","x13i","x14i")
    View(a)

I try to create an “if”, “else if” loop function utilizing "for" which is supposed to do this reading task by itself. Until now, I wrote down the following code which does not work yet.

    `%notin%` <- Negate(`%in%`) # got this one from <https://www.marsja.se/how-to-use-in-in-r/>
    Read_Info_Loop <- function(a,b)
      {for (i in 1:20) 
    { if (a[i,4] %in% b[1:(21-i),1]) {x[i]<-c(a[i,1],a[i,4],names(a)[4]); b[i]<-subset(b,b %notin% a[i,4])} 
      if (a[i,5] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,5],names(a)[5]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,5]))
      } else if (a[i,6] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,6],names(a)[6]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,6]))
      } else if (a[i,7] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,7],names(a)[7]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,7]))
      } else if (a[i,8] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,8],names(a)[8]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,8]))
      } else if (a[i,9] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,9],names(a)[9]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,9]))
      } else if (a[i,10] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,10],names(a)[10]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,10]))
      } else if (a[i,11] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,11],names(a)[11]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,11]))
      } else if (a[i,12] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,12],names(a)[12]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,12]))
      } else if (a[i,13] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,13],names(a)[13]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,13]))
      } else if (a[i,14] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,14],names(a)[14]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,14]))
      } else if (a[i,15] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,15],names(a)[15]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,15]))
      } else if (a[i,16] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,16],names(a)[16]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,16]))
      } else if (a[i,17] %in% b[i-1][1:(21-i),1]) {x[i]<-c(a[i,1],a[i,17],names(a)[17]); b[i]<-subset(b,b %notin% c(a[1,4],a[i,17]))
      } else if (a[i,17] %notin% b[1:(21-i),1]) {x[i]<-c(NA,NA,NA); b[i]<-c(b[i-1])}
    y<-rbind(x[i[1:20]]) 
              colnames(y, do.NULL = FALSE)
              colnames(y) <- c("ID","Value","Variable")
    u<-rbind(b[i=20])
              colnames(u, do.NULL = FALSE)
              colnames(u) <- c("Remaining numbers")
        print(y)
        print(u)

      }
      }
    # y is supposed to be data frame 3 and u is supposed to be data frame 4 
    # in the above theoretical considerations 

Errors

I now get the following errors:

    Error in `[<-.data.frame`(`*tmp*`, i, value = c("a", "1", "x3i")) : 
      replacement has 3 rows, data has 4

    Error in Read_Info_Loop(test, l) : object 'x' not found

...nevertheless, the first error, I got yesterday. Today, after restarting R, the second error occurred which seems to address internal structural problems of the function code. Additionally, I am pretty sure, that there might be further errors which are right now "hidden" behind the other errors and which will occur as soon as the two above mentioned errors are dealt with.

However, I do not want you to just solve any problems. I rather would like to ask, if you have ideas how I can solve these two specific errors, and maybe a hint to just get the function a little bit closer to work properly. So, for me the focus is clearly on learning a thing or two in general.

A few disclaimers: I have little experience in programming, so the code or my descriptions are probably rather messy. Therefore, if you have any questions for clarification, please feel free to ask. I try to respond as quickly as possible. English is not my first language, so please excuse me for any language mistakes.

I am looking forward to learning and hear your ideas about the code itself, ideas regarding the theoretical considerations or the approach to the loop function.

Kind Regards

Paul

Aucun commentaire:

Enregistrer un commentaire