vendredi 21 juillet 2017

Relatively Referencing Observations in a function in R

When writing a function that calculates each observation in a vector, how do I reference said observation to include cells of observations that are a pre-determined number of observations away from the observation currently being operated on? If each row is i, such that i = 1, 2, ..., etc., how do I reference a collumn in row i-1?

Here is a sample data-set that mimics my dilemma:

> letters <- c('a', 'b', 'c', 'b', 'e')
> numbers <- c('1', '', '2', '', '3')
> sample <- cbind(letters, numbers)
> sample
     letters numbers
[1,] "a"     "1"    
[2,] "b"     ""     
[3,] "c"     "2"    
[4,] "b"     ""     
[5,] "e"     "3"  

I would like to fill each empty cell in sample$numbers with the value in sample$numbers from the observation prior. How do I reference the observation being created in its creation? For example, I've tried:

> sample$numbers <- ifelse(sample$numbers == "", sample$numbers[as.numeric(rownames(sample)) - 1], sample$numbers)
Error in sample$numbers : $ operator is invalid for atomic vectors

I've also tried using the common b in sample$letters to fill the missing value:

> f1 <- function(df, cols, match_with, to_x = 'b'){
+   df[cols] <- lapply(df[cols], function(i) 
+     ifelse(grepl(to_x, match_with, fixed = TRUE), sample$numbers[as.numeric(rownames(sample)) - 1], 
+            i))
+   return(df)
+ }
> sample = f1(sample, cols = c('numbers'), match_with = sample$letters)
 Hide Traceback

 Rerun with Debug
 Error in sample$letters : $ operator is invalid for atomic vectors 
5.
grepl(to_x, match_with, fixed = TRUE) 
4.
ifelse(grepl(to_x, match_with, fixed = TRUE), sample$numbers[as.numeric(rownames(sample)) - 
    1], i) 
3.
FUN(X[[i]], ...) 
2.
lapply(df[cols], function(i) ifelse(grepl(to_x, match_with, fixed = TRUE), 
    sample$numbers[as.numeric(rownames(sample)) - 1], i)) 
1.
f1(sample, cols = c("numbers"), match_with = sample$letters) 

My trouble seems to be, in both cases, that I'm using sample$numbers[as.numeric(rownames(sample)) - 1] to reference sample$numbers's value in the previous observation. Is there a better way to do this?

Aucun commentaire:

Enregistrer un commentaire