mardi 27 janvier 2015

How to apply a function with if statement in ddply or any kind of apply()?

First let's generate some sample data and install plyr and data.table package:



library("plyr", lib.loc="~/R/win-library/3.1")
library("data.table", lib.loc="~/R/win-library/3.1")
x<-seq(1:12)
y<-rep(seq(1:4),3)
z<-c(rep("a",6),rep("b",6))
t<-rep(seq(2005,length.out=6),2)
df<-data.table(t,x,y,z)
setkey(df,z,t)


this will yield a table:



t x y z
1: 2005 1 1 a
2: 2006 2 2 a
3: 2007 3 3 a
4: 2008 4 4 a
5: 2009 5 1 a
6: 2010 6 2 a
7: 2005 7 3 b
8: 2006 8 4 b
9: 2007 9 1 b
10: 2008 10 2 b
11: 2009 11 3 b
12: 2010 12 4 b


Now the job is: separate this data.frame into two small data set according to z. in each set, if y > lag(y,k=1)(i.e y>previous y). then apply function i=y/lag(y,k=1), otherwise, apply function i=-y/lag(y,k=1).


The approach I tried is following:



#####define a function f
f<-function(x,y)
{ if (y>lag(y,k=1)) {i<-y/lag(y,k=1)}
else{i<--y/lag(y,k=1)}
return (i)
}
#######using ddply to apply function to subset
v<-ddply(df,.(z),summarize,i=f(x,y))


However this will return error massages saying:



Error in attributes(column) <- a :
invalid time series parameters specified
In addition: Warning messages:
1: In if (y > lag(y, k = 1)) { :
the condition has length > 1 and only the first element will be used
2: In if (y > lag(y, k = 1)) { :
the condition has length > 1 and only the first element will be used


I think I made some mistake during the coding and more importantly, it seems that my if statement doesn't looping in the function. Anyone have any idea how to correct this problem??


Thank you very much for your help in advance!!!


Aucun commentaire:

Enregistrer un commentaire