vendredi 15 novembre 2019

Merge more than 2 dataframes together depending on column value in R

I would like to merge a dataframe with multiple dataframes depending on a value in two columns. I know I can merge two dataframes based on an element in a column using the merge() function, but I dont know how to do it when there are more than 2 dataframes.

For example, take this as the primary dataframe:

yr.col <- c(rep("2018",3), rep("2017",4), rep("2016",5))
mnth.col <- sample.int(4,12, replace = TRUE)
lon <- c(paste(1:12,"x"))
lat <- c(paste(1:12,"y"))

df <- data.frame(yr.col,lon,lat)

These are the other dataframes, which have the temperature for the set of lon and lat in different years.

tmp_18 <- sample.int(8,12,replace = TRUE)
tmp_17 <- sample.int(8,12,replace = TRUE)
tmp_16 <- sample.int(8,12,replace = TRUE)

env_18 <- data.frame(tmp_18,lon,lat)
env_17 <- data.frame(tmp_17, lon, lat)
env_16 <- data.frame(tmp_16, lon, lat)

Aim: I want to merge df with either env_18 env_17 or env_16 depending on df$yr.col

Expected result: A dataframe with a new column called tmp where the number from the correct env datasets are found

Previously tried:

1)

if (df$yr.col=="2018"){
  df.new$tmp <- merge(df,env_18, by=c("lon", "lat"))
  df.new$tmp.yr <- "2018"
}else if (df$yr.col=="2017"){
  df.new$tmp <- merge(df, env_17, by=c("lon", "lat"))
  df.new$tmp.yr <- "2017"
} else {
  df.new$tmp <- merge(df, env_16, by=c("lon", "lat"))
  df.new$tmp.yr <- "2016"}

produces this warning:

Warning message:
In if (df$yr.col == "2018") { :
  the condition has length > 1 and only the first element will be used

It only takes the first dataframe env_18 and merges that with df

I have also tried 2)

df.new <- ifelse(df$yr.col=="2018", merge(df, env_18, by=c("lon", "lat")), 
                       ifelse(df$yr.col=="2017", merge(df, env_17, by=c("lon", "lat")),
                              ifelse(df$yr.col=="2016", merge(df, env_16, by=c("lon", "lat")), "NA")))
df.new <- data.frame(matrix(unlist(df.new), nrow=length(df.new)))

but this does not give the desired outcome.

Is there some magic way to do this that I have not condisered or have I made an error? Perhaps a for-loop or function?
Thank you so much for your help in advance! I really appreciate it :))

Aucun commentaire:

Enregistrer un commentaire