samedi 14 août 2021

Matching date range from to different data frames in R

I am struggling days now to find a way to match two different tables (unbalanced in row number) in R with the matching range of dates.

The first data frame represents what someone has declared and the second data.frame represents what actually is.The problem is that an id say A might have kept for a week or more a amount of product in kilos in his/her storage and then appear it one week after.

So i have two data frames with different rows. Both number for rows exceed 500.000.The first has 500.000 and the second 520.000.(so some NAs might occur but this doesn't matter).

I want to seach and match from these 2 different data frames:

  1. the ids to are the same,
  2. the product code and the man code to be the same, and
  3. to check if the date in the second data frame is in the range of a month from the date in the first data frame (for example he/she declares a weight (100 kg) in 2020/1/1 and appears it in 2020/1/15 but the actual weight is 105).

and if all these conditions are true to sum the two weights (from the data frames 1 and 2)

The desired output must be a data frame with the matched id, range of date, the product , the man, the 2 columns weights and the last column to be the values that will be the sum of the weights.

the example data frame are the two data frames in R below :


id = rep("A",10)
date = seq(as.Date("2000/1/1"), by = "day", length.out = 10)
product = c("E1","E1","E2","E3","E3","E4","E1","E1","E5","E5")
man = c("PS","PS","PS","DDR","DDR","FFR","PS","PS","BA","BA")
weight = c(1505,300,259,231,140,150,300,112,203,2200) 
data1 = data.frame(id,date,product,man,weight);data1


id2 = rep("A",10)
date2 = sample(seq(as.Date("2000/1/10"), by = "day", length.out = 10),10)
product2 = c("E3","E3","E1","E1","E1","E1","E5","E4","E5","E2")
man2 = c("DDR","DDR","PS","PS","PS","PS","BA","FFR","BA","PS")
weight2 = c(259,295,260,240,135,145,250,110,2000,210) 
data2 = data.frame(id2,date2,product2,man2,weight2);data2

How can I match them with the ginned date latency condition ?

Any help ?

Aucun commentaire:

Enregistrer un commentaire