jeudi 20 avril 2017

R calculate value based on another row's

I have a question about calculating some values based on another row's.
First, I show my part of data:

Type    Time            PM
Ac      2016-01         7710
Ac      2016-02         27500
Ac      2016-03         12840
Ac      2016-04         20060
Ac      2016-05         9430
Ac      2016-06         9830
Ac      2016-07         3870
Ac      2016-08         12040
Ac      2016-09         5590
Ac      2016-10         4300
Ac      2016-11         3870
Ac      2016-12         2580
As      2016-01         315052
As      2016-02         478542
As      2016-03         491472
As      2016-04         662238
As      2016-05         464728
As      2016-06         530424
As      2016-07         443122
As      2016-08         603512
As      2016-09         564438
As      2016-10         440068
As      2016-11         462846
As      2016-12         525394
HW      2015-12         7250

Second, what I want is like following(I subset Type Ac, Time 2016-01 & 2016-03 & 2016-10 as example):

Type    Time    Time2   LapseMonth  PM      What I want
Ac  2016-01     2016-01     0       7710    315.9836066
Ac  2016-01     2016-02     1       0       610.9016393
Ac  2016-01     2016-03     2       0       653.0327869
Ac  2016-01     2016-04     3       0       631.9672131
Ac  2016-01     2016-05     4       0       653.0327869
Ac  2016-01     2016-06     5       0       631.9672131
Ac  2016-01     2016-07     6       0       653.0327869
Ac  2016-01     2016-08     7       0       653.0327869
Ac  2016-01     2016-09     8       0       631.9672131
Ac  2016-01     2016-10     9       0       653.0327869
Ac  2016-01     2016-11     10      0       631.9672131
Ac  2016-01     2016-12     11      0       653.0327869
Ac  2016-03     2016-03     0       12840   526.2295082
Ac  2016-03     2016-04     1       0       1052.459016
Ac  2016-03     2016-05     2       0       1087.540984
Ac  2016-03     2016-06     3       0       1052.459016
Ac  2016-03     2016-07     4       0       1087.541
Ac  2016-03     2016-08     5       0       1087.541
Ac  2016-03     2016-09     6       0       1052.459
Ac  2016-03     2016-10     7       0       1087.541
Ac  2016-03     2016-11     8       0       1052.459
Ac  2016-03     2016-12     9       0       1087.541
Ac  2016-10     2016-10     0       4300    176.2295082
Ac  2016-10     2016-11     1       0       352.4590164
Ac  2016-10     2016-12     2       0       364.2076503
HW  2015-12     2015-12     0       7250    297.1311475
HW  2015-12     2016-01     1       0       614.0710382
HW  2015-12     2016-02     2       0       574.4535519
HW  2015-12     2016-03     3       0       614.0710382
HW  2015-12     2016-04     4       0       594.2622951
HW  2015-12     2016-05     5       0       614.0710382
HW  2015-12     2016-06     6       0       594.2622951
HW  2015-12     2016-07     7       0       614.0710382
HW  2015-12     2016-08     8       0       614.0710382
HW  2015-12     2016-09     9       0       594.2622951
HW  2015-12     2016-10     10      0       614.0710382
HW  2015-12     2016-11     11      0       594.2622951
HW  2015-12     2016-12     12      0       316.9398907    

I explain the result I want:

  1. Time2 is a new column which is the month equal and after Time.

  2. LapseMonth is also a new column which is equal to month(Time2) - month(Time). Thus, it generates as above.

  3. Column What I want is complicated.
    Look at row1(Time:2016-01, Time2:2016-01, LapseMonth:0, PM:7710, What I want:315.9836066)
    When Time = Time1, the formula of What I want is 15/366*PM, which PM=7710
    And the following is the day of that month/366*PM (i.e. row2:610.9016393 = 29/366*7710, riow3:653.0327869 = 31/366*7710)

  4. There are somethimg very special.
    Look at my last obs. of data: HW 2015-12 7250
    When Time is 2015, the result I need is different with Time in 2016.
    I know it is very complex, so I explain it in the following.

  5. When Time is 2015, no matter what month, LapseMonth is not the same when Time is 2016. LapseMonth must has 0~12 when Time in 2015.(see above)

  6. Column What I want is the same calculation when Time in 2016. However, there is one special case. When LapseMonth is equal to 12, the value of What I want is original value minus the value of What I want when LapseMonth=0. Here is weird.
    original value in this example is 614.0710382, because Dec has 31 days. Thus, 614.0710382 = PM*31/366 = 7250*31/366.
    The value of What I want when LapseMonth is equal to 12 is 614.0710382 - 297.1311475.

My question is how to calculate What I want when Time is 2015 & LapseMonth is 12.(316.9398907)

I add my code below. Since my data contains multiple excel, so I read it at one time:

library(readxl)
library(data.table)
library(lubridate)
file.list <- dir(path = "filename", pattern='\\.xlsx', full.names = T)
df.list <- lapply(file.list, read_excel)
df <- rbindlist(df.list)
df <-as.data.table(df)
df[,Time:=as.Date(Time,"%Y-%m")]   
df$Time <- ymd( paste( df$Time, 01, sep = "-"))
df[,Time:=as.Date(Time,"%Y/%m/%d")]
dtapp <- df[,rep:=ifelse(year(Time)==2016, 12-month(Time)+1, 13)][rep(1:.N,rep)]
dtapp[,LapseMonth := seq_len(.N)-1, by =.(Type,Time,PM) ]
dtapp[,Time2:=Time-days(mday(Time)-1)+months(LapseMonth)]
dtapp[,`What I want`:=ifelse(Time==Time2,PM*15/366,PM*days_in_month(Time2)/366)]
dtapp[,`PM`:=ifelse(Time==Time2,PM,0)]

Aucun commentaire:

Enregistrer un commentaire