jeudi 24 septembre 2020

data.table manipulations yielding unexpected results using ifelse()

I have a very long data.table consisting of claims and clients data:

head(claimsData)
                                   claimID   subCoverageKey       clientID amount     clDate                 subCoverageKeyClaims   product gender incurred censored
1: IP_00297141-5f49-4979-8f20-4d929bf42504 IP_Accommodation client-98655-2   2281 2018-04-09 ffdef3f3-2996-4d1a-bf51-a78b43029079 Product 1      M     2281     TRUE
2: IP_00297141-5f49-4979-8f20-4d929bf42504       IP_Upgrade client-98655-2   2281 2018-04-09 4390be79-dc1e-4f7a-a0c0-f548c0b9ffcb Product 1      M     2281     TRUE
3: IP_00297141-5f49-4979-8f20-4d929bf42504       IP_Upgrade client-98655-2   2281 2018-04-09 4390be79-dc1e-4f7a-a0c0-f548c0b9ffcb Product 1      M     2281     TRUE
4: IP_0032b73e-6371-40d6-a971-b8eed7943b94       Daily_cash client-47058-1   4805 2017-11-09 0a9ee55e-31b1-46f8-a0d4-91154e6c0998 Product 1      M 12011725     TRUE
5: IP_0032b73e-6371-40d6-a971-b8eed7943b94 IP_Accommodation client-47058-1   4805 2017-11-09 ffdef3f3-2996-4d1a-bf51-a78b43029079 Product 1      M     4805     TRUE
6: IP_0032b73e-6371-40d6-a971-b8eed7943b94       IP_Upgrade client-47058-1   4805 2017-11-09 4390be79-dc1e-4f7a-a0c0-f548c0b9ffcb Product 1      M     4805     TRUE

Now i want to aggregate the data by clientID and i want to flag those clients whose overall claims have been more than the predefine tresholdHCC:

  results <- claimsData[, .("NOclaims" = length(amount), "claimsAmount" = sum(amount),"claimsIncurred" = sum(incurred)), by = clientID
                                  ][, HCC := ifelse(claimsIncurred > tresholdHCC,1,0)
                                    ][ , claimsCappedHCC := as.numeric(claimsIncurred)][ HCC == 1, claimsCappedHCC := tresholdHCC]

I get an warning message:

Warning message: In [.data.table(claimsData[, .(NOclaims = length(amount), claimsAmount = sum(amount), : Coercing 'character' RHS to 'double' to match the type of the target column (column 6 named 'claimsCappedHCC').

and the results are not as expected, since client-47058-1 should have been flagged as HCC. What is going on?

results
               clientID NOclaims claimsAmount claimsIncurred HCC claimsCappedHCC
     1:  client-98655-2        9         9953        2600716   0         2600716
     2:  client-47058-1       30        41783       12291356   0        12291356
     3:  client-76465-1       33        74122       49882641   1           40000
     4: client-101595-2       24        69960       20643477   0        20643477
     5:  client-79164-3        3         2492        2078186   0         2078186

DATA:

structure(list(claimID = c("IP_00297141-5f49-4979-8f20-4d929bf42504", 
"IP_00297141-5f49-4979-8f20-4d929bf42504", "IP_00297141-5f49-4979-8f20-4d929bf42504", 
"IP_0032b73e-6371-40d6-a971-b8eed7943b94", "IP_0032b73e-6371-40d6-a971-b8eed7943b94", 
"IP_0032b73e-6371-40d6-a971-b8eed7943b94"), subCoverageKey = c("IP_Accommodation", 
"IP_Upgrade", "IP_Upgrade", "Daily_cash", "IP_Accommodation", 
"IP_Upgrade"), clientID = c("client-98655-2", "client-98655-2", 
"client-98655-2", "client-47058-1", "client-47058-1", "client-47058-1"
), amount = c(2281, 2281, 2281, 4804.69, 4804.69, 4804.69), clDate = structure(c(17630L, 
17630L, 17630L, 17479L, 17479L, 17479L), class = c("IDate", "Date"
)), subCoverageKeyClaims = c("ffdef3f3-2996-4d1a-bf51-a78b43029079", 
"4390be79-dc1e-4f7a-a0c0-f548c0b9ffcb", "4390be79-dc1e-4f7a-a0c0-f548c0b9ffcb", 
"0a9ee55e-31b1-46f8-a0d4-91154e6c0998", "ffdef3f3-2996-4d1a-bf51-a78b43029079", 
"4390be79-dc1e-4f7a-a0c0-f548c0b9ffcb"), product = c("Product 1", 
"Product 1", "Product 1", "Product 1", "Product 1", "Product 1"
), gender = c("M", "M", "M", "M", "M", "M"), incurred = c(2281, 
2281, 2281, 12011725, 4804.69, 4804.69), censored = c(TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE)), row.names = c(NA, -6L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7ffbf78114e0>)

Aucun commentaire:

Enregistrer un commentaire