if-statement: Remove rows that have inexact duplicates in R

samedi 29 juin 2019

Remove rows that have inexact duplicates in R

I have some sales data where mistakes recorded at the point of sale are corrected afterward and the data set still contains records for the initial mistake then a duplicate of the mistake but with a negative price value. However, there are multiple duplicate lines for some sales which are valid and must be retained.

DATE MODEL TYPE COUNT PRICE WEIGHT TOTAL ABS_COUNT ABS_WEIGHT ABS_TOTAL replicate 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 1 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 2 20140211 JBL B 1 6.5 27 175.5 1 27 175.5 1 20140211 JBL A 1 4 11 44 1 11 44 1 20140211 JBL B 1 11.2 44 492.8 1 44 492.8 1 20140211 JBL B 1 6.5 27 175.5 1 27 175.5 2 20140211 JBL B 1 11.2 44 492.8 1 44 492.8 2 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 3 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 4 20140211 JBL B -1 -11.2 44 -492.8 1 44 492.8 3 20140211 JBL B 1 10.9 82 893.8 1 82 893.8 1 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 5 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 6 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 7 20140211 JBL B 1 11.2 44 492.8 1 44 492.8 4 20140211 JBL A 1 3.2 15 48 1 15 48 1 20140211 JBL B 1 11.2 44 492.8 1 44 492.8 5 20140211 JBL B 1 11.2 44 492.8 1 44 492.8 6 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 8 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 9 20140211 JBL B 1 11.2 104 1164.8 1 104 1164.8 1 20140211 JBL A -1 4.5 -15 -67.5 1 15 67.5 10 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 11 20140211 JBL A 1 4.5 15 67.5 1 15 67.5 12 20140211 JBL B 1 11.2 44 492.8 1 44 492.8 7

What I have done is calculated the abs() for each of the COUNT, WEIGHT, and TOTAL columns then counted the number of replicates. I am now trying to figure out how to remove the negative observations as well as the corresponding duplicate where column replicate=n-1

test$ABS_COUNT <- abs(test$COUNT) test$ABS_WEIGHT <- abs(test$WEIGHT) test$ABS_TOTAL <- abs(test$TOTAL)

test2 <- test %>%

dplyr::group_by(DATE, MODEL, TYPE, PRICE, ABS_COUNT, ABS_WEIGHT, ABS_TOTAL) %>% dplyr::mutate(replicate=seq(n()))%>% ungroup()

if-statement

samedi 29 juin 2019

Remove rows that have inexact duplicates in R

Aucun commentaire:

Enregistrer un commentaire