I am working on taking a very large data.frame with a several numbers for an individual. What I need to do is get the range of numbers for each individual. My data set is about 500,000 rows of 6 columns, and I have extra information attached to them I would like to carry over. I have provided a summarized version of my data.frame set up. Thanks in advance!
What I have is a column of ID's and a column of numbers (ex: 1-500). Each ID has a different number of numbers associated with it:
ID Number Group Date
A 1 K 1-19-2019
A 2 K 1-19-2019
A 3 K 1-19-2019
A 4 K 1-19-2019
A 5 K 1-19-2019
A 6 K 1-19-2019
B 10 K 1-19-2019
B 11 K 1-19-2019
C 12 J 1-19-2019
C 13 J 1-19-2019
C 14 J 1-19-2019
C 15 J 1-19-2019
C 16 J 1-19-2019
A 20 K 1-20-2019
A 21 K 1-20-2019
A 22 K 1-20-2019
A 23 K 1-20-2019
What I need:
ID Min Max Group Date
A 1 6 K 1-19-2019
A 20 23 K 1-19-2019
B 10 11 K 1-19-2019
C 12 16 J 1-20-2019
I have tried a few things including:
-
grouping in dplyr
test <-data %>% group_by(ID)%>% top_n(n=1)) -
tapply and combining
max<- tapply(data$Number, tags.I$ID, max) min<- tapply(data$Number, tags.I$ID, min) test2 <- full_join(min, max, by=ID)
What I get from test1 is the original data set. Test2 works, but misses out if there are ID repeats.
Aucun commentaire:
Enregistrer un commentaire