mardi 22 janvier 2019

Taking a list of individual numbers and getting a range without a common grouping variable

I am working on taking a very large data.frame with a several numbers for an individual. What I need to do is get the range of numbers for each individual. My data set is about 500,000 rows of 6 columns, and I have extra information attached to them I would like to carry over. I have provided a summarized version of my data.frame set up. Thanks in advance!

What I have is a column of ID's and a column of numbers (ex: 1-500). Each ID has a different number of numbers associated with it:

ID        Number    Group    Date

A          1          K      1-19-2019
A          2          K      1-19-2019
A          3          K      1-19-2019
A          4          K      1-19-2019
A          5          K      1-19-2019
A          6          K      1-19-2019
B          10         K      1-19-2019
B          11         K      1-19-2019
C          12         J      1-19-2019
C          13         J      1-19-2019 
C          14         J      1-19-2019
C          15         J      1-19-2019
C          16         J      1-19-2019
A          20         K      1-20-2019
A          21         K      1-20-2019
A          22         K      1-20-2019
A          23         K      1-20-2019

What I need:

ID    Min  Max   Group   Date
A     1    6      K      1-19-2019
A     20   23     K      1-19-2019
B     10   11     K      1-19-2019
C     12   16     J      1-20-2019

I have tried a few things including:

  • grouping in dplyr

    test <-data %>%
    group_by(ID)%>%
    top_n(n=1))
    
    
  • tapply and combining

    max<- tapply(data$Number, tags.I$ID, max)
    min<- tapply(data$Number, tags.I$ID, min)
    test2 <- full_join(min, max, by=ID)
    
    

What I get from test1 is the original data set. Test2 works, but misses out if there are ID repeats.

Aucun commentaire:

Enregistrer un commentaire