dimanche 2 septembre 2018

R sampling with if statement and similar number of sample

I need to to create a sample from my dataframe and to do so I am using the code bellow.

 name <- sample(c("Adam","John","Henry","Mike"),100,rep = TRUE)
 area <- sample(c("run","develop","test"),100,rep = TRUE)
 id <- sample(100:200,100,rep = FALSE)

 mydata <- as.data.frame(cbind(id,area,name))


qcsample <- mydata %>%
  group_by(area) %>% 
  nest() %>%            
  mutate(n = c(20, 15, 15)) %>% 
  mutate(samp = map2(data, n, sample_n)) %>% 
  select(area, samp) %>%
  unnest()

Now, I am getting these results.

table(qcsample$area) 

develop     run    test 
     15      15      20 

--

table(qcsample$name)

Adam Henry  John  Mike 

    9     9    16    16 

I would like to create a sample that would have more or less the same number of samples for each name eg. Adam - 12, Henry - 12, John - 13, Mike - 13. How can I achieve that ? can I somehow request that the sample is equally distributed ?

Also, in this example I used function

sample_n

and specified number of samples.

I am anticipating that sometimes there will not be required number from a given group. In my example I am taking 20 samples from area called "test" but sometimes there will be only let's say 10 rows containing "test". The total number is 50 so I need to make sure if there are only 10 "test" the code has to automatically increase the others, so the sample would be "test" - 10, "run" - 20 and "develop" - 20. This can happen to any of the area so I need to test if there is enough rows to create the sample and increase other areas. If there is only 1 it can be added to any of the remaining areas or if the difference is 3 we add 1 to one area and 2 to the another one.

How could I check that taking into account all the possibilities ? I believe there are eight permutations in this case.

Thanks in advance A.

Aucun commentaire:

Enregistrer un commentaire