I am looking to select a subset of my data based on 2 conditions:
Firstly, here is my data:
Gene AreaID Label
DNAJC12 rs1111111 unlikely
HERC4 rs1111111 unlikely
RP11-57G10.8 rs2222222 possible
RPL12P8 rs1111111 unlikely
SIRT1 rs3333333 certain
RP11-57G10.8 rs3333333 possible
RPL12P8 rs3333333 unlikely
SIRT1 rs3333333 unlikely
I am looking to subset this to select the genes with an 'unlikely' label and if they have the same area ID. However, the ID must also not be present for any other genes with any other label.
So for example my output would only select this:
Gene AreaID Label
DNAJC12 rs1111111 unlikely
HERC4 rs1111111 unlikely
RPL12P8 rs1111111 unlikely
and not include the rs333333 area ID which has unlikely with duplicate IDs but also has genes of different labels.
I have tried based on reading similar questions on here, but this does not seems to work:
loci <- read.csv('dataset.csv')
sub_list <- lapply(1:length(loci), function(i) loci %>% filter(loci$AreaID==duplicated(loci) & loci$Label =='unlikely'))
do.call(rbind, sub_list)
I have also tried:
prediction_snps = loci$AreaID[loci$label == 'unlikely']
result = loci[prediction_snps, ]
I am not sure how else to approach this as I am new to R, currently
Aucun commentaire:
Enregistrer un commentaire