I am trying to go through each value in a data frame and based on that value extract information from another data frame. I have code that works for doing nested for loops but I am working with large datasets that run far too long for that to be feasible.
To simplify, I will provide sample data with initially only one row:
> ind_1 <- data.frame("V01" = "pp", "V02" = "pq", "V03" = "pq")
> ind_1
# V01 V02 V03
#1 pp pq pq
I also have this data frame:
> stratum <- rep(c("A", "A", "B", "B", "C", "C"), 3)
> locus <- rep(c("V01", "V02", "V03"), each = 6)
> allele <- rep(c("p", "q"), 9)
> value <- rep(c(0.8, 0.2, 0.6, 0.4, 0.3, 0.7, 0.5, 0.5, 0.6), 2)
> df <- as.data.frame(cbind(stratum, locus, allele, value))
> head(df)
# stratum locus allele value
#1 A V01 p 0.8
#2 A V01 q 0.2
#3 B V01 p 0.6
#4 B V01 q 0.4
#5 C V01 p 0.3
#6 C V01 q 0.7
There are two allele values for each locus and there are three values for stratum for every locus as well, thus there are six different values for each locus. The column name of ind_1
corresponds to the locus
column in df
. For each entry in ind_1
, I want to return a list of values which are extracted from the value column in df
based on the locus
(column name in ind_1
) and the data entry (pp
or pq
). For each entry in ind_1
there will be three returned values in the list, one for each of the stratum
in df
.
My attempted code is as follows:
library(dplyr)
library(magrittr)
pop.prob <- function(df, ind_1){
p <- df %>%
filter( locus == colnames(ind_1), allele == "p") %>%
select(value) %>%
unlist() %>%
as.numeric()
if( ind_1 == "pp") {
prob <- (2 * p * (1-p))
return(prob)
} else if ( ind_1 == "pq") {
prob <- (p^2)
return(prob)
}
}
test <- sapply(ind_1, function(x) {pop.prob(df, ind_1)} )
This code just provides null values. Ideally, I would have the following output:
> test
# $V01
# 0.32
# 0.48
# 0.42
#
# $V02
# 0.25
# 0.36
# 0.04
#
# $V03
# 0.16
# 0.49
# 0.25
I've been trying to figure out how to NOT use for loops in my code because it's not feasible for my data. Any help in figuring out how to do this for this simplified data set would be greatly appreciated. Once I do that I can work on applying this to a data frame such as ind_1
that has multiple rows
Thank you all, please let me know if the example data are not clear
Aucun commentaire:
Enregistrer un commentaire