if-statement: R using if statements in apply instead of for loop

jeudi 1 mars 2018

R using if statements in apply instead of for loop

I am trying to go through each value in a data frame and based on that value extract information from another data frame. I have code that works for doing nested for loops but I am working with large datasets that run far too long for that to be feasible.

To simplify, I will provide sample data with initially only one row:

> ind_1 <- data.frame("V01" = "pp", "V02" = "pq", "V03" = "pq")
> ind_1
#  V01 V02 V03
#1 pp  pq  pq

I also have this data frame:

> stratum <- rep(c("A", "A", "B", "B", "C", "C"), 3)
> locus <- rep(c("V01", "V02", "V03"), each = 6)
> allele <- rep(c("p", "q"), 9)
> value <- rep(c(0.8, 0.2, 0.6, 0.4, 0.3, 0.7, 0.5, 0.5, 0.6), 2)
> df <- as.data.frame(cbind(stratum, locus, allele, value))
> head(df)
#   stratum locus allele value
#1        A   V01      p   0.8
#2        A   V01      q   0.2
#3        B   V01      p   0.6
#4        B   V01      q   0.4
#5        C   V01      p   0.3
#6        C   V01      q   0.7

There are two allele values for each locus and there are three values for stratum for every locus as well, thus there are six different values for each locus. The column name of ind_1 corresponds to the locus column in df. For each entry in ind_1, I want to return a list of values which are extracted from the value column in df based on the locus(column name in ind_1) and the data entry (pp or pq). For each entry in ind_1 there will be three returned values in the list, one for each of the stratum in df.

My attempted code is as follows:

library(dplyr)
library(magrittr)
pop.prob <- function(df, ind_1){
  p <-  df %>%
    filter( locus == colnames(ind_1), allele == "p") %>%
    select(value) %>%
    unlist() %>%
    as.numeric()
  if( ind_1 == "pp") {
    prob <- (2 * p * (1-p))
    return(prob)
  } else if ( ind_1 == "pq") {
    prob <- (p^2)
    return(prob)
  } 
}
test <- sapply(ind_1, function(x) {pop.prob(df, ind_1)} )

This code just provides null values. Ideally, I would have the following output:

> test
# $V01
# 0.32
# 0.48
# 0.42
#
# $V02
# 0.25
# 0.36
# 0.04
#
# $V03
# 0.16
# 0.49
# 0.25

I've been trying to figure out how to NOT use for loops in my code because it's not feasible for my data. Any help in figuring out how to do this for this simplified data set would be greatly appreciated. Once I do that I can work on applying this to a data frame such as ind_1 that has multiple rows

Thank you all, please let me know if the example data are not clear

if-statement

jeudi 1 mars 2018

R using if statements in apply instead of for loop

Aucun commentaire:

Enregistrer un commentaire