jeudi 29 septembre 2016

R nested for multiple if loops to generate new vector

I have 20 workers doing 100 tasks each. I have generated the true answer for each task, which is 1 out of 5 answers by

answers <- c("liver", "blood", "lung", "brain", "heart")
truth <- sample(answers, no.tasks, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))

My dataSet contains the columns workerID, taskID, truth. Now I need to generate another vector where I am simulating what the worker will answer based on a certain probability. For example, if my truth for task 1, worker 1 is "liver", I want the worker 1 to answer "liver" for task 1 with a high probability. Similarly for each of the five answers for all the 2000 tasks, I want the workers answers. For that I am using the following for and if loops.

for (i in nrow(dataSet)){
if (dataSet$truth[i] == "liver")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.9, 0.02, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "blood")
{ 
df <-  (rep(sample(answers, no.tasks, prob = c(0.02, 0.9, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "lung")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.9, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "brain")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.9, 0.02), no.workers)))
} else if (dataSet$truth[i] == "heart")
{
df <-  (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.02, 0.9), no.workers)))
} else {
df <- (rep(sample(answers, no.tasks, prob = c(0.2, 0.2, 0.2, 0.2, 0.2), no.workers)))
}
}

But, since my truth for task 1 is brain, the output vector df has a lot of answers which are "brain". Can some one please hint as to what is going wrong here?

Aucun commentaire:

Enregistrer un commentaire