I have the following problem: I need to run each subset of a dataframe -based on the value of a variable- creating a new entries for another variable depending on 2 conditions.
The dataframe (dt3) is as follows: I have 4 variables (birth_year, last name –Name-, role in the household -role- and household -hh-). The whole set is divided or subsetted by the hh variable, which gathers all the individuals under the same household. For instances, in my example bellow, the first 4 rows belong to the household “1”. Also, under the variable role, only the head of the household is stated. The rest of roles are empty and must be derived, and this is what I’m trying to do. My first step is to assign the roles of "children". I was thinking of doing it by running a loop over the whole data set and over each subset (each hh value). Whenever each line has a person who has the same last name as the head of the household and whose birth year is at least 15 years later than the head’s, then this person is inferred as “children”.
The original dataframe is:
birth_year Name role hh
1877 Snijders Head ofhousehold 1
1885 Marteen NA 1
1897 Snijders NA 1
1892 Zelstra NA 1
1878 Kuipers Head of household 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe Head of household 3
1905 Flachs NA 3
1920 Lippe NA 3
1922 Lippe NA 3
So, I need to run the whole set and each hh subset and perform the following two conditions: a. If the person’s name == the name of the head, and b. If the birth year of the person has a difference of 15 years or more with the head´s
Then this person is “children”.
So far I´ve been trying several things. As I’m placing the head role in the first row of each household then I was doing this:
a) Nested loop, where I try to run the data set and then each hh. For each hh I run the conditions (by comparing each row’s name and birth year with those of the first line of the hh –the head-)
for (n in 1:unique(dt3$hh)){
for (i in 1:length(which(dt3$hh==n)) ){
mutate(dt3, role = ifelse( dt3$Name[[1,2]] == dt3$Name[[n,1]]
& dt3$birth_year[[n,i]] > dt3$birth_year[[n,1]], "children","NoA"))
}
}
Also b), I have tried to do the same, but with lists. I first Split dt3 by means of the hh variable
dt3 <- split(dt3, f = dt3$hh)
And then
for (n in 1:dt3){
mutate(dt3, role = ifelse( dt3$name [[n,i]] == dt3$name[[n,1]] &
dt3$birth_year[[n,i]] > dt3$birth_year[[n,1]],"children","NoA"))
}
What I was expecting is an outcomelike this:
birth_year Name role hh
1877 Snijders Head ofhousehold 1
1885 Marteen NA 1
1897 Snijders children 1
1892 Zelstra NA 1
1878 Kuipers Head of household 2
1870 Marteen NA 2
1897 Wals NA 2
1900 Venstra NA 2
1900 Lippe Head of household 3
1905 Flachs NA 3
1920 Lippe children 3
1922 Lippe children 3
Any tips will be welkom.
Thank you in advance
Aucun commentaire:
Enregistrer un commentaire