dimanche 1 juillet 2018

Creating a new variable that indicates a specific condition using two preexisting variables in a dataframe.

I have an individual-level dataset with demographic information of each person. It also provides a unique household id along with other variables:

id     if_adult (>18 yrs old)     marital_status
1          1                       Single
1          1                       Single
2          1                       Married
2          1                       Married
2          0                       Married

Each household has at least one adult who is single or two adults who are either married or single. Some households also have children. I am trying to create a dummy variable called "unmarried couple" that will correctly categorize a household that has exactly two single adults. Obviously, there are duplicate rows with the same household id so I want each to be labeled correctly. Currently, the code I have is:

individual_data$`unmarried couple` <- ifelse((individual_data$if_adult == 
"1" & individual_data$id == individual_data$id) & 
individual_data$marital_status == "Single", "1","0")

But this incorrectly categorizes the single-person led households (i.e. single moms and single dads with children) as being unmarried couples. This is key - if I can figure this out then it will be accurate. To rectify this issue, I am attempting to create a new variable that indicates the total number of adults per household:

id     if_adult (>18 yrs old)     marital_status   total_adults
1          1                       Single          2
1          1                       Single          2
2          1                       Married         2
2          1                       Married         2
2          0                       Married         2

Then create my desired variable by filtering out the single-led households and setting the condition as having at least two adults

individual_data$`unmarried couple` <- ifelse((individual_data$total_adults 
== 2 & individual_data$id == individual_data$id) & 
individual_data$marital_status == "Single", "1","0")

I ultimately want it to look like this and for the rest of the data:

id     if_adult     marital_status   total_adults  unmarried couple  
1          1           Single          2             1
1          1           Single          2             1
2          1           Married         2             0    
2          1           Married         2             0
2          0           Married         2             0

Thanks in advance for the feedback and suggestions

Aucun commentaire:

Enregistrer un commentaire