jeudi 8 mars 2018

R replacing NA's by referencing parent item's value

I am trying to replicate an excel expression that I have for filling in missing Topic information using an Index and Match on two values.

The columns that are relevant are Topic (factor), Post.ID (int) and Parent.ID (int).

Each row will have a post id, but the parent.id is populated if the post is a child of another post and is the same id as the parent's post id. The hierarchy is only ever one level. I am trying to add a topic to the child by matching the Post.ID against the Parent.ID and taking the parent's Topic.

Here is the excel formula:

=IF([Topic]=0,IFERROR(IF([Parent ID]=0,[Topic],(INDEX([Topic]$[StartCell]:[Topic]$[EndCell], MATCH([Parent ID],[Post ID]$[StartCell]:[Post ID]$[EndCell],0))))),[Topic]),[Topic])

I've tried the following ifelse function and it works, except it converts the factors to ints.

ifelse(is.na(df$Topic), 
         df$Topic[df$Post.id %in% df$Parent.id], 
         df$Topic)

I tried converting the exact thing to if_else, but i get the following error

Evaluation error: `true` must be length 1995 (length of `condition`) or one, not 429.

I also tried creating a separate dataframe of the distinct Topics and Post ID, renaming the post ID and then doing a left_join, but that is really clunky and I know there must be a cleaner way to do it.

Aucun commentaire:

Enregistrer un commentaire