I have a large data frame with names and a "classifying" variable named sequence. sequence tells about the position regarding the other rows. It has two values: first and additional. The problem is that the distribution of these values is not uniform, i.e., there's not an additional per every first, and every letters value is unique. The data frame looks like this (simplified version):
letters <- sample(LETTERS, 20)
sequence <- c("first","additional","first","first","first","first","first","additional","additional","additional","first","first","additional","first","additional","additional","first","additional","first","first")
df <- data.drame(sequence, letters)
Now, what I want to do is take every additional value in letters and paste it into its corresponding first value in letters. So, for example, the second (row) value in the letters column would be pasted into the first, since it's the corresponding additional. Further, the eigth, ninth and tenth values in letters should be pasted inside (next to) the seventh value of letters (e.g., first; additional; additional; additional).
I've tried the following with the obvious limitation that it only looks to the immediate next value,
library(dplyr)
df <- df %>% mutate(letters_ok = if_else(sequence == "additional",
paste(letters, lag(letters), sep = "; "), letters))
highlighting my problem: How do I manage to lag conditionally on the values in sequence, so that I can paste the values in letters according to the first or additional classification?
Since every letters value is unique and it's tied to a specific sequence value, I didn't use group_by. Evry other solution eludes my current knowledge of string/character wrangling, so I would very much appreciate any help.
Aucun commentaire:
Enregistrer un commentaire