My data looks something like this:
A toberevised
8: <NA>
9: <NA>
10: Number of returns
11: Number of joint returns
12: Number with paid preparer's signature
13: Number of exemptions
14: Adjusted gross income (AGI) [3]
14: Adjusted gross income (AGI) [3]
**15: Salaries and wages in AGI: [4] Number
16: Amount
17: Taxable interest: Number
18: Amount
19: Ordinary dividends: Number
20: Amount**
21: <NA>
22: <NA>
23: Number of returns
24: Number of joint returns
25: Number with paid preparer's signature
26: Number of exemptions
DF <- structure(list(toberevised = c("[Money amounts are in thousands of dollars]",
NA, NA, NA, "Item", NA, NA, NA, NA, "Number of returns", "Number of joint returns",
"Number with paid preparer's signature", "Number of exemptions",
"Adjusted gross income (AGI) [3]", "Salaries and wages in AGI: [4] Number",
"Amount", "Taxable interest: Number", "Amount", "Ordinary dividends: Number",
"Amount")), row.names = c(NA, -20L), class = c("data.table",
"data.frame"))
I would like to write a piece of code that copies the part before : in lines 15, 17 and 19 before Amount in the other lines, so:
A toberevised
8: <NA>
9: <NA>
10: Number of returns
11: Number of joint returns
12: Number with paid preparer's signature
13: Number of exemptions
14: Adjusted gross income (AGI) [3]
**15: Salaries and wages in AGI: [4] Number
16: Salaries and wages in AGI: Amount
17: Taxable interest: Number
18: Taxable interest: Amount
19: Ordinary dividends: Number
20: Ordinary dividends: Amount**
21: <NA>
22: <NA>
23: Number of returns
24: Number of joint returns
25: Number with paid preparer's signature
26: Number of exemptions
I tried some pretty unwieldy solutions like copying the cells which have : to a new column, filling in that column and then trying to remove Number from that column after which I can concatenate the columns, after which I have to remove all the debree.
DF <- setDT(DF)[grepl(":", DF$toberevised), type:=toberevised]
DF$type <- na.locf(DF$type, na.rm=FALSE)
DF$type <- gsub("[[:punct:]]*Number[[:punct:]]*", "", DF$type)
DF$fullname <- paste(DF$type,DF$toberevised)
Apart from the fact that it does not work it is also a bit cumbersome.
What would be a better way of doing this? I was thinking about something that checks if one cell has : Number and the cell below has Amount paste the substring before : before the string below. But I have no idea how to write something like that..
Aucun commentaire:
Enregistrer un commentaire