jeudi 21 mai 2020

Copying a substring to a string below, conditional on the contents of both strings

My data looks something like this:

A         toberevised
 8:                                        <NA>
 9:                                        <NA>
10:                           Number of returns
11:                     Number of joint returns
12:       Number with paid preparer's signature
13:                        Number of exemptions
14:             Adjusted gross income (AGI) [3]
14:             Adjusted gross income (AGI) [3]
**15:       Salaries and wages in AGI: [4] Number
16:                                      Amount
17:                   Taxable interest:  Number
18:                                      Amount
19:                 Ordinary dividends:  Number
20:                                      Amount**
21:                                        <NA>
22:                                        <NA>
23:                           Number of returns
24:                     Number of joint returns
25:       Number with paid preparer's signature
26:                        Number of exemptions

DF <- structure(list(toberevised = c("[Money amounts are in thousands of dollars]", 
NA, NA, NA, "Item", NA, NA, NA, NA, "Number of returns", "Number of joint returns", 
"Number with paid preparer's signature", "Number of exemptions", 
"Adjusted gross income (AGI) [3]", "Salaries and wages in AGI: [4] Number", 
"Amount", "Taxable interest:  Number", "Amount", "Ordinary dividends:  Number", 
"Amount")), row.names = c(NA, -20L), class = c("data.table", 
"data.frame"))

I would like to write a piece of code that copies the part before : in lines 15, 17 and 19 before Amount in the other lines, so:

 A        toberevised
 8:                                        <NA>
 9:                                        <NA>
10:                           Number of returns
11:                     Number of joint returns
12:       Number with paid preparer's signature
13:                        Number of exemptions
14:             Adjusted gross income (AGI) [3]
**15:       Salaries and wages in AGI: [4] Number
16:           Salaries and wages in AGI: Amount
17:                   Taxable interest:  Number
18:                    Taxable interest: Amount
19:                 Ordinary dividends:  Number
20:                Ordinary dividends:   Amount**
21:                                        <NA>
22:                                        <NA>
23:                           Number of returns
24:                     Number of joint returns
25:       Number with paid preparer's signature
26:                        Number of exemptions

I tried some pretty unwieldy solutions like copying the cells which have : to a new column, filling in that column and then trying to remove Number from that column after which I can concatenate the columns, after which I have to remove all the debree.

DF <- setDT(DF)[grepl(":", DF$toberevised), type:=toberevised]
DF$type <- na.locf(DF$type, na.rm=FALSE)
DF$type <- gsub("[[:punct:]]*Number[[:punct:]]*", "", DF$type)
DF$fullname <- paste(DF$type,DF$toberevised)

Apart from the fact that it does not work it is also a bit cumbersome.

What would be a better way of doing this? I was thinking about something that checks if one cell has : Number and the cell below has Amount paste the substring before : before the string below. But I have no idea how to write something like that..

Aucun commentaire:

Enregistrer un commentaire