jeudi 2 mai 2019

Conditionally remove middle character of string

I figured this would be an easy search, but I haven't been able to find an answer. Basically I have a column from one dataframe that contains characters without a zero between the first and third character (e.g. "A1") but my other dataframe contains the same variable, but with a pointless "zero" in the middle (e.g. "A01"). I'd like to rbind them, but need these character class values to be the same. I think I need an ifelse statement because there's other variables in the column that contain a second character (not a zero though).

Example Data

# Dataframe with no zeroes between the characters in column_A
set.seed(123)
df_nozero <- data.frame(column_A = c(rep("A1",5),rep("B10",5)), 
                        column_B = sample(0:100,10),stringsAsFactors = FALSE)
print(df_nozero)

   column_A column_B
1        A1       29
2        A1       78
3        A1       40
4        A1       86
5        A1       91
6       B10        4
7       B10       50
8       B10       83
9       B10       51
10      B10       42

# Dataframe with zeroes between the characters in column_A
set.seed(123)
df_zero <- data.frame(column_A =  c(rep("A01",5),rep("B10",5)),
                      column_B = sample(0:50,5), stringsAsFactors = FALSE)
print(df_zero)

   column_A column_B
1       A01       14
2       A01       39
3       A01       20
4       A01       42
5       A01       44
6       B10       14
7       B10       39
8       B10       20
9       B10       42
10      B10       44

Desired Output

   column_A column_B
1        A1       29
2        A1       78
3        A1       40
4        A1       86
5        A1       91
6       B10        4
7       B10       50
8       B10       83
9       B10       51
10      B10       42
11       A1       14
12       A1       39
13       A1       20
14       A1       42
15       A1       44
16      B10       14
17      B10       39
18      B10       20
19      B10       42
20      B10       44

Failed Attempts

df_corrected <- df_zero
df_corrected$column_A <- ifelse(substr(df_corrected$column_A,2,2)=="0","",df_corrected$column_A)
print(df_corrected)

   column_A column_B
1                 14
2                 39
3                 20
4                 42
5                 44
6       B10       14
7       B10       39
8       B10       20
9       B10       42
10      B10       44

df_corrected$column_A <- ifelse(substr(df_corrected$column_A,2,2)=="0",substr(df_corrected$column_A,1,3),df_corrected$column_A)
print(df_corrected)

   column_A column_B
1       A01       14
2       A01       39
3       A01       20
4       A01       42
5       A01       44
6       B10       14
7       B10       39
8       B10       20
9       B10       42
10      B10       44

If there was a way to only choose the first and third character in column_A, then I could easily replace the zero with only the first and third character.

Aucun commentaire:

Enregistrer un commentaire