A sample of my dataset: https://i.stack.imgur.com/AYLAg.png
I wish to recode a variable which I want to call nr-axspa. This variable is a diagnosis, a subform of spondyloarthritis, which is a rheumatic disease. The diagnosis can be inferred based on the classification criteria of the ASAS and New York criteria.
If the patients has 1 in ASAS but 0 in New York, then he/she has nr-axspa, otherwise not (in that case it is r-axspa). I recoded everyone with nr-axspa to "1", everyone without nr-axspa to "0" and some that have ASAS 0 and New York 0 to "2". This is the code I used:
df_nr_axspa <- mutate(df, nr_axspa = if_else(asas_criteria == 0 & new_york_criteria == 0, 2,
if_else(asas_criteria == 1 & new_york_criteria == 0, 1, 0)))
Interestingly enough, when I look at summary(df_nr_axspa$nr_axspa) I find that there are 1596 patients with a diagnosis. However, I would have expected there to be only 1434 cases.
When I create a 2x2 table of ASAS criteria and New York criteria, it gives me these numbers:
<table>
<tbody>
<tr>
<td> </td>
<td>New York</td>
<td> </td>
</tr>
<tr>
<td>ASAS</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>20</td>
<td>50</td>
</tr>
<tr>
<td>1</td>
<td>372</td>
<td>992</td>
</tr>
</tbody>
</table>
So according to this table, there should be 20 patients without a diagnosis or group "2", 372 patients with diagnosis "1" or "nr-axspa" and 1042 patients with "0" or "r-axspa".
However, the newly coded variable has a frequency of 372 for "1", 20 for "2" but 1204 for "0". So the group "1" and group "2" have been classified correctly, but of group "0" we have suddendly a surplus of 162 patients with this diagnosis.
The code I used to determine the frequencies of the newly coded variable
describe(df_nr_ax_spa$nr_ax_spa)
So I am trying to figure out what the hell happened. When I look at the data manually, I can't seem to find any mistake in the way the new variable is coded. Does anyone have an explanation?
Thanks in advance!
Aucun commentaire:
Enregistrer un commentaire