dimanche 16 août 2020

loop over 2 datasets in R to match the value of all rows from one dateset with only one column of another dateset

I am trying to write a loop in R to perform some iteration on two datasets called “datasetA” and “datasetB”. DatasetA has 600 entries and datasetB has 200’000 entries. For each entry in dataset A, I want to perform the following: If the value of “V2” in both datasets are equal, then calculate the ppm: [datasetA$V3-datasetBV3)/datasetA$V3]*1000000 If the ppm<|10|,then paste the ppm value in V4 column in datasetB, paste the relevant name of datasetA$V1 in column V1 of datasetB.

Say this is datasetA with 600 entries:

datasetA<- read.table(text='Alex    1   50.00042
John    1   60.000423
Janine    3   88.000123
Aline    3   117
Mark    2    79.9999')

DatasetA

and this is an example of "datasetB" with 200000 entries:

datasetB<- read.table(text='NA    1   50.0001    NA
NA    1   50.00032    NA
NA    2   70    NA
NA    2   80    NA
NA    3   88.0004    NA
NA    3   100    NA
NA    3   101    NA
NA    2    102    NA')

DatasetB

The final table should look like this:

datasetC<- read.table(text='Alex    1   50.0001    6.459945
Alex    1   50.00032    2.059983
NA    2   70    NA
Mark    2   80    -1.25
Janine    3   88.0004    -3.14772
NA    3   100    NA
NA    3   101    NA
NA    2    102    NA')

The final table should look like this

Many thanks for any hints and tips! :-)

Aucun commentaire:

Enregistrer un commentaire