jeudi 31 janvier 2019

R Create dummy datasets based on reference dataset

Context

I'd like to build a two dummy survey dataframes for a project. One dataframe has responses to a Relationship survey, and another to aPulse survey.

Here are what each look like -

  1. Relationship Dataframe

    #Relationship Data
    rel_data= data.frame(
                TYPE=rep('Relationship',446),
                SURVEY_ID = rep('SURVEY 2018 Z662700',446),
                SITE_ID=rep('Z662700',446),
                START_DATE= rep(as.Date('2018-07-01'),446),
                END_DATE= rep(as.Date('2018-07-04'),446)
                )
    
    
  2. Pulse Dataframe

    #Pulse Data
    pulse_data= data.frame(
                TYPE=rep('Pulse',525),
                SURVEY_ID = rep('SURVEY 2018 W554800',525),
                SITE_ID=rep('W554800',525),
                START_DATE= rep(as.Date('2018-04-01'),525),
                END_DATE= rep(as.Date('2018-04-04'),525)
                )
    
    

My Objective

I'd like to add columns to each of these two dataframes, based on conditions from a reference table.

The reference table consists of the questions to be added to each of the two survey dataframes, along with further details on each question asked. This is what it looks like

  1. Reference Table

    #Reference Table - Question Bank
    qbank= data.frame(QUEST_ID=c('QR1','QR2','QR3','QR4','QR5','QP1','QP2','QP3','QP4','QP5','QP6'),
                  QUEST_TYPE=c('Relationship','Relationship','Relationship','Relationship','Relationship',
                         'Pulse','Pulse','Pulse','Pulse','Pulse','Pulse'),
                  SCALE=c('Preference','Satisfaction','Satisfaction','Satisfaction','Preference','NPS',
                         'Satisfaction','Satisfaction','Satisfaction','Preference','Open-Ended'),
                  FOLLOWUP=c('No','No','No','No','No','No','Yes','No','Yes','No','No'))  
    
    

The Steps

For each survey dataframe( Relationship & Pulse), I'd like to do the following -

1) Lookup their respective question codes in the reference table, and add only those questions to the dataframe. For example, the Relationship dataframe would have only question codes pertaining to TYPE = 'Relationship' from the reference table. And the same for the Pulse dataframe.

2) The responses to each question would be conditionally added to each dataframe. Here are the conditions -

  • If SCALE = 'Preference' in the Reference table, then responses would be either 150,100,50,0 or -50. Also, these numbers would be generated in any random order.
  • If SCALE = 'NPS' in the Reference table, then responses would range from 0 to 10. Numbers would be generated such that the Net Promoter Score (NPS) equals 50%. Reminder: NPS = Percentage of 9s & 10s minus Percentage of 0s to 6s.
  • If SCALE = 'Satisfaction' in the Reference table, then responses would range from 1 (Extremely Dissatisfied) to 5 (Extremely Satisfied). Numbers would be generated such that the percentage of 1s & 2s equal 90%.
  • If SCALE = 'Open-Ended' in the Reference table, then ensure the column is empty (i.e. contains no responses).

My Attempt

Using this previously asked question for the conditional response creation and this one to add columns from the reference table, I attempted to solve the problem. But I haven't got what I was looking for yet.

Any inputs on this would be greatly appreciated

Desired Output

My desired output tables would look like this -

Relationship Dataframe Output

            TYPE            SURVEY_ID SITE_ID START_DATE   END_DATE QR1 QR2 QR3 QR4 QR5
1   Relationship SURVEY 2018 Z662700 Z662700 2018-07-01 2018-07-04 150   5   1   2   2
2   Relationship SURVEY 2018 Z662700 Z662700 2018-07-01 2018-07-04 100   1   2   2   2
3   Relationship SURVEY 2018 Z662700 Z662700 2018-07-01 2018-07-04 100   4   5   2   2
4   Relationship SURVEY 2018 Z662700 Z662700 2018-07-01 2018-07-04 150   1   1   2   2

and so on

And the Pulse Dataframe Output

     TYPE           SURVEY_ID SITE_ID START_DATE   END_DATE QP1 QP2 QP3 QP4 QP5 QP6
1   Pulse SURVEY 2018 W554800 W554800 2018-04-01 2018-04-04   7   1   3   3 100    
2   Pulse SURVEY 2018 W554800 W554800 2018-04-01 2018-04-04   8   5   3   1 100    
3   Pulse SURVEY 2018 W554800 W554800 2018-04-01 2018-04-04   3   1   4   3 100    
4   Pulse SURVEY 2018 W554800 W554800 2018-04-01 2018-04-04   1   2   4   3 100

and so on

Aucun commentaire:

Enregistrer un commentaire