mercredi 30 janvier 2019

R Build dummy dataframes conditionally based on reference dataframe

Context

I'm trying to build a two dummy survey dataframes for a personal project. One dataframe contains responses to a Relationship survey, and the other to aPulse survey.

Here are what each look like -

  1. Relationship Dataframe

    #Relationship Data
    reldata= data.frame(
                TYPE=rep('Relationship',446),
                SURVEY_ID = rep('SURVEY 2018 AU662700',446),
                SITE_ID=rep('X662700',446),
                START_DATE= rep(as.Date('2018-07-01'),446),
                END_DATE= rep(as.Date('2018-07-04'),446)
                )
    
    
  2. Pulse Dataframe

    #Pulse Data
    pulsedata= data.frame(
                TYPE=rep('Pulse',525),
                SURVEY_ID = rep('SURVEY 2018 Y554800',525),
                SITE_ID=rep('Y554800',525),
                START_DATE= rep(as.Date('2018-04-01'),525),
                END_DATE= rep(as.Date('2018-04-04'),525)
                )
    
    

My Objective

I would like to add columns to each of these two dataframes, based on conditions from a reference table.

The reference table consists of the questions to be added to each of the two survey dataframes, along with further details on each question asked. This is what it looks like

  1. Reference Table

    #Reference Table - Question Bank
    qbank= data.frame(QUEST_ID=c('QR1','QR2','QR3','QR4','QR5','QP1','QP2','QP3','QP4','QP5','QP6'),
                  QUEST_TYPE=c('Relationship','Relationship','Relationship','Relationship','Relationship',
                         'Pulse','Pulse','Pulse','Pulse','Pulse','Pulse'),
                  SCALE=c('Preference','Satisfaction','Satisfaction','Satisfaction','Preference','NPS',
                         'Satisfaction','Satisfaction','Satisfaction','Preference','Open-Ended'),
                  FOLLOWUP=c('Yes','No','No','No','No','No','Yes','No','Yes','No','Yes'))  
    
    

The Steps

For each survey dataframe( Relationship & Pulse), I'd like to do the following -

1) Lookup their respective question codes in the reference table, and add only those questions to the dataframe. For example, the Relationship dataframe would have only question codes pertaining to TYPE = 'Relationship' from the reference table. And the same for the Pulse dataframe.

2) The responses to each question would be conditionally added to each dataframe. Here are the conditions -

  • If SCALE = 'Preference' in the Reference table, then responses would be either 150,100,50,0 or -50. Also, these numbers would be generated in any random order.
  • If SCALE = 'NPS' in the Reference table, then responses would range from 0 to 10. Numbers would be generated such that the Net Promoter Score (NPS) equals 50%. Reminder: NPS = Percentage of 9s & 10s minus Percentage of 0s to 6s.
  • If SCALE = 'Satisfaction' in the Reference table, then responses would range from 1 (Extremely Dissatisfied) to 5 (Extremely Satisfied). Numbers would be generated such that the percentage of 1s & 2s equal 90%.
  • If SCALE = 'Open-Ended' in the Reference table, then ensure the column is empty (i.e. contains no responses).
  • Finally, if FOLLOWUP = 'Yes' AND the response is either 1 or 2, add a blank column and call it QFOLLOWUP.

My Attempt

Using this previously asked question for the conditional response creation and this one to add columns from the reference table, I attempted to solve the problem. But I haven't got what I was looking for yet.

Any inputs on this would be greatly appreciated

Aucun commentaire:

Enregistrer un commentaire