Context
I'd like to build a two dummy survey dataframes for a project. One dataframe has responses to a Relationship
survey, and another to aPulse
survey.
Here are what each look like -
-
Relationship Dataframe
#Relationship Data rel_data= data.frame( TYPE=rep('Relationship',446), SURVEY_ID = rep('SURVEY 2018 Z662700',446), SITE_ID=rep('Z662700',446), START_DATE= rep(as.Date('2018-07-01'),446), END_DATE= rep(as.Date('2018-07-04'),446) )
-
Pulse Dataframe
#Pulse Data pulse_data= data.frame( TYPE=rep('Pulse',525), SURVEY_ID = rep('SURVEY 2018 W554800',525), SITE_ID=rep('W554800',525), START_DATE= rep(as.Date('2018-04-01'),525), END_DATE= rep(as.Date('2018-04-04'),525) )
My Objective
I'd like to add columns to each of these two dataframes, based on conditions from a reference table.
The reference table consists of the questions to be added to each of the two survey dataframes, along with further details on each question asked. This is what it looks like
-
Reference Table
#Reference Table - Question Bank qbank= data.frame(QUEST_ID=c('QR1','QR2','QR3','QR4','QR5','QP1','QP2','QP3','QP4','QP5','QP6'), QUEST_TYPE=c('Relationship','Relationship','Relationship','Relationship','Relationship', 'Pulse','Pulse','Pulse','Pulse','Pulse','Pulse'), SCALE=c('Preference','Satisfaction','Satisfaction','Satisfaction','Preference','NPS', 'Satisfaction','Satisfaction','Satisfaction','Preference','Open-Ended'), FOLLOWUP=c('No','No','No','No','No','No','Yes','No','Yes','No','No'))
The Steps
For each survey dataframe( Relationship & Pulse), I'd like to do the following -
1) Lookup their respective question codes in the reference table, and add only those questions to the dataframe. For example, the Relationship
dataframe would have only question codes pertaining to TYPE = 'Relationship'
from the reference table. And the same for the Pulse
dataframe.
2) The responses to each question would be conditionally added to each dataframe. Here are the conditions -
- If
SCALE = 'Preference'
in the Reference table, then responses would be either150,100,50,0 or -50
. Also, these numbers would be generated in any random order. - If
SCALE = 'NPS'
in the Reference table, then responses would range from0 to 10
. Numbers would be generated such that the Net Promoter Score (NPS) equals 50%. Reminder: NPS = Percentage of 9s & 10s minus Percentage of 0s to 6s. - If
SCALE = 'Satisfaction'
in the Reference table, then responses would range from1 (Extremely Dissatisfied) to 5 (Extremely Satisfied)
. Numbers would be generated such that the percentage of 1s & 2s equal 90%. - If
SCALE = 'Open-Ended'
in the Reference table, then ensure the column is empty (i.e. contains no responses).
My Attempt
Using this previously asked question for the conditional response creation and this one to add columns from the reference table, I attempted to solve the problem. But I haven't got what I was looking for yet.
Any inputs on this would be greatly appreciated
Desired Output
My desired output tables would look like this -
Relationship Dataframe Output
TYPE SURVEY_ID SITE_ID START_DATE END_DATE QR1 QR2 QR3 QR4 QR5
1 Relationship SURVEY 2018 Z662700 Z662700 2018-07-01 2018-07-04 150 5 1 2 2
2 Relationship SURVEY 2018 Z662700 Z662700 2018-07-01 2018-07-04 100 1 2 2 2
3 Relationship SURVEY 2018 Z662700 Z662700 2018-07-01 2018-07-04 100 4 5 2 2
4 Relationship SURVEY 2018 Z662700 Z662700 2018-07-01 2018-07-04 150 1 1 2 2
and so on
And the Pulse Dataframe Output
TYPE SURVEY_ID SITE_ID START_DATE END_DATE QP1 QP2 QP3 QP4 QP5 QP6
1 Pulse SURVEY 2018 W554800 W554800 2018-04-01 2018-04-04 7 1 3 3 100
2 Pulse SURVEY 2018 W554800 W554800 2018-04-01 2018-04-04 8 5 3 1 100
3 Pulse SURVEY 2018 W554800 W554800 2018-04-01 2018-04-04 3 1 4 3 100
4 Pulse SURVEY 2018 W554800 W554800 2018-04-01 2018-04-04 1 2 4 3 100
and so on
Aucun commentaire:
Enregistrer un commentaire