lundi 15 juillet 2019

Condition-based distribution Sampling in R

I am working on a dataset for students to practice hypothesis tests. The data should contains fictional processing times to produce a construction equipment vehicle. The vehicle comes in different types and with different options that (might) influence the processing time. Based on the processing times and the machine specifications the students will investigate which factors contribute significantly to the processing times and predict the time required to produce a certain machine with a specific configuration.

The end goal for the dataset is to generate the total processing time per machine. In essence the (total) processing time should be an accumulation of a base time + Option 1 time + Option 2 time + option 3 time + etc….. Each option is to be randomly sampled from a distribution to not make it all too obvious. Only the total time will be provided to the students but I need the options time to construct the total time.

I know how to do random sampling with rnorm() and other distrubutions. But I don't know how to only generate data conditionally based on the content of the column.

The dataset looks something like this..

Machine                  <-   c(1,2,3,4,5,6,7,8,9,10)
Pump.Option              <-   c("30 Liter", "40 Liter", "30 Liter", "30 Liter", "30 Liter", "30 Liter", "50 Liter", "30 Liter", "30 Liter", "40 Liter")
Piping.Option            <-   c("No special piping", "No special piping", "special piping", "No special piping", "special piping", "No special piping", "No special piping", "special piping", "special piping", "No special piping")
Lights.Option            <-   c("Std light", "Std & Addional", "Std & Addional","Std & Addional", "Std & Addional", "Std & Addional", "Std light", "Std & Addional", "Std & Addional", "Std & Addional")
Valve.Option             <-   c("Safety valve", "Safety valve", "Normal valve", "Normal valve", "Safety valve", "Normal valve", "Safety valve", "Safety valve", "Normal valve", "Safety valve")
Pump.Time                <-   NA       
Piping.Time              <-   NA
Lights.Time              <-   NA
Valve.Time               <-   NA
Total.Time               <-   NA


DF.Sample                <- data.frame(Machine, Pump.Option, Piping.Option, Lights.Option, Valve.Option, Pump.Time, Piping.Time, Lights.Time, Valve.Time, Total.Time)

The times that needs to be generated are the Pump.Time, Piping.Time and Lights.Time based on the contents of the columns Pump.Option, Piping.Option and Lights.Option. these times will be used to calculate the total time for that machine.

The times for the options are something like this.

  • Pump.Time
    • 30 Liter (No additional time)
    • 40 Liter (10 minutes mean, 4 minutes standard deviation)
    • 50 Liter (20 minutes mean, 10 minutes standard deviation)
  • Piping.Time
    • No special piping ( No additional time)
    • Special piping (10 minutes mean, 4 minutes standard deviation)
  • Lights.Option
    • Std light ( No additional time)
    • Std & Additional ( 10 minutes mean, 4 minutes standard deviation)

Thank you very much!

Aucun commentaire:

Enregistrer un commentaire