vendredi 23 octobre 2020

Trying to calculate # of females/males individually from a 'gender' character field in a df in R

 genderTable <- table(churnData$gender)
     genderVector <- churnData$gender
     head(genderVector)
     if (genderVector == 'Female')  {
       churnData_gentotal <- mutate(.data=churnData,gentotal=genderTable[1])  
      }else {  
       churnData_gentotal <- mutate(.data=churnData,gentotal=genderTable[2])  } %>%
    group_by(gender,Churn)   %>%
    summarise(Count=n(),Proportion=Count/churnData_gentotal$gentotal)
#    head(churnData$gender)
    head(churnData_gentotal$gentotal)
    head(churnData_gentotal)

results are not correct:

"Female" "Male"   "Male"   "Male"   "Female" "Female"
>      if (genderVector == 'Female')  {
+        churnData_gentotal <- mutate(.data=churnData,gentotal=genderTable[1])  
+       }else {  
+        churnData_gentotal <- mutate(.data=churnData,gentotal=genderTable[2])  } %>%
+     group_by(gender,Churn)   %>%
+     summarise(Count=n(),Proportion=Count/churnData_gentotal$gentotal)
Warning message:
In if (genderVector == "Female") { :
  the condition has length > 1 and only the first element will be used
> #    head(churnData$gender)
>     head(churnData_gentotal$gentotal)
Female Female Female Female Female Female 
  3488   3488   3488   3488   3488   3488 
>     head(churnData_gentotal)
# A tibble: 6 x 22
  customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines
  <chr>      <chr>          <int> <chr>   <chr>       <int> <chr>        <chr>        
1 7590-VHVEG Female             0 Yes     No              1 No           No phone ser~
2 5575-GNVDE Male               0 No      No             34 Yes          No           
3 3668-QPYBK Male               0 No      No              2 Yes          No           
4 7795-CFOCW Male               0 No      No             45 No           No phone ser~
5 9237-HQITU Female             0 No      No              2 Yes          No           
6 9305-CDSKC Female             0 No      No              8 Yes          Yes          
# ... with 14 more variables: InternetService <chr>, OnlineSecurity <chr>,
#   OnlineBackup <chr>, DeviceProtection <chr>, TechSupport <chr>, StreamingTV <chr>,
#   StreamingMovies <chr>, Contract <chr>, PaperlessBilling <chr>, PaymentMethod <chr>,
#   MonthlyCharges <dbl>, TotalCharges <dbl>, Churn <chr>, gentotal <int>
> length(churnData_gentotal)
[1] 22
> length(genderVector)
[1] 7043
> 

I suspect the if condition is not correct because of the warning issued but at a total loss of what to do to correct it. Please help

Aucun commentaire:

Enregistrer un commentaire