mercredi 25 août 2021

Ignoring NAs in an Ifelse statement R to be applied over a list of dataframes R

I have a function which calculates the z score for values in multiple columns over a list of dataframes. A snippet of my dataframes below


df <- list(Al2O3 = structure(list(Determination_No = c(1, 2, 3, 4, 
5, 6, 7, 8, 9, 10), `2` = c(2.04, 2.07, 2.05, 2.07, 2.1, 2.08, 
NA, NA, NA, NA), `3` = c(2.08, 2.1, 2.08, 2.13, 2.1, 2.08, NA, 
NA, NA, NA), `4` = c(2.08, 2.08, 2.09, 2.06, 2.08, 2.07, 2.07, 
2.06, 2.08, 2.08), `5` = c(2.11, 2.09, 2.1, 2.08, 2.09, 2.09, 
NA, NA, NA, NA), `7` = c(2.06, 2.05, 2.04, 2.05, 2.04, 2.03, 
NA, NA, NA, NA), `8` = c(2.078, 2.065, 2.057, 2.063, 2.067, 2.066, 
NA, NA, NA, NA), `10` = c(2.191776681, 2.153987428, 2.153987428, 
2.097303548, 2.116198175, 2.116198175, NA, NA, NA, NA), `12` = c(2.24, 
2.08, 2.12, 2.15, 2.15, 2.15, NA, NA, NA, NA), `36` = c(2.07, 
2.082, 2.048, 2.046, 2.086, 2.069, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-10L)), As = structure(list(Determination_No = c(1, 2, 3, 4, 
5, 6, 7, 8, 9, 10), `2` = c(0.002, 0.001, 0.001, 0.001, 0.002, 
0.001, NA, NA, NA, NA), `3` = c(0.003, 0.002, 0.002, 0.002, 0.001, 
0.002, NA, NA, NA, NA), `4` = c(0.001, 0.002, 0.001, 0.002, 0.002, 
0.002, 0.001, 0.002, 0.002, 0.003), `5` = c(0.002, 0.001, 0.001, 
0.001, 0.001, 0.002, NA, NA, NA, NA), `7` = c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), `8` = c(NA, 0.001, NA, NA, NA, NA, NA, NA, NA, NA), 
    `10` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `12` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), `36` = c(0.0053, 0.0053, 0.0053, 
    0.00454, 0.0053, 0.0053, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-10L)), Ba = structure(list(Determination_No = c(1, 2, 3, 4, 
5, 6, 7, 8, 9, 10), `2` = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
    `3` = c(NA, NA, NA, NA, 0.001, NA, NA, NA, NA, NA), `4` = c(0.004, 
    0.003, 0.003, 0.004, 0.003, 0.002, 0.004, 0.002, 0.005, NA
    ), `5` = c(NA, NA, NA, NA, NA, 0.003, NA, NA, NA, NA), `7` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), `8` = c(0.002, 0.003, NA, 
    NA, NA, 0.002, NA, NA, NA, NA), `10` = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), `12` = c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), `36` = c(0.00089566, 0.00089566, 0.00089566, 0.00089566, 
    0.00089566, 0.00089566, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-10L))

My intent is for the function to go and calculate more statistics based on the Z score value. My challenge is that there is lots of NA in my dataframes. When I go to apply my if statements it does not work because of the NA values that are present. my function below

ZMax <- 3.5
FinalStats <- function(x,...){ 
  unlistdata <- unlist(x[-1])
  GrandMean <- mean(unlistdata,na.rm = T)
  GrandSD <- sd(unlistdata,na.rm=T)
  ZScore <- abs(((x[-1])-GrandMean)/GrandSD)

  if(ZScore > ZMax){
    LabMean <- mapply(mean, x[-1], na.rm = T) #Calculate Mean by columns
    SD.All <- unlist(x[-1])
    ConsensusValue <- mean(LabMean)
    Uncertainty <- sd(SD.All, na.rm = T)
  }else{ 
    LabMedian <- mapply(median, x[-1], na.rm = T) #Calculate Median by columns
    LabMedian[is.infinite(LabMedian)] <- NA #convert any Inf values to NA
    SD.All <- unlist(x[-1])
    ConsensusValue <- LabMedian
    Uncertainty <- sd(SD.All, na.rm = T)
  }

  FinalValues <- cbind(ConsensusValue,Uncertainty) #combined the desired Info
  
  return(FinalValues)
}

df.stats <- lapply(df,FinalStats)

How do I get the if statement to ignore NA values?

I have tried using the ifelse in base R in the following way below

FinalStats <- function(x,...){ 
  unlistdata <- unlist(x[-1])
  GrandMean <- mean(unlistdata,na.rm = T)
  GrandSD <- sd(unlistdata,na.rm=T)
  ZScore <- abs(((x[-1])-GrandMean)/GrandSD)
    
  ConsensusValue   <- ifelse((is.na(ZScore > ZMax)),
                        mean(mapply(mean, x[-1], na.rm = T)),
                       median(mapply(median,x[-1],na.rm=T))) 
  
  return(ConsensusValue)
} 

unfortunately my attempt to use the ifelse statement partially works on the first dataframe and only returns NA on the other two dataframes in my example.

The result I am looking for is a single value that is either the mean of each column mean or median of each column median. depending on the Z score what I am getting is a list of dataframes with the mean value (that looks correct) or a series of NAs

I have tried calculating the mean and median values outside the ifelse statement and then using the ifelse statement to choose which value I want but I get a dataframe of values rather than a single value. However, if I return either the mean or median that is calculated outside the ifelse then I get the correct result.

FinalStats <- function(x,...){ 
  unlistdata <- unlist(x[-1])
  GrandMean <- mean(unlistdata,na.rm = T)
  GrandSD <- sd(unlistdata,na.rm=T)
  ZScore <- abs(((x[-1])-GrandMean)/GrandSD)
  
  LabMean <- mean(mapply(mean, x[-1], na.rm = T),na.rm=T) #Calculate Mean by columns
  LabMedian <- median(mapply(median, x[-1], na.rm = T),na.rm = T) #Calculate Median by columns
  LabMedian[is.infinite(LabMedian)] <- NA #convert any Inf values to NA
    

  ConsensusValue <- ifelse(!is.na(ZScore > ZMax),
                       LabMean,
                      LabMedian)
  return(ConsensusValue)
}   
CatergoreisStats <- lapply(df,FinalStats) 

Aucun commentaire:

Enregistrer un commentaire