dimanche 28 avril 2019

How to create new rows of data based on other rows

I have a dataframe (abund) similar to the dune and dune.env datasets used in the vegan package, except my first three columns summarise the sampleID and two methods used to collect the data. Data are the abundances of each species collected.

 SampleID   MethodA     MethodB     SpA    SpB    SpC   ...
 18001        A1          B1          0      3     4 
 18001        A1          B2          1      5     0
 18001        A2          B1          0      7     0          
 18001        A2          B2          0      11    0
 18002        A1          B1          4      1     0
 18002        A1          B2          0      0     3
 18002        A2          B1          0      0     0
 18002        A2          B2          0      8     2
 18003        A1          B1          0      9     0
 ....

I would like to create a new dataset (whole) based on this data, but with only SampleID and MethodA as row identifiers.

 SampleID   MethodA     MethodB     SpA    SpB    SpC   ...
 18001        A1          B3          1      8     4 
 18001        A2          B3          0      18    0          
 18002        A1          B3          4      1     3
 18002        A2          B3          0      8     2
 18003        A1          B3          0      9     1
 ....

An extra twist is that instead of just adding data from B1 + B2, I first want to multiply B1 by 15 (ie: B3 = 15*B1 + B2).

There are two problems that I have.

  1. multiplying only certain rows of data.

I tried using an if statement:

wholeCalc <- function(MethodB, multiplier=15){
  whole <- MethodB* multiplier
  if(MethodB= "B2") {
    whole <- whole/multiplier
  }
}

-> there were a bunch of errors which indicate that I am way off the mark!

  1. figuring out how to group the data depending on SampleID and MethodA.

I have tried multiply ways to group the data, with not much success.

aggregate(abund, 
          list(Group=replace(rownames(abund$MethodB),
                             rownames(abund$MethodB) 
                             %in% 
                               c("B1","B2"), 
                             "B3")), 
          sum)

-> I receive an error that arguments must be the same length (i.e. B1 and B2).

whole <- abund %>%
  group_by(SampleID, MethodA)
whole

-> this gives me the same dataset that I began with.

rbind(abund, 
      c(MethodB = "B3", 
        abund[abund$MethodB == "B1", -3] + 
          abund[abund$MethodB == "B2", -3])) 

-> this gives me an error because the number of rows for B1 and B2 do not match.

As you may see, I'm completely lost and need some help! I've been in the lab for the past couple of months and let my R skills get rusty.

Aucun commentaire:

Enregistrer un commentaire