dimanche 2 décembre 2018

Tapply based on if and else condition on this data frame R

I have this data frame data:

data

  Country  Year  Time.Unit GSA            Unit                  Name  Port.Name        Gear.L3 Species.scientific.name  Quantity
1      ITA 1972      Year GSA 17               GSA                 GSA 17                               Engraulis encrasicolus 7757.6660
2      ITA 1973      Year GSA 17               GSA                 GSA 17                               Engraulis encrasicolus 8102.5740
3      ITA 1974      Year GSA 17               GSA                 GSA 17                               Engraulis encrasicolus 8447.4820
4      ITA 1975      Year GSA 17               GSA                 GSA 17                               Engraulis encrasicolus 7795.0000
5      ITA 1976      Year GSA 17               GSA                 GSA 17                               Engraulis encrasicolus 7509.0000
6      HRV 1970      Year GSA 17               GSA                 GSA 17                Bottom trawls   Merluccius merluccius  322.0000
7      HRV 1971      Year GSA 17               GSA                 GSA 17                Bottom trawls   Merluccius merluccius  263.0000
8      HRV 1972      Year GSA 17               GSA                 GSA 17                Bottom trawls   Merluccius merluccius  342.0000
9      HRV 1973      Year GSA 17               GSA                 GSA 17                Bottom trawls   Merluccius merluccius  408.0000
10     ITA 1972      Year GSA 10            Region               CALABRIA  VIBO VALENTIA                    Sardina pilchardus   92.7310
11     ITA 1973      Year GSA 10            Region               CALABRIA  VIBO VALENTIA                    Sardina pilchardus  140.6450
12     ITA 1974      Year GSA 10            Region               CALABRIA  VIBO VALENTIA                    Sardina pilchardus  117.7287
13     ITA 1975      Year GSA 10            Region               CALABRIA  VIBO VALENTIA                    Sardina pilchardus  135.8510
14     ITA 1955      Year   <NA>            GSA 17                 Region        ABRUZZO                      Acipenser sturio    0.1000
15     ITA 1956      Year   <NA>            GSA 17                 Region        ABRUZZO                      Acipenser sturio    0.1000
16     ITA 1957      Year   <NA>            GSA 17                 Region        ABRUZZO                      Acipenser sturio    0.0000
17     ITA 1953      Year   <NA>            GSA 17                 Region EMILIA ROMAGNA                      Acipenser sturio    0.5450
18     ITA 1954      Year   <NA>            GSA 17                 Region EMILIA ROMAGNA                      Acipenser sturio    0.4000

dim(data)
18 10   



data[,1:9] <- apply(data[,1:9],2,as.character)

data$comb <- apply(data[,c(1,3:9)],1, function(x)paste0(x,collapse="_")) #possible combination

Now, I would like to apply some function (with tapply) to my data based on length condition for level of combination (data$comb), as:

if (tapply(data$Quantity,data$comb,length) <=3){
  tapply(data$Quantity,data$comb,mean)
} else {
  tapply(data$Quantity,data$comb,function (x) (x*1000000))

}

the code works but the result it's not correct:

$`HRV_Year_GSA 17_GSA_GSA 17__Bottom trawls_Merluccius merluccius`
[1] 3.22e+08 2.63e+08 3.42e+08 4.08e+08

$`ITA_Year_GSA 10_Region_CALABRIA_VIBO VALENTIA__Sardina pilchardus`
[1]  92731000 140645008 117728710 135851024

$`ITA_Year_GSA 17_GSA_GSA 17___Engraulis encrasicolus`
[1] 7757666000 8102574000 8447482000 7795000000 7509000000

$`ITA_Year_NA_GSA 17_Region_ABRUZZO__Acipenser sturio`
[1] 1e+05 1e+05 0e+00

$`ITA_Year_NA_GSA 17_Region_EMILIA ROMAGNA__Acipenser sturio`
[1] 545000 400000

Warning message:
In if (tapply(data$Quantity, data$comb, length) <= 3) { :
  the condition has length > 1 and only the first element will be used

Where I'm wrong? P.S. my true data frame has 100000 row and more then 3000 combination (data$comb). I would like to apply complex function (like lm, and glm and so on) based on the same condition of length. Maybe it's not the correct approach?

Aucun commentaire:

Enregistrer un commentaire