I have this data frame data
:
data
Country Year Time.Unit GSA Unit Name Port.Name Gear.L3 Species.scientific.name Quantity
1 ITA 1972 Year GSA 17 GSA GSA 17 Engraulis encrasicolus 7757.6660
2 ITA 1973 Year GSA 17 GSA GSA 17 Engraulis encrasicolus 8102.5740
3 ITA 1974 Year GSA 17 GSA GSA 17 Engraulis encrasicolus 8447.4820
4 ITA 1975 Year GSA 17 GSA GSA 17 Engraulis encrasicolus 7795.0000
5 ITA 1976 Year GSA 17 GSA GSA 17 Engraulis encrasicolus 7509.0000
6 HRV 1970 Year GSA 17 GSA GSA 17 Bottom trawls Merluccius merluccius 322.0000
7 HRV 1971 Year GSA 17 GSA GSA 17 Bottom trawls Merluccius merluccius 263.0000
8 HRV 1972 Year GSA 17 GSA GSA 17 Bottom trawls Merluccius merluccius 342.0000
9 HRV 1973 Year GSA 17 GSA GSA 17 Bottom trawls Merluccius merluccius 408.0000
10 ITA 1972 Year GSA 10 Region CALABRIA VIBO VALENTIA Sardina pilchardus 92.7310
11 ITA 1973 Year GSA 10 Region CALABRIA VIBO VALENTIA Sardina pilchardus 140.6450
12 ITA 1974 Year GSA 10 Region CALABRIA VIBO VALENTIA Sardina pilchardus 117.7287
13 ITA 1975 Year GSA 10 Region CALABRIA VIBO VALENTIA Sardina pilchardus 135.8510
14 ITA 1955 Year <NA> GSA 17 Region ABRUZZO Acipenser sturio 0.1000
15 ITA 1956 Year <NA> GSA 17 Region ABRUZZO Acipenser sturio 0.1000
16 ITA 1957 Year <NA> GSA 17 Region ABRUZZO Acipenser sturio 0.0000
17 ITA 1953 Year <NA> GSA 17 Region EMILIA ROMAGNA Acipenser sturio 0.5450
18 ITA 1954 Year <NA> GSA 17 Region EMILIA ROMAGNA Acipenser sturio 0.4000
dim(data)
18 10
data[,1:9] <- apply(data[,1:9],2,as.character)
data$comb <- apply(data[,c(1,3:9)],1, function(x)paste0(x,collapse="_")) #possible combination
Now, I would like to apply some function (with tapply
) to my data
based on length
condition for level of combination (data$comb
), as:
if (tapply(data$Quantity,data$comb,length) <=3){
tapply(data$Quantity,data$comb,mean)
} else {
tapply(data$Quantity,data$comb,function (x) (x*1000000))
}
the code works but the result it's not correct:
$`HRV_Year_GSA 17_GSA_GSA 17__Bottom trawls_Merluccius merluccius`
[1] 3.22e+08 2.63e+08 3.42e+08 4.08e+08
$`ITA_Year_GSA 10_Region_CALABRIA_VIBO VALENTIA__Sardina pilchardus`
[1] 92731000 140645008 117728710 135851024
$`ITA_Year_GSA 17_GSA_GSA 17___Engraulis encrasicolus`
[1] 7757666000 8102574000 8447482000 7795000000 7509000000
$`ITA_Year_NA_GSA 17_Region_ABRUZZO__Acipenser sturio`
[1] 1e+05 1e+05 0e+00
$`ITA_Year_NA_GSA 17_Region_EMILIA ROMAGNA__Acipenser sturio`
[1] 545000 400000
Warning message:
In if (tapply(data$Quantity, data$comb, length) <= 3) { :
the condition has length > 1 and only the first element will be used
Where I'm wrong? P.S. my true data frame has 100000 row and more then 3000 combination (data$comb). I would like to apply complex function (like lm, and glm and so on) based on the same condition of length. Maybe it's not the correct approach?
Aucun commentaire:
Enregistrer un commentaire