I have a data set of different vessels in different regions. The data output i get notes the name of vessel, type (e.g. fishing/cargo) and the time it entered the zone, time it left and it duration in the zone/ The DOS is simply the distance offshore - or the zone i am looking at.
My issue is that fishing vessels often do transects and on a single day will enter and exit the zone multiple times a day and thus will be noted multiple times in my report output.
I would like to consolidate the fishing ship data such that if a ship of the same name (only for type: fishing) is noted more than once per-day, all but one account is removed. For simplicity maybe just looking at the "First seen in zone date" as I think it can get more complicated when that particular duration spans multiple days (i can come back to that thought later).
Dummy data:
df <- structure(list(Name = structure(c(1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 8L,
8L, 9L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I"
), class = "factor"), Type = structure(c(2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 1L, 2L), .Label = c("Cargo", "Fishing"), class = "factor"),
`First seen inside` = structure(c(1556385360, 1556393640,
1556002200, 1556260260, 1556518860, 1556136660, 1556278500,
1556285820, 1556391480, 1556509620, 1556319480, 1556214120,
1556235600, 1556325540, 1556326920, 1556329500, 1556330220,
1556330580, 1556330880, 1556330940, 1556332980, 1556339880,
1556340900, 1556344140, 1556344500, 1556345220, 1556346420,
1556348220, 1556348520, 1556350860, 1556351460, 1556356620,
1556360220, 1556365920, 1556366520, 1556367180, 1556076420,
1556166900, 1556154840, 1556454900, 1556291220), class = c("POSIXct",
"POSIXt"), tzone = ""), `Last seen inside` = structure(c(34L,
35L, 1L, 8L, 38L, 3L, 7L, 9L, 36L, 38L, 27L, 4L, 5L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L,
23L, 24L, 25L, 26L, 28L, 29L, 30L, 31L, 32L, 33L, 2L, 6L,
37L, 38L, 38L), .Label = c("4/23/2019 14:27", "4/24/2019 21:23",
"4/25/2019 00:00", "4/25/2019 10:47", "4/25/2019 16:59",
"4/25/2019 23:49", "4/26/2019 05:17", "4/26/2019 13:39",
"4/26/2019 15:12", "4/26/2019 17:54", "4/26/2019 18:05",
"4/26/2019 18:51", "4/26/2019 19:00", "4/26/2019 19:06",
"4/26/2019 19:08", "4/26/2019 19:13", "4/26/2019 21:24",
"4/26/2019 21:38", "4/26/2019 22:02", "4/26/2019 22:51",
"4/26/2019 22:55", "4/26/2019 23:22", "4/26/2019 23:51",
"4/27/2019 00:00", "4/27/2019 00:36", "4/27/2019 00:42",
"4/27/2019 01:17", "4/27/2019 02:06", "4/27/2019 03:11",
"4/27/2019 04:30", "4/27/2019 05:00", "4/27/2019 05:03",
"4/27/2019 05:13", "4/27/2019 10:29", "4/27/2019 12:42",
"4/27/2019 17:21", "4/28/2019 03:47", "4/29/2019 09:56"), class =
"factor"),
`Time in zone` = structure(c(5L, 31L, 6L, 7L, 2L, 3L, 23L,
30L, 26L, 4L, 32L, 27L, 9L, 8L, 22L, 28L, 22L, 22L, 1L, 24L,
15L, 1L, 29L, 18L, 1L, 8L, 17L, 22L, 19L, 16L, 14L, 25L,
13L, 31L, 16L, 1L, 12L, 10L, 21L, 11L, 20L), .Label = c("",
"10h 35m", "10h 49m", "13h 9m", "13m", "14h 37m", "14h 8m",
"15m", "19m", "1d 2h 14m", "1d 4h 21m", "1d 56m", "1h 13m",
"1h 15m", "1h 41m", "1m", "24m", "2m", "34m", "3d 1h 49m",
"3d 9h 33m", "3m", "42m", "4m", "54m", "5h 23m", "5m", "6m",
"7m", "8h 35m", "8m", "9h 19m"), class = "factor"), DOS =
structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0-12", class =
"factor")), row.names = c(NA,
-41L), class = "data.frame")
So if for example in my dummy data set:
As ship "A" is a "Fishing" Vessel in DOS 0-12 and it occurs twice on the 27th of April, i would like to reduce the data input to one record - if possible a sum of the total "time in zone" and the "last seen inside" would be transferred to the mutated data, would be great - but if that is too complex, not too worry. So Ship A would show only:
"Name" "Type" "First seen inside" "Last seen inside" "Time in zone" "DOS A Fishing 4/27/2019 10:16 4/27/2019 12:42 21m 0-12
But i would happy with the just reducing it to one of the rows, and the last seen and time in zone does not have to be corrected if that is too much.
For ship C, as it is a cargo ship i do not want to treat it in the same way as fishing and I would like to keep all of the documented data even when there is multiple documentations per day
For ship E as it is present on three different days I would like there to be three data entries of it...
I hope that makes some sense? I am not sure if this is a possible filter option on dplyr or mutate based on multiplications of the same day? Any suggestions on how to manage this "problem" would be great... or perhaps i need to do some manual work on the data set :(
Aucun commentaire:
Enregistrer un commentaire