I'm working with a dataset that has timestamps(POSIXct formatted) for every half hour (but there are duplicates for some time chunks, and not the same number of duplicates for each). There are three columns A, B1, B2 that note the location of a ship in a harbor. There are two more columns with the timestamp for arrival and departure of each ship. I want to carry forward the A's, B1's, and B2's until reaching the departure timestamp to actually denote how long the ships are present not just when they arrive.
Here's what the table looks like (with many many more rows... 2 million...)
. [Timestamp] [A] [B1] [B2] [Arrival] [Departure]
[1,] "2018-06-01 07:00:00" "NA" "B1" "NA" "2018-06-01 07:00:00" "2018-06-01 22:00:00"
[2,] "2018-06-01 07:30:00" "NA" "NA" "NA" "NA" "NA"
[3,] "2018-06-01 08:00:00" "A" "NA" "NA" "2018-06-01 08:00:00" "2018-06-01 17:00:00"
[4,] "2018-06-01 08:30:00" "NA" "NA" "NA" "NA" "NA"
[5,] "2018-06-01 09:00:00" "NA" "NA" "NA" "NA" "NA"
[6,] "2018-06-01 09:30:00" "NA" "NA" "NA" "NA" "NA"
[7,] "2018-06-01 10:00:00" "NA" "NA" "NA" "NA" "NA"
[8,] "2018-06-01 10:30:00" "NA" "NA" "NA" "NA" "NA"
Right now this is what I have for code to try to achieve this:
lastdate = 1
for(i in 1:length(loopdata$Timestamp))
{
if(i%%1000==0) print(i)
if(!is.na(loopdata$Arrival[i]))
{lastdate=i}
if(loopdata$Timestamp[i] >= loopdata$Arrival[lastdate] &
loopdata$Timestamp[i] <= loopdata$Departure[lastdate])
{loopdata[i,2:4]=loopdata[lastdate,2:4]}
}
The above code RUNS but it doesn't WORK. I usually stop it after 5,000 rows to check it (hence the print(i)) and there are no error messages. It carries forward the A's but is short by 1 hour (which I think is a daylight savings time issue and I just realized I can fix that with POSIXct...) but it won't carry forward the B1's or B2's. Is this because it resets again because the first B1 is so quickly followed by an A? Help! Thank you!
Aucun commentaire:
Enregistrer un commentaire