I have a programming question in R that I've been looking for a solution to but can't seem to find online. I am working with a data set that tracks the location of hospital providers over time. So there is a unique ID identifying the staff and various time stamps with associated locations. I have a distance matrix that describes how far certain locations (A through E) are to each other and am interested in aggregating this across the entire data set (calculating total distance walked).
My distance matrix is a 5x5 matrix describing movement from Section X to Section Y, where X and Y belong to the subset of Sections A through E. The following describes the distance matrix:
dmatrix=matrix(c(1, 2, 2, 3, 4, 2, 1, 2, 3, 4, 2, 2, 1, 2, 3, 3, 3, 2, 1, 2, 4, 4, 3, 2, 1), nrow=5, ncol=5)
colnames(dmatrix)<-c("sectionA", "sectionB", "sectionC", "sectionD", "sectionE")
rownames(dmatrix)<-c("sectionA", "sectionB", "sectionC", "sectionD", "sectionE")
My data frame is a 5 million line data set that contains the locations where all staff members are by the second. The objective is to calculate the total distance walked on any given day (by reading from the distance matrix above) per staff ID.
I am able to successfully do this by sub-setting the data by date AND ID by using the following code:
jun10<-subset(dataframe,format(dataframe$st,'%m/%d')=='06/10')
jun1013<-jun10[jun10$id==13,]
jun1013[with(jun1013, order(st)),]
for(i in 1:nrow(jun1013))
{
jun1013$distance[i]=dmatrix[cbind(as.character(jun1013$section[i-1]),as.character(jun1013$section[i]))]
}
sum(jun1013$distance, na.rm=TRUE)
Note: "dataframe" is the dataframe name, dataframe$st is the POSIXlt time stamp data, in this case I am extracting all entries from June 10th, and then all entries on June 10th for ID number 13. This code gives me the total distance walked on June 10th by Staff #13.
How do I do this for all of the data? I've tried to create if loops to do this by date and by ID but I keep getting errors or NAs in the fields.
The perfect solution would go through the data, and calculate the total distance walked similar to what I've done above, and then output to a matrix where each column describes a unique day (like June 10, 11, 12, etc.) and the rows are the unique IDs working on a given day. The entries in the matrix would be the sums describing total distance walked. Note that the number of staff working on each day differ.
Not sure if this is even possible, I've been stuck on this for over a week now - any help, insight, or advice would be tremendously helpful - thank you!!
Aucun commentaire:
Enregistrer un commentaire