As a new R user I've been struggling with this problem for a while and could not figure it out on my own. Perhaps the answer is simple, and someone can help me. My challenge is that I have thousands of xm files in a folder and I want to extract the content of a specific node from each of them and save in a dataframe. The files, however, have repetition of the names for my node of interest. So I used numbers instead of names to extract the data I want.
for (i in (1 : length(file_list))) {
test.file<- file_list[i]
datax<-xmlParse(test.file) #enter the xml file name you want to analyze
data<-xmlToList(datax) #convert xml as a list
serial<-as.vector(unlist(data$.attrs[2]))
print(serial)
# Check if the xml file contains the node AuditRules
n <- ifelse(xml_find_all(test.file, "//AuditRules") == TRUE, 6, 5)
#Extract waveform values for the Current ECG srip
waveform <- as.vector(data[[n]][[3]][[1]][[1]][[2]])
waveform <- as.character(waveform)
waveform<-strsplit(waveform, split = " ")
waveform<-as.numeric(unlist(waveform))
waveform<-as.data.frame(waveform)
#Extract the serial number to be used as ID for the animal and create a column on the dataframe
serial<-as.vector(unlist(data$.attrs[2]))
serial<-as.factor(serial)
waveform$serial<-serial
#Extract date and time of Current ECG and save it as a column date
date<-as.vector(unlist(data$.attrs[n]))
date <- gsub("T", " ", date)
waveform$date <- as.POSIXct(date, format = "%Y-%m-%d %H:%M:%S", tz = 'Etc/GMT+5')
#Extract time offset [the first R-R interval from the Current ECG ]
offset <-as.vector(unlist(data[[n]][[3]][[1]][[1]][[1]][[1]]))
offset <- gsub("[a-zA-Z]+", "", offset)
waveform$offset <- offset
#Crate a column for voltage in mv using the amplitudeScaleFactor="0.000815"
#waveform$mv <- waveform$waveform*0.000815
#Create a column for time (sec) using the sampleInterval="PT0.0078125S"
#waveform$time <- as.numeric(waveform$offset)
# add a new column to old data.frame. Set value "offset" as the starting value for row 1.
# populate newcol with values starting from row 2.
#for (i in 4:nrow(waveform)){
# waveform[i,6] <- waveform[i-1,6] +0.0078125
# Write data to CSV
write.csv(waveform, paste0(data_export_dir,"/Savannah_", file_names[i],"_ECG.csv"))
}
My Problem: Some files have one extra node before the node of interest [5]. For those I would need to change the node of interest to [6] instead. My question: How could I change the above code to include a condition (presence or absence of the extra node) and alternate the use of [5] or [6] accordingly. I tried to add something like this to my loop, but it did not work:
for (i in (1 : length(file_list))) {
test.file<- file_list[i]
datax<-xmlParse(test.file)
data<-xmlToList(datax) #convert xml as a list
serial<-as.vector(unlist(data$.attrs[2]))
print(serial)
# Check if the xml file contains the node AuditRules
n <- ifelse(xml_find_all(test.file, "//AuditRules") == TRUE, 6, 5)
#Extract waveform values for the Current ECG srip
waveform <- as.vector(data[[n]][[3]][[1]][[1]][[2]])
waveform <- as.character(waveform)
waveform<-strsplit(waveform, split = " ")
waveform<-as.numeric(unlist(waveform))
waveform<-as.data.frame(waveform)
I would appreciate any help! Thanks in advance.
Aucun commentaire:
Enregistrer un commentaire