mardi 23 février 2016

VWAP across levels of an orderbook

Im trying to write a code in R for Volume Weighted Average Price across different levels(depths) of an order book.I want to do it upto level 5 but without hard coding the level(depth) of the book. I am using a data set with about 500,000 rows and 62 variables. I have written code to do exactly what I want but with if statements. The code is following:

BVWAP = function(file, level = 5){
  whole_data<- read.csv(file = file,header = FALSE,sep = "",col.names = c("DateTime","Seq","BP1","BQ1","BO1","AP1","AQ1","AO1","BP2","BQ2","BO2","AP2","AQ2","AO2","BP3","BQ3","BO3","AP3","AQ3","AO3","BP4","BQ4","BO4","AP4","AQ4","AO4","BP5","BQ5","BO5","AP5","AQ5","AO5","BP6","BQ6","BO6","AP6","AQ6","AO6","BP7","BQ7","BO7","AP7","AQ7","AO7","BP8","BQ8","BO8","AP8","AQ8","AO8","BP9","BQ9","BO9","AP9","AQ9","AO9","BP10","BQ10","BO10","AP10","AQ10","AO10"))
  whole_data<- whole_data[which(whole_data$DateTime != 0),]
  whole_data$DateTime= as.POSIXct(whole_data$DateTime/(10^9), origin="1970-01-01")    #timestamp conversion 
  completecase<- whole_data[complete.cases(whole_data),]
  attach(completecase)
  if(level == 5){
    B = data.frame(DateTime=completecase$DateTime, WAP = ((BP1*BQ1)+(BP2*BQ2)+(BP3*BQ3)+(BP4*BQ4)+(BP5*BQ5))/(BQ1+BQ2+BQ3+BQ4+BQ5))
  }
  if(level == 4){
    B = data.frame(DateTime=completecase$DateTime, WAP = ((BP1*BQ1)+(BP2*BQ2)+(BP3*BQ3)+(BP4*BQ4))/(BQ1+BQ2+BQ3+BQ4))
  }
  if(level == 3){
    B = data.frame(DateTime=completecase$DateTime, WAP = ((BP1*BQ1)+(BP2*BQ2)+(BP3*BQ3)+(BP4*BQ4))/(BQ1+BQ2+BQ3))
  }
  if(level == 2){
    B = data.frame(DateTime=completecase$DateTime, WAP = ((BP1*BQ1)+(BP2*BQ2))/(BQ1+BQ2))
  }
  B
}

Now I know multiple if statements slows it down pretty significantly and that is exactly what I need help with. How do I write this code using a for loop or something in that line? How do I loop it across the columns? What would be a more efficient/faster way to get to where I want? Any kinda help will be greatly appreciated.

Also, since I am working with pretty large data sets what would be best way to read the file with using as little RAM as possible? Because running this code a couple of different times slows down my system quite significantly. Any suggestions on what function I should use to optimize RAM usage?

Let me know if any other information is needed.

Formula for VWAP is as follows:

(Bid*Volume+Bid2*Volume2...Bidn*Volumen)/(Volume1+Volume2...Volumen)

Aucun commentaire:

Enregistrer un commentaire