vendredi 7 août 2015

Categorising numerical data based on ranges

New to R and stack exchange so go easy. I have had a fairly good search around and although I have found things that are close, I can't quite mold them into something that is working.

My data:

Line Coord Alt Gz 1 10 1 28.9 1348.71 2 10 4 22.9 1348.58 3 10 7 22.9 1348.45 4 10 10 22.4 1348.33 5 10 13 22.4 1348.20 6 10 16 26.8 1348.08

Basically, I want to split by line, and then group the coordinates into n length bins starting at the Coord_min. Once they are categorised I can split by that and do some statistics on a cell by cell basis.

I am trying to do it in dplyr, but struggling with the step of categorising. This is my workflow:

  1. group by line
  2. calculate y min, y max and define them as values
  3. input n spacing
  4. define the interval "int" as the ymin + i*n i.e. when i = 1 its min + 25, when its 2 its + 50
  5. categorise by if the point is less than int(1), otherwise int(2) and so on. The points are categorised by the "i" value
  6. Descriptive statistics for each category!

```{r} #df <- read.csv("Height_data.csv") #df2 <- df[complete.cases(df),]

#df2 %>%
  #group_by(Line) %>%


#I want to define these here
#coordmin <- min(coord) 
#coordmax <- max(coord) 
#n = 25 (User inputted)
#int[i] <- coordmin + i*n 

#mutate(df2, category = (ifelse(y < coordmin + int[i],) # if not first, the second until int[i] > (coordmax - coordmin)

  #group_by(Category) %>%

   #summarise_each(funs(median(.)))

```

Sorry for the vagueness, I am bashing my head against this a bit!

Aucun commentaire:

Enregistrer un commentaire