I have got the following code and I need to adjust it as the output is not 100% correct:
df = read.table(text='colA colB colC
10 11 7
10 34 7
10 89 7
10 21 7
9 8 0
9 11 0
9 21 0
2 23 5
2 21 5
2 56 5
1 45 0
1 23 0
22 14 3
22 19 3
22 90 3
11 19 2
11 45 2
80 13 2
80 17 2
99 15 1
55 20 1', header = TRUE)
df$ID <-NA
df[1,'ID']<-1
val=1
for (i in 2:nrow(df)){
if (df[i,'colC']!=0 & (df[i,'colC']!=df[i-1,'colC'])) {val=val+1}
df[i,'ID']<-val
}
Output as per above code:
> df
colA colB colC ID
1 10 11 7 1
2 10 34 7 1
3 10 89 7 1
4 10 21 7 1
5 9 8 0 1
6 9 11 0 1
7 9 21 0 1
8 2 23 5 2
9 2 21 5 2
10 2 56 5 2
11 1 45 0 2
12 1 23 0 2
13 22 14 3 3
14 22 19 3 3
15 22 90 3 3
16 11 19 2 4
17 11 45 2 4
18 80 13 2 4
19 80 17 2 4
20 99 15 1 5
21 55 20 1 5
What is wrong are the last 4 ID values, which should group the rows based on colC but because there are no 0s between each distinct group, colC values with same number but diverse colA are grouped together and this is not what I want.
Desired output (see last 4 ID values):
> df
colA colB colC ID
1 10 11 7 1
2 10 34 7 1
3 10 89 7 1
4 10 21 7 1
5 9 8 0 1
6 9 11 0 1
7 9 21 0 1
8 2 23 5 2
9 2 21 5 2
10 2 56 5 2
11 1 45 0 2
12 1 23 0 2
13 22 14 3 3
14 22 19 3 3
15 22 90 3 3
16 11 19 2 4
17 11 45 2 4
18 80 13 2 5
19 80 17 2 5
20 99 15 1 6
21 55 20 1 7
How can I solve this? Thanks
Aucun commentaire:
Enregistrer un commentaire