I have been trying to subset data in a more nuanced way, to avoid excessive verbosity and redundancies. I find that I end up typing complicated conditions for subsetting, a method that is not feasible if I want to create multiple subsets of a dataframe.
Data:
> print(assm)
Speaker V1 POA V2
1 JF01 u tt U
2 JF01 o r a
3 JF01 o t a
4 JF01 a r u
5 JF01 e l i
6 JF01 a j o
7 JF01 e s o
8 JF01 u l i
9 JF01 a j i
10 JF01 i y a
11 JF01 o g i
12 JF01 u m O
13 JF01 u l E
14 JF01 a t o
15 JF01 o r u
16 JF01 a l e
17 JF01 u tt o
18 JF01 o r a
19 JF01 o t a
20 JF01 a r u
21 JF01 e l i
22 JF01 i y O
23 JF01 o r i
24 JF01 i l E
25 JF01 u k o
26 JF01 o n e
27 JF01 a t o
28 JF01 o r u
29 JF01 o r a
30 JF01 u m u
31 JF01 u l a
32 JF01 a t u
33 JF01 u tt o
34 JF01 o r a
35 JF01 o t a
36 JF01 a h e
37 JF01 u r e
38 JF01 o l i
39 JF01 i b o
40 JF01 o l e
41 JF01 e j u
42 JF01 a l e
43 JF01 u tt i
44 JF01 o t a
45 JF01 a r u
46 JF01 e l i
47 JF01 i y U
48 JF01 o r i
49 JF01 i l e
50 JF01 u k o
I used subset() and data[] to create 3 subsets with the following conditions:
assm <- subset(assm, V1==“a"| V1=="e"| V1=="E"| V1=="i"| V1=="o"| V1=="O"| V1=="u"| V1=="U", select = Speaker:V2)
assm <- subset(assm, V2==“a”| V2==“e”| V2=="E"| V2=="i"| V2=="o"| V2=="O"| V2=="u"| V2=="U", select = Speaker:V2)
Is there a more efficient way(using regex for subsetting, for example) to avoid all the hard-coding(like using a variable)? The subsetting condition for V1 and V2 are identical, but I ended up typing things out two times.
Using very rudimentary R, this is some more subsetting I did:
assm_h <- subset(assm, (V1=="o" & V2=="i")|
(V1=="e" & V2=="i")|
(V1=="u" & V2=="i")|
(V1=="e" & V2=="u")|
(V1=="o" & V2=="u")|
(V1=="u" & V2=="u"))
assm_nh <- subset(assm, (V1=="i" & V2=="O")|
(V1=="i" & V2=="E")|
(V1=="i" & V2=="U")|
(V1=="u" & V2=="O")|
(V1=="u" & V2=="E")|
(V1==“u" & V2=="U"))
I need one more subset(assm_neu), one that contains rows that do not match the conditions that subsets assm_h and assm_nh satisfy(i.e. the rest of the data), a task that requires even more painstaking coding. My method requires painstaking typing and multiple steps.
**is there a way to make this task more efficient, so that the subsetting returns not just what is asked, but also returns a subset of the data that does not satisfy the conditions? I have read multiple posts about functions and conditionals, but none that would help me extract and create more than one dataset with a single command. **
Thanks in advance.
Aucun commentaire:
Enregistrer un commentaire