mardi 31 juillet 2018

R: subsetting data to create three new subsets- using subset(), function or if/then conditional

I have been trying to subset data in a more nuanced way, to avoid excessive verbosity and redundancies. I find that I end up typing complicated conditions for subsetting, a method that is not feasible if I want to create multiple subsets of a dataframe.

Data:

> print(assm)
Speaker V1 POA V2
1     JF01  u  tt  U
2     JF01  o   r  a
3     JF01  o   t  a
4     JF01  a   r  u
5     JF01  e   l  i
6     JF01  a   j  o
7     JF01  e   s  o
8     JF01  u   l  i
9     JF01  a   j  i
10    JF01  i   y  a
11    JF01  o   g  i
12    JF01  u   m  O
13    JF01  u   l  E
14    JF01  a   t  o
15    JF01  o   r  u
16    JF01  a   l  e
17    JF01  u  tt  o
18    JF01  o   r  a
19    JF01  o   t  a
20    JF01  a   r  u
21    JF01  e   l  i
22    JF01  i   y  O
23    JF01  o   r  i
24    JF01  i   l  E
25    JF01  u   k  o
26    JF01  o   n  e
27    JF01  a   t  o
28    JF01  o   r  u
29    JF01  o   r  a
30    JF01  u   m  u
31    JF01  u   l  a
32    JF01  a   t  u
33    JF01  u  tt  o
34    JF01  o   r  a
35    JF01  o   t  a
36    JF01  a   h  e
37    JF01  u   r  e
38    JF01  o   l  i
39    JF01  i   b  o
40    JF01  o   l  e
41    JF01  e   j  u
42    JF01  a   l  e
43    JF01  u  tt  i
44    JF01  o   t  a
45    JF01  a   r  u
46    JF01  e   l  i
47    JF01  i   y  U
48    JF01  o   r  i
49    JF01  i   l  e
50    JF01  u   k  o 

I used subset() and data[] to create 3 subsets with the following conditions:

assm <- subset(assm, V1==“a"| V1=="e"| V1=="E"| V1=="i"| V1=="o"| V1=="O"| V1=="u"| V1=="U", select = Speaker:V2)
assm <- subset(assm, V2==“a”| V2==“e”| V2=="E"| V2=="i"| V2=="o"| V2=="O"| V2=="u"| V2=="U", select = Speaker:V2)

Is there a more efficient way(using regex for subsetting, for example) to avoid all the hard-coding(like using a variable)? The subsetting condition for V1 and V2 are identical, but I ended up typing things out two times.

Using very rudimentary R, this is some more subsetting I did:

assm_h <- subset(assm, (V1=="o" & V2=="i")|
                     (V1=="e" & V2=="i")|
                     (V1=="u" & V2=="i")|
                     (V1=="e" & V2=="u")|
                     (V1=="o" & V2=="u")|
                     (V1=="u" & V2=="u"))

assm_nh <- subset(assm, (V1=="i" & V2=="O")|
                      (V1=="i" & V2=="E")|
                      (V1=="i" & V2=="U")|
                      (V1=="u" & V2=="O")|
                      (V1=="u" & V2=="E")|
                      (V1==“u" & V2=="U"))

I need one more subset(assm_neu), one that contains rows that do not match the conditions that subsets assm_h and assm_nh satisfy(i.e. the rest of the data), a task that requires even more painstaking coding. My method requires painstaking typing and multiple steps.

**is there a way to make this task more efficient, so that the subsetting returns not just what is asked, but also returns a subset of the data that does not satisfy the conditions? I have read multiple posts about functions and conditionals, but none that would help me extract and create more than one dataset with a single command. **

Thanks in advance.

Aucun commentaire:

Enregistrer un commentaire