vendredi 11 janvier 2019

Using ntile from dplyr within an ifelse statement

I'm trying to segment some data using ntile (from dplyr) into 'n' equal buckets separately for negative and positive values in the same data.table column.

I'll demonstrate what I mean via a simple example:

require(data.table)
require(dplyr)
Buckets <- 3

Check <- data.table(a = sample(-30:30,30))
Check[a < 0,Test := ntile(a[a < 0],Buckets) * -1]
Check[a >= 0,Test := ntile(a[a >= 0],Buckets)]

When I perform a test to see whether the buckets are OK (i.e. not overlapping), you can see that it checks out:

Check[,range(a),by = Test][order(Test)]

I want to combine the above into one ifelse statement within the data.table "Check" as, in practice, I will be performing these calculations over multiple columns and the rows which are greater than or less than 0 will differ by column. I'd therefore prefer to operate solely on columns rather than subsetting by rows separately as per the original code and repeating it for each column.

When I try to do the following, it doesn't seem to identify the rows that I would have expected when calling the 'ntile' function:

Check[,Test := ifelse(a < 0,
                 ntile(a[a < 0],Buckets)*-1,
                 ntile(a[a >= 0],Buckets))]

Perform check again:

Check[,range(a),by = Test][order(Test)]

Which doesn't check out...

Can anyone please let me know what I'm missing and whether it's possible to utilise 'ifelse' here? Any other approaches are also welcome as I'm always keen to learn new things.

Any help would be greatly appreciated. Thanks

Aucun commentaire:

Enregistrer un commentaire