vendredi 24 février 2017

Using loops in R to identify unique cases varying on factor variable

I am still struggling with using if and while loops in real-world datasets. Below is a example dataset. My dataset includes customer IDs and where they purchase their coffee.

customers <- data.table(customer_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), store = c("starbucks", "peets", "coffee bean", "drnk", "starbucks", "coffee bean", "peets", "coffee bean", "drnk", "starbucks"))

What I would like to do is create a loop function that allows me to identify customers who are going elsewhere for their coffee. In this dataset, customer 1 is going to both starbucks and coffeebean.

What I did was next was assign a store_id to each shop in case my loop function will rely on numeric values. Starbucks is 1, Peets is 2, Coffee Bean is 3, and DRNK is 4.

customers <- data.table(customer_id = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), store = c("starbucks", "peets", "coffee bean", "drnk", "starbucks", "coffee bean", "peets", "coffee bean", "drnk", "starbucks"), store_id_value = c(1, 2, 3, 4, 1, 3, 2, 3, 4, 1))

In my loop function, I was hoping to do something like.. for each customer, if store_id_value of the first purchase is EQUAL to the store_id_value of their next purchase, then continue performing this function til the end. For customers who are purchasing coffee at different locations, this would return a false. Thus, I would like to create a column that shows these TRUEs and FALSEs instead of discontinuing the code.

Any suggestions on how to get this started? Any packages? Thanks for your help everyone!

Aucun commentaire:

Enregistrer un commentaire