mercredi 17 avril 2019

How to transform a column using a for loop

I am trying to transform Amazon product names into specific categories and replace the original values in my data frame. How do I do this?

I already have a regex code and for loop that is able to find and print the transformation, but I am having trouble replacing the original values in the column. I also have a nested for loop that seems to work, but it only transforms and replaces one of the categories (Kindle) correctly. I'm thinking my break conditions aren't working as I would like.

Code to define Categories:

fire = unique(grep('^[^Certified].*Fire TV', amz$name, value=TRUE))
kindle = unique(grep('^[^Certified]*Kindle', amz$name, value=TRUE))
echo = unique(grep('[^Certified].*Echo', amz$name, value=TRUE))
tap = unique(grep('[^Certified].*Tap', amz$name, value=TRUE))
tablet = unique(grep('^[^Certified].*Tablet', amz$name, value=TRUE))
refurb = unique(grep('^Certified', amz$name, value=TRUE))

Code to transform and print Categories:

for (x in amz$name){
        if(x %in% fire
        ){print('Fire TV')} else if(x %in% kindle
        ){print('Kindle')} else if(x %in% echo
        ){print('Echo')} else if(x %in% tap
        ){print('Tap')} else if(x %in% tablet
        ){print('Tablet')} else if(x %in% refurb
        ){print('Certified Refurbished')} else {
                                print('Misc')
                            }
                        }

Code attempting to replace original values:

for (i in 1:nrow(amz)){
    for (x in amz$name[i]){
        if(x %in% fire
            ){(amz$name[i] <- 'Fire TV') 
                break} else if(x %in% kindle
            ){(amz$name[i] <- 'Kindle') 
                break} else if(x %in% echo
            ){(amz$name[i] <- 'Echo') 
                break} else if(x %in% tap
            ){(amz$name[i] <- 'Tap') 
                break} else if(x %in% tablet
            ){(amz$name[i] <- 'Tablet') 
                break} else if(x %in% refurb
            ){(amz$name[i] <- 'Certified Refurbished') 
                break} else {(amz$name[i] <- 'Misc') 
                    break
        }
    }
}

In the inner loop, I expect that the code checks if x is in the first list, and if not, then it moves to the next one until it finds the list it belongs to and enters the category in amz$name[i]. Once it is found and inputted, I want the inner loop to break, and the outer loop to move to the second iteration, i=2. So far it is only getting it correct for the first category, the rest of the categories return NA. I should say that product at amz$name[1] is a Kindle Paperwhite. So it seems to be selectively categorizing Kindle products.

Aucun commentaire:

Enregistrer un commentaire