I am trying to transform Amazon product names into specific categories and replace the original values in my data frame. How do I do this?
I already have a regex code and for loop that is able to find and print the transformation, but I am having trouble replacing the original values in the column. I also have a nested for loop that seems to work, but it only transforms and replaces one of the categories (Kindle) correctly. I'm thinking my break conditions aren't working as I would like.
Code to define Categories:
fire = unique(grep('^[^Certified].*Fire TV', amz$name, value=TRUE))
kindle = unique(grep('^[^Certified]*Kindle', amz$name, value=TRUE))
echo = unique(grep('[^Certified].*Echo', amz$name, value=TRUE))
tap = unique(grep('[^Certified].*Tap', amz$name, value=TRUE))
tablet = unique(grep('^[^Certified].*Tablet', amz$name, value=TRUE))
refurb = unique(grep('^Certified', amz$name, value=TRUE))
Code to transform and print Categories:
for (x in amz$name){
if(x %in% fire
){print('Fire TV')} else if(x %in% kindle
){print('Kindle')} else if(x %in% echo
){print('Echo')} else if(x %in% tap
){print('Tap')} else if(x %in% tablet
){print('Tablet')} else if(x %in% refurb
){print('Certified Refurbished')} else {
print('Misc')
}
}
Code attempting to replace original values:
for (i in 1:nrow(amz)){
for (x in amz$name[i]){
if(x %in% fire
){(amz$name[i] <- 'Fire TV')
break} else if(x %in% kindle
){(amz$name[i] <- 'Kindle')
break} else if(x %in% echo
){(amz$name[i] <- 'Echo')
break} else if(x %in% tap
){(amz$name[i] <- 'Tap')
break} else if(x %in% tablet
){(amz$name[i] <- 'Tablet')
break} else if(x %in% refurb
){(amz$name[i] <- 'Certified Refurbished')
break} else {(amz$name[i] <- 'Misc')
break
}
}
}
In the inner loop, I expect that the code checks if x is in the first list, and if not, then it moves to the next one until it finds the list it belongs to and enters the category in amz$name[i]. Once it is found and inputted, I want the inner loop to break, and the outer loop to move to the second iteration, i=2. So far it is only getting it correct for the first category, the rest of the categories return NA. I should say that product at amz$name[1] is a Kindle Paperwhite. So it seems to be selectively categorizing Kindle products.
Aucun commentaire:
Enregistrer un commentaire