Whether listing products on your own site or uploading them to an online store like amazon or advertising them with google ads, it’s very important to accurately assign products into categories. This is necessary to identify what attributes the product data should have, how people will search and filter for it on the website, having ads perform well by optimally targeting the correct audience, etc.
The classification problem can be broken down in two types, mapping products to an existing taxonomy (such as Google’s or Amazon’s) or creating a new one. The first problem is generally easier. I created approaches textually comparing the product type discovered in the product title to the category node labels, and by measuring the semantic similarity (word-vector distances) of the product type to the average vectors of other products in the category.
For creating new taxonomies, I created different approaches. One used vectors and density based clustering into trees which became categories and nodes of the next taxomony. The other involved using the extracted structured product attributes and a custom subdivision by the characteristics that would most ideally divide each group of products into similarly sized subgroups (i.e. if either 99 products the same and 1 is different, or if all 100 are different, then it is not a very good way to divide the group, but if 50 products are similar to each and another 50 are also similar, then it’s a very good subdivision attribute)
