eCommerce Product Attribute Extraction

In order to create a universal and comprehensive extractor of product attributes where the number of kinds of attributes numbered in the thousands and the possible words and numbers that those attributes were expressed in was in the millions took many complex approaches. Before I started on the project, FindWatt was using manually created/maintained regular-expressions to find new attribute values. The approach I developed was using ngram-frequencies and word vectors to help classify text from the product title and descriptions. This was done both as supervised learning to expand the corpus, and to try to fill in missing attributes expected because of the product type.

I also combined the vectors with a clustering analysis to analyze for any new attribute types that we would need to add to our product-type attribute definitions.