I created a tool in python to quickly analyze all the product titles in an eCommerce dataset for 20 kinds of problems and create a PDF summary report with examples and detailed Excel Report with all problems and suggested fixes.

Each analysis required a different approach from simple case checks, to advanced regular expressions, to heuristic fuzzy matching, to NLP and Word Vector parsing.
Overall the project required some complex architecture in involving multi-processing of custom functions with shared memory, automated EC2 instance creation and destruction, SQL and NoSQL databases handling millions of records with efficient indexing.