Shoppers rely on Home Depot’s product authority to find and buy the latest products and to get timely solutions to their home improvement needs. From installing a new ceiling fan to remodeling an entire kitchen, with the click of a mouse or tap of the screen, customers expect the correct results to their queries – quickly. Speed, accuracy and delivering a frictionless customer experience are essential.
In this competition, Home Depot is asking Kagglers to help them improve their customers' shopping experience by developing a model that can accurately predict the relevance of search results.
Search relevancy is an implicit measure Home Depot uses to gauge how quickly they can get customers to the right products. Currently, human raters evaluate the impact of potential changes to their search algorithms, which is a slow and subjective process. By removing or minimizing human input in search relevance evaluation, Home Depot hopes to increase the number of iterations their team can perform on the current search algorithms.
This Project was first experience on working with text data. By working on this project, I got good grasp on working with text manipulation and building models with it.
- Lower casing
- Punctuation removal
- Stopwords removal
- Frequent words removal
- Spelling correction
- Tokenization
- Stemming
- Lemmatization
- Number of words
- Number of characters
- Average word length
- Number of stopwords
- Number of special characters
- Number of numerics
Here, 2 Algorithms are implemented.
- XGboost
- Feed Forward Neaural Network
H2O API was not working, so NN is implemented in R.
We achieved RMSE score of - 0.47
Please, Check out code. And your feedback is appriciated.