Git Product home page Git Product logo

quora-ques's Introduction

quora-ques

Summary of Finding similarity in Quora Questions set.

As a first Kaggle assignment, we started with reading up a top-rated Notebook [https://www.kaggle.com/anokas/data-analysis-xgboost-starter-0-35460-lb] by anokas. The Notebook analyzed some basic properties of given data such as data size; max true positives etc max sentence size. word frequency tfidf words shared between the questions. etc.

Before applying a neural network to this problem, we thought of applying more basic techniques like Step 1: Machine Learning algorithms like Decision Trees, Random Forest, boosting algorithms. Step 2: Logistic Regression Step 3: RNN

However, step 1 and step 2 required extracting features. Two features we ended up using are word_share_match and tfidf_word_share_match. Using these two features, Logistic regression and Decision Trees, Random Forest, boosting algorithms were implemented.

Gradient Boosting leaderboard score was 0.35535.

Later RNN was applied using Dual encoder LSTM. Interestingly, the when model was being fitted on training data, accuracy was increasing while even logloss was increasing. Why accuracy can increase when logloss is also increasing.? (arjun please comment)

Current state:

We could not calculate/submit accuracy of NN on the test set because we are facing technical issues for the submission.

Understing ROC_AUC (just for information)

Learnt metric such as ROC_AUC accuracy What is ROC: It is plot of the True Positive Rate (on Y-axis) and False Positive Rate (on X-axis) for every positive classification threshold. True Positive Rate = # of true positive / all positives. False Positive Rate = # of false positive / all negatives. http://www.dataschool.io/roc-curves-and-auc-explained/ - very good explaination.

AUC score tells us how good is the classifier. AUC score of 0.5 means very poor classifier (equivalent to random guessing). AUC score of 1 means best classifier.

quora-ques's People

Contributors

naik-amey avatar arjunjauhari avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar  avatar

Forkers

naik-amey

quora-ques's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.