Git Product home page Git Product logo

text-similarity's Introduction

Text Similarity using BM25 & WordNet

Prerequisite for running code

  1. Python 2.x - https://askubuntu.com/questions/101591/how-do-i-install-the-latest-python-2-7-x-or-3-x-on-ubuntu
  2. Numpy - pip install numpy
  3. Scipy - pip install scipy
  4. NLTK - pip install nltk
  5. Pattern - pip install pattern
  6. Sklearn - pip install sklearn

Command for running code

python execute.py

OUTPUT

alt text

Algorithms

Syntactic Similarity -

I used BM25(Best Matching) algorithm for syntactic similarity, it generates the similarity score between two sentences.

BM25 Algorithm -

bm25_score(CD,QD) = ๐šบ(i=1 to n) idf(qi)*(f(qi,CD)*k1+1)/(f(qi,CD)+k1*(1-b+(b*|CD|/avgdl)))
idf(t) = 1 + log(C/1+df(t))

Where,

    CD = corpus document, e.g.- list of all the answers
    QD = query document, e.g.- list of model answer
    idf(qi) = inverse document frequency (IDF) of the term qi in CD
    C = count of the total number of documents in CD
    df(t) = frequency of the number of documents in which the term t is present
    f(qi, CD) = frequency of the term qi in CD
    |CD| = total length of the CD
    avgdl = average document length of CD
    k1, b = Constants

Semantic Similarity -

I used NLTK's WordNet corpus for generating the semantic similarity score between two sentences. I used synsets function to get all the lexnames of a word then calulated the path similarity between words then took the maximum value among all the lexnames for a single word. After that I calculated the average of all scores for a single sentence and that is the value of semantic similarity score.

Final Score is the average of bm25_score and semantic_similarity_score.

text-similarity's People

Contributors

shubham16394 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

text-similarity's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.