Git Product home page Git Product logo

boolean-retrieval-system's Introduction

Boolean Retrieval system

Project description

Boolean Retrieval Model is the first and most used classic information retrieval model. We make a few assumptions in the Boolean model, such as:

  • If the given index term is present in the document, the value is 1; otherwise, it is 0.
  • Queries in the Boolean model are AND, OR, NOT. They represent the following combinations
    • “X AND Y” represents a list of documents containing both X and Y.
    • “X OR Y” represents a list of documents containing either X or Y,
    • “NOT X” represents a list of documents not containing X.

Preprocessing steps:

  1. Converting all the document corpse data into lowercase.
  2. Removing stopwords to remove unnecessary computation while searching
  3. Stemming each word i.e.., removing suffixes and prefixes and reducing the word to its "stem".
  4. Building a permuterm and inverted index for the system to process the queries

Scope

This Boolean Retrieval model takes a query as an input from the user. It analyses the query and fetches us the inverted index array of the recognised word. The recognised words are spelling corrected and then stemmed before fetching us the inverted index. For wildcard search query, we get all the words from the permuterm index, and then we get the inverted index arrays for all the terms and perform OR operation. After getting inverted index arrays of all the words, it performs the boolean operations on the arrays given in the query.

Credits

This project has been developed as part of the course "Information Retrieval" taught by Dr. Bhanu Murthy at BITS Pilani Hyderabad Campus.

boolean-retrieval-system's People

Contributors

learningleopard avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.