Git Product home page Git Product logo

intelligent_document_finder's Introduction

Intelligent Document Finder 2.0



A tool which can find your any document using semantic search.

This is an Improvised Version of Intelligent-Document-Finder
List of New Features--

  1. Implemented Document Similarity Script, which allows you to see related or most similar documents.
  2. Revamped website UI.
  3. Reduces time complexities of search functions.

What is Intelligent Document Finder ?

How easy do you find it to remember the exact location of a document that you created last year? Not very easy, right? Big Organizations/people deal with hundreds of documents daily and forget about them, most of the time.
But what if we want that old documentation again for some work, but unfortunately you do not remember the name or the actual content of that document to retrieve it from the large storage of your computer.
In such cases, use of a Intelligent document finder can really make a huge difference. As, it can Search for the document(semantically) of your need based on a query input. This will not only help in faster access to the document, but will also help in grouping similar documents together and in analysing them.

Watch Project Demo:

Watch Demo

Note

Currently this repositry is using predefined database of news articles gathered by web scraping. Due to the github restrictions on uploading the large files, we cannot upload it here.

Soon, we will add the support of the dynamic databases, so that you can use this tool for your own databases to build your own custom search engine.

Technologies Used

Python3.6 JavaScript jQuery HTML & CSS

Database Used:

SQlite

For implementing searching:

Various NLP(Natural Language Processing) techniques is used.

For website:

  • Python-based Web framework : Flask
  • JavaScript
  • jQuery

Program Flow

Trulli

Compatibility

  • Backend (AI part) is compatible on any machine that has python and required dependencies installed.
  • Recommended browsers: Mozilla Firefox and Google Chrome.

How to Install and Use ?

> mkdir IntelligentDocumentFinder

> cd IntelligentDocumentFinder

> git clone https://github.com/Sarthakjain1206/Intelligent_Document_Finder_2.0.git

Install Vitual Environment if not installed

  • On Linux/MacOs > python3 -m pip install --user virtualenv
  • On windows > py -m pip install --user virtualenv

Create Virtual Environment

  • On macOS and Linux: > python3 -m venv env
  • On Windows: > py -m venv env

Activate Environment:

  • On macOS and Linux: > source env/bin/activate
  • On Windows: > .\env\Scripts\activate

> pip install -r requirements.txt

Download Glove Word Embeddings from this link, decompress it and copy the glove.6B.100d file in DataBase folder

then, run initial_file.py through this command > python initial_file.py

Now you are good to go.. Just type this command everytime you want to access it, and open the website in chrome/firefox
> python src/app.py

Developers

You can get in touch with us on linkedln profiles


Sarthak Jain Machine Learning NLP Web Crawling

Foo

You can also follow me on Github to stay updated about my latest projects Foo

Rishabh Mishra Full Stack Web Developer

Foo

You can also follow me on Github to stay updated about my latest projects Foo

If you liked this repository, then do support it by giving it a star

Contributions

If you find any bug or have any suggestions to improve this project, then feel free to generate a pull request.

There are a lot of features that can be added to this tool.

  1. Query Segmentation
  2. Query Expansion (Mainly - Pseudo Relevance Feedback technique)
  3. Improvising Spell Checker
  4. Collocations For example- Currently this project consider "New York" as ["New","York"] i.e two different words but it should be consider as a single entity like ["New_York"], this can definitely make a big difference in search results.
  5. Query Logs (Game changing technique for search engines)
  6. Search result's segmentation [like- Luecene]

If you have any experience in implementing any of these features then, do contribue.

References

  1. Awsome article of BM25 ranking algorithm on wikipedia - Okapi BM25

  2. Read this article on Topic Modeling

  3. Completely followed this beautiful article on SVOs tagging for generating tags for this project.

  4. Used the BM25 ranking fuction implementation from this great repositry on github by dorianbrown.

intelligent_document_finder's People

Contributors

rishabhm74 avatar sarthakjain1206 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

intelligent_document_finder's Issues

ModuleNotFoundError: No module named 'smart_open.compression'

Thank you for this wonderful github repository, i am struggling in reproducing your results, but when i run this command
python src/app.py it gives the error of ModuleNotFoundError: No module named 'smart_open.compression
below is the complete error details which i am confronting.

Traceback (most recent call last):
  File "/home/ehsan-bi/Downloads/Intelligent_Document_Finder-master/src/app.py", line 17, in <module>
    from gensim.parsing.preprocessing import STOPWORDS
  File "/home/ehsan-bi/DF-USF/lib/python3.9/site-packages/gensim/__init__.py", line 11, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401
  File "/home/ehsan-bi/DF-USF/lib/python3.9/site-packages/gensim/models/__init__.py", line 7, in <module>
    from .coherencemodel import CoherenceModel  # noqa:F401
  File "/home/ehsan-bi/DF-USF/lib/python3.9/site-packages/gensim/models/coherencemodel.py", line 37, in <module>
    from gensim.topic_coherence import (
  File "/home/ehsan-bi/DF-USF/lib/python3.9/site-packages/gensim/topic_coherence/probability_estimation.py", line 11, in <module>
    from gensim.topic_coherence.text_analysis import (
  File "/home/ehsan-bi/DF-USF/lib/python3.9/site-packages/gensim/topic_coherence/text_analysis.py", line 20, in <module>
    from gensim.models.word2vec import Word2Vec
  File "/home/ehsan-bi/DF-USF/lib/python3.9/site-packages/gensim/models/word2vec.py", line 206, in <module>
    from smart_open.compression import get_supported_extensions
ModuleNotFoundError: No module named 'smart_open.compression'

I have been gone through this file word2vec.py and i am unable to how to tweak the file so that i can reproduce your project. Looking for your kind consideration. Thank you

How can i clear the previous documents?

I am making an app that searches documents. But I don't want The documents that are already in the dataBase folder. How do I remove the documents that come pre programmed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.