Git Product home page Git Product logo

gist's Introduction

Gist

Logo
Gist: Less is more | A text summarizer


GitHub pull-requests GitHub issues GitHub contributors GitHub license GitHub repo size Code of Conduct Open Source Love svg1

Table of Contents

Motivation

Gist started out as a project built as part of NMIT Hacks 2021 and even came runners up! *wink wink*
It was then further developed with improved models, an added front-end and features enhanced. Gist started as a simple prototype to easily and efficiently summarize any news article into 60 or lesser words for quicker grasp of data, but since then it has come a long way.

Today Gist hosts 3 seperate applications, each with their own unique functionalities, yet derived from the same core.
Gist Core API summarizes content from text, documents, pdfs, images so you can upload your physical news paper! And even other online news websites. This helps you consume important data much quicker.

Gist at it's heart is an application that acts as a medium where you can read several 60 or lesser word news articles in one place, similar to that of inShorts.

And finally the Gmail Summarizer which was built to quickly get insights on your email data as it becomes harder to keep track of the infinite mails we recieve everyday. And thanks to the devs for making Gist open source, it can be easily accessed and updated by users from around the world and it's features only growing everyday!

Installation

Firstly, clone the repository using,

git clone https://github.com/SVijayB/Gist

Once you have the source code, create a virtual environment using the following command, python3 -m venv venv

Enter the virtual environment and install dependancies using pip install -r requirements.txt.

Your installation is completed and you are all set to use the API.

For using the front-end, you'll need to make sure you have node installed. Once you have node installed, you can install the dependencies using npm install in the frontend\gist folder.

Usage

Once all the dependencies for both the front-end and back-end is completed you need to create a .env file in the root directory of the project.
The .env file should contain the same variables as .env.example. Once that is done, launch the back-end server. To do this, run the following command in the root directory of the project.

py main.py

Once the back-end is running successfully, you can launch the front-end by running the following command in the frontend\gist folder.

npm start

P.S: You can also find the thesis for the project here.

You can also find some of the results generated here and here.

Contributing

To contribute to Gist, fork the repository, create a new branch and send us a pull request. Make sure you read CONTRIBUTING.md before sending us Pull requests.

Also, thanks for contributing to Open-source!

License

Gist is under The MIT License. Read the LICENSE file for more information.


gist's People

Contributors

dependabot[bot] avatar engineerscodes avatar imgbotapp avatar saipranay47 avatar srisanthoshreddy-medapati avatar svijayb avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gist's Issues

[Bug]: Fix .pdf and .docx support for file formats support

Description

For some reason, the .pdf and .docx file format support does not seem to be working. Will have to fix it.

Expected Behavior

By default, content from .pdf files are read into text and then passed into the engine for summarization, and as for .docx format, it's first converted to .pdf and the above steps are carried out again.

As of right now, the code required for .docx and .pdf file format support is commented out. You can find it at src\components\extraction.py from lines 29 to 42.

[Feature Request]: Build an automated script for testing the rouge score

As of right now, we are using two randomly taken examples from Inshorts to test our product's ability to automatically generate news summaries.
However, we can automate this process by using web scraping and hence test it over a larger number of articles.
Things to do to sort out this issue:

  • Scrape data from the Inshorts page or use their API (If they have one).
  • Run the original article against our API (Inshorts provides a link to the original article).
  • Use the rouge metric to analyze our generated summary against the one from inshorts.
  • Store the original article, inshorts summary, our summary and rouge metric in a .csv file for further analysis.
  • Perform small data analytics over the collected data and depict them visually to better understand performance of our API.

[Feature Request]: Complete the categorizer function

The categorizer function has been in works for quite some time and the previous model was completely scrapped due to it's complete failure :(
However, it's about time we start working on it and get it completed.
We should be able to sort out the news articles using the categorizer function.
Articles should be sorted into

  • Sports
  • Tech
  • Politics
  • Entertainment
  • Business

[Documentation]: Document the API using swagger.io

Currently, the API is being documented in the README file and with a simple Html render on the back end.
We ought to use swagger.io and properly document the API if we plan to scale up the project.
It'll also create a much more presentable look for our first review so moving this up in the priority scale.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.