Git Product home page Git Product logo

insight_api's Introduction

Weekendpedia Django API

This is the Django API server source code repo for the the Insight Data Science Project Weekendpedia. The source codes for the Chrome extension part are in another repo.

Weekendpedia is an Chrome extension that recommend cultural events (in galleries, museums, etc) in New York City for Wikipedia users. The extension will track the current Wikipedia topic that the user is viewing, and alert the user when a relevant cultural event is found.

This YouTube video demonstrates how the extension interacts with users.

This repo contains the Django backend for the API server (in ./insight_api_src). The event data is scraped from nyc.com and stored in ./insight_api_data.

Recommendation algorithm

The server uses keyword extraction and TF-IDF for content recommendation. Keywords are captured using named entity recognition (NER) and part-of-speech (PoS) tagging. The IDF space is defined by the keywords from the event descriptions. The cosine similarities between TF-IDF feature vectors of the wiki articles and all the event descriptions are calculated. Event information is returned to users if the similarities are higher than the threshold.

More details are explained in the notebook and the slides.

Service details

API service

The Chrome extension sends the URL to the Django server if the user is currently navigating to Wikipedia. The wiki topic is extracted from the URL, and the intro text of the corresponding wiki page is retrieved using the API provided by Wikipedia. The text is converted to a feature vector, using the pre-calculated IDF weights of the keywords from event descriptions. The cosine similarities between the feature vector and all the pre-calculated feature vectors of the events are calculated by the recommender. If the similarities are higher than the threshold, the information (name, link, etc) of the corresponding event is retrieved by the recommender from the PostgreSQL server linked to the Django server, and returned to the user Chrome extension as JSON strings. The IDF weights, the feature vectors of the events and the PostgreSQL database are updated once the events are updated.

Components of the server

The Django API server has three main components: extractor, vectorizer and recommender.

The extractor (./insight_api_src/extractor/) retrieves pure texts of the Wikipedia topic that the user is viewing, using the API provided by Wikipedia. The functions are defined in ./insight_api_src/extractor/views.py.

The vectorizer (./insight_api_src/vectorizer/) converts the text into a feature vector using TF-IDF algorithm (details are explained in the notebook in ./recommender_prototype), and sent to the recommender. The functions are defined in ./insight_api_src/extractor/views.py.

The recommender (./insight_api_src/vectorizer/) calculates the cosine similarities between the feature vectors of the wiki texts and the pre-calcualted feature vectors of the events. Events are recommended when the similarities are higher than the threshold. The event infomation (name, description, link, etc) is retrieved from the PostgreSQL server and returned as JSON strings. The functions are defined in ./insight_api_src/extractor/views.py.

insight_api's People

Contributors

jiananarthurli avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.