Git Product home page Git Product logo

vaibhavkankane / gameofthrones Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 583 KB

Entity graph (character, location, events, dragons, etc.) created by scraping GOTWiki.

Home Page: https://codefringo.wordpress.com/2018/10/22/webcralwer-in-python/

Jupyter Notebook 82.80% Python 17.20%
scrapy-crawler crawler gameofthrones python3 neo4j entity-graph spider jupyter-notebook fandom-wiki graph

gameofthrones's Introduction

Game Of Thrones full dataset from fandom wiki.

Entity graph created by scraping GOTWiki- https://gameofthrones.fandom.com/wiki

Nodes- ['Organization', 'Person', 'Event', 'Episode', 'Animal', 'Location', 'HistoriesNLore', 'Weapon', 'House', 'PersonType', 'Religion', 'Season']

Relationships- ['SeenOrMentioned', 'Membership', 'Religion', 'Center', 'Location', 'Clergy', 'Allegiance', 'Leader', 'Founder', 'Predecessor', 'Death', 'Culture', 'Conflict', 'Place', 'Outcome', 'AssociatedLocation', 'Father', 'Mother', 'Spouse', 'Siblings', 'Battles', 'Rulers', 'Narratedby', 'Lovers', 'Successor', 'Children', 'Maker', 'Owner', 'Lord', 'Capital', 'Cities', 'Towns', 'Castles', 'Species', 'Range', 'Ruler', 'Population', 'Heir', 'Ancestralweapon', 'PlacesofNote', 'Formerly', 'Placesofnote', 'Military', 'Institutions', 'Villages', 'Placeoforigin', 'Formedfrom', 'Cadetbranches', 'Militarystrength', 'Premiere', 'Finale']

I have written a introductory blog about web scraping - https://codefringo.wordpress.com/2018/10/22/webcralwer-in-python/

Important files-

  1. spiders/GotTGraphSpider.py is the main spider used to scrape fandom wiki
  2. DataProcessor/ScrapyOutputProcessing.ipynb is the jupyter notebook that processes the scrapedOutput and generates tabular data for entities and creates graph in neo4j instance.

Run the whole project-

  1. Run command- "scrapy crawl GotGraphSpider -o Data/ScrapedData.json"
  2. Execute the jupyter notebook - DataProcessor/ScrapyOutputProcessing.ipynb.

gameofthrones's People

Contributors

vaibhavkankane avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

chitthushine

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.