Git Product home page Git Product logo

cccatalog-dataviz's Introduction

⚠️ Notice: This project is on hold, and not under active development. We are not accepting new issues or pull requests. You can learn more at: Upcoming Changes to the CC Open Source Community — Creative Commons Open Source.

Visualize CC Catalog Data

About

The landscape of openly licensed content is wide and varied. Millions of web pages host and share CC-licensed works—in fact, we estimate that there are over 1.6 billion across the web! With this growth of CC-licensed works, Creative Commons (CC) is increasingly interested in learning how hosts and users of CC-licensed materials are connected, as well as the types of content published under a CC license and how this content is shared. Each month, CC uses Common Crawl data to find all domains that contain CC-licensed content. This dataset contains information about the URL of the websites and the licenses used.

In order to draw conclusions and insights from this dataset, we created the Linked Commons: a visualization that shows how the Commons is digitally connected.

A live demo of the project can be found in here

Getting Started

Directory Structure

src
│   README.md
│   docker-compose.yml # Development docker compose
│
└───GSoC2019
└───data-release # Contains some raw unprocessed tsv files and processed output JSON files
│
└───frontend # Contains react.js app to render the visualization in the browser.|   .env # Contains Backend Server Base Endpoint
│  │   package.json
│  │   package.lock.json
│  │
│  └───src # Contains all React Components
│  
└───backend # Includes Django server source code and scripts to build & update the database. 
   │   requirements.txt
   │   .env # Contains list of environment variables the project needs
   │
   └───scripts # Contains scripts to parse JSON data and upload it to MongoDB server
   └───src # Contains server side Django Apps which defines the API that feeds data to the visualization 

Setting Up Local Development Environment Without Docker

Prerequisites

The frontend application is using react, for which NodeJS v12+ and npm are necessary. NodeJS can be installed from here.

The backend application is using Django, for which Python v3.7+ necessary. Python can be installed from here.

Frontend

  1. Navigate to frontend/ directory.
cd frontend/
  1. Install all dependencies (Make sure that there exists a package.json in the current path)
npm install
  1. To start the development server, use the following command in the terminal.
npm start
  1. To create an optimized build for production, run the following command in the terminal.
npm run build

Backend and Database

  1. Navigate to backend/ directory.
cd backend/
  1. Before proceeding further, ensure that all the variables in .env file are updated and MONGO_HOSTNAME is set to localhost:27017.
  2. Install all dependencies
pip install -r requirements.txt
  1. Navigate to src/ directory where Django-server code exists
cd src/
  1. To start the development server, use the following command
python manage.py runserver
  1. Now the backend should be live at localhost:8000.
  2. The server needs a running instance of MongoDB. Start the Mongo DB server and ensure that the authentication credentials are exactly same as defined in the .env file. If you wish to update the data inside the Database, head over to this section.
  3. Happy Contributing to Linked Commons! 🚀🚀🚀

Setting Up Local Development Environment using Docker

  1. Make sure that the root directory contains docker-compose.yml. And ensure that the backend/.env file is updated and MONGO_HOSTNAME is set to mongodb:27017.
  2. Run the following command to build and start the container.
docker-compose up
  1. Now the frontend, backend and database should be live.
  2. If this is the first time you have built the container, head over to this section to learn how to add data to the MongoDB.
  3. Any changes in the backend/ and frontend/ will trigger a rebuild process and you will be able to see the changes on server!
  4. Happy Contributing to Linked Commons! 🚀🚀🚀

Building production version

Important: For simiplicity we will be using docker to build the production version. Please note that any changes in project files after build won't get reflected in the running container and you need to rebuild the image again.

  1. Before building images, ensure that all the variables in .env file are updated and MONGO_HOSTNAME is set to mongodb:27017.
  2. Now, navigate to backend and then build the django-backend image.
cd backend/
docker build . -f Dockerfile.prod -t linked_commons/backend
  1. Create a new user-defined bridge network
docker network create --driver=bridge linkedcommons-net
  1. Now run the recently built linked_commons/backend image.
docker run --name backend \
   -p 8000:8000 --env-file ./env \
   --network=linkedcommons-net \
   --rm -d linked_commons/backend
  1. Now to start the database in an isolated container.
docker run -it --name mongodb \
   --network=linkedcommons-net \
   -p 27017:27017 -v mongodbdata:/data/db \
   --env-file ./.env --rm -d mongo:4.0.8
  1. You can now access the backend at port 8000 and database at port 27017 of localhost. If you wish to add data then head over to this section.

  2. Now, let's build the frontend. Navigate to frontend directory and build the react-frontend image.

cd frontend
docker build . -f Dockerfile.prod  -t  linkedcommons/frontend
  1. Now to start the frontend application run the following command.
docker run --name frontend \
   -p 3000:80 --rm -d linkedcommons/frontend
  1. Now, the frontend can be accessed at localhost:3000.

Add data to MongoDB

  1. Navigate to the directory containing build_db_script.py.
cd backend/scripts
  1. Ensure that the directory contains fdg_input_file.json or update the INPUT_FILE_PATH variable which will be uploaded to the database. A sample fdg_input_file.json can be found inside data-release/ directory.
  2. Ensure that all the variables in .env file are updated with the running mongodb server.
  3. Now run the build_db_script in the terminal.
# It will connect to the database at `localhost:27017` and update the data. 
python build_db_script.py localhost
  1. It should take a while depending on the JSON file size.
  2. Congrats! You have successfully updated the data. 🎉🎉🎉

Archive

GSoC2019 - Google Summer of Code project by María Belén Guaranda

cccatalog-dataviz's People

Contributors

bharatnischal avatar dependabot[bot] avatar kgodey avatar mathemancer avatar mostafahamedabdelmasoud avatar pa-w avatar parth-paradkar avatar sclachar avatar soccerdroid avatar sp35 avatar ssayima avatar subhamx avatar zackkrida avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.