MARADMIN WEB CRAWLER IOT PROVIDE A BETTER SEARCH FUNCTION

Below are the documentations on how to set it up from scratch. Then further below will be how to just clone it and run it. The website is https://www.marines.mil/News/Messages/MARADMINS/

Methodology of building the application

Scrapy
Django API backend
Celery Worker to automate the scraping and inserting new maradmin into the db
React frontend

VIRTUAL ENV

UTILIZE BASH in windows or natively in Unix machines
python -m venv mybash
Activate the virutal environment in bash.
1. In windows utilizing bash source mybash/bin/activate
python -m pip install --upgrade pip

These are the packages installed. They will all role up into a requirements.txt later

pip install Scrapy

SCRAPY

Create a new project scrapy startproject maradmin_scrapy_project
Add csv into the settings
Create a spider under spiders folder called maradminspider
Trial 1: it scraps just the basic information and not the body - initial scrape is 50 pages
Crawl utilizing the command scrapy crawl maradminspider inside the root director of maradmin_scrapy_project
The scraped data is inserted straight into the database via django model

DJANGO

pip install django
django-admin.exe startproject backend
Make updates to the settings.py
python manage.py migrate
python manage.py runserver - to test if it runs
python manage.py startapp search_api
Add search_api into settings to installed apps
Create the models based on the scraped data
python manage.py makemigrations search_api
python manage.py migrate search_api
Set up admin to for testing purpose only
1. python manage.py createsuperuser
Set up URL links from project to app
Create a bulk insert manager and run it with a django command
python manage.py maradmin_uploader

Django Rest Framework

pip install djangorestframework
pip install django-filter
pip install markdown
Add rest_framework to settings
Create serializer.py
Create view to display serialized objects
Update URL to view with simple Router

Pagination and SearchFilter

Update settings and view

Integrated Scrapy into Django

pip install scrapy-djangoitem
Moved maradmin_scrapy_project into same level as search_api
Update management/commands/maradmin_uploader.py
Update maradmin_scrapy_project files from adding apps.py, to the items, pipelines, and settings to integrate Django models
Run scrapy crawl maradminspider inside the maradmin_scrapy_project to scrape and save into django database

Testing

pip install coverage
made tests folder
add test_models and test_view
coverage run
coverage html
coverage report

Celery Worker - IW (PAUSE) - workaround is to establish a cron job later on to automate it. Twister error wins...

This is to automate the scraping and uploading into the database to ensure it is up-to-date
pip install celery for the worker and pip install redis for the broker
Create celery.py inside the backend/backend
Add celery settings at the bottom of settings
Add celery to backend/backend/__init__.py to ensure it is loaded every time django starts up
Create tasks.py and make simple task inside directory search_api
pip install crochet - handle Twisted errors. See below reference stackoverflow on ReactorNotRestartable
pause celery worker
Install redis-server
1. sudo apt-get install redis-server
2. sudo service redis-server restart
Run redis server on separate terminal
1. redis-server
Run celery worker on separate terminal
1. celery worker -A backend -l info
2. test celery beat -A backend -l info

phansiri / maradmin-search Goto Github PK

maradmin-search's Introduction

MARADMIN WEB CRAWLER IOT PROVIDE A BETTER SEARCH FUNCTION

Methodology of building the application

VIRTUAL ENV

These are the packages installed. They will all role up into a requirements.txt later

SCRAPY

DJANGO

Django Rest Framework

Pagination and SearchFilter

Integrated Scrapy into Django

Testing

Celery Worker - IW (PAUSE) - workaround is to establish a cron job later on to automate it. Twister error wins...

React

Redis

References

maradmin-search's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org