Git Product home page Git Product logo

crawlers's Introduction

POPONG Crawlers

Just some minor web crawlers.
Pull requests are always welcome.

License

Affero GPL v3.0

  • Required: License and copyright notice + State Changes + Disclose Source
  • Permitted: Commercial Use + Modification + Distribution
  • Forbidden: Hold Liable + Sublicensing

Descriptions

Production

bills

Get bill data from the National Assembly and structurize to json formats. (See attributes)

pip install -U celery-with-redis    # Install dependencies
cd bills
cp settings.py.sample settings.py   # Input data directory
python main.py

commentable_bills

Get commentable bills from 국회입법예고

cd commentable_bills
python crawl.py     # open and set datadir first

committee_list

Get committee list data from the 위원회 현황

cd committee_list
python get.py       # To get data files

election_commission

Get Korean politicians' data from Korea Election Commission (중앙선거관리위원회).
This data contains the list of all people that have run for office in the National Asssmbly.

cd election_commission
python main.py -h

glossary

Get and merge data for POPONG Glossary from:
committee: Standing committee and Special Committee (국회상임위원회 및 특별위원회),
likms: Integrated Legislation Knowledge Management System (입법통합지식관리시스템),
nas: National Assembly Secretaritat (국회사무처).

python get.py       # To get source data files
python merge.py     # To create glossary.csv

google

Get Google search counts.

cd google
python ndocs.py

meetings

Get National Assembly meetings.

cd meetings
python crawl.py

meetings_calendar

Get National Assembly meetings calendar.

cd meetings_calendar
python get.py 2014-11-01 2014-11-11     # To get meetings schedule from 2014-11-01 to 2014-11-11 or 
python get.py 2014-11-01                # To get meetings schedule at 2014-11-01

national_assembly

Get member information from the Korean National Assembly.

pip install Scrapy>=0.22.2
cd national_assembly
python crawl.py

naver_news

Get news articles for recent bills from Naver News.

pip install psycopg2 lxml
cd naver_news
cp settings.py.sample settings.py
vi settings.py                      # fill in values
python crawl.py

peoplepower

Get People Power 21 (열려라국회) webpages. (Currently broken)

cd peoplepower
scrapy crawl peoplepower21

pledges

Get pledges from NEC (선거관리위원회) for 19th National Assembly officials.

cd pledges
python crawler.py

rokps

Get Korean politicians' data from ROKPS(헌정회).

cd rokps
python crawler.py
python parser.py

wikipedia

Get Korean lastnames from Wikipedia.

cd wikipedia
python wiki_lastnames.py

Get Wikipedia links for assembly members.

cd wikipedia
python assembly_members.py

Metrics

twitter

Get Twitter follower lists for specified handles.

make twitter_setup
python twitter/followers.py

crawlers's People

Contributors

e9t avatar cornchz avatar stray-leone avatar majorika avatar sanxiyn avatar hunkim avatar lexifdev avatar dongx3 avatar

Watchers

Seungjoon Lee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.