Git Product home page Git Product logo

crawl-me's Introduction

crawl-me

中文README

Crawl-me is a light-weight fast plugin based web picture crawler. You can download your favorite pictures via the plugin if the website is supported. For now, the plugins include gamersky and pixiv. If you want to contribute, please just feel free to contact with me.

Fork me on Github :) https://github.com/nyankosama/crawl-me

Features

  • Crawl-me core supports muti-thread downloading using http range-headers, so it's very fast.
  • It's plugin based, so you can free add any plugin you want.

Available plugins

  • pixiv : This plugin allows you to download any author's paintings in pixiv site.
  • gamersky : This plugin supports downloading all pictures in special topic from gamersky site.

Installation

install via pip

Make sure you have already installed python2.7 and pip.

Due to the fact that package relies on lxml, if your platform is linux, please make sure you have installed lib libxslt-devel libxml2-devel. And for windows please select a suitable lxml installer to install.

And then:

$ pip install crawl-me

For windows, please add {$python-home}/Scripts/ to systempath

install via git

1. Ubuntu

Install the prerequisite library first:

sudo apt-get install libxml2-dev
sudo apt-get install libxslt1-dev 

And then you should install setuptools in order to run the setup.py file

sudo apt-get install python-setuptools

Finally, git clone the source, and install:

$ git clone https://github.com/nyankosama/crawl-me.git
$ cd crawl-me/
$ sudo python setup.py install

2. Windows

Make sure you have already installed python2.7 and pip

You can install python2.7 via windows installer. You can install pip via downloading the get-pip.py, and run it via python:

python get-pip.py

And then install the prerequisite library lxml. please select a suitable lxml installer to install.

Finally git clone the source, and install:

$ git clone https://github.com/nyankosama/crawl-me.git
$ cd crawl-me/
$ sudo python setup.py install

For windows, please add {$python-home}/Scripts/ to systempath

Usage

Examples

  1. Download 10 pages pictures at the url of http://www.gamersky.com/ent/201404/352055.shtml in gamersky site, and store the pictures into local direcotry.

     crawl-me gamersky http://www.gamersky.com/ent/201404/352055.shtml ./gamersky-crawl 1 10
    
  2. Download all the paintings of 藤原(Fujiwara, Pixiv ID=27517), and store them into local directory.

     crawl-me pixiv 27517 ./pixiv-crawl <your pixiv loginid> <your password>
    

Command line options

  1. general help

     $ crawl-me -h    
    
     usage: crawl-me [-h] plugin
    
     positional arguments:
         plugin      plugin the crawler uses
     
     optional arguments:
         -h, --help  show this help message and exit
    
     available plugins:
     ----gamersky
     ----pixiv
    
  2. gamersky

     $ crawl-me gamersky -h
     
     usage: crawl-me [-h] plugin url savePath beginPage endPage
    
     positional arguments:
         plugin      plugin the crawler uses
         url         your url to crawl
         savePath    the path where the imgs ars saved
         beginPage   the page where we start crawling
         endPage     the page where we end crawling 
     
     optional arguments:
         -h, --help  show this help message and exit
    
  3. pixiv

     $ crawl-me pixiv -h
    
     usage: crawl-me [-h] plugin authorId savePath pixivId password
    
     positional arguments:
         plugin      plugin the crawler uses
         authorId    the author id you want to crawl
         savePath    the path where the imgs ars saved
         pixivId     your pixiv login id
         password    your pixiv login password
     
     optional arguments:
         -h, --help  show this help message and exit
    

TODO

  • Functions:

    • support breakpoint resume
  • Plugins:

    • weibo
    • qq zone

Licenses

MIT

crawl-me's People

Contributors

nyankosama avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.