Git Product home page Git Product logo

mmjpg's Introduction

美女写真套图爬虫(一)

爬取网站 : http://www.mmjpg.com

写代码是一种艺术,来源于生活并且服务于生活

想要看妹子的图片怎么办,上网找阿,于是某度之

一看排名第一,来头不小,那就决定是你了
觉得不能只是走马观花地浏览,所以决定把整个网站的套图全都爬下来,以便以后慢慢品味

Just do it

配上一杯咖啡以及网易云一个电音歌单,经过指尖的一阵阵翻云覆雨之后,代码算是写好了。测试好,没问题,走你!

不知不觉中,套图已全部爬取完成

全站 950 套图片,共 3.86 G

爬虫使用多进程,学校 8M 的网速基本满速

欢迎 Fork 和 Star

mmjpg's People

Contributors

80000v avatar chenjiandongx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmjpg's Issues

唔,爬的时候会显示这个错误

("Connection broken: ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None)", ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None))

出现错误后下载的图片编号不连续,还请大佬指教

一直Please start your performance!

win7+ python2.7.13,安装好所需模块,但是一直Please start your performance!不下载
我创建了e:/mmjpg也会删除,但就不创建了

应该是多线程的问题,我的是N2820的下载机,双核不支持多线程,关掉脚本cpu满载不下,直到重启

Lock的使用

程序中使用multiprocessing模块,用pool创建多进程,为什么互斥锁用的是threading模块的,而不是multiprocessing.lock()?

一直无法创建文件夹

初学python,版本3.6.5,Windows10,在运行过程中,在shell里一直显示Please start your performance!,不会创建文件夹,我看了已经关掉的issue,试了也不行,可能是什么问题呢

跑出来的只有带名字的空文件夹,最后以报错终止程序

Traceback (most recent call last):
File "E:\tools\mmjpg-master\mm_crawler.py", line 94, in
pool.map(urls_crawler, urls)
File "C:\Python27\lib\multiprocessing\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Python27\lib\multiprocessing\pool.py", line 567, in get
raise self._value
IOError: [Errno 9] Bad file descriptor

报错信息如上

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.