Comments (10)
实际上,在我i3上面也是类似现象,除了cpu不到100%
from mmjpg.
我使用的 win10 + Python 3.5.2,
这个爬虫使用的是多进程,不是多线程,进程数取决于你的 cpu 数
试过用多线程,不过效果没有多进程的好,多线程由于 GIL 的原因,不太适用于这种爬虫下载
multiprocessing 模块官方介绍
https://docs.python.org/2/library/multiprocessing.html#introduction
New in version 2.6. 所以你的 Python 2.7 应该是没问题的
cpu 是不会开到 100% 的,还有 e:/mmjpg 其实也不用自己创建的
或者你试试先把 pool = Pool(processes=cpu_count()) 的processes 改为 1,试试单进程能不能跑起来
再不行的话就改用 multiprocessing.Process( ) 创建进程吧,这样就要修改点代码了
from mmjpg.
processes=1也是一样,文件夹都不创建就有意思了
from mmjpg.
这问题我也搞不太清楚,不然你试试用 python3 吧,因为毕竟我是在 python3 下测试的。
要不然你就修改多进程那部分代码吧,改用别的进程模块
from mmjpg.
用了最新的3.6一样不行,文件夹也不创建,一直Please start your performance! 脚本没详细的打印信息,也不知道一步挂了
from mmjpg.
如果能打印 Please start your performance! 但不执行接下来的操作那就是 urls_crawler(url) 方法的问题了,要不你试试在这个方法中的代码中间加入 打印语句,测试看看具体到哪一句就打印不出来不执行了,因为你这样说我也没办法确定问题所在
from mmjpg.
我加了打印标记发现,不能执行下面一句,应该还是创建进程时的问题
results = pool.map(urls_crawler, urls)
不使用进程池
urls_crawler(urls[1])
#results = pool.map(urls_crawler, urls)
发现能成功下载
from mmjpg.
试试其他进程的写法吧,换种思路
try:
process = []
delete_empty_dir(dir_path)
# results = pool.map(urls_crawler, urls)
for i in range(cpu_count()):
p = multiprocessing.Process(target=urls_crawler, args=(urls,)) # 创建进程
p.start() # 启动进程
process.append(p) # 进程入队
for p in process:
p.join() # 等待进程结束
然后把 urls_crawler(urls) 方法改为
def urls_crawler(urls):
""" 爬虫入口,主要爬取操作 """
for url in urls:
try:
from mmjpg.
谢谢,这个方法可行
from mmjpg.
问题解决那我关闭这个 issue 了
from mmjpg.
Related Issues (10)
- 一直无法创建文件夹 HOT 1
- 请问,在MAC下如何使用? HOT 6
- a
- 跑出来的只有带名字的空文件夹,最后以报错终止程序 HOT 3
- Lock的使用 HOT 3
- 下载出所有的图片都一样 HOT 3
- 好像采集下来的图片都不对 哪个网站防盗链了? HOT 5
- 请问怎么使用啊?能否详细说下,初学PY 目前版本3.5 HOT 5
- 唔,爬的时候会显示这个错误 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mmjpg.