Git Product home page Git Product logo

Comments (10)

ShowerXu avatar ShowerXu commented on August 11, 2024

实际上,在我i3上面也是类似现象,除了cpu不到100%

from mmjpg.

chenjiandongx avatar chenjiandongx commented on August 11, 2024

我使用的 win10 + Python 3.5.2,

这个爬虫使用的是多进程,不是多线程,进程数取决于你的 cpu 数
试过用多线程,不过效果没有多进程的好,多线程由于 GIL 的原因,不太适用于这种爬虫下载

multiprocessing 模块官方介绍
https://docs.python.org/2/library/multiprocessing.html#introduction
New in version 2.6. 所以你的 Python 2.7 应该是没问题的

cpu 是不会开到 100% 的,还有 e:/mmjpg 其实也不用自己创建的
或者你试试先把 pool = Pool(processes=cpu_count()) 的processes 改为 1,试试单进程能不能跑起来
再不行的话就改用 multiprocessing.Process( ) 创建进程吧,这样就要修改点代码了

from mmjpg.

ShowerXu avatar ShowerXu commented on August 11, 2024

processes=1也是一样,文件夹都不创建就有意思了

from mmjpg.

chenjiandongx avatar chenjiandongx commented on August 11, 2024

这问题我也搞不太清楚,不然你试试用 python3 吧,因为毕竟我是在 python3 下测试的。
要不然你就修改多进程那部分代码吧,改用别的进程模块

from mmjpg.

ShowerXu avatar ShowerXu commented on August 11, 2024

用了最新的3.6一样不行,文件夹也不创建,一直Please start your performance! 脚本没详细的打印信息,也不知道一步挂了

from mmjpg.

chenjiandongx avatar chenjiandongx commented on August 11, 2024

如果能打印 Please start your performance! 但不执行接下来的操作那就是 urls_crawler(url) 方法的问题了,要不你试试在这个方法中的代码中间加入 打印语句,测试看看具体到哪一句就打印不出来不执行了,因为你这样说我也没办法确定问题所在

from mmjpg.

ShowerXu avatar ShowerXu commented on August 11, 2024

我加了打印标记发现,不能执行下面一句,应该还是创建进程时的问题
results = pool.map(urls_crawler, urls)
不使用进程池
urls_crawler(urls[1])
#results = pool.map(urls_crawler, urls)
发现能成功下载

from mmjpg.

chenjiandongx avatar chenjiandongx commented on August 11, 2024

试试其他进程的写法吧,换种思路

try:
    process = []
    delete_empty_dir(dir_path)
    # results = pool.map(urls_crawler, urls)
    for i in range(cpu_count()):
        p = multiprocessing.Process(target=urls_crawler, args=(urls,))  # 创建进程
        p.start()           # 启动进程
        process.append(p)  # 进程入队

    for p in process:
        p.join()  # 等待进程结束

然后把 urls_crawler(urls) 方法改为

def urls_crawler(urls):
    """ 爬虫入口,主要爬取操作 """
    for url in urls:
        try:

from mmjpg.

ShowerXu avatar ShowerXu commented on August 11, 2024

谢谢,这个方法可行

from mmjpg.

chenjiandongx avatar chenjiandongx commented on August 11, 2024

问题解决那我关闭这个 issue 了

from mmjpg.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.