Git Product home page Git Product logo

Comments (7)

dataabc avatar dataabc commented on June 7, 2024

感谢反馈,能否提供出错的user_id,方便调试,谢谢

from weibo-crawler.

Nathern001 avatar Nathern001 commented on June 7, 2024

感谢反馈,能否提供出错的user_id,方便调试,谢谢

我也遇到这个问题了,user_id是1877809031

from weibo-crawler.

Nathern001 avatar Nathern001 commented on June 7, 2024

感谢反馈,能否提供出错的user_id,方便调试,谢谢

我也遇到这个问题了,user_id是1877809031

还有这个user_id:2214855827,爬到第7页也开始出现这个问题

from weibo-crawler.

Nathern001 avatar Nathern001 commented on June 7, 2024

感谢反馈,能否提供出错的user_id,方便调试,谢谢

我也遇到这个问题了,user_id是1877809031

还有这个user_id:2214855827,爬到第7页也开始出现这个问题

还有1650713582

from weibo-crawler.

dataabc avatar dataabc commented on June 7, 2024

@469698742 @Nathern001
我测试了下,除了最后一个都可以爬下来,感觉是爬的太快被限制了。最后一个显示有几千条微博,主页却没有一条微博,感觉是博主自己作了限制。
解决上面的问题,大概有两种,一种是加cookie,另一种就是减慢爬取速度。如果加了cookie还有问题,应该是被限制了,过一段时间限制会自动解除。
减慢速度需要修改get_pages方法中的如下代码:

                if page - page1 == random_pages and page < page_count:
                    sleep(random.randint(6, 10))
                    page1 = page
                    random_pages = random.randint(1, 5)

代码的意思是程序每爬1到5页,随机暂停6到10秒,这是程序的默认值。因为要减速,可以加快暂停的频率,如改成1到3页,也可以增加每次的暂停时间,如10到15秒。可以根据自己的需求改,爬的越慢,被限制的几率就越小,但是速度就将下来了,需要自己权衡利弊。

from weibo-crawler.

469698742 avatar 469698742 commented on June 7, 2024

我个人需求是爬用户,因此设置了0天发现有的用户没发微博就爬得太快了,因此我把self.get_pages()注释掉并加了暂停,就没有被限制了

    def start(self):
        """运行爬虫"""
        try:
            for user_id in self.user_id_list:
                self.initialize_info(user_id)
                self.get_user_info()
                #self.get_pages()
                print(u'信息抓取完毕')
                print('*' * 100)
                sleep(random.randint(6, 10))

from weibo-crawler.

dataabc avatar dataabc commented on June 7, 2024

@469698742
感谢反馈,很有参考价值。

from weibo-crawler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.