Git Product home page Git Product logo

Comments (3)

dataabc avatar dataabc commented on June 6, 2024

目前不支持时间段设置,但是可以通过爬取页码范围实现相同功能,具体是修改get_weibo_info的如下代码:

for page in tqdm(range(1, page_num+1), desc='Progress'):

上面page_num代表微博总页数,代码意思是,在满足since_date的情况下,获取从第1页到最后一页的微博。你可以手动搜索那段时间内微博的页码范围,上面的1替换成起始页码,page_num替换成终止页码。这样程序就只会获取从起始页码到终止页码且满足since_date的微博,间接获取某段时间内的微博

from weibo-crawler.

LuciMessiah avatar LuciMessiah commented on June 6, 2024

目前不支持时间段设置,但是可以通过爬取页码范围实现相同功能,具体是修改get_weibo_info的如下代码:

for page in tqdm(range(1, page_num+1), desc='Progress'):

上面page_num代表微博总页数,代码意思是,在满足since_date的情况下,获取从第1页到最后一页的微博。你可以手动搜索那段时间内微博的页码范围,上面的1替换成起始页码,page_num替换成终止页码。这样程序就只会获取从起始页码到终止页码且满足since_date的微博,间接获取某段时间内的微博

这个代码执行后获取内容为空。或者是否考虑下这种思路,download内容从最旧的地方开始,也就是从since date日期开始爬,这个会更好实现? 求大大尝试一下

from weibo-crawler.

dataabc avatar dataabc commented on June 6, 2024

感谢反馈和建议。
获取为空,可能是因为爬的太快,被限制了,限制一段时间会自动解除。程序默认是有随机等待的,可以避免被限制。因为,默认的随机等待策略是每获取1到5页随机sleep6到10秒。修改起始页码,可能会影响到等待策略,这是程序的问题,现在已经修复了,可以任意修改起始页码和终止页码
从since_date开始爬的建议很不错,但是暂时不打算实现,因为效率太低。如果要实现,必需用程序计算满足起始日期的页码是第几页,这个过程可能要访问多页微博,效率不高。所以暂时不会实现这个功能。
再次感谢建议,如有问题欢迎继续反馈

from weibo-crawler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.