Git Product home page Git Product logo

pspider's People

Contributors

foristkirito avatar xianhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pspider's Issues

关于爬虫代理ip

最近写爬虫经常遇到ip被封,于是爬了好多代理ip,但是代理ip可用性是个问题,超时、失效或者被封都要换ip,这块写起来很麻烦。我看PSpider里面没有这个模块,是通过别的方法绕过去了,还是暂时不需要呢,为什么我每次都遇到这个问题

consider an image fetch and saver ?

like this

def image_fetch(self, url: str):
        response = requests.get(url, headers={"User-Agent": make_random_useragent()}, stream=True, timeout=(3.05, 10))
        payload = urlparse(url).path
        _left_bound_pos = payload.rfind('/')
        _right_bound_pos = payload.find('.', _left_bound_pos)

        if (payload[_right_bound_pos + 1:] == 'jpeg' or payload[_right_bound_pos + 1:] == 'jpg') and \
                        response.headers['Content-Type'] == 'image/jpeg':
            _ext = '.jpeg'
        elif payload[_right_bound_pos + 1:] == 'gif' or response.headers['Content-Type'] == 'image/gif':
            _ext = '.gif'
        else:
            _ext = '.jpeg'

        return payload[_left_bound_pos + 1:_right_bound_pos] + _ext

新手请教问题

请问这个框架在控制爬取速度上怎么设计的? 目前有些网站抓取过快会导致链接断开。另外某些站点需要验证码来通过下一页验证,例如**文书网,请问作者是怎样解决这些问题

Ask

应该以什么顺序读您的代码,才能够理解您的爬虫框架呢

Debug 问题

新手, 想修改parser, 也就是inst_parse.py, 用threads方法用以抓取电影天堂下载链接(使用test_spider() 函数),以下是修改的parser

    def htm_parse_2(self, priority: int, url: str, keys: object, deep: int, content: object) -> (int, list, list):
        """
        parse the content of a url, you can rewrite this function, parameters and return refer to self.working()
        """
        *_, html_text = content

        url_list = [], save_list = []
        if (self._max_deep < 0) or (deep < self._max_deep):
            
            if not re.compile(r"/\d{8}/").search(url): #如果输入网址是列表网页 ,则抓取各个电影的下载链接网页       
                a_list = re.findall(r"<a href=\"(?P<url>[\w\W]{5,}?)\" class=\"ulink\">[\w\W]+?</a>", html_text, flags=re.IGNORECASE)
                url_list = [(_url, keys, priority+1) for _url in [get_url_legal(href, url) for href in a_list]]

            else:#如果输入网址是下载链接网页,则抓取下载链接
                download_url = re.search(r"<td style=\"WORD-WRAP: break-word\"[\w\W]*?><a href=\"(?P<url>[\w\W]{5,}?)\">", html_text, flags=re.IGNORECASE)
                save_list = [(download_url.group("url").strip(), datetime.datetime.now()), ] if download_url else []

        return 1, url_list, save_list

另外修改初始的urlhttp://www.ygdy8.net/html/gndy/oumei/list_7_12.html 一个其他部分不变,但是刚允许程序就结束了,log信息为:

WARNING:root:MonitorThread[monitor] start...
WARNING:root:ThreadPool set_start_url: keys=None, priority=0, deep=0, url=http://www.ygdy8.net/html/gndy/oumei/list_7_12.html
WARNING:root:ThreadPool start: fetcher_num=10, is_over=True
WARNING:root:FetchThread[fetcher-1] start...
WARNING:root:FetchThread[fetcher-2] start...
WARNING:root:FetchThread[fetcher-3] start...
WARNING:root:FetchThread[fetcher-4] start...
WARNING:root:FetchThread[fetcher-5] start...
WARNING:root:FetchThread[fetcher-6] start...
WARNING:root:FetchThread[fetcher-7] start...
WARNING:root:FetchThread[fetcher-8] start...
WARNING:root:FetchThread[fetcher-9] start...
WARNING:root:FetchThread[fetcher-10] start...
WARNING:root:ParseThread[parser] start...
WARNING:root:SaveThread[saver] start...
WARNING:root:ThreadPool status: running_tasks=0; fetch=(0, 0, 0/(5s)); parse=(0, 0, 0/(5s)); save=(0, 0, 0/(5s)); total_seconds=5
WARNING:root:FetchThread[fetcher-1] end...
WARNING:root:FetchThread[fetcher-2] end...
WARNING:root:FetchThread[fetcher-3] end...
WARNING:root:FetchThread[fetcher-4] end...
WARNING:root:FetchThread[fetcher-5] end...
WARNING:root:FetchThread[fetcher-6] end...
WARNING:root:FetchThread[fetcher-7] end...
WARNING:root:FetchThread[fetcher-8] end...
WARNING:root:FetchThread[fetcher-9] end...
WARNING:root:ParseThread[parser] end...
WARNING:root:SaveThread[saver] end...
WARNING:root:FetchThread[fetcher-10] end...
WARNING:root:ThreadPool status: running_tasks=0; fetch=(0, 0, 0/(5s)); parse=(0, 0, 0/(5s)); save=(0, 0, 0/(5s)); total_seconds=10
WARNING:root:MonitorThread[monitor] end...
WARNING:root:ThreadPool end: fetcher_num=10, is_over=True

也没调过threads 程序, 不知道该怎么调试,debug 模式也无法做到一步一步进行,请问问题出在哪里呢?另外可否推荐以下怎么调threads相关的程序,需要其他模块吗,比如winpdb等等?
谢谢!

AttributeError: module 'spider' has no attribute 'YunDaMa

不太明白weibo_user.py 中的 WeiBoLogin self.cookie_jar, self.opener = None, None self.yundama = spider.YunDaMa("", "") 这段,调用的时候也出错说‘AttributeError: module 'spider' has no attribute 'YunDaMa'’ 去spider里找了下也并没有找到有YunDaMa相关,不知道这里是什么用法还是缺失是一个bug了。

concur_insts 稳定性问题

以下面代码为例,self.pool.finish_a_task(TPEnum.URL_FETCH) 以上的任意代码出现问题,都可能造成线程异常退出,导致程序无法正常结束。比较简单的解决办法是加个 try ... catch

def work_fetch(self):
    # ----1
    priority, url, keys, deep, critical, fetch_repeat, parse_repeat = self.pool.get_a_task(TPEnum.URL_FETCH)

    # ----2
    code, content = self.worker.working(url, keys, critical, fetch_repeat)

    # ----3
    if code > 0:
        self.pool.update_number_dict(TPEnum.URL_FETCH, +1)
        self.pool.add_a_task(TPEnum.HTM_PARSE, (priority, url, keys, deep, critical, fetch_repeat, parse_repeat, content))
    elif code == 0:
        priority += (1 if critical else 0)
        self.pool.add_a_task(TPEnum.URL_FETCH, (priority, url, keys, deep, critical, fetch_repeat+1, parse_repeat))
    else:
        pass

    # ----4
    self.pool.finish_a_task(TPEnum.URL_FETCH)
    return True

i Cant run the Tool

Traceback (most recent call last):
File "test.py", line 13, in
from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

抓取豆瓣电影bid随机策略

请问一下,那个bid随机的策略还能用吗,之前我用了一下还能用,今天不能用了,你那边怎么样呢

你好

你好,交流群已经满了,加不进去

why my spider rerun twice?

2017-11-05 20:12:01,076	WARNING	ThreadPool start: urls_count=1, fetcher_num=10, is_over=True
2017-11-05 20:12:06,091	WARNING	ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=8, FAIL=0, 8/(5s)]; parse:[NOT=0, SUCC=8, FAIL=0, 8/(5s)]; save:[NOT=211, SUCC=6, FAIL=0, 6/(5s)]; total_seconds=5
2017-11-05 20:12:11,107	WARNING	ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=15, FAIL=0, 7/(5s)]; parse:[NOT=0, SUCC=15, FAIL=0, 7/(5s)]; save:[NOT=407, SUCC=16, FAIL=0, 10/(5s)]; total_seconds=10
2017-11-05 20:12:16,122	WARNING	ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=19, FAIL=0, 4/(5s)]; parse:[NOT=0, SUCC=19, FAIL=0, 4/(5s)]; save:[NOT=531, SUCC=28, FAIL=0, 12/(5s)]; total_seconds=15
2017-11-05 20:12:21,138	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 2/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 2/(5s)]; save:[NOT=591, SUCC=31, FAIL=0, 3/(5s)]; total_seconds=20
2017-11-05 20:12:26,151	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=579, SUCC=43, FAIL=0, 12/(5s)]; total_seconds=25
2017-11-05 20:12:31,161	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=568, SUCC=54, FAIL=0, 11/(5s)]; total_seconds=30
2017-11-05 20:12:36,176	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=566, SUCC=56, FAIL=0, 2/(5s)]; total_seconds=35
2017-11-05 20:12:41,190	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=561, SUCC=61, FAIL=0, 5/(5s)]; total_seconds=40
2017-11-05 20:12:46,202	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=560, SUCC=62, FAIL=0, 1/(5s)]; total_seconds=45
2017-11-05 20:12:51,218	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=558, SUCC=64, FAIL=0, 2/(5s)]; total_seconds=50
2017-11-05 20:12:56,227	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=554, SUCC=68, FAIL=0, 4/(5s)]; total_seconds=55
2017-11-05 20:13:01,230	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=548, SUCC=74, FAIL=0, 6/(5s)]; total_seconds=60
2017-11-05 20:13:06,245	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=522, SUCC=100, FAIL=0, 26/(5s)]; total_seconds=65
2017-11-05 20:13:11,261	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=514, SUCC=108, FAIL=0, 8/(5s)]; total_seconds=70
2017-11-05 20:13:16,270	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=495, SUCC=127, FAIL=0, 19/(5s)]; total_seconds=75
2017-11-05 20:13:21,277	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=489, SUCC=133, FAIL=0, 6/(5s)]; total_seconds=80
2017-11-05 20:13:26,282	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=477, SUCC=145, FAIL=0, 12/(5s)]; total_seconds=85
2017-11-05 20:13:31,298	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=460, SUCC=162, FAIL=0, 17/(5s)]; total_seconds=90
2017-11-05 20:13:36,300	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=453, SUCC=169, FAIL=0, 7/(5s)]; total_seconds=95
2017-11-05 20:13:41,315	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=447, SUCC=175, FAIL=0, 6/(5s)]; total_seconds=100
2017-11-05 20:13:46,326	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=441, SUCC=181, FAIL=0, 6/(5s)]; total_seconds=105
2017-11-05 20:13:51,342	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=436, SUCC=186, FAIL=0, 5/(5s)]; total_seconds=110
2017-11-05 20:13:56,357	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=423, SUCC=199, FAIL=0, 13/(5s)]; total_seconds=115
2017-11-05 20:14:01,362	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=412, SUCC=210, FAIL=0, 11/(5s)]; total_seconds=120
2017-11-05 20:14:06,378	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=401, SUCC=221, FAIL=0, 11/(5s)]; total_seconds=125
2017-11-05 20:14:11,394	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=388, SUCC=234, FAIL=0, 13/(5s)]; total_seconds=130
2017-11-05 20:14:16,402	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=383, SUCC=239, FAIL=0, 5/(5s)]; total_seconds=135
2017-11-05 20:14:21,405	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=379, SUCC=243, FAIL=0, 4/(5s)]; total_seconds=140
2017-11-05 20:14:26,421	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=361, SUCC=261, FAIL=0, 18/(5s)]; total_seconds=145
2017-11-05 20:14:31,430	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=352, SUCC=270, FAIL=0, 9/(5s)]; total_seconds=150
2017-11-05 20:14:36,443	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=351, SUCC=271, FAIL=0, 1/(5s)]; total_seconds=155
2017-11-05 20:14:41,459	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=350, SUCC=272, FAIL=0, 1/(5s)]; total_seconds=160
2017-11-05 20:14:46,461	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=343, SUCC=279, FAIL=0, 7/(5s)]; total_seconds=165
2017-11-05 20:14:51,467	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=335, SUCC=287, FAIL=0, 8/(5s)]; total_seconds=170
2017-11-05 20:14:56,483	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=334, SUCC=288, FAIL=0, 1/(5s)]; total_seconds=175
2017-11-05 20:15:01,484	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=331, SUCC=291, FAIL=0, 3/(5s)]; total_seconds=180
2017-11-05 20:15:06,490	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=331, SUCC=291, FAIL=0, 0/(5s)]; total_seconds=185
2017-11-05 20:15:11,506	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=330, SUCC=292, FAIL=0, 1/(5s)]; total_seconds=190
2017-11-05 20:15:16,519	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=313, SUCC=309, FAIL=0, 17/(5s)]; total_seconds=195
2017-11-05 20:15:21,530	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=309, SUCC=313, FAIL=0, 4/(5s)]; total_seconds=200
2017-11-05 20:15:26,545	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=298, SUCC=324, FAIL=0, 11/(5s)]; total_seconds=205
2017-11-05 20:15:31,550	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=286, SUCC=336, FAIL=0, 12/(5s)]; total_seconds=210
2017-11-05 20:15:36,566	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=277, SUCC=345, FAIL=0, 9/(5s)]; total_seconds=215
2017-11-05 20:15:41,582	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=271, SUCC=351, FAIL=0, 6/(5s)]; total_seconds=220
2017-11-05 20:15:46,592	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=258, SUCC=364, FAIL=0, 13/(5s)]; total_seconds=225
2017-11-05 20:15:51,593	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=257, SUCC=365, FAIL=0, 1/(5s)]; total_seconds=230
2017-11-05 20:15:56,609	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=249, SUCC=373, FAIL=0, 8/(5s)]; total_seconds=235
2017-11-05 20:16:01,624	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=241, SUCC=381, FAIL=0, 8/(5s)]; total_seconds=240
2017-11-05 20:16:06,625	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=237, SUCC=385, FAIL=0, 4/(5s)]; total_seconds=245
2017-11-05 20:16:11,641	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=235, SUCC=387, FAIL=0, 2/(5s)]; total_seconds=250
2017-11-05 20:16:16,656	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=232, SUCC=390, FAIL=0, 3/(5s)]; total_seconds=255
2017-11-05 20:16:21,672	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=232, SUCC=390, FAIL=0, 0/(5s)]; total_seconds=260
2017-11-05 20:16:26,688	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=226, SUCC=396, FAIL=0, 6/(5s)]; total_seconds=265
2017-11-05 20:16:31,703	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=222, SUCC=400, FAIL=0, 4/(5s)]; total_seconds=270
2017-11-05 20:16:36,716	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=219, SUCC=403, FAIL=0, 3/(5s)]; total_seconds=275
2017-11-05 20:16:41,728	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=214, SUCC=408, FAIL=0, 5/(5s)]; total_seconds=280
2017-11-05 20:16:46,744	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=211, SUCC=411, FAIL=0, 3/(5s)]; total_seconds=285
2017-11-05 20:16:51,750	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=207, SUCC=415, FAIL=0, 4/(5s)]; total_seconds=290
2017-11-05 20:16:56,765	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=193, SUCC=429, FAIL=0, 14/(5s)]; total_seconds=295
2017-11-05 20:17:01,781	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=191, SUCC=431, FAIL=0, 2/(5s)]; total_seconds=300
2017-11-05 20:17:06,796	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=183, SUCC=439, FAIL=0, 8/(5s)]; total_seconds=305
2017-11-05 20:17:11,797	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=172, SUCC=450, FAIL=0, 11/(5s)]; total_seconds=310
2017-11-05 20:17:16,813	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=158, SUCC=464, FAIL=0, 14/(5s)]; total_seconds=315
2017-11-05 20:17:21,828	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=155, SUCC=467, FAIL=0, 3/(5s)]; total_seconds=320
2017-11-05 20:17:26,843	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=145, SUCC=477, FAIL=0, 10/(5s)]; total_seconds=325
2017-11-05 20:17:31,859	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=140, SUCC=482, FAIL=0, 5/(5s)]; total_seconds=330
2017-11-05 20:17:36,864	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=139, SUCC=483, FAIL=0, 1/(5s)]; total_seconds=335
2017-11-05 20:17:41,880	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=135, SUCC=487, FAIL=0, 4/(5s)]; total_seconds=340
2017-11-05 20:17:46,895	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=118, SUCC=504, FAIL=0, 17/(5s)]; total_seconds=345
2017-11-05 20:17:51,902	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=106, SUCC=516, FAIL=0, 12/(5s)]; total_seconds=350
2017-11-05 20:17:56,917	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=90, SUCC=532, FAIL=0, 16/(5s)]; total_seconds=355
2017-11-05 20:18:01,933	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=86, SUCC=536, FAIL=0, 4/(5s)]; total_seconds=360
2017-11-05 20:18:06,949	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=84, SUCC=538, FAIL=0, 2/(5s)]; total_seconds=365
2017-11-05 20:18:11,960	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=83, SUCC=539, FAIL=0, 1/(5s)]; total_seconds=370
2017-11-05 20:18:16,975	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=74, SUCC=548, FAIL=0, 9/(5s)]; total_seconds=375
2017-11-05 20:18:21,983	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=59, SUCC=563, FAIL=0, 15/(5s)]; total_seconds=380
2017-11-05 20:18:26,999	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=52, SUCC=570, FAIL=0, 7/(5s)]; total_seconds=385
2017-11-05 20:18:32,014	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=43, SUCC=579, FAIL=0, 9/(5s)]; total_seconds=390
2017-11-05 20:18:37,030	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=34, SUCC=588, FAIL=0, 9/(5s)]; total_seconds=395
2017-11-05 20:18:42,041	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=28, SUCC=594, FAIL=0, 6/(5s)]; total_seconds=400
2017-11-05 20:18:47,053	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=27, SUCC=595, FAIL=0, 1/(5s)]; total_seconds=405
2017-11-05 20:18:52,063	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=27, SUCC=595, FAIL=0, 0/(5s)]; total_seconds=410
2017-11-05 20:18:57,079	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=22, SUCC=600, FAIL=0, 5/(5s)]; total_seconds=416
2017-11-05 20:19:02,095	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=16, SUCC=606, FAIL=0, 6/(5s)]; total_seconds=421
2017-11-05 20:19:07,110	WARNING	ThreadPool status: running_tasks=0; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=0, SUCC=623, FAIL=0, 17/(5s)]; total_seconds=426
2017-11-05 20:19:12,115	WARNING	ThreadPool status: running_tasks=0; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=0, SUCC=623, FAIL=0, 0/(5s)]; total_seconds=431
2017-11-05 20:19:12,115	WARNING	ThreadPool end: fetcher_num=10, is_over=True, fetch:[SUCC=21, FAIL=0]; parse[SUCC=21, FAIL=0]; save:[SUCC=623, FAIL=0]
2017-11-05 20:19:12,115	WARNING	ThreadPool start: urls_count=1, fetcher_num=10, is_over=True
2017-11-05 20:19:17,129	WARNING	ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=6, FAIL=0, 6/(5s)]; parse:[NOT=0, SUCC=6, FAIL=0, 6/(5s)]; save:[NOT=147, SUCC=7, FAIL=0, 7/(5s)]; total_seconds=5
2017-11-05 20:19:22,131	WARNING	ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=11, FAIL=0, 5/(5s)]; parse:[NOT=0, SUCC=11, FAIL=0, 5/(5s)]; save:[NOT=291, SUCC=19, FAIL=0, 12/(5s)]; total_seconds=10
2017-11-05 20:19:27,143	WARNING	ThreadPool status: running_tasks=2; fetch:[NOT=0, SUCC=19, FAIL=0, 8/(5s)]; parse:[NOT=0, SUCC=19, FAIL=0, 8/(5s)]; save:[NOT=531, SUCC=28, FAIL=0, 9/(5s)]; total_seconds=15
2017-11-05 20:19:32,158	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 2/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 2/(5s)]; save:[NOT=591, SUCC=31, FAIL=0, 3/(5s)]; total_seconds=20
2017-11-05 20:19:37,174	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=574, SUCC=48, FAIL=0, 17/(5s)]; total_seconds=25
2017-11-05 20:19:42,189	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=568, SUCC=54, FAIL=0, 6/(5s)]; total_seconds=30
2017-11-05 20:19:47,203	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=564, SUCC=58, FAIL=0, 4/(5s)]; total_seconds=35
2017-11-05 20:19:52,205	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=561, SUCC=61, FAIL=0, 3/(5s)]; total_seconds=40
2017-11-05 20:19:57,221	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=560, SUCC=62, FAIL=0, 1/(5s)]; total_seconds=45
2017-11-05 20:20:02,236	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=558, SUCC=64, FAIL=0, 2/(5s)]; total_seconds=50
2017-11-05 20:20:07,252	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=554, SUCC=68, FAIL=0, 4/(5s)]; total_seconds=55
2017-11-05 20:20:12,267	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=550, SUCC=72, FAIL=0, 4/(5s)]; total_seconds=60
2017-11-05 20:20:17,281	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=545, SUCC=77, FAIL=0, 5/(5s)]; total_seconds=65
2017-11-05 20:20:22,292	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=517, SUCC=105, FAIL=0, 28/(5s)]; total_seconds=70
2017-11-05 20:20:27,298	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=502, SUCC=120, FAIL=0, 15/(5s)]; total_seconds=75
2017-11-05 20:20:32,313	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=491, SUCC=131, FAIL=0, 11/(5s)]; total_seconds=80
2017-11-05 20:20:37,329	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=478, SUCC=144, FAIL=0, 13/(5s)]; total_seconds=85
2017-11-05 20:20:42,344	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=469, SUCC=153, FAIL=0, 9/(5s)]; total_seconds=90
2017-11-05 20:20:47,360	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=455, SUCC=167, FAIL=0, 14/(5s)]; total_seconds=95
2017-11-05 20:20:52,362	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=450, SUCC=172, FAIL=0, 5/(5s)]; total_seconds=100
2017-11-05 20:20:57,372	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=437, SUCC=184, FAIL=1, 13/(5s)]; total_seconds=105
2017-11-05 20:21:02,388	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=436, SUCC=185, FAIL=1, 1/(5s)]; total_seconds=110
2017-11-05 20:21:07,403	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=423, SUCC=198, FAIL=1, 13/(5s)]; total_seconds=115
2017-11-05 20:21:12,419	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=412, SUCC=209, FAIL=1, 11/(5s)]; total_seconds=120
2017-11-05 20:21:17,433	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=398, SUCC=223, FAIL=1, 14/(5s)]; total_seconds=125
2017-11-05 20:21:22,446	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=388, SUCC=233, FAIL=1, 10/(5s)]; total_seconds=130
2017-11-05 20:21:27,462	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=383, SUCC=238, FAIL=1, 5/(5s)]; total_seconds=135
2017-11-05 20:21:32,477	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=375, SUCC=246, FAIL=1, 8/(5s)]; total_seconds=140
2017-11-05 20:21:37,493	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=359, SUCC=261, FAIL=2, 16/(5s)]; total_seconds=145
2017-11-05 20:21:42,508	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=352, SUCC=268, FAIL=2, 7/(5s)]; total_seconds=150
2017-11-05 20:21:47,524	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=351, SUCC=269, FAIL=2, 1/(5s)]; total_seconds=155
2017-11-05 20:21:52,530	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=347, SUCC=273, FAIL=2, 4/(5s)]; total_seconds=160
2017-11-05 20:21:57,536	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=336, SUCC=284, FAIL=2, 11/(5s)]; total_seconds=165
2017-11-05 20:22:02,550	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=334, SUCC=286, FAIL=2, 2/(5s)]; total_seconds=170
2017-11-05 20:22:07,566	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=334, SUCC=286, FAIL=2, 0/(5s)]; total_seconds=175
2017-11-05 20:22:12,568	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=331, SUCC=289, FAIL=2, 3/(5s)]; total_seconds=180
2017-11-05 20:22:17,584	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=330, SUCC=290, FAIL=2, 1/(5s)]; total_seconds=185
2017-11-05 20:22:22,585	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=323, SUCC=297, FAIL=2, 7/(5s)]; total_seconds=190
2017-11-05 20:22:27,600	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=309, SUCC=311, FAIL=2, 14/(5s)]; total_seconds=195
2017-11-05 20:22:32,606	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=307, SUCC=313, FAIL=2, 2/(5s)]; total_seconds=200
2017-11-05 20:22:37,621	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=288, SUCC=332, FAIL=2, 19/(5s)]; total_seconds=205
2017-11-05 20:22:42,636	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=279, SUCC=341, FAIL=2, 9/(5s)]; total_seconds=210
2017-11-05 20:22:47,638	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=273, SUCC=347, FAIL=2, 6/(5s)]; total_seconds=215
2017-11-05 20:22:52,651	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=262, SUCC=358, FAIL=2, 11/(5s)]; total_seconds=220
2017-11-05 20:22:57,661	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=258, SUCC=362, FAIL=2, 4/(5s)]; total_seconds=225
2017-11-05 20:23:02,676	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=251, SUCC=369, FAIL=2, 7/(5s)]; total_seconds=230
2017-11-05 20:23:07,692	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=245, SUCC=375, FAIL=2, 6/(5s)]; total_seconds=235
2017-11-05 20:23:12,707	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=238, SUCC=382, FAIL=2, 7/(5s)]; total_seconds=240
2017-11-05 20:23:17,723	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=235, SUCC=385, FAIL=2, 3/(5s)]; total_seconds=245
2017-11-05 20:23:22,739	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=234, SUCC=386, FAIL=2, 1/(5s)]; total_seconds=250
2017-11-05 20:23:27,740	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=232, SUCC=388, FAIL=2, 2/(5s)]; total_seconds=255
2017-11-05 20:23:32,745	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=229, SUCC=391, FAIL=2, 3/(5s)]; total_seconds=260
2017-11-05 20:23:37,760	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=223, SUCC=397, FAIL=2, 6/(5s)]; total_seconds=265
2017-11-05 20:23:42,776	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=219, SUCC=401, FAIL=2, 4/(5s)]; total_seconds=270
2017-11-05 20:23:47,791	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=217, SUCC=403, FAIL=2, 2/(5s)]; total_seconds=275
2017-11-05 20:23:52,807	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=211, SUCC=409, FAIL=2, 6/(5s)]; total_seconds=280
2017-11-05 20:23:57,820	WARNING	ThreadPool status: running_tasks=1; fetch:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; parse:[NOT=0, SUCC=21, FAIL=0, 0/(5s)]; save:[NOT=209, SUCC=411, FAIL=2, 2/(5s)]; total_seconds=285

Process finished with exit code 1

I stop it when I found it rerun twice

Ask

笑虎大神你好:
刚接触爬虫这个领域,看了一下你的源代码,觉得设计的很好,我这里有个小小的疑问,你看我理解的对不对。
我的疑问是fetcher,parser,saver多线程之间是怎样相互协调的。
我的理解是在你的ThreadThreadPool里面有_number_dict这个变量,这个变量是所有其他线程共享的,其实相当于semaphore的想法,每次更新都需要lock起来,比如当fetcher获取到了新的url,parser就可以去根据这个信号量的变化进行下一步的工作。
你看我这样理解对吗?
谢谢

文档

大神考虑写一份中文文档吗?读起来方便

豆瓣爬取遇到301和403

在运行test_demos.py的时候,会遇到301错误。在修改demos_doubanmovies的fetch方法,使得重定向被允许之后,还是会出现403。请问还有哪些设置可以修改,以成功爬取豆瓣的电影数据呢?

不太理解 set_start_url 中的 keys 与 deep 的作用

def set_start_url(self, url, keys=None, priority=0, deep=0)
url 为 fetch 的目标
priority 用于 priorityQueue,从 queue 取出的优先级

那么 keys, deep 的作用分别是什么?
deep 是 fetch url 的深度吗?如何与 self._max_deep = max_deep # default: 0, if -1, spider will not stop until all urls are fetched 一起起作用的?是如何实现 deep 的逻辑的?

keys
在 抓取豆瓣 中,通过 keys[0] 来区分抓取的是索引页面还是电影详情页

phantomJS 的关闭问题

你在 dangdang 中提到 如果每一次爬取都反复开关 driver 开销太大,我也是这样认为的,但没有实际测试过。
如果要复用 driver,那么怎么保证所有打开的driver都能关闭呢?

抓取电影数量

用demos_doubanmovies一次一个标签最多可以抓取三百部电影,怎么才能抓取更多的电影呢?谢谢!

Dockerfile

ln -sf /usr/local/bin/python3 /usr/bin/python
ln -sf /usr/bin/python2.6 /usr/bin/python2

/usr/bin/yum
sed -i 's/usr/bin/python/usr/bin/python2/g' /usr/bin/yum

附带文档

你好!看之前的issues里面有提到说写了详细的文档,请问文档在哪里呢?谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.