dormymo / spiderkeeper Goto Github PK

View Code? Open in Web Editor NEW

2.7K 2.7K 504.0 3.71 MB

admin ui for scrapy/open source scrapinghub

Home Page: http://sk.7mdm.com:5000/

CSS 0.19% JavaScript 15.35% HTML 33.11% Python 51.35%

dashboard scrapy scrapy-ui scrapyd scrapyd-dashboard scrapyd-ui spider

spiderkeeper's People

Contributors

Stargazers

Watchers

Forkers

drat wings-xue thankslife liwei123o0 rock999 runningabcd qiyeboy awesome-archive zhanghuijun0 sangecoder wuze leeyis yupengyan fashtimedotcom wu1005690803 ousay intelchen qitiandashengsunwukong youngcraft lyplcr changjiasheng dingjb duanbj jasonlc howardyan93 nbabook lux182 lonelygo taozywu ascode hunny-lh wenkikro pan-xiong sdlearn hi-trust liverpoolpjy xzj675 valleyq wuwx netconstructor puppycodes templeblock jdc08161063 didiwuliu little1tow tomzhang fanlifei lionelgeo lishuo9527 slideclick professording hhy5277 nullq replive kakadong12138 gearchen liyuntao wangbaojin surfingit kopzhangwx chagge expansion jaredmpeterson sjoerdapp benjamesbabala maniacs-oss spark-lin feinoah niejn ninetailsbear9 niko150 kk9599 magic-coder unkvuzutop sunjieee devenlu guoyuehappy xcbat igizm0 marcolin hiddenstrawberry nilportugues handsomeslow mozhuowen jackycat lawlite20 generalwei hz-heng yuiking hy9125 qaseven roottan enjoywt kuzovkov strogo daihongchao iamlile linkyfish x893675 lvmin216

spiderkeeper's Issues

SpiderKeeper支持高可用部署环境吗

例如我的爬虫部署了多台服务器，都启用了scrapyd。怎么在SpiderKeeper里配置这些服务器呢？
因为我看SpiderKeeper的启动是用的 spiderkeeper --server=http://localhost:6800

spiderkeeper停止爬虫后，scrapyd上面的爬虫任务并没有停止

安装依赖报错

zeddeMacBook-Air:~ zed$ npm install -g bower
/usr/local/bin/bower -> /usr/local/lib/node_modules/bower/bin/bower
/usr/local/lib
└── [email protected] 

zeddeMacBook-Air:~ zed$ npm install -g grunt
/usr/local/bin/grunt -> /usr/local/lib/node_modules/grunt/bin/grunt
/usr/local/lib
└── [email protected] 

zeddeMacBook-Air:~ zed$ bower install
bower ENOENT        No bower.json present
zeddeMacBook-Air:~ zed$ npm install
npm WARN [email protected] No repository field.
zeddeMacBook-Air:~ zed$ grunt build
grunt-cli: The grunt command line interface (v1.2.0)

Fatal error: Unable to find local grunt.

If you're seeing this message, grunt hasn't been installed locally to
your project. For more information about installing and configuring grunt,
please see the Getting Started guide:

http://gruntjs.com/getting-started
zeddeMacBook-Air:~ zed$ grunt serve
grunt-cli: The grunt command line interface (v1.2.0)

Fatal error: Unable to find local grunt.

If you're seeing this message, grunt hasn't been installed locally to
your project. For more information about installing and configuring grunt,
please see the Getting Started guide:

http://gruntjs.com/getting-started
zeddeMacBook-Air:~ zed$

404 error on /project//job/periodic url with apache2 server

I deployed spiderkeeper using apache and got 404 error when i opened /project//job/periodic url. It is url when you don't have any project and tries to go to other urls (periodic jobs in my case). It happens because you have these code:

@app.before_request
def intercept_no_project():
    if request.path.find('/project//') > -1:
        flash("create project first")
        return redirect("/project/manage", code=302)

But apache2 transform url /project//job/periodic to /project/job/periodic so redirect is not working. I think there should be another way to determine redirect instead of checking for double slash.

Help - how to debug spiderkeeper code?

I downloaded the code from main branch and try to debug in pycharm. however it seems always run from the pip installed spiderkeeper in dist package.
How can i debug the downloaded code? Thanks,

功能与意见反馈，报bug可以另开issue

都可以在这里交流，我会及时回复的~
也欢迎加入QQ群讨论：389688974

Nginx uwsgi supervisor

Hi @DormyMo

First of all, thanks for the great tool.
The thing is that I tried to run SpiderKeeper using uwsgi and nginx but there is a small problem.
It loads correctly, but any click results in

404 Not Found

The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

And there are no errors in the logs - therefore not posted here.

Here is the wsgi.py. I suppose the problem is here.

from SpiderKeeper import config
from SpiderKeeper.app import app, initialize


def create_app(config_object):
    initialize()
    return app


application = create_app(config_object=config)

Here is the uwsgi.ini

[uwsgi]
socket = :7000
master = true
processes = 4
threads = 8
enable-threads = true

plugin = python3
chdir = /srv/www/SpiderKeeper_webapp/SpiderKeeper
virtualenv = /srv/www/virtualenvs/SpiderKeeper
module = wsgi
callable = application

And nginx config file is the following.

# HTTP server to redirect all 80 traffic to SSL/HTTPS
server {
  listen 80;
  server_name spiderkeeper.somedomain.com;
  access_log  /srv/www/logs/nginx/SpiderKeeper-access.log;
  error_log   /srv/www/logs/nginx/SpiderKeeper-error.log info;

  # Tell all requests to port 80 to be 302 redirected to HTTPS
  return 302 https://$host$request_uri;
}

server {
  listen 443;
  ssl on;

  server_name spiderkeeper.somedomain.com;
  access_log  /srv/www/logs/nginx/SpiderKeeper-access.log;
  error_log   /srv/www/logs/nginx/SpiderKeeper-error.log info;

  location /static/  {
        alias /srv/www/SpiderKeeper_webapp/SpiderKeeper/SpiderKeeper/app/static/;   
  }

  ssl_certificate /etc/nginx/ssl/cert.crt;
  ssl_certificate_key /etc/nginx/ssl/cert.key;
  ssl_session_timeout 5m;
  ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
  ssl_ciphers "HIGH:!aNULL:!MD5 or HIGH:!aNULL:!MD5:!3DES";
  ssl_prefer_server_ciphers on;

  location / {
    try_files   $uri @yourapplication;
  }

  location @yourapplication {
    include     uwsgi_params;
    uwsgi_pass  127.0.0.1:7000;
  }
}

It would be great if you can help me with this issue. Thanks in advance

周期性任务enable逻辑

我详细研究了源码，发现 JobInstance models 的 enabled 应该 default = -1 表示默认不开启，然后其他地方没有问题。

哦，还有 job_periodic 模板的

                {% if job_instance.enabled %}

最好改成

                {% if job_instance.enabled == 0 %}

这个前端展示你决定了，不改动加个提示也可以的。就说明现在是enabled了。

任务日志问题

任务数量操作6个日志无法显示

deleted

scheduler doesnot work without any prompts

please see the log below. Only one job was scheduled in 2017-07-12.while I deploy about 40 Job instances which should be executed every hour?Would you please help me figure it out?

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:73]:
[load_spider_job][project:1][spider_name:keywordSpider][job_instance_id:38][job_id:spider_job_38:1499306532]

2017-07-11 21:30:17,110 - SpiderKeeper.app - INFO - [load_spider_job][project:1][spider_name:keywordSpider][job_instance_id:38][job_id:spider_job_38:1499306532]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:73]:
[load_spider_job][project:4][spider_name:cjw][job_instance_id:39][job_id:spider_job_39:1499751120]

**2017-07-11 21:32:17,023 - SpiderKeeper.app - INFO - [load_spider_job][project:4][spider_name:cjw][job_instance_id:39][job_id:spider_job_39:1499751120]
No handlers could be found for logger "apscheduler.executors.default"

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:4][spider_name:cjw][job_instance_id:39]

2017-07-12 00:30:00,052 - SpiderKeeper.app - INFO - [run_spider_job][project:4][spider_name:cjw][job_instance_id:39]**

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:4][spider_name:cjw][job_instance_id:39]

2017-07-13 02:30:00,074 - SpiderKeeper.app - INFO - [run_spider_job][project:4][spider_name:cjw][job_instance_id:39]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:10]

2017-07-13 04:15:00,161 - SpiderKeeper.app - INFO - [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:10]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:15]

2017-07-13 04:15:00,594 - SpiderKeeper.app - INFO - [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:15]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:17]

2017-07-13 04:15:00,643 - SpiderKeeper.app - INFO - [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:17]

Scrapy Realtime Execution

I have a brief question: I have been looking for some type of reliable method which would allow specific scrapy spiders to be assigned to a "realtime" style method of execution. More specifically: I often run into situations in which I find myself wanting to create a scrapy spider and calling it on demand (along with passing variables if needed) with the simple purpose of retrieving data in realtime for insertion into a website.

It seems that this "realtime" capability is not addressed within your admin approach (as well as all other scrapy admin approaches I have reviewed). Instead it seems that all such admin related management capabilities are always targeted at managing scrapy extractions as "bulk" jobs/runs.

I would great appreciate if someone could address this topic as well as the feasibility of extending this script to deal with such functionality in a reliable and effective way.

Any comments or known issues/limitations others might have experienced in delivering such a solution would be very much appreciated.

Chris H.

In console, cannot run spiderkeeper.

I wanna run the main branch code in console, git clone and
in console, cannot run. Here's the screen log,

[root@zcyq-collect-02 SpiderKeeper]# pwd
/root/SpiderKeeper/SpiderKeeper
[root@zcyq-collect-02 SpiderKeeper]# ll
总用量 40
drwx------. 8 root root 4096 5月 16 21:42 app
-rw-------. 1 root root 969 5月 16 21:42 config.py
-rw-------. 1 root root 802 5月 16 21:42 config.pyc
-rw-------. 1 root root 46 5月 16 21:42 init.py
-rw-------. 1 root root 205 5月 16 21:42 init.pyc
-rw-------. 1 root root 2961 5月 16 21:42 run.py
-rw-------. 1 root root 14336 5月 16 21:42 SpiderKeeper.db
[root@zcyq-collect-02 SpiderKeeper]# nano run.py
[root@zcyq-collect-02 SpiderKeeper]# python ./run.py
Traceback (most recent call last):
File "./run.py", line 6, in
from SpiderKeeper.app import app, initialize
ImportError: No module named SpiderKeeper.app
[root@zcyq-collect-02 SpiderKeeper]#

Whether multiple projects are supported?

In Project Manage page, only can add one project. If can add more than one project?

And, i know, scrapyd can add several scrapy project.

能否开放qq群？

可否建立一个qq群，大家一起讨论，优化。

Cannot run the job

As subj.

{
  "code": 500, 
  "data": null, 
  "msg": "(sqlite3.IntegrityError) NOT NULL constraint failed: sk_job_execution.service_job_execution_id [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, 3, '2017-04-21 21:42:24.010000', None, None, 0, 'http://localhost:6800')]", 
  "success": false
}

这个地方是需要你手动兼容吗，还是说我版本太低了。

No module named loader.processors

2017-09-06 02:43:24+0000 [HTTPChannel,223,127.0.0.1] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1730, in allContentReceived
            req.requestReceived(command, path, version)
          File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 826, in requestReceived
            self.process()
          File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 189, in process
            self.render(resrc)
          File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 238, in render
            body = resrc.render(self)
        --- <exception caught here> ---
          File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 18, in render
            return JsonResource.render(self, txrequest)
          File "/usr/lib/pymodules/python2.7/scrapy/utils/txweb.py", line 10, in render
            r = resource.Resource.render(self, txrequest)
          File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 250, in render
            return m(request)
          File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 88, in render_GET
            spiders = get_spider_list(project, runner=self.root.runner)
          File "/usr/lib/pymodules/python2.7/scrapyd/utils.py", line 65, in get_spider_list
            raise RuntimeError(msg.splitlines()[-1])
        exceptions.RuntimeError: ImportError: No module named loader.processors

这个是什么原因引起的？

scrapyd目前正式版还不支持python3，那如果项目用的是python3写的，会报错不

爬虫重新部署，没有覆盖原来旧爬虫

由于爬虫更新，所以在原项目重新部署，但是发现爬虫并没有更新

如何在多台服务器上进行爬虫的部署和调度呢？

如何在多台服务器上进行爬虫的部署和调度呢？在文档里没有找到。。

如果spiderkeeper和scrpayd不在同一台机子上怎么连接呢？

spiderkeeper是一台机子
scrapyd 有可能是台机子使用了nginx配置，
spiderkeeper怎么使用用户名和密码连接scrapyd呢？

对这个项目感兴趣的请添加群：494760716

spiderkeeper最多能监控多少台服务器的多少个爬虫？在哪里可以修改？

我的将50个分布式爬虫上传部署在10台服务器上，提示超过了最大监控数，该如何设置呢？谢谢解答！

scrapyd的报错提示

whoami@blackman:~/spider$ scrapyd
2017-04-19T09:59:46+0800 [-] Loading /home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/scrapyd/txapp.py...
2017-04-19T09:59:46+0800 [-] Scrapyd web console available at http://0.0.0.0:6800/
2017-04-19T09:59:46+0800 [-] Loaded.
2017-04-19T09:59:46+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 17.1.0 (/home/whoami/.pyenv/versions/3.5.1/bin/python3.5 3.5.1) starting up.
2017-04-19T09:59:46+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor.
2017-04-19T09:59:46+0800 [-] Site starting on 6800
2017-04-19T09:59:46+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site object at 0x7f29ee254f98>
2017-04-19T09:59:46+0800 [Launcher] Scrapyd 1.2.0a1 started: max_proc=8, runner='scrapyd.runner'
2017-04-19T09:59:48+0800 [_GenericHTTPChannelProtocol,0,127.0.0.1] Unhandled Error
        Traceback (most recent call last):
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/http.py", line 1906, in allContentReceived
            req.requestReceived(command, path, version)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/http.py", line 771, in requestReceived
            self.process()
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/server.py", line 190, in process
            self.render(resrc)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/server.py", line 241, in render
            body = resrc.render(self)
        --- <exception caught here> ---
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/scrapyd/webservice.py", line 21, in render
            return JsonResource.render(self, txrequest).encode('utf-8')
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/scrapyd/utils.py", line 20, in render
            r = resource.Resource.render(self, txrequest)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/resource.py", line 250, in render
            return m(request)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/scrapyd/webservice.py", line 112, in render_GET
            spiders = get_spider_list(project, runner=self.root.runner, version=version)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/scrapyd/utils.py", line 137, in get_spider_list
            raise RuntimeError(msg.encode('unicode_escape') if six.PY2 else msg)
        builtins.RuntimeError: Scrapy 1.3.3 - no active project

        Unknown command: list

        Use "scrapy" to see available commands


2017-04-19T09:59:48+0800 [twisted.python.log#info] "127.0.0.1" - - [19/Apr/2017:01:59:47 +0000] "GET /listspiders.json?project=%E9%BB%98%E8%AE%A4 HTTP/1.1" 200 163 "-" "python-requests/2.13.0"
2017-04-19T09:59:49+0800 [_GenericHTTPChannelProtocol,1,127.0.0.1] Unhandled Error
        Traceback (most recent call last):
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/http.py", line 1906, in allContentReceived
            req.requestReceived(command, path, version)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/http.py", line 771, in requestReceived
            self.process()
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/server.py", line 190, in process
            self.render(resrc)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/server.py", line 241, in render
            body = resrc.render(self)
        --- <exception caught here> ---
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/scrapyd/webservice.py", line 21, in render
            return JsonResource.render(self, txrequest).encode('utf-8')
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/scrapyd/utils.py", line 20, in render
            r = resource.Resource.render(self, txrequest)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/Twisted-17.1.0-py3.5-linux-x86_64.egg/twisted/web/resource.py", line 250, in render
            return m(request)
          File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/scrapyd/webservice.py", line 123, in render_GET
            queue = self.root.poller.queues[project]
        builtins.KeyError: 'defaut'

2017-04-19T09:59:49+0800 [twisted.python.log#info] "127.0.0.1" - - [19/Apr/2017:01:59:48 +0000] "GET /listjobs.json?project=defaut HTTP/1.1" 200 68 "-" "python-requests/2.13.0"
^C2017-04-19T09:59:49+0800 [-] Received SIGINT, shutting down.
2017-04-19T09:59:49+0800 [-] (TCP Port 6800 Closed)
2017-04-19T09:59:49+0800 [twisted.web.server.Site#info] Stopping factory <twisted.web.server.Site object at 0x7f29ee254f98>
2017-04-19T09:59:49+0800 [-] Main loop terminated.
2017-04-19T09:59:49+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] Server Shut Down.

OS:ubuntu 16.04 64bit
看不懂这个报错，刚装好的scrapyd

可以建一个微信群吗？

QQ 群挺好的不过如果有微信群那就更完美了 .

Cheers

你好呀~按照要求已配置好相关地址了

按照配置已配置好,目前在spiders选项页面无相关显示~不清楚是什么原因,请指教谢谢拉

爬虫运行日志不支持中文显示

爬虫日志里面UTF-8编码的中文显示乱码

Typo in line 323 of DaemonService.py

It should be except IOError as e instead of except IOError, e in line 323 of DaemonService.py

dashboard页面建议

我在使用的过程中遇到一个问题：
有的时候，我需要修改已经部署的爬虫代码，当我重新上传新的egg文件覆盖原project时，SpiderKeeper会在数秒内显示无法显示最新的project爬虫，我猜是在更新:）

如下图：

或许可以增加一点提示，显示该project最新的部署时间？

Periodic jobs set onetime but running twice

for example, when I set spiders runing at 18:30 per day,It will running at 18:30:05, 18:30:10 twice

Automatically deploy egg

Hi, first, thanks for building SpiderKeeper - it's really easy to use.

We have some scrapers that utilize SK and we want to automate our deployments with a continuous deployment script. The only manual part of this is uploading the egg file to the UI. Is it possible to deploy the egg file some other way?

无法提交部署以及删除项目

控制台一直报错，
Execution of job "sync_job_execution_status_job (trigger: interval[0:00:05], next run at: 2017-10-26 16:51:06 CST)" skipped: maximum number of running instances reached (1)
重启spiderkeeper以及scrapyd服务还是这个情况

sqlalchemy.exc.OperationalError

whoami@blackman:~/spider$ spiderkeeper --server=http://localhost:6800
/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/SpiderKeeper/app/__init__.py:9: ExtDeprecationWarning: Importing flask.ext.restful is deprecated, use flask_restful instead.
  from flask.ext.restful import Api
/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/SpiderKeeper/app/__init__.py:10: ExtDeprecationWarning: Importing flask.ext.restful_swagger is deprecated, use flask_restful_swagger instead.
  from flask.ext.restful_swagger import swagger
/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/flask_sqlalchemy/__init__.py:839: FSADeprecationWarning: SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future.  Set it to True or False to suppress this warning.
  'SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and '
/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/flask_sqlalchemy/__init__.py:839: FSADeprecationWarning: SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future.  Set it to True or False to suppress this warning.
  'SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and '
--------------------------------------------------------------------------------
INFO in run [/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/SpiderKeeper/run.py:19]:
SpiderKeeper startd on 0.0.0.0:5000 with scrapyd servers:http://localhost:6800
--------------------------------------------------------------------------------
2017-04-17 18:25:01,071 - SpiderKeeper.app - INFO - SpiderKeeper startd on 0.0.0.0:5000 with scrapyd servers:http://localhost:6800
Job "sync_job_execution_status_job (trigger: interval[0:00:03], next run at: 2017-04-17 18:25:07 CST)" raised an exception
Traceback (most recent call last):
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: no such table: sk_project

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/SpiderKeeper/app/schedulers/common.py", line 14, in sync_job_execution_status_job
    for project in Project.query.all():
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2703, in all
    return list(self)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2855, in __iter__
    return self._execute_and_instances(context)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2878, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 945, in execute
    return meth(self, multiparams, params)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
    context)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception
    exc_info
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: sk_project [SQL: 'SELECT sk_project.id AS sk_project_id, sk_project.date_created AS sk_project_date_created, sk_project.date_modified AS sk_project_date_modified, sk_project.project_name AS sk_project_project_name \nFROM sk_project']
Job "sync_job_execution_status_job (trigger: interval[0:00:03], next run at: 2017-04-17 18:25:10 CST)" raised an exception
Traceback (most recent call last):
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: no such table: sk_project

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/SpiderKeeper/app/schedulers/common.py", line 14, in sync_job_execution_status_job
    for project in Project.query.all():
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2703, in all
    return list(self)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2855, in __iter__
    return self._execute_and_instances(context)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2878, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 945, in execute
    return meth(self, multiparams, params)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
    context)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception
    exc_info
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/home/whoami/.pyenv/versions/3.5.1/lib/python3.5/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: sk_project [SQL: 'SELECT sk_project.id AS sk_project_id, sk_project.date_created AS sk_project_date_created, sk_project.date_modified AS sk_project_date_modified, sk_project.project_name AS sk_project_project_name \nFROM sk_project']
^Cwhoami@blackman:~/spider$

好像是建表错误？

schedule调度失败问题

schedule触发以后，任务无法正常执行，截取sk端与scrapyd之间的log如下，请问可能是什么原因？

SK 端log：

2017-07-06 19:00:00,037 - SpiderKeeper.app - ERROR - [run_spider_job] (sqlite3.IntegrityError) sk_job_execution.service_job_execution_id may not be NULL [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_name, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, u'\u78a7\u6842\u56ed\u8206\u60c5\u5173\u952e\u8bcd1', 3, '2017-07-06 19:00:00.033838', None, None, 0, 'http://127.0.0.1:6800')]

ERROR in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:41]:
[run_spider_job] (sqlite3.IntegrityError) sk_job_execution.service_job_execution_id may not be NULL [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_name, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, u'\u78a7\u6842\u56ed\u8206\u60c5\u5173\u952e\u8bcd2', 2, '2017-07-06 19:00:00.035478', None, None, 0, 'http://127.0.0.1:6800')]

2017-07-06 19:00:00,037 - SpiderKeeper.app - ERROR - [run_spider_job] (sqlite3.IntegrityError) sk_job_execution.service_job_execution_id may not be NULL [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_name, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, u'\u78a7\u6842\u56ed\u8206\u60c5\u5173\u952e\u8bcd2', 2, '2017-07-06 19:00:00.035478', None, None, 0, 'http://127.0.0.1:6800')]

ERROR in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:41]:
[run_spider_job] (sqlite3.IntegrityError) sk_job_execution.service_job_execution_id may not be NULL [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_name, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, u'\u78a7\u6842\u56ed\u653f\u7b56\u4eba\u7269\u5173\u952e\u8bcd', 4, '2017-07-06 19:00:00.034746', None, None, 0, 'http://127.0.0.1:6800')]

2017-07-06 19:00:00,039 - SpiderKeeper.app - ERROR - [run_spider_job] (sqlite3.IntegrityError) sk_job_execution.service_job_execution_id may not be NULL [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_name, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, u'\u78a7\u6842\u56ed\u653f\u7b56\u4eba\u7269\u5173\u952e\u8bcd', 4, '2017-07-06 19:00:00.034746', None, None, 0, 'http://127.0.0.1:6800')]

ERROR in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:41]:
[run_spider_job] (sqlite3.IntegrityError) sk_job_execution.service_job_execution_id may not be NULL [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_name, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, u'\u78a7\u6842\u56ed\u7ade\u4e89\u5bf9\u624b\u5173\u952e\u8bcd', 5, '2017-07-06 19:00:00.039962', None, None, 0, 'http://127.0.0.1:6800')]

2017-07-06 19:00:00,040 - SpiderKeeper.app - ERROR - [run_spider_job] (sqlite3.IntegrityError) sk_job_execution.service_job_execution_id may not be NULL [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_name, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, u'\u78a7\u6842\u56ed\u7ade\u4e89\u5bf9\u624b\u5173\u952e\u8bcd', 5, '2017-07-06 19:00:00.039962', None, None, 0, 'http://127.0.0.1:6800')]

ERROR in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:41]:
[run_spider_job] This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.IntegrityError) sk_job_execution.service_job_execution_id may not be NULL [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_name, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, None, u'\u78a7\u6842\u56ed\u8206\u60c5\u5173\u952e\u8bcd2', 2, '2017-07-06 18:30:00.026636', None, None, 0, 'http://127.0.0.1:6800')]

Scrapyd端log:
2017-07-06T19:00:00+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:10:59:59 +0000] "POST /schedule.json HTTP/1.1" 200 67 "-" "python-requests/2.13.0"
2017-07-06T19:00:00+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:10:59:59 +0000] "POST /schedule.json HTTP/1.1" 200 67 "-" "python-requests/2.13.0"
2017-07-06T19:00:00+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:10:59:59 +0000] "POST /schedule.json HTTP/1.1" 200 67 "-" "python-requests/2.13.0"
2017-07-06T19:00:00+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:10:59:59 +0000] "POST /schedule.json HTTP/1.1" 200 67 "-" "python-requests/2.13.0"
2017-07-06T19:00:00+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:11:00:00 +0000] "GET /listjobs.json?project=secrawler HTTP/1.1" 200 351 "-" "python-requests/2.13.0"
2017-07-06T19:00:00+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:11:00:00 +0000] "GET /listjobs.json?project=news4 HTTP/1.1" 200 95 "-" "python-requests/2.13.0"
2017-07-06T19:00:00+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:11:00:00 +0000] "GET /listjobs.json?project=news_all HTTP/1.1" 200 331 "-" "python-requests/2.13.0"
2017-07-06T19:00:01+0800 [Launcher,12876/stdout] Unable to format event {'log_namespace': u'log_legacy', 'log_time': 1499338801.834994, 'log_system': 'Launcher,12876/stdout', 'log_level': <LogLevel=info>, 'system': 'Launcher,12876/stdout', 'time': 1499338801.834994, 'log_text': 'ng.my&pn=0&cl=2&ct=0&tn=news&rn=20&ie=utf-8&bt=0&et=0\ntotal page:1\nhttp://www.baidu.com/s?rtt=2&wd=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&pn=0&tn=baiduwb&ie=utf-8\ntotal page:1\nhttp://tieba.baidu.com/f/search/res?isnew=1&kw=&qw=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&rn=10&un=&only_thread=1&sm=1&sd=&ed=&pn=0&ie=utf-8\ntotal page:1\nhttps://www.sogou.com/web?query=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&page=0&ie=utf8\nExtracting date from http://www.so.com/link?m=ajUS%2F4k3o89yP3cZN7Qfdzcvo4ZyKqq9GhEdc2wQ05hqgs4VcvOzSjW80ezfabEVSBca%2BTx064aSyGK%2B9JGJ5CtU0d1gxe99TOXJDub1muxcf4vKj0mBdQgVVfJ2mdTFr%2B3bQ%2Bd9OItJ%2BR1bV%2F2NAdC4ldkXeOqbS%2FUoeX%2Fp%2BX1zW83JiV8VsqPADBA1EqkZP4OpxpMXNFDQMR4Rz0UNgkFEaOIiK0YgzEtk5u%2FkOncII%2Fr6p7s%2FuYEZvcHJwayp5IY9Kh8Qyoq95GWSp3nksMXuXNc41048as6xTFBsZb7B0qjkrJ%2Bj0CXIWlr7a5uTdq95vTMsGl%2Ft6bX0xd%2F3NSZcJisUpj3eDAH9KRw41MbA%3D\ntotal page:1\nhttp://weixin.sogou.com/weixin?query=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&type=2&page=0\ntotal page:1\nhttps://www.so.com/s?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&pn=0\ntotal page:1\nhttp://news.so.com/ns?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&pn=0&tn=news&rank=pdate&j=0&src=page\ntotal page:1\nhttp://www.bing.com/search?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&first=0\ntotal page:1\nhttp://www.baidu.com/s?wd=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0&ie=utf-8\ntotal page:1\nhttp://news.baidu.com/ns?word=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0&cl=2&ct=0&tn=news&rn=20&ie=utf-8&bt=0&et=0\ntotal page:1\nhttp://www.baidu.com/s?rtt=2&wd=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0&tn=baiduwb&ie=utf-8\nExtracting date from http://stock.hexun.com/2015-11-10/180470145.html\ntotal page:1\nhttp://tieba.baidu.com/f/search/res?isnew=1&kw=&qw=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&rn=10&un=&only_thread=1&sm=1&sd=&ed=&pn=0&ie=utf-8\ntotal page:1\nhttps://www.sogou.com/web?query=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&page=0&ie=utf8\ntotal page:1\nhttp://weixin.sogou.com/weixin?query=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&type=2&page=0\ntotal page:1\nhttps://www.so.com/s?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0\ntotal page:1\nhttp://news.so.com/ns?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0&tn=news&rank=pdate&j=0&src=page\ntotal page:1\nhttp://www.bing.com/search?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&first=0\nExtracting date from http://www.so.com/link?m=aC8wY9hcupuXdD2dJrJWQS5QUKxCBBWl3pYbjI9tr6QzLkj4P7Uua%2B4t8oZttb70g0H0vel5suJc1W8Q04Xz5JFYYpX3mKPJlWrc2TqR32wAbVbWP2oIq0v%2FkXhC7MHDe1R%2FFOPZh%2BexjAILT7T4ZjikI1PrydGyCKjSdO5i%2Fv4l0uHpi%2BoFuBprWjgE%3D\nExtracting date from http://www.so.com/link?m=avF3cmCdtrLPkJjd3BA0ujFh2G9A1wnp3wXfLEJjy%2FvKaGKU%2Bt1zM5K0FdL4d3eMyKQRa1ld4VcPX843Sc7jkuA4gD4pJou%2BnzobE9ACcw6c3Ct0g3%2B0exg8vl7t71GwX9e15cOMlNW5FgyfSoSGYJmIgNAD4SotTGNUIcT4MIn7uutq8w8q7uPhi%2FUWF4hK0XDNKx8SRxNE%3D\nExtracting date from http://www.so.com/link?m=aGwhsOV6kskLT3OHaAE7GUzMQqhkXNtXFTRKPCVwHt7qItcT%2FBFhxz29hDclC%2FbVzpjpLYHF6QdDdK8NPhyFFvGFSIZAqorvmPGdbLAw6gpfJ37wZosdzuwE0FpTdCehR%2BRoYvof7kD%2F%2Bgueitf%2F%2BFgO2a0ooMl%2F%2BxmMKVyYtEcgQdWeC347xErnRPF8N3xmEA4%2BpTPH0k9s%3D\nExtracting date from http://www.so.com/link?m=ap4wN3YXVA8glj2bLWcxyOHhGlLhpD7%2FR1vLPUx%2FfoeReZusXoR4JcOHmRWtEuf0nhIGEyFe14QvAgpKKqcfQYgkjNziN7BjW71kbKdn6tnjkY8QDgRK5M9IK%2BeGTOhiKObLWtlTMg7KtSRZ%2FrF0%2FOfj4jGU%2FcRFvTeClmVjXvyYE3DeeM0s7ej3oNrm%2BORTc59kC4tJ%2B86Q%3D\nExtracting date from http://www.so.com/link?m=a3ck%2B2lrBxRwHPbPJ6w1gQERV62ONUWAPF741rDWcz%2Fxo1CZNHtV%2BYrPkw3bGAHSx5PlKWVLrr9i69bQP1SVnWrzX52IjmJtoRPLrBGHCfJWew%2FEmVb12MZuCyjc9gboONohXEYP%2BSAKOC5SDg%2BOnJVNYz1drLDe%2BtIgrBS0Rea8J%2FCk%2FoJJK%2FvK5VGLos4sP7Ovk0448YzE%3D\nExtracting date from http://www.so.com/link?m=aGEEZtMmklGW%2F5fJM95Bl8VkIxQrCV2tr0tVwPXVYEv2SK7u%2BnyiGNIrtA%2FKjid9ULAU%2FgKrqyVy1Qtf4b8pJDMe4MrQnfQG2jjOdGSDhGLlUV%2FeQQHGEzUaTUQ%2FpvTGejb1ON0JdUpfinoIKNrcnV8VejMLUcS20p4tX2Gd3LC1vGL6q3HcQ1alW6QR1DAiTFFNWBvS0jj0%3D\nExtracting date from http://www.so.com/link?m=apC%2BdaVT2B2Hhi5j4AVL5CiIpr8WzPrdGVd3uCubJ0LdHKMrGwpCaie%2B4TFuY1gqePImOjRXUwbWQcBOt4puYIs1J5RnNLge0SxADFHTTODEKeclCxNyJHbWfEA89FHaI4WGW%2F3ksNWvzamEk5qmEEb2qzDzk0EJUUyGoDsALJhR%2FxP3LhwXSK44iUVzhrLjQ', 'log_format': u'{log_text}', 'message': ('ng.my&pn=0&cl=2&ct=0&tn=news&rn=20&ie=utf-8&bt=0&et=0\ntotal page:1\nhttp://www.baidu.com/s?rtt=2&wd=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&pn=0&tn=baiduwb&ie=utf-8\ntotal page:1\nhttp://tieba.baidu.com/f/search/res?isnew=1&kw=&qw=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&rn=10&un=&only_thread=1&sm=1&sd=&ed=&pn=0&ie=utf-8\ntotal page:1\nhttps://www.sogou.com/web?query=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&page=0&ie=utf8\nExtracting date from http://www.so.com/link?m=ajUS%2F4k3o89yP3cZN7Qfdzcvo4ZyKqq9GhEdc2wQ05hqgs4VcvOzSjW80ezfabEVSBca%2BTx064aSyGK%2B9JGJ5CtU0d1gxe99TOXJDub1muxcf4vKj0mBdQgVVfJ2mdTFr%2B3bQ%2Bd9OItJ%2BR1bV%2F2NAdC4ldkXeOqbS%2FUoeX%2Fp%2BX1zW83JiV8VsqPADBA1EqkZP4OpxpMXNFDQMR4Rz0UNgkFEaOIiK0YgzEtk5u%2FkOncII%2Fr6p7s%2FuYEZvcHJwayp5IY9Kh8Qyoq95GWSp3nksMXuXNc41048as6xTFBsZb7B0qjkrJ%2Bj0CXIWlr7a5uTdq95vTMsGl%2Ft6bX0xd%2F3NSZcJisUpj3eDAH9KRw41MbA%3D\ntotal page:1\nhttp://weixin.sogou.com/weixin?query=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&type=2&page=0\ntotal page:1\nhttps://www.so.com/s?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&pn=0\ntotal page:1\nhttp://news.so.com/ns?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&pn=0&tn=news&rank=pdate&j=0&src=page\ntotal page:1\nhttp://www.bing.com/search?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:enanyang.my&first=0\ntotal page:1\nhttp://www.baidu.com/s?wd=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0&ie=utf-8\ntotal page:1\nhttp://news.baidu.com/ns?word=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0&cl=2&ct=0&tn=news&rn=20&ie=utf-8&bt=0&et=0\ntotal page:1\nhttp://www.baidu.com/s?rtt=2&wd=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0&tn=baiduwb&ie=utf-8\nExtracting date from http://stock.hexun.com/2015-11-10/180470145.html\ntotal page:1\nhttp://tieba.baidu.com/f/search/res?isnew=1&kw=&qw=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&rn=10&un=&only_thread=1&sm=1&sd=&ed=&pn=0&ie=utf-8\ntotal page:1\nhttps://www.sogou.com/web?query=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&page=0&ie=utf8\ntotal page:1\nhttp://weixin.sogou.com/weixin?query=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&type=2&page=0\ntotal page:1\nhttps://www.so.com/s?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0\ntotal page:1\nhttp://news.so.com/ns?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&pn=0&tn=news&rank=pdate&j=0&src=page\ntotal page:1\nhttp://www.bing.com/search?q=\xe7\xa2\xa7\xe6\xa1\x82\xe5\x9b\xad \xe9\x80\x80\xe6\x88\xbf site:www.qhcin.gov.cn&first=0\nExtracting date from http://www.so.com/link?m=aC8wY9hcupuXdD2dJrJWQS5QUKxCBBWl3pYbjI9tr6QzLkj4P7Uua%2B4t8oZttb70g0H0vel5suJc1W8Q04Xz5JFYYpX3mKPJlWrc2TqR32wAbVbWP2oIq0v%2FkXhC7MHDe1R%2FFOPZh%2BexjAILT7T4ZjikI1PrydGyCKjSdO5i%2Fv4l0uHpi%2BoFuBprWjgE%3D\nExtracting date from http://www.so.com/link?m=avF3cmCdtrLPkJjd3BA0ujFh2G9A1wnp3wXfLEJjy%2FvKaGKU%2Bt1zM5K0FdL4d3eMyKQRa1ld4VcPX843Sc7jkuA4gD4pJou%2BnzobE9ACcw6c3Ct0g3%2B0exg8vl7t71GwX9e15cOMlNW5FgyfSoSGYJmIgNAD4SotTGNUIcT4MIn7uutq8w8q7uPhi%2FUWF4hK0XDNKx8SRxNE%3D\nExtracting date from http://www.so.com/link?m=aGwhsOV6kskLT3OHaAE7GUzMQqhkXNtXFTRKPCVwHt7qItcT%2FBFhxz29hDclC%2FbVzpjpLYHF6QdDdK8NPhyFFvGFSIZAqorvmPGdbLAw6gpfJ37wZosdzuwE0FpTdCehR%2BRoYvof7kD%2F%2Bgueitf%2F%2BFgO2a0ooMl%2F%2BxmMKVyYtEcgQdWeC347xErnRPF8N3xmEA4%2BpTPH0k9s%3D\nExtracting date from http://www.so.com/link?m=ap4wN3YXVA8glj2bLWcxyOHhGlLhpD7%2FR1vLPUx%2FfoeReZusXoR4JcOHmRWtEuf0nhIGEyFe14QvAgpKKqcfQYgkjNziN7BjW71kbKdn6tnjkY8QDgRK5M9IK%2BeGTOhiKObLWtlTMg7KtSRZ%2FrF0%2FOfj4jGU%2FcRFvTeClmVjXvyYE3DeeM0s7ej3oNrm%2BORTc59kC4tJ%2B86Q%3D\nExtracting date from http://www.so.com/link?m=a3ck%2B2lrBxRwHPbPJ6w1gQERV62ONUWAPF741rDWcz%2Fxo1CZNHtV%2BYrPkw3bGAHSx5PlKWVLrr9i69bQP1SVnWrzX52IjmJtoRPLrBGHCfJWew%2FEmVb12MZuCyjc9gboONohXEYP%2BSAKOC5SDg%2BOnJVNYz1drLDe%2BtIgrBS0Rea8J%2FCk%2FoJJK%2FvK5VGLos4sP7Ovk0448YzE%3D\nExtracting date from http://www.so.com/link?m=aGEEZtMmklGW%2F5fJM95Bl8VkIxQrCV2tr0tVwPXVYEv2SK7u%2BnyiGNIrtA%2FKjid9ULAU%2FgKrqyVy1Qtf4b8pJDMe4MrQnfQG2jjOdGSDhGLlUV%2FeQQHGEzUaTUQ%2FpvTGejb1ON0JdUpfinoIKNrcnV8VejMLUcS20p4tX2Gd3LC1vGL6q3HcQ1alW6QR1DAiTFFNWBvS0jj0%3D\nExtracting date from http://www.so.com/link?m=apC%2BdaVT2B2Hhi5j4AVL5CiIpr8WzPrdGVd3uCubJ0LdHKMrGwpCaie%2B4TFuY1gqePImOjRXUwbWQcBOt4puYIs1J5RnNLge0SxADFHTTODEKeclCxNyJHbWfEA89FHaI4WGW%2F3ksNWvzamEk5qmEEb2qzDzk0EJUUyGoDsALJhR%2FxP3LhwXSK44iUVzhrLjQ',), 'isError': 0}: 'ascii' codec can't decode byte 0xe7 in position 99: ordinal not in range(128)
2017-07-06T19:00:05+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:11:00:05 +0000] "GET /listjobs.json?project=secrawler HTTP/1.1" 200 351 "-" "python-requests/2.13.0"
2017-07-06T19:00:05+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:11:00:05 +0000] "GET /listjobs.json?project=news4 HTTP/1.1" 200 95 "-" "python-requests/2.13.0"
2017-07-06T19:00:05+0800 [twisted.python.log#info] "127.0.0.1" - - [06/Jul/2017:11:00:05 +0000] "GET /listjobs.json?project=news_all HTTP/1.1" 200 331 "-" "python-requests/2.13.0"

加了个超长的args后，按钮飞出屏幕外了

给爬虫传了一个由100多个关键词组成的字符串作为参数后。。。

页面布局跪了，参数那栏把修改、运行按钮撑出屏幕外了

can not upload egg file

error info:
{
"code": 500,
"data": null,
"msg": "HTTPConnectionPool(host='localhost', port=6800): Max retries exceeded with url: /addversion.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x02C33C30>: Failed to establish a new connection: [WinError 10061] \u7531\u4e8e\u76ee\u6807\u8ba1\u7b97\u673a\u79ef\u6781\u62d2\u7edd\uff0c\u65e0\u6cd5\u8fde\u63a5\u3002',))",
"success": false
}

unicode info:
由于目标计算机积极拒绝，无法连接。

Question: args

First thanks for good software!

Secondly I have a question about the "Args" field in the "Run job" view. What does that do? Is there any way to pass arguments to a job like it's possible to do when run a scraper from command line, e.g scrapy crawl --set=SOMETHING=ATHING a_scraper

Thanks!

Not delete project on scrapyd while you delete the project in the SK project manager page

项目删除后，未执行任务，仍然存在

如题

fsevents是mac的包，linux下无法使用

在服务器上安装的时候这个包报错，查看后发现是mac的包，linux上运行是否有影响？具体用途在哪，有什么替代方案

关于时间的问题

运行后注意到界面类的时间，并不是北京时间

同时可运行的爬虫数量

作者你好，请问spiderkeeper只能同时运行4个爬虫吗？如果我想运行更多的爬虫项目，能否在设置中进行更改？

scrapyd-deploy 可以打包非.py文件吗

有些配置我是写到txt文件里的，文件修改：比如账号信息、网址链接、搜索关键词列表……

发现用 scrapyd-deploy --build-egg output.egg打包的时候，只对.py文件有效，可以把其它资源文件也打包进来吗

模块硬性要求问题

有些模块都规定了版本，这是硬伤：

spiderkeeper 1.2.0 has requirement PyMySQL==0.7.11, but you'll have pymysql 0.8.1 which is incompatible.

spiderkeeper 1.2.0 has requirement SQLAlchemy==1.1.9, but you'll have sqlalchemy 1.2.8 which is incompatible.

Custom Widgets for Spider Args

It would be much better user experience to use custom widgets for spider args. For example if we could be able to select category from a list or enter URL in separate field it would be much easier to end user to work with.

对这个项目感兴趣的请添加群：494760716

关于定时任务功能

消息提示 sqlite3 出现错误，错误信息如下：

"code": 500,
"data": null,
"msg": "(sqlite3.IntegrityError) NOT NULL constraint failed: sk_job_execution.service_job_execution_id [SQL: u'INSERT INTO sk_job_execution (date_created, date_modified, project_id, service_job_execution_id, job_instance_id, create_time, start_time, end_time, running_status, running_on) VALUES (CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1,

请问这个问题如何解决

'SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and '

^CzeddeMacBook-Air:spider zed$ spiderkeeper --server=http://localhost:6800
/Users/zed/.pyenv/versions/3.5.1/lib/python3.5/site-packages/SpiderKeeper/app/__init__.py:9: ExtDeprecationWarning: Importing flask.ext.restful is deprecated, use flask_restful instead.
  from flask.ext.restful import Api
/Users/zed/.pyenv/versions/3.5.1/lib/python3.5/site-packages/SpiderKeeper/app/__init__.py:10: ExtDeprecationWarning: Importing flask.ext.restful_swagger is deprecated, use flask_restful_swagger instead.
  from flask.ext.restful_swagger import swagger
/Users/zed/.pyenv/versions/3.5.1/lib/python3.5/site-packages/flask_sqlalchemy/__init__.py:839: FSADeprecationWarning: SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and will be disabled by default in the future.  Set it to True or False to suppress this warning.
  'SQLALCHEMY_TRACK_MODIFICATIONS adds significant overhead and '
--------------------------------------------------------------------------------
INFO in run [/Users/zed/.pyenv/versions/3.5.1/lib/python3.5/site-packages/SpiderKeeper/run.py:19]:
SpiderKeeper startd on 0.0.0.0:5000 with scrapyd servers:http://localhost:6800
--------------------------------------------------------------------------------
2017-04-16 23:32:36,752 - SpiderKeeper.app - INFO - SpiderKeeper startd on 0.0.0.0:5000 with scrapyd servers:http://localhost:6800

python v: 3.5.1
os: osx 10.12.4

这个忽略了可以吗?会不会影响使用

spiderkeeper 1.2.0 has requirement six==1.10.0, but you'll have six 1.11.0 which is incompatible.

first thanks you for your nice tool,

i have a question here

I install spiderkeeper on centos7 with virtualenv, python3.6

I have a "six" package, version is 1.11.0, but spiderkeeper output:

spiderkeeper 1.2.0 has requirement six==1.10.0, but you'll have six 1.11.0 which is incompatible.

should i use six 1.10.0 ? or perhaps this warning is influence nothing ?

Blueprint

How can I connect the SpiderKeeper through the Blueprint?

dormymo / spiderkeeper Goto Github PK

spiderkeeper's People

Contributors

Stargazers

Watchers

Forkers

spiderkeeper's Issues

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:73]: [load_spider_job][project:1][spider_name:keywordSpider][job_instance_id:38][job_id:spider_job_38:1499306532]

2017-07-11 21:30:17,110 - SpiderKeeper.app - INFO - [load_spider_job][project:1][spider_name:keywordSpider][job_instance_id:38][job_id:spider_job_38:1499306532]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:73]: [load_spider_job][project:4][spider_name:cjw][job_instance_id:39][job_id:spider_job_39:1499751120]

**2017-07-11 21:32:17,023 - SpiderKeeper.app - INFO - [load_spider_job][project:4][spider_name:cjw][job_instance_id:39][job_id:spider_job_39:1499751120] No handlers could be found for logger "apscheduler.executors.default"

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]: [run_spider_job][project:4][spider_name:cjw][job_instance_id:39]

2017-07-12 00:30:00,052 - SpiderKeeper.app - INFO - [run_spider_job][project:4][spider_name:cjw][job_instance_id:39]**

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]: [run_spider_job][project:4][spider_name:cjw][job_instance_id:39]

2017-07-13 02:30:00,074 - SpiderKeeper.app - INFO - [run_spider_job][project:4][spider_name:cjw][job_instance_id:39]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]: [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:10]

2017-07-13 04:15:00,161 - SpiderKeeper.app - INFO - [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:10]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]: [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:15]

2017-07-13 04:15:00,594 - SpiderKeeper.app - INFO - [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:15]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]: [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:17]

2017-07-13 04:15:00,643 - SpiderKeeper.app - INFO - [run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:17]

SK 端log：

Recommend Projects

Recommend Topics

Recommend Org

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:73]:
[load_spider_job][project:1][spider_name:keywordSpider][job_instance_id:38][job_id:spider_job_38:1499306532]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:73]:
[load_spider_job][project:4][spider_name:cjw][job_instance_id:39][job_id:spider_job_39:1499751120]

**2017-07-11 21:32:17,023 - SpiderKeeper.app - INFO - [load_spider_job][project:4][spider_name:cjw][job_instance_id:39][job_id:spider_job_39:1499751120]
No handlers could be found for logger "apscheduler.executors.default"

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:4][spider_name:cjw][job_instance_id:39]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:4][spider_name:cjw][job_instance_id:39]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:10]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:15]

INFO in common [/opt/SpiderKeeper/SpiderKeeper/app/schedulers/common.py:40]:
[run_spider_job][project:1][spider_name:keywordSpider][job_instance_id:17]