crawlab-team / crawlab-sdk Goto Github PK

SDK for Crawlab, including SDK for different programming languages such as Python, Node.js and Java, and a CLI Tool written in Python.

Home Page: https://crawlab.cn

License: BSD 3-Clause "New" or "Revised" License

Python 52.87% JavaScript 37.27% Go 7.42% TypeScript 2.44%

crawlab-sdk's Introduction

Crawlab SDK

中文 | English

SDK for Crawlab, including SDK for different programming languages such as Python, Node.js and Java, and a CLI Tool written in Python.

crawlab-sdk's People

Contributors

Stargazers

Watchers

crawlab-sdk's Issues

save_items方法，在运行一段时间后，保存不到数据库。

版本：0.6.1
刚开始是好的，设置定时任务后，过几天看结果，发现任务有数量记录，但是点进去看不到任务的结果，去数据库看也没有新增，开启了url去重过滤

Python SDK 依赖版本过旧

Python 依赖定义的最佳实践是使用 >= 而非 == 。现在的配置，会强制所有 SDK 的用户安装多余、老版本的依赖包。

ERROR: crawlab-sdk 0.3.3 requires Click==7.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires elasticsearch==7.8.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires kafka-python==2.0.1, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires pathspec==0.8.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires prettytable==0.7.2, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires psycopg2-binary==2.8.5, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires pymysql==0.9.3, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires requests==2.22.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 has requirement pymongo==3.10.1, but you'll have pymongo 3.11.3 which is incompatible.
ERROR: crawlab-sdk 0.3.3 has requirement scrapy==2.2.0, but you'll have scrapy 2.5.0 which is incompatible.

can only concatenate str (not "ConnectionError") to str

一个 bug

File "/usr/local/lib/python3.8/dist-packages/crawlab/core/client.py", line 99, in update_token
    print('error: ' + data.get('error'))
TypeError: can only concatenate str (not "ConnectionError") to str

上传有bug

版本0.6.0-3

1、当环境为Windows时，使用upload命令上传，实际上传的路径变成/xxx/xxx\xxx\xxx这种形式，上传上能成功，但是传上去都成同一级
2、当环境为Mac时，使用命令upload .命令上传当前路径的文件，上传能成功，但最终所有的文件名都被删除了“.”（例如scrapy.cfg，上传后变成了scrapycfg）

Hello Crawlab team,
i'm using crawlab to deploy my scrapy spiders and when I'm trying to download results CSV it takes a lot of time sometimes more than 15min , is there any commands line with CLI sdk to download directly the data .
thank you

error: invalid character '-' in numeric literal

When trying to upload a scrapy project i have this error :

/.git/logs/refs/remotes/origin/HEAD
/.git/hooks/commit-msg.sample
/.git/hooks/pre-rebase.sample
/.git/hooks/pre-commit.sample
/.git/hooks/applypatch-msg.sample
/.git/hooks/fsmonitor-watchman.sample
/.git/hooks/pre-receive.sample
/.git/hooks/prepare-commit-msg.sample
/.git/hooks/post-update.sample
/.git/hooks/pre-merge-commit.sample
/.git/hooks/pre-applypatch.sample
/.git/hooks/pre-push.sample
/.git/hooks/update.sample
/.git/hooks/push-to-checkout.sample
/.git/refs/heads/master
/.git/refs/remotes/origin/HEAD
error: invalid character '-' in numeric literal

.
| ── packages
│         | ── js_spiders
│         |         | ── js_spider_1
│         |         |         | ── index.js
│         |         | ── js_spider_2
│         |         |         | ── index.js
│         |         | ── package.json
│         |         | ── .....
│         | ──  py_spiders
│         |         | ── py_spider_1
│         |         |         | ── main.py
│         |         | ── py_spider_2
│         |         |         | ── main.py
│         |         | ── setup.py
│         |         | ── .....
│ ── crawlab.json
│ ── makefile

crawlab.json

{
  "spiders": [
    {
      "path": "packages/js_spider",
      "exclude_path": "node_modules",
      "name": "js spiders",
      "description": "js spiders",
      "cmd": "node",
      "schedules": [
        {
          "name": "js spider 1 cron",
          "cron": "* 1 * * *",
          "command": "node js_spider_1/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 1 cron",
          "enabled": true
        },
        {
          "name": "js spider 2 cron",
          "cron": "* 2 * * *",
          "command": "node js_spider_2/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 2 cron",
          "enabled": true
        }
      ]
    },
    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        }
      ]
    }
  ]
}

I can help implement this if you think it is possible
@tikazyq

crawlab-team / crawlab-sdk Goto Github PK

crawlab-sdk's Introduction

Crawlab SDK

crawlab-sdk's People

Contributors

Stargazers

Watchers

Forkers

crawlab-sdk's Issues

Recommend Projects

Recommend Topics

Recommend Org