crawlab-team / crawlab-sdk Goto Github PK

SDK for Crawlab, including SDK for different programming languages such as Python, Node.js and Java, and a CLI Tool written in Python.

Home Page: https://crawlab.cn

License: BSD 3-Clause "New" or "Revised" License

Python 52.87% JavaScript 37.27% Go 7.42% TypeScript 2.44%

crawlab-sdk's Issues

go sdk 里面的go.mod问题

目前的依赖版本有问题存在
replace (
github.com/crawlab-team/crawlab-grpc => /Users/marvzhang/projects/crawlab-team/crawlab-grpc
github.com/crawlab-team/go-trace => /Users/marvzhang/projects/crawlab-team/go-trace
)
目前go-trace 和crawlab-grpc 都有 github仓库了，是否需要修改

Python SDK 依赖版本过旧

Python 依赖定义的最佳实践是使用 >= 而非 == 。现在的配置，会强制所有 SDK 的用户安装多余、老版本的依赖包。

ERROR: crawlab-sdk 0.3.3 requires Click==7.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires elasticsearch==7.8.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires kafka-python==2.0.1, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires pathspec==0.8.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires prettytable==0.7.2, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires psycopg2-binary==2.8.5, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires pymysql==0.9.3, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires requests==2.22.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 has requirement pymongo==3.10.1, but you'll have pymongo 3.11.3 which is incompatible.
ERROR: crawlab-sdk 0.3.3 has requirement scrapy==2.2.0, but you'll have scrapy 2.5.0 which is incompatible.

大佬们的java sdk 有在开发吗?

error: invalid character '-' in numeric literal

When trying to upload a scrapy project i have this error :

/.git/logs/refs/remotes/origin/HEAD
/.git/hooks/commit-msg.sample
/.git/hooks/pre-rebase.sample
/.git/hooks/pre-commit.sample
/.git/hooks/applypatch-msg.sample
/.git/hooks/fsmonitor-watchman.sample
/.git/hooks/pre-receive.sample
/.git/hooks/prepare-commit-msg.sample
/.git/hooks/post-update.sample
/.git/hooks/pre-merge-commit.sample
/.git/hooks/pre-applypatch.sample
/.git/hooks/pre-push.sample
/.git/hooks/update.sample
/.git/hooks/push-to-checkout.sample
/.git/refs/heads/master
/.git/refs/remotes/origin/HEAD
error: invalid character '-' in numeric literal

CrawlabPipeline create new connection for every item

CrawlabPipeline do not keep the client instance so it has to create new connection every time, that's not so efficient.
Is it necessary to switch to a more efficient way to do the jobs?

【增强】Crawlab 容器镜像里升级 requests 和 scrapy 时遇到兼容性报错问题

ERROR crawlab-sdk 0.1.7 has requirement requests==2.22.0, but you'll have requests 2.24.0 which is incompatible.
ERROR crawlab-sdk 0.1.7 has requirement scrapy==1.8.0, but you'll have scrapy 2.2.0 which is incompatible.

也许该升级下 SDK 了？

About crawlab.json

When I was working with the SDK, I found that the SDK was not very convenient for schedules and deployment of multiple spiders, so I wondered if it could be designed to look like the following

.
| ── packages
│         | ── js_spiders
│         |         | ── js_spider_1
│         |         |         | ── index.js
│         |         | ── js_spider_2
│         |         |         | ── index.js
│         |         | ── package.json
│         |         | ── .....
│         | ──  py_spiders
│         |         | ── py_spider_1
│         |         |         | ── main.py
│         |         | ── py_spider_2
│         |         |         | ── main.py
│         |         | ── setup.py
│         |         | ── .....
│ ── crawlab.json
│ ── makefile

crawlab.json

{
  "spiders": [
    {
      "path": "packages/js_spider",
      "exclude_path": "node_modules",
      "name": "js spiders",
      "description": "js spiders",
      "cmd": "node",
      "schedules": [
        {
          "name": "js spider 1 cron",
          "cron": "* 1 * * *",
          "command": "node js_spider_1/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 1 cron",
          "enabled": true
        },
        {
          "name": "js spider 2 cron",
          "cron": "* 2 * * *",
          "command": "node js_spider_2/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 2 cron",
          "enabled": true
        }
      ]
    },
    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        }
      ]
    }
  ]
}

I can help implement this if you think it is possible
@tikazyq

File "/usr/local/lib/python3.8/dist-packages/crawlab/core/client.py", line 99, in update_token
    print('error: ' + data.get('error'))
TypeError: can only concatenate str (not "ConnectionError") to str

save_items方法，在运行一段时间后，保存不到数据库。

版本：0.6.1
刚开始是好的，设置定时任务后，过几天看结果，发现任务有数量记录，但是点进去看不到任务的结果，去数据库看也没有新增，开启了url去重过滤

crawlab-team / crawlab-sdk Goto Github PK

crawlab-sdk's Issues

go sdk 里面的go.mod问题

Python SDK 依赖版本过旧

大佬们的java sdk 有在开发吗?

error: invalid character '-' in numeric literal

CrawlabPipeline create new connection for every item

【增强】Crawlab 容器镜像里升级 requests 和 scrapy 时遇到兼容性报错问题

About crawlab.json

【增强】增加 CLi 批量取消任务和批量启动任务功能

save_item 数据保存数据库，连接不释放问题

Download results with CLI

上传有bug

can only concatenate str (not "ConnectionError") to str

save_items方法，在运行一段时间后，保存不到数据库。

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent