Git Product home page Git Product logo

crawlab-sdk's Issues

go sdk 里面的go.mod问题

目前的依赖版本有问题存在
replace (
github.com/crawlab-team/crawlab-grpc => /Users/marvzhang/projects/crawlab-team/crawlab-grpc
github.com/crawlab-team/go-trace => /Users/marvzhang/projects/crawlab-team/go-trace
)
目前go-trace 和crawlab-grpc 都有 github仓库了,是否需要修改

Python SDK 依赖版本过旧

Python 依赖定义的最佳实践是使用 >= 而非 == 。现在的配置,会强制所有 SDK 的用户安装多余、老版本的依赖包。

ERROR: crawlab-sdk 0.3.3 requires Click==7.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires elasticsearch==7.8.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires kafka-python==2.0.1, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires pathspec==0.8.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires prettytable==0.7.2, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires psycopg2-binary==2.8.5, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires pymysql==0.9.3, which is not installed.
ERROR: crawlab-sdk 0.3.3 requires requests==2.22.0, which is not installed.
ERROR: crawlab-sdk 0.3.3 has requirement pymongo==3.10.1, but you'll have pymongo 3.11.3 which is incompatible.
ERROR: crawlab-sdk 0.3.3 has requirement scrapy==2.2.0, but you'll have scrapy 2.5.0 which is incompatible.

error: invalid character '-' in numeric literal

When trying to upload a scrapy project i have this error :

/.git/logs/refs/remotes/origin/HEAD
/.git/hooks/commit-msg.sample
/.git/hooks/pre-rebase.sample
/.git/hooks/pre-commit.sample
/.git/hooks/applypatch-msg.sample
/.git/hooks/fsmonitor-watchman.sample
/.git/hooks/pre-receive.sample
/.git/hooks/prepare-commit-msg.sample
/.git/hooks/post-update.sample
/.git/hooks/pre-merge-commit.sample
/.git/hooks/pre-applypatch.sample
/.git/hooks/pre-push.sample
/.git/hooks/update.sample
/.git/hooks/push-to-checkout.sample
/.git/refs/heads/master
/.git/refs/remotes/origin/HEAD
error: invalid character '-' in numeric literal

About crawlab.json

When I was working with the SDK, I found that the SDK was not very convenient for schedules and deployment of multiple spiders, so I wondered if it could be designed to look like the following

.
| ── packages
│         | ── js_spiders
│         |         | ── js_spider_1
│         |         |         | ── index.js
│         |         | ── js_spider_2
│         |         |         | ── index.js
│         |         | ── package.json
│         |         | ── .....
│         | ──  py_spiders
│         |         | ── py_spider_1
│         |         |         | ── main.py
│         |         | ── py_spider_2
│         |         |         | ── main.py
│         |         | ── setup.py
│         |         | ── .....
│ ── crawlab.json
│ ── makefile

crawlab.json

{
  "spiders": [
    {
      "path": "packages/js_spider",
      "exclude_path": "node_modules",
      "name": "js spiders",
      "description": "js spiders",
      "cmd": "node",
      "schedules": [
        {
          "name": "js spider 1 cron",
          "cron": "* 1 * * *",
          "command": "node js_spider_1/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 1 cron",
          "enabled": true
        },
        {
          "name": "js spider 2 cron",
          "cron": "* 2 * * *",
          "command": "node js_spider_2/index.js",
          "param": "",
          "mode": "random",
          "description": "js spider 2 cron",
          "enabled": true
        }
      ]
    },
    {
      "path": "packages/py_spider",
      "exclude_path": ".venv",
      "name": "py spiders",
      "description": "py spiders",
      "cmd": "python",
      "schedules": [
        {
          "name": "py spider 1 cron",
          "cron": "* 1 * * *",
          "command": "python py_spider_1/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 1 cron",
          "enabled": true
        },
        {
          "name": "py spider 2 cron",
          "cron": "* 2 * * *",
          "command": "python py_spider_2/main.py",
          "param": "",
          "mode": "random",
          "description": "py spider 2 cron",
          "enabled": true
        }
      ]
    }
  ]
}

I can help implement this if you think it is possible
@tikazyq

save_item 数据保存数据库,连接不释放问题

入库环境是:mysql;
save_item 保存数据库,测试发现采集器即便采集结束,连接依然被占用,同时启动多个采集器,发现数据库就连接不上了,查询后,发现采集器连接未释放,显示Sleep 状态,单个采集器占用连接都是上千个,占用连接和采集的数据量可能存在关系,严重bug😢😢😢😢😢😢,我更换批量写入再次测试下,麻烦作者尽快回复哈

Download results with CLI

Hello Crawlab team,
i'm using crawlab to deploy my scrapy spiders and when I'm trying to download results CSV it takes a lot of time sometimes more than 15min , is there any commands line with CLI sdk to download directly the data .
thank you

上传有bug

版本0.6.0-3

1、当环境为Windows时,使用upload命令上传,实际上传的路径变成/xxx/xxx\xxx\xxx这种形式,上传上能成功,但是传上去都成同一级
2、当环境为Mac时,使用命令upload .命令上传当前路径的文件,上传能成功,但最终所有的文件名都被删除了“.”(例如scrapy.cfg,上传后变成了scrapycfg)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.