scrapy-plugins / scrapy-jsonrpc Goto Github PK

View Code? Open in Web Editor NEW

296.0 25.0 72.0 47 KB

Scrapy extension to control spiders using JSON-RPC

Shell 1.60% Python 98.40%

scrapy-jsonrpc's Issues

Python 3 compatibility

scrapy-jsonrpc is not compatible with Python 3.

Apart from the example client code that uses urllib.urlopen() :

the crawler resource is not found, the child resource name "crawler" needs to be passed as bytes to Twisted
the responses are not bytes and Twisted also complains

Multiple Crawls (no scrapyd) signal handler Error (WebService, Address)

Note: Originally reported by @ThiagoF at scrapy/scrapy#1122

I'm running a long concurrent crawl from a shell script. There are many scrapy processes running in parallel.

Time to time one throw this errors

015-03-31 01:11:12-0300 [scrapy] ERROR: Error caught on signal handler: <bound method ?.stop_listening of <scrapy.webservice.WebService instance at 0x7f48362a4710>>
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
        result = g.send(result)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 300, in _finish_stopping_engine
        yield self.signals.send_catch_log_deferred(signal=signals.engine_stopped)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
        return signal.send_catch_log_deferred(*a, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
        *arguments, **named)
    --- <exception caught here> ---
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 140, in maybeDeferred
        result = f(*args, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
        return receiver(*arguments, **named)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/webservice.py", line 96, in stop_listening
        self.port.stopListening()
    exceptions.AttributeError: WebService instance has no attribute 'port'

2015-03-31 01:12:16-0300 [scrapy] ERROR: Error caught on signal handler: <bound method ?.start_listening of <scrapy.webservice.WebService instance at 0x7fa8a733e710>>
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
        result = g.send(result)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 77, in start
        yield self.signals.send_catch_log_deferred(signal=signals.engine_started)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
        return signal.send_catch_log_deferred(*a, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
        *arguments, **named)
    --- <exception caught here> ---
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 140, in maybeDeferred
        result = f(*args, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
        return receiver(*arguments, **named)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/webservice.py", line 90, in start_listening
        self.port = listen_tcp(self.portrange, self.host, self)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/reactor.py", line 14, in listen_tcp
        return reactor.listenTCP(x, factory, interface=host)
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 495, in listenTCP
        p.startListening()
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 991, in startListening
        skt.listen(self.backlog)
      File "/usr/lib/python2.7/socket.py", line 224, in meth
        return getattr(self._sock,name)(*args)
    socket.error: [Errno 98] Address already in use

Had similar problem with telnet, but we disabled it.

Please complete doc and test it

Some suggestions:

complete your doc about how to use, please give a example in scrapy;
this code have some bugs, eg. https://github.com/movingheart/django_example/blob/master/QQ%E5%9B%BE%E7%89%8720160628005154.png

Error when the scrapy spider starts crawling and I access the path /crawler

Once I have started a spider and I try to access the URL http://localhost:6080/crawler, the following error is thrown.

web.Server Traceback (most recent call last):
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7fc808b829d0> is not JSON serializable
/usr/local/lib/python2.7/dist-packages/twisted/web/server.py:189 in process
188                    self._encoder = encoder
189            self.render(resrc)
190        except:
/usr/local/lib/python2.7/dist-packages/twisted/web/server.py:238 in render
237        try:
238            body = resrc.render(self)
239        except UnsupportedMethod as e:
/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py:11 in render
10        r = resource.Resource.render(self, txrequest)
11        return self.render_object(r, txrequest)
12
/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py:14 in render_object
13    def render_object(self, obj, txrequest):
14        r = self.json_encoder.encode(obj) + "\n"
15        txrequest.setHeader('Content-Type', 'application/json')
/usr/local/lib/python2.7/dist-packages/scrapy/utils/serialize.py:89 in encode
88            o = self.spref.encode_references(o)
89        return super(ScrapyJSONEncoder, self).encode(o)
90
/usr/lib/python2.7/json/encoder.py:207 in encode
206        # equivalent to the PySequence_Fast that ''.join() would do.
207        chunks = self.iterencode(o, _one_shot=True)
208        if not isinstance(chunks, (list, tuple)):
/usr/lib/python2.7/json/encoder.py:270 in iterencode
269                self.skipkeys, _one_shot)
270        return _iterencode(o, 0)
271
/usr/local/lib/python2.7/dist-packages/scrapy/utils/serialize.py:109 in default
108        else:
109            return super(ScrapyJSONEncoder, self).default(o)
110
/usr/lib/python2.7/json/encoder.py:184 in default
183        """
184        raise TypeError(repr(o) + " is not JSON serializable")
185
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7fc808b829d0> is not JSON serializable

Do you think extending and serialzing the Crawler object would be right thing to do here ? I can create a pull request with the fix if thats the case.

I can't access http://localhost:6080/crawler

Can someone tell me how to use it? I modified the configuration according to the document, but I could not access http://localhost:6080/crawler....

a part of "setting.py":
JSONRPC_ENABLED = True
EXTENSIONS = {
'scrapy_jsonrpc.webservice.WebService': 500,
}

i use python3.5,and scrapy 1.3.2
if you know the problem,could you please answer me?Thank you very much...

ImportError: cannot import name 'log' from 'scrapy'

in scrapy_jsonrpc\webservice.py

is this repo still alive?

module impoet error both in python2 and python3

import(name)
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/webservice.py", line 7, in
from scrapy_jsonrpc.jsonrpc import jsonrpc_server_call
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/jsonrpc.py", line 11, in
from scrapy_jsonrpc.serialize import ScrapyJSONDecoder
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/serialize.py", line 8, in
from scrapy.spider import Spider
ImportError: No module named spider

This project Fix http://localhost:6080/crawler 404 problem

https://github.com/xiayus/scrapy-jsonrpc

No one maintenance code ,you can use python3 version scrapy-jsonrpc

Welcome to fork and star

https://github.com/songhao8080/scrapy-jsonrpc

complete your doc about how to use, please give a example in scrapy;
this code have some bugs, eg. https://github.com/movingheart/django_example/blob/master/QQ%E5%9B%BE%E7%89%8720160628005154.png

scrapy-plugins / scrapy-jsonrpc Goto Github PK

scrapy-jsonrpc's Issues

Can someone tell me how to use it? I modified the configuration according to the document, but I could not access http://localhost:6080/crawler....

Welcome to fork and star

Recommend Projects

Recommend Topics

Recommend Org