Git Product home page Git Product logo

scrapy-jsonrpc's Issues

Python 3 compatibility

scrapy-jsonrpc is not compatible with Python 3.

Apart from the example client code that uses urllib.urlopen() :

  • the crawler resource is not found, the child resource name "crawler" needs to be passed as bytes to Twisted
  • the responses are not bytes and Twisted also complains

Multiple Crawls (no scrapyd) signal handler Error (WebService, Address)

Note: Originally reported by @ThiagoF at scrapy/scrapy#1122

I'm running a long concurrent crawl from a shell script. There are many scrapy processes running in parallel.

Time to time one throw this errors

015-03-31 01:11:12-0300 [scrapy] ERROR: Error caught on signal handler: <bound method ?.stop_listening of <scrapy.webservice.WebService instance at 0x7f48362a4710>>
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
        result = g.send(result)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 300, in _finish_stopping_engine
        yield self.signals.send_catch_log_deferred(signal=signals.engine_stopped)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
        return signal.send_catch_log_deferred(*a, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
        *arguments, **named)
    --- <exception caught here> ---
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 140, in maybeDeferred
        result = f(*args, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
        return receiver(*arguments, **named)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/webservice.py", line 96, in stop_listening
        self.port.stopListening()
    exceptions.AttributeError: WebService instance has no attribute 'port'

2015-03-31 01:12:16-0300 [scrapy] ERROR: Error caught on signal handler: <bound method ?.start_listening of <scrapy.webservice.WebService instance at 0x7fa8a733e710>>
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1107, in _inlineCallbacks
        result = g.send(result)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 77, in start
        yield self.signals.send_catch_log_deferred(signal=signals.engine_started)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
        return signal.send_catch_log_deferred(*a, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
        *arguments, **named)
    --- <exception caught here> ---
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 140, in maybeDeferred
        result = f(*args, **kw)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
        return receiver(*arguments, **named)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/webservice.py", line 90, in start_listening
        self.port = listen_tcp(self.portrange, self.host, self)
      File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/reactor.py", line 14, in listen_tcp
        return reactor.listenTCP(x, factory, interface=host)
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 495, in listenTCP
        p.startListening()
      File "/usr/local/lib/python2.7/dist-packages/twisted/internet/tcp.py", line 991, in startListening
        skt.listen(self.backlog)
      File "/usr/lib/python2.7/socket.py", line 224, in meth
        return getattr(self._sock,name)(*args)
    socket.error: [Errno 98] Address already in use

Had similar problem with telnet, but we disabled it.

Error when the scrapy spider starts crawling and I access the path /crawler

Once I have started a spider and I try to access the URL http://localhost:6080/crawler, the following error is thrown.

web.Server Traceback (most recent call last):
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7fc808b829d0> is not JSON serializable
/usr/local/lib/python2.7/dist-packages/twisted/web/server.py:189 in process
188                    self._encoder = encoder
189            self.render(resrc)
190        except:
/usr/local/lib/python2.7/dist-packages/twisted/web/server.py:238 in render
237        try:
238            body = resrc.render(self)
239        except UnsupportedMethod as e:
/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py:11 in render
10        r = resource.Resource.render(self, txrequest)
11        return self.render_object(r, txrequest)
12
/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py:14 in render_object
13    def render_object(self, obj, txrequest):
14        r = self.json_encoder.encode(obj) + "\n"
15        txrequest.setHeader('Content-Type', 'application/json')
/usr/local/lib/python2.7/dist-packages/scrapy/utils/serialize.py:89 in encode
88            o = self.spref.encode_references(o)
89        return super(ScrapyJSONEncoder, self).encode(o)
90
/usr/lib/python2.7/json/encoder.py:207 in encode
206        # equivalent to the PySequence_Fast that ''.join() would do.
207        chunks = self.iterencode(o, _one_shot=True)
208        if not isinstance(chunks, (list, tuple)):
/usr/lib/python2.7/json/encoder.py:270 in iterencode
269                self.skipkeys, _one_shot)
270        return _iterencode(o, 0)
271
/usr/local/lib/python2.7/dist-packages/scrapy/utils/serialize.py:109 in default
108        else:
109            return super(ScrapyJSONEncoder, self).default(o)
110
/usr/lib/python2.7/json/encoder.py:184 in default
183        """
184        raise TypeError(repr(o) + " is not JSON serializable")
185
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7fc808b829d0> is not JSON serializable

Do you think extending and serialzing the Crawler object would be right thing to do here ? I can create a pull request with the fix if thats the case.

I can't access http://localhost:6080/crawler

Can someone tell me how to use it? I modified the configuration according to the document, but I could not access http://localhost:6080/crawler....

a part of "setting.py":
JSONRPC_ENABLED = True
EXTENSIONS = {
'scrapy_jsonrpc.webservice.WebService': 500,
}

i use python3.5,and scrapy 1.3.2
if you know the problem,could you please answer me?Thank you very much...

module impoet error both in python2 and python3

import(name)
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/webservice.py", line 7, in
from scrapy_jsonrpc.jsonrpc import jsonrpc_server_call
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/jsonrpc.py", line 11, in
from scrapy_jsonrpc.serialize import ScrapyJSONDecoder
File "/Users/jerry/venv2/lib/python2.7/site-packages/scrapy_jsonrpc/serialize.py", line 8, in
from scrapy.spider import Spider
ImportError: No module named spider

import error

since I installed this to my project I've been getting
Screenshot 2021-05-12 at 23 08 33

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.