Git Product home page Git Product logo

scrapy-random-useragent's People

Contributors

cnu avatar djm avatar tianhuil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrapy-random-useragent's Issues

Problem after upgrading to scrappy 1.1.0

Hello, I have upgraded scrappy to 1.1.10 and I get the following ...

#ERROR - Error downloading <GET http://www. some url > ------ [(scraper.py:_log_download_errors:208) - 2016-07-08 14:16:53 - PID:13503])
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1099, in _inlineCallbacks
    result = g.send(result)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/middleware.py", line 53, in process_response
    spider=spider)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/downloadermiddlewares/redirect.py", line 96, in process_response
    interval, url = get_meta_refresh(response)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/response.py", line 39, in get_meta_refresh
    response.encoding, ignore_tags=('script', 'noscript'))
TypeError: get_meta_refresh() got an unexpected keyword argument 'ignore_tags'

I upgraded also scrapy-random-useragent to 0.2, but I got the same error.
If I remove the line 'random_useragent.RandomUserAgentMiddleware': 400,
I don't get the error, so I think that the problem is in the scrapy-random-useragent (perhaps I'm wrong, I'm not an expert programmer).

Could you help me?

Thanks a lot for your time.

Regards.

setup.py should not import the project, as this breaks installs via pip.

Given a requirements file, requirements.txt:

scrapy
scrapy-random-useragent

and the command:

pip install -r requirements.txt

results in the error:

ImportError: No module named scrapy

The installation of the requirements fails, as scrapy-random-useragent is relying on scrapy at a point where it has not been installed yet. This is because the setup.py gets read by pip, which in turn imports the project (import random_useragent) which in turn tries to import modules from scrapy, which is not yet installed, hence the ImportError.

This needs to be solved by removing the import in setup.py and any reliance on the data it imports.

A lot of projects just copy and paste the data, and bump both version numbers on doing a new release. However, [reading this thread](ImportError: No module named scrapy) seems like another way to handle the situation and keep things DRY.

If you wish, I can submit a PR which follows the path you wish to take the fix this - I'd just like your OK before I invest the time to do so.

Thanks,
Darian

Tips for README improvement

Hi there!

Your module is awesome! Used it couple of times already.

Still, I see couple of improvements, like:

  1. Description part "you set in a text file" should be bolded. When you want to try module fast (just pip install and copy-paste configuration) you still will end up reading your whole readme file and wondering "why it still not working". Because you need to create a file with list of user agents.

  2. I was able to find only small amount of user agents lists already built, like: https://github.com/cvandeplas/pystemon/blob/master/user-agents.txt
    So maybe this can be used as helpful tip in Readme? I understand "single responsibility priciple" - your app does one thing and does it right, but without actual file nobody can use your application.

No module named random_useragent : Error

I am using scrapy 1.1.2
The error:

Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 163, in crawl
return self._crawl(crawler, _args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 167, in _crawl
d = crawler.crawl(_args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1274, in unwindGenerator
return _inlineCallbacks(None, gen, Deferred())
--- ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 68, in init
self.downloader = downloader_cls(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/init.py", line 88, in init
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/usr/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
exceptions.ImportError: No module named random_useragent

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.