cnu / scrapy-random-useragent Goto Github PK
View Code? Open in Web Editor NEWScrapy Middleware to set a random User-Agent for every Request.
License: MIT License
Scrapy Middleware to set a random User-Agent for every Request.
License: MIT License
Apache, IIS, Nginx, and so on the vast majority of web servers, are not allowed to respond to POST requests for static files.
Hello, I have upgraded scrappy to 1.1.10 and I get the following ...
#ERROR - Error downloading <GET http://www. some url > ------ [(scraper.py:_log_download_errors:208) - 2016-07-08 14:16:53 - PID:13503])
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1099, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/middleware.py", line 53, in process_response
spider=spider)
File "/usr/local/lib/python2.7/dist-packages/scrapy/downloadermiddlewares/redirect.py", line 96, in process_response
interval, url = get_meta_refresh(response)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/response.py", line 39, in get_meta_refresh
response.encoding, ignore_tags=('script', 'noscript'))
TypeError: get_meta_refresh() got an unexpected keyword argument 'ignore_tags'
I upgraded also scrapy-random-useragent to 0.2, but I got the same error.
If I remove the line 'random_useragent.RandomUserAgentMiddleware': 400,
I don't get the error, so I think that the problem is in the scrapy-random-useragent (perhaps I'm wrong, I'm not an expert programmer).
Could you help me?
Thanks a lot for your time.
Regards.
if USER_AGENT_LIST
isn't set, it throws an error. Use some default setting.
Given a requirements file, requirements.txt
:
scrapy
scrapy-random-useragent
and the command:
pip install -r requirements.txt
results in the error:
ImportError: No module named scrapy
The installation of the requirements fails, as scrapy-random-useragent
is relying on scrapy
at a point where it has not been installed yet. This is because the setup.py
gets read by pip, which in turn imports the project (import random_useragent
) which in turn tries to import modules from scrapy
, which is not yet installed, hence the ImportError
.
This needs to be solved by removing the import in setup.py and any reliance on the data it imports.
A lot of projects just copy and paste the data, and bump both version numbers on doing a new release. However, [reading this thread](ImportError: No module named scrapy) seems like another way to handle the situation and keep things DRY.
If you wish, I can submit a PR which follows the path you wish to take the fix this - I'd just like your OK before I invest the time to do so.
Thanks,
Darian
Hi there!
Your module is awesome! Used it couple of times already.
Still, I see couple of improvements, like:
Description part "you set in a text file" should be bolded. When you want to try module fast (just pip install
and copy-paste configuration) you still will end up reading your whole readme file and wondering "why it still not working". Because you need to create a file with list of user agents.
I was able to find only small amount of user agents lists already built, like: https://github.com/cvandeplas/pystemon/blob/master/user-agents.txt
So maybe this can be used as helpful tip in Readme? I understand "single responsibility priciple" - your app does one thing and does it right, but without actual file nobody can use your application.
Hi,
Can you show use how we add the user agents to the file? Is it gonna be comma-delimited or with '\n'
Thanks
I am using scrapy 1.1.2
The error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 163, in crawl
return self._crawl(crawler, _args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 167, in _crawl
d = crawler.crawl(_args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1274, in unwindGenerator
return _inlineCallbacks(None, gen, Deferred())
--- ---
File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 68, in init
self.downloader = downloader_cls(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/init.py", line 88, in init
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 34, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 44, in load_object
mod = import_module(module)
File "/usr/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
exceptions.ImportError: No module named random_useragent
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.