Comments (4)
Removing the to_unicode() call is obviously incorrect.
from scrapy.
Then how about exception handling code such like this?
def get_header(self, name, default=None):
try:
return to_unicode(self.request.headers.get(name, default), errors="replace")
except TypeError:
return default
from scrapy.
Hi @marinelay and @wRAR, while looking through this good first issue, I found the line in scrapy, which seems to be the root cause of providing this error for Request with WrappedRequest as shown in the line --
=====================================================================
from urllib.request import Request as _Request
from scrapy.http.request import Request
from scrapy.http.cookies import WrappedRequest
a = _Request(url="https://a.example")
print(a.get_header('xxxx'))
b = WrappedRequest(Request(url="https://a.example"))
print("WrappedRequest get-header result:", b.get_header('xxxx')) -- This one
=====================================================================
None
TypeError: to_unicode must receive a bytes or str object, got NoneType -- Result
=====================================================================
Where as per behavior, it should be returning None for both the requests.
This is line 173 in "http/response/cookies.py" --
return to_unicode(self.request.headers.get(name, default), errors="replace")
Because of "to_unicode" being used, which as per function definition says --
"""Return the unicode representation of a bytes object text
. If
text
is already an unicode object, return it as-is."""
Here is checks output of 'self.request.headers.get(name, default), errors="replace"', which in this case would be "str" to be a valid candidate. If this is not the case, hence the error - "TypeError: to_unicode must receive a bytes or str object, got NoneType".
Hence a viable solution to this can be - "return self.request.headers.get(name, default)", which returns the output as "None".
Do let me know if this is the correct solution to this, or it might interfere with some other functionality for it.
from scrapy.
Hi @marinelay, I guess that gives the required output. Since it usually throws a 'TypeError', handling this part using an exception block is neat 👍.
from scrapy.
Related Issues (20)
- scrapy.pqueues.ScrapyPriorityQueue HOT 8
- Remove tests/requirements.txt
- Add an extra-deps job for pypy
- Document the SpiderState extension
- WindowsRunSpiderCommandTest isn't skipped properly in the pinned envs
- More documentation needed about the robots.txt protocol HOT 3
- GZipPlugin does not work with S3 HOT 3
- AttributeError: 'Decompressor' object has no attribute 'process' HOT 1
- Handle robots.txt files not utf-8 encoded HOT 3
- SitemapSpider will ignore sitemap with URLs like https://website.com/filename.xml?from=7155352010944&to=7482320519360 HOT 3
- Decompressor' object has no attribute 'process' HOT 1
- Failed to scrape data from Auction website with Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) error HOT 3
- Not able to use requests inside with scrapy. HOT 1
- ast.NameConstant is deprecated and will be removed in Python 3.14; use ast.Constant instead HOT 1
- Scrapy and Great Expectations: Error - __provides__ HOT 13
- test_get_func_args() expectation changes in new Python point releases HOT 6
- Receiving 403 while using proxy server and a valid user agent HOT 1
- Media Pipeline is not filtering the duplicate file requests HOT 3
- Per spider DNS_RESOLVER doesn't work HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapy.