jordanmilne / advocate Goto Github PK
View Code? Open in Web Editor NEWAn SSRF-preventing wrapper around Python's requests library. Advocate is no longer maintained, please fork and rename if you would like to continue work on it.
License: Other
An SSRF-preventing wrapper around Python's requests library. Advocate is no longer maintained, please fork and rename if you would like to continue work on it.
License: Other
Seems to have something to do with the new usage of @pytest.fixture
and our wrapper for requests' tests that auto-passes tests that call Session.mount()
. I'm guessing fixture uses introspection to figure out what fixtures to add in based on arg names, but our wrapper uses *args, **kwargs
.
A network admin may configure their DNS64 instance to use a prefix other than the well-known prefix, which is in the private IP space. We want to allow people to specify ranges that should be treated the same as the DNS64 WK prefix.
Currently we have a pretty heinous way of checking that our wrappers don't break existing functionality across multiple versions of requests. We pull the source for requests, read in its test_requests.py
file and make modifications so that any calls to requests.<foo>()
will instead use the advocate wrapper. We also use regexes to replace code in older requests versions' test code that is broken under Python 3.
We then take all that regex-mangled code, exec()
it, and shove the resulting test functions' globals into the test_advocate
module so they get run with the rest of our tests. Gross.
This sort of worked up until recently, when requests did the sensible thing and migrated from a single test script to a directory of test scripts. Rather than have separate code-paths for searching / replacing source in the different requests versions we should make these tests less brittle.
My current plan is:
2.4
which broke due to httpbin changes.)requests.<method>()
with advocate_wrapper.<method>()
in the source, figure out a better way of intercepting uses of the requests
module directly by requests' test suite (something like the _WrappedSocket
we already use.)It's been at least a year since anybody's asked me to care about Python 2.x, so it seems reasonable to drop 2.x support for new versions of Advocate. Requests no longer supports Python 2.x on new versions either!
This'd let us drop all of the six
stuff along with the otherwise-useless deps we're pulling in to fix HTTPS cert validation under Python 2.7.
i got exception
TypeError: new() got an unexpected keyword argument 'key_validator'
Hey Jordan,
I ran into some weird behavior in advocate today. I haven't tried to dig into the cause, but it seems to be mostly just related to the head()
function for some reason. We have a RequestsAPIWrapper
instance that's defined like this:
advocate = RequestsAPIWrapper(AddrValidator(
ip_whitelist=g.scraper_ip_whitelist,
))
Using this for advocate.head(url)
behaves strangely when used on our S3 thumbnail buckets, even though it seems to work if I just use advocate.request('HEAD', url)
instead. Here's some testing that should show what I mean:
>>> from r2.lib.media import advocate
>>> advocate.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [400]>
>>> advocate.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg').content
'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>BadRequest</Code><Message>Insufficient information. Origin request header needed.</Message><RequestId>1CB3BBB8D0A65360</RequestId><HostId>eno2W4lV4rPy1ZpQbpybvxtXSFrcTmxcjyrdVSf8/ongUZBGJKnZMRtv5eQhVJDto6LxxOHCSD8=</HostId></Error>'
>>> advocate.get('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [200]>
>>> advocate.request('HEAD', 'https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [200]>
>>> import advocate
>>> advocate.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [200]>
>>> import requests
>>> requests.head('https://b.thumbs.redditmedia.com/6hn7cHXUEXSYh8K5ga-LCeHWBpPADv6cS6-34ty2RuE.jpg')
<Response [200]>
Let me know if you need any more information or anything.
Requests itself has types in the typeshed: https://github.com/python/typeshed/tree/6e985ef3de3678676d753a1efbf45fd67095a3b2/stubs/requests/requests
so maybe advocate can piggyback off those somehow, but type hints would be nice for static typing
The last release was 3 years ago: https://pypi.org/project/advocate/#history
I think I just stumbled over https://stackoverflow.com/q/76002461/562769
Is there a chance of getting an update?
For platforms where using Python modules implemented in C is particularly difficult, it would be useful to be able to use Advocate without those features included.
In particular, it looks like netifaces
exists solely to support detection of local addresses; in the vast majority of cases, I would expect all addresses reported by netifaces
to be private addresses, and thus not pass the filter anyway. I'm not sure that it's worth a breaking change, but I'm curious why this functionality is enabled by default, given that (I assume) most hosts don't have direct public IP addresses.
Would you be open to a PR that disables this functionality if netifaces
is not available?
Hi, when I try to run the pytest
, I noticed that hostname-related tests are skipped even when the nameserver supports the AI_CANONNAME
flag. This was possibly due to the incorrect type of the expected result when testing for the AI_CANONNAME
flag: https://github.com/JordanMilne/Advocate/blob/master/test/test_advocate.py#L55
The cannonname returned by socket.getaddrinfo
is a string now. ref: https://docs.python.org/3/library/socket.html#socket.getaddrinfo
When running hostname-related tests, HostnameTests.test_idn
failed:
======================================= FAILURES =======================================
________________________________ HostnameTests.test_idn ________________________________
self = <test.test_advocate.HostnameTests testMethod=test_idn>
def test_idn(self):
# test some basic globs
> self.assertFalse(self._is_hostname_allowed(
u"**.example.org",
fake_lookup=True,
hostname_blacklist={"*.org"}
))
test/test_advocate.py:359:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/test_advocate.py:350: in _is_hostname_allowed
if validator.is_addrinfo_allowed(res):
advocate/addrvalidator.py:58: in wrapper
return func(self, *args, **kwargs)
advocate/addrvalidator.py:319: in is_addrinfo_allowed
if not self.is_hostname_allowed(canonname):
advocate/addrvalidator.py:259: in is_hostname_allowed
if self._hostname_matches_pattern(hostname, pattern):
advocate/addrvalidator.py:221: in _hostname_matches_pattern
hostname = canonicalize_hostname(hostname)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
hostname = b'xn--fiqs8s.example.org'
def canonicalize_hostname(hostname):
"""Lowercase and punycodify a hostname"""
# We do the lowercasing after IDNA encoding because we only want to
# lowercase the *ASCII* chars.
# TODO: The differences between IDNA2003 and IDNA2008 might be relevant
# to us, but both specs are damn confusing.
> return str(hostname.encode("idna").lower(), 'utf-8')
E AttributeError: 'bytes' object has no attribute 'encode'
advocate/addrvalidator.py:23: AttributeError
=============================== short test summary info ================================
FAILED test/test_advocate.py::HostnameTests::test_idn - AttributeError: 'bytes' object has no attribute 'encode'
Similar to the cannoname
bug, this is also introduced by feeding bytes as the cannoname
in the _is_hostname_allowed
function: https://github.com/JordanMilne/Advocate/blob/master/test/test_advocate.py#L346
I can make a PR later for a quick fix of the issues mentioned above
I don't know what I was thinking. There's no reason for this to be vendored, and the code to vendor it is gnarly.
http://0177.0.0.1/ resolves to http://127.0.0.1, which is localhost. This module does not block this URL when it should.
Will there be a version for python 3.11?
Should it be on by default? I'm kind of iffy on this, but I don't know enough about IPv6 as-deployed-by most people to say if it's dangerous one way or the other.
I get the feeling people may currently be relying on NAT / firewall rules to prevent external requests from getting in. With IPv4, if you use private ranges like you should, it's super easy to detect and block if an IP refers to something on the LAN (192.x.x.x
, 10.x.x.x
., etc.) Even if there's a service that's bound to 0.0.0.0
.
But what about with IPv6? Say I've got an internal service that binds to ::
on server X, and server Y running a site that uses Advocate on the same network. If server Y makes a request to server X's globally-routable IPv6 IP, do we have any guarantees as to how the packet will be routed? Might it be routed directly to server server X without hitting the external firewall?
I'm concerned that if that is the case, users will be vulnerable to SSRF unless the servers are assigned ULAs, or if the organization is large enough to have a globally-routable block assigned to them for internal use, which be added to Advocate's blacklist.
TL;DR: We somewhat abuse the side-effects of IPv4 + NAT so we can tell by an address if the destination is likely on the LAN or not. Non-globally-routable (RFC1918 or otherwise) addresses are blocked, globally routable ones are not.
This doesn't appear to be possible with most IPv6 setups as LAN IPs usually == globally routable WAN IPs, frustrating the ability to prevent SSRF via checks in the client.
Is any of that actually a concern? Am I fundamentally misunderstanding IPv6? Any relevant IPv6 documentation is helpful!
Since requests 2.14.0 advocate is broken.
requests updated its internal urllib3 version and this can be the cause of the problem.
I get an error which relates to urllib3:
File "D:\******\******\env36\lib\site-packages\requests\sessions.py", line 531, in get
return self.request('GET', url, **kwargs)
File "D:\******\******\env36\lib\site-packages\requests\sessions.py", line 518, in request
resp = self.send(prep, **send_kwargs)
File "D:\******\******\env36\lib\site-packages\requests\sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "D:\******\******\env36\lib\site-packages\requests\adapters.py", line 403, in send
conn = self.get_connection(request.url, proxies)
File "D:\******\******\env36\lib\site-packages\requests\adapters.py", line 307, in get_connection
conn = self.poolmanager.connection_from_url(url)
File "D:\******\******\env36\lib\site-packages\requests\packages\urllib3\poolmanager.py", line 279, in connection_from_url
pool_kwargs=pool_kwargs)
File "D:\******\******\env36\lib\site-packages\requests\packages\urllib3\poolmanager.py", line 227, in connection_from_host
return self.connection_from_context(request_context)
File "D:\******\******\env36\lib\site-packages\requests\packages\urllib3\poolmanager.py", line 238, in connection_from_context
pool_key = pool_key_constructor(request_context)
File "D:\******\******\env36\lib\site-packages\requests\packages\urllib3\poolmanager.py", line 103, in _default_key_normalizer
return key_class(**context)
TypeError: __new__() got an unexpected keyword argument 'key_validator'
They work fine with nosetests
, need to look into this.
======================================================================
ERROR: test_response_iter_lines_reentrant (test_advocate.RequestsTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/functools.py", line 33, in update_wrapper
setattr(wrapper, attr, getattr(wrapped, attr))
AttributeError: 'RequestsTestCase' object has no attribute '__name__'
----------------------------------------------------------------------
I've avoided this because of security concerns, and was locally uploading packages with twine
, but it looks like twine
itself builds and publishes its package to PyPI in a github release hook. May as well publish in CI, then!
https://github.com/psf/black/blob/c0cc19b5b3371842d696875897bebefebd5e1596/.github/workflows/pypi_upload.yml#L11-L34 looks like it'd be a good fit.
Any chance that the latest commit can be released so that there's fewer ipaddress module copies floating around? Can totally pin to a git commit but would be nice to pull this from PyPI.
Looks like https://github.com/reddit/baseplate.py/blob/develop/requirements-transitive.txt#L1 pins Advocate to 1.0.0 but might install anything less than 2.x so I imagine this would be a fairly safe release.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.