gruns / furl Goto Github PK
View Code? Open in Web Editor NEW🌐 URL parsing and manipulation made easy.
License: Other
🌐 URL parsing and manipulation made easy.
License: Other
It'd be great if you tagged release commits please and push them to GitHub - would make looking at change deltas substantially easier!
furl ('http://domain.com:9999').join ('http://:9021/').url
'http://:9021/
expected `http://domain.com:9021/'
In [17]: furl ('http://domain.com:9999').join ('https//:9021/').url
Out[17]: 'http://:9021/'
expected `https://domain.com:9021/'
If I use and URL like the following:
url = 'http://localhost:8080/foo?Subject=aa&Subject=bb'
f = furl(url)
...I have no way to manipulate the query string values.
calling f.args.Subject
is only returning the first item.
Hey @gruns, love your library—it's definitely a wheel reinvented too often in the Python community.
There's one use case that this library does not seem to offer: schema-relative URLs. Consider this example:
>>> u = furl('//example.org/foo')
>>> u.url
'example.org/foo'
This loses the schema-relative prefix from the inputted URL, which is valuable information. Is this somehow supported? Internally, you could possibly represent this case using a sentinel value for the scheme attribute, so it can be reflected accordingly?
RFC3986 says pchar includes ":" and "@".
So a url like http://a.example.com/a/b/c/ftp://b.example.com/d/e/f.txt is valid
while furl escapes it and generates http://a.example.com/a/b/c/ftp%3A//b.example.com/d/e/f.txt .
Would you plaese add the args like dict or list:
furl('http://www.google.com/?one=1').add({'two':[1,2,3,4]})
it will help me in packaging this module in Fedora.
Currently the following "doesn't work":
>>> f = furl('http://example.com')
>>> f.args[u'testö'] = u'testä'
>>> f
<repr(<furl.furl.furl at 0x17438d0>) failed: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 4: ordinal not in range(128)>
The documentation explicitly states "Encoding is handled for you" and, of course, means urlencoding, however urllib.urlencode
does not automatically handle unicode
.
>>> f.args[u'testö'.encode('utf-8')] = u'testä'.encode('utf-8')
"works".
@
and :
are not escaped in result URL when they are present in username/password:
> from furl import furl
> f = furl('http://some-domain.com')
> f.username = 'username@withcharacter'
> f.password = 'complex@password:characters'
> print(f.url)
http://username@withcharacter:complex@password:characters@some-domain.com
According to RFC 3986, section 3.2.1, it needs to be percent encoded:
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
So it should look like:
http://username%40withcharacter:complex%40password%[email protected]
urlparse.parse_qs
returning a dict of lists of values)urlparse.parse_qsl
)Werkzeug provides a data structure for this use case
http://python.net/crew/mwh/nevowapi/nevow.url.URL.html is a similar API, and it has a few things that I think you don't yet: up, click, clear (?), replace.
It would also be friendly to start a list of related projects in your readme.
If I create a scheme-less furl
object with no explicit port set and then at a later date set the scheme to one which has a known default port, it doesn't update the port
attribute. For example:
from furl import furl
print(furl('http://example.com').port)
print(furl('unknown://example.com').set(scheme='http').port)
print(furl().set(netloc='example.com').set(scheme='http').port)
I get 80, None
, None
, whereas I'd expect 80 each time.
furl 3.6:
"furl/setup.py", line 37, in
tests_require=[] if version_info[0:2] >= [2,7] else ['unittest2'],
TypeError: unorderable types: tuple() >= list()
Is it possible to use the .remove() method with multiple value keys?
# Actual
>>> furl('/foo/?gender=M&gender=F&location=CA').remove({'gender': 'M'}).url
'/foo/?location=CA'
# Expected
>>> furl('/foo/?gender=M&gender=F&location=CA').remove({'gender': 'M'}).url
'/foo/?gender=F&location=CA'
# Workaround
>>> f = furl('/foo/?gender=M&gender=F&location=CA')
>>> f.args.popvalue('gender', 'M')
>>> print f.url
'/foo/?gender=F&location=CA'
I have a simple url: new.rambler.ru
obj = furl("new.rambler.ru")
print obj.host, obj.scheme, obj.path
None None news.rambler.ru
I excepted that host is new.rambler.ru, scheme is None. But new.rambler.ru is in the path.
Is this a bug or expected behavior?
I am now incorporating furl in my project. I really like it, but there are two things, which fail my unittests:
username
, password
, host
... etc return empty string when they do not make sense. But port
for relative url (there's no scheme
) returns None
.furl.pathstr
, furl.querystr
, furl.fragmentstr
, but I don't see similar property for params
. Also str(furl.query.params)
returns stringified dictionary instead of a part of original url.Am I missing something (eg. paramsstr hidden somewhere)?
This fails:
url = "http://host/?param"
assert str(furl(url)) == url
I have to talk to a very picky web-service that needs a specific parameter to have no value (and not an empty value). However Furl generates ?param=
with an empty value. Maybe using None
for no value would work?
First, thanks for your work on furl. I've found the API very useful for slicing and dicing URLs. However I think I have found an issue with the way the class handles relative paths. Using 0.3.4 installed via pip. Consider the following:
from furl import furl
f1 = furl('http://www.domain.com/somewhere/over")
f2 = furl('the/rainbow')
print f2.path
/the/rainbow
print f1.join(f2.url)
http://www.domain.com/the/rainbow
I think the addition of the forward slash to the path in f2 is a bug, since it turns a page-relative path into a root-relative path.
There's a minor discrepancy in how the hostname is lowercased, based on which API you use.
This seems OK:
>>> url = furl('https://MyHostWithCaps.example.org')
>>> url.host
'myhostwithcaps.example.org'
>>> str(url)
'https://myhostwithcaps.example.org'
This seems not OK (or at least not consistent):
>>> url = furl('https://MyHostWithCaps.example.org')
>>> url.host = 'MyHostWithCaps.example.org' # <- difference here!
>>> url.host
'MyHostWithCaps.example.org'
>>> str(url)
'https://MyHostWithCaps.example.org'
The u.host
property should always return a lowercased value.
The u.host
property can sometimes return a string with non-lowercase chars.
There is no reason not to follow PEP 8 in 2011 :)
I noticed that furl doesn't like to handle unicode URLs. Would it make sense to make furl convert passed in unicode hostnames to punycode? It would make it a lot easier to handle user input. Using the idna lib makes this trivial, and would be nice to have built in.
Here's how to do it (in Python 3.5):
>>> url = 'http://åäö.se/'
>>> import idna, furl
>>> from urllib.parse import urlparse
>>> parsed = urlparse(url)
>>> parsed = parsed._replace(netloc=idna.encode(parsed.netloc).decode())
>>> punyfied = furl.furl(parsed.geturl())
>>> print(punyfied)
http://xn--4cab6c.se/
I have no idea how complex it would be, but another nice feature would be the ability to specify a base URL to resolve relative ones e.g.
furl("../pants", "http://example.com/mysection/page")
# Constructs a URL "http://example.com/pants"
The latest version of six (1.9.0) was released today (Jan. 2, 2015) and is causing problems for apps that depend on furl since it requires a specific version of six (1.8.0). This problem was fixed in Issue #42, but the fix wasn't included in the latest release of furl (0.4.1).
I do not get this error with furl-0.3.7. I get this error with furl-0.4.2.
In [3]: furl.furl()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-a2bd23daefba> in <module>()
----> 1 furl.furl()
/Library/Python/2.7/site-packages/furl/furl.pyc in __init__(self, url, strict)
847 self.strict = strict
848
--> 849 self.load(url) # Raises ValueError on invalid url.
850
851 def load(self, url):
/Library/Python/2.7/site-packages/furl/furl.pyc in load(self, url)
865 # Python 2.7+. In Python <= 2.6, urlsplit() doesn't raise a
866 # ValueError on malformed IPv6 addresses.
--> 867 tokens = urlsplit(url)
868
869 self.netloc = tokens.netloc # Raises ValueError in Python 2.7+.
/Library/Python/2.7/site-packages/furl/furl.pyc in urlsplit(url)
1308 url = _set_scheme(url, 'http')
1309 toks = urllib.parse.urlsplit(url)
-> 1310 return urllib.parse.SplitResult(*_change_urltoks_scheme(toks, original_scheme))
1311
1312
AttributeError: 'Module_six_moves_urllib_parse' object has no attribute 'SplitResult'
Using the new version 1.0.0, to reproduce:
from furl import furl
test = furl('www.example.com').set(scheme='http').add(path='test/path')
print(test.url)
http:///www.example.com/test/path
Notice the three forward slashes "///" in front of the domain.
The example:
from furl import furl
f = furl('http://www.google.com/?one=1&two=2')
f.args['three'] = '3'
del f.args['one']
f.url
I suppose you meant
f = furl('http://www.google.com/?one=1&two=2&three=3')
Forward slashes are currently not quoted in username/password when generating the netloc. This causes parsing issues when parsing such a URL. As per https://tools.ietf.org/html/rfc3986 the userinfo part shall not contain /
.
Thesafe=''
argument should be passed to the quote function here: https://github.com/gruns/furl/blob/master/furl/furl.py#L949-L951
The host
property can be assigned an evil value that breaks the URL parsing and leads to furl outputting invalid URLs.
Values assigned to the host property don't get escaped correctly:
>>> u = furl('https://user:[email protected]/path/goes/here')
>>> u.host = 'evil:[email protected]'
>>> str(u)
'https://user:pass@evil:[email protected]/path/goes/here'
Trying to parse this value with furl again leads to an error now:
>>> furl(str(u))
...
ValueError: Invalid port: '[email protected]'
Either escape the value for the host
property, or throw an error if this is impossible (i.e. even escaped chars are not allowed).
A URL that's broken and ends up not being a valid URL.
Hi,
I saw you fixed python 2.6 support. It would be great if you could release these fixes to Pypi. ;)
Thanks!
Jeffrey Gelens
That'd be nice if the .add()
method could have an optional replace
argument (False by default), because currently the only way to update url's params without param duplication is to call furl.args.update()
.
Can i add in a setup.py and you can add the project to pypi?
Furl should support Python 2.6+ and Python 3.x in one codebase.
Pull requests welcome.
After upgrading from 0.3.7 to 0.3.8 my tests are failing with UnicodeDecodeError
on an URL like:
>>> furl.furl(u"http://www.example.org/?kødpålæg=42")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 826, in __init__
self.load(url) # Raises ValueError on invalid url.
File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 851, in load
self.query.load(tokens.query)
File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 433, in load
self.params.load(self._items(query))
File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 566, in _items
items = self._extract_items_from_querystr(items)
File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 596, in _extract_items_from_querystr
if key.encode('utf8') == urllib.quote_plus(pairstr.encode('utf8')):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
I'm running Python 2.7.3 and my sys.stdin.encoding is UTF-8.
In [16]: furl(u'/Assassin™-Harrows-80%-Tungsten-Soft-Tip-Darts-568-HDT1007.html')
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-16-72ee7b05fd36> in <module>()
----> 1 furl(u'/Assassin™-Harrows-80%-Tungsten-Soft-Tip-Darts-568-HDT1007.html')
/usr/local/lib/python2.7/site-packages/furl/furl.py in __init__(self, url, strict)
860 self.strict = strict
861
--> 862 self.load(url) # Raises ValueError on invalid url.
863
864 def load(self, url):
/usr/local/lib/python2.7/site-packages/furl/furl.py in load(self, url)
873
874 if not isinstance(url, str):
--> 875 url = str(url)
876
877 # urlsplit() raises a ValueError on malformed IPv6 addresses in
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 9: ordinal not in range(128)
+
In [1]: furl('http://www.wayfair.com/Assassin%25E2%2584%25A2-Harrows-80%2525-Tungsten-Soft-Tip-Darts-568-HDT1007.html')
Out[1]: ---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
/Library/Python/2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
693 type_pprinters=self.type_printers,
694 deferred_pprinters=self.deferred_printers)
--> 695 printer.pretty(obj)
696 printer.flush()
697 return stream.getvalue()
/Library/Python/2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
399 if callable(meth):
400 return meth(obj, self, cycle)
--> 401 return _default_pprint(obj, self, cycle)
402 finally:
403 self.end_group()
/Library/Python/2.7/site-packages/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)
519 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
520 # A user-provided repr. Find newlines and replace them with p.break_()
--> 521 _repr_pprint(obj, p, cycle)
522 return
523 p.begin_group(1, '<')
/Library/Python/2.7/site-packages/IPython/lib/pretty.pyc in _repr_pprint(obj, p, cycle)
701 """A pprint that just redirects to the normal repr function."""
702 # Find newlines and replace them with p.break_()
--> 703 output = repr(obj)
704 for idx,output_line in enumerate(output.splitlines()):
705 if idx:
/usr/local/lib/python2.7/site-packages/furl/furl.py in __repr__(self)
1266
1267 def __repr__(self):
-> 1268 return "%s('%s')" % (self.__class__.__name__, str(self))
1269
1270
/usr/local/lib/python2.7/site-packages/furl/compat.py in __str__(self)
21 else: # Python 2
22 def __str__(self):
---> 23 return self.__unicode__().encode('utf8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 31: ordinal not in range(128)
However the follow works correctly
In [20]: f = furl('/Assassin%25E2%2584%25A2-Harrows-80%2525-Tungsten-Soft-Tip-Darts-568-HDT1007.html')
+
In [21]: f.path
Out[21]: Path('/Assassin™-Harrows-80%-Tungsten-Soft-Tip-Darts-568-HDT1007.html')
Using cached furl-0.5.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 20, in <module>
File "/private/var/folders/12/y2p8nk8s4hx_2x79q6rws8780000gn/T/pip-build-7kf0kjut/furl/setup.py", line 26
print 'python setup.py sdist'
^
SyntaxError: Missing parentheses in call to 'print'
----------------------------------------
Cannot install furl>=0.5 using pip
In furl/compat.py
, there is an import statement that imports unittest module without using it, this leads to an ImportError
on python2.6 if we haven't installed it.
The repo has a .travis.yml but it looks like you never set up Travis CI for the repo?
Any particular reason why support for fetching naked domain from url is not supported yet. I would like to contribute.
Hi,
I'm using your module directly and indirectly via the SQLAlchemy-Utils module and its URLType
column type which coerces a URL into a furl
object.
I've found an issue when I try and use the URLType
type on Python 3.x, I get the following error when I try and query an object using such a column:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2588, in all
return list(self)
File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 86, in instances
util.raise_from_cause(err)
File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 189, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 183, in reraise
raise value
File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 71, in instances
rows = [proc(row) for row in fetch]
File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 71, in <listcomp>
rows = [proc(row) for row in fetch]
File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 379, in _instance
instance = session_identity_map.get(identitykey)
File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/identity.py", line 146, in get
if key not in self._dict:
TypeError: unhashable type: 'furl'
I've managed to fix this by monkeypatching the furl
class with a __hash__
method like so:
from sqlalchemy_utils import URLType
from furl import furl
def furl_hash(self):
return hash(self.url)
furl.__hash__ = furl_hash
class Foo(Model):
url = Column(URLType)
...
Would it be possible to add such a method? I'm not sure if I'm 100% correct just using the value of self.url
here but it looked close enough.
It seems there's no way to set individual query string parameters inline. Using add() or set() replaces the whole query string. I would love to have an inline version if this:
In [5]: url = furl('http://www.a.de/?a=b&x=y')
In [6]: url.args['x'] = 'z'
In [7]: url
Out[7]: furl('http://www.a.de/?a=b&x=z')
Am I missing something? If not, I'll try and make a pull request
Loving furl, it's a great interface for URL-handling! Found this inconsistency in the API while using Python 2.7:
e.g.
# works fine
>>> furl(u'http://example.org/?kødpålæg=42')
furl('http://example.org/?k%C3%B8dp%C3%A5l%C3%A6g=42')
# fails
>>> furl('http://example.org').join(u'/?kødpålæg=42')
Traceback (most recent call last):
File "<input>", line 1, in <module>
furl('http://example.org').join(u'/kødpålæg=42')
File "/Users/jeffbr/.virtualenvs/clipper/lib/python2.7/site-packages/furl/furl.py", line 1260, in join
self.load(urljoin(self.url, str(url)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 2: ordinal not in range(128)
Related to #33 where you fixed furl.__init__
, but no sanitation seems to be done for Python 2's unicode strings in urljoin
?
When constructing a path, a reserved character like space is encoded correctly by furl:
>>> furl('http://localhost').add(path=['foo bar']).url
'http://localhost/foo%20bar'
However, a percentage character is not encoded (and the following two characters are capitalized):
>>> furl('http://localhost').add(path=['foo%bar']).url
'http://localhost/foo%BAr'
In this case I'd expect the output to be:
'http://localhost/foo%25bar'
Is it possible to add a changelog to the project so that we can see what has changed between releases? Or am I missing something?
The 'delimeter' typo I noticed and fixed in #49 turns out to be all over the place.
import os
uri = os.environ['REQUEST_URI']
from furl import furl
f = furl(uri)
print f.args['product']
print f.args['category']
If current uri is /product.py?product=12&category=2
Then it prints 12 and 2
But if current uri is /product.py?product=12
it throws KeyError
from furl import furl
furl('/some%2520path').url
-> '/some%20path'
furl('/some%252Bpath').url
-> '/some+path'
I was expecting the original encoding to be preserved.
I think a good idea is to support more than just one argument to .join()
. Example:
>>> url = furl('http://my-api-endpoint.com')
>>> url.join('order')
furl('http://my-api-endpoint.com/order')
>>> order_id = 98127
>>> url.join('order', order_id, 'status')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: join() takes exactly 2 arguments (4 given)
>>>
This should support the encoding and other powered things from furl. I imagine this similar to the os.path.join
function.
Thanks!
Steps to reproduce:
import furl
f = furl.furl('mailto:[email protected]')
f.scheme
f = furl.furl('mailto://[email protected]')
f.scheme
"mailto"
Forward slashes are not required for URIs with the "mailto" scheme.
See RFC 6068 @ http://tools.ietf.org/html/rfc6068
It would be nice if there was an inline method clone
that returned a copy of the object. It would mean you could do things like:
base_url.clone().set({path: '/method/1'})
Without affecting the original base_url
object
The orderedmultidict module required in omdict1D.py doesn't seem to be anywhere.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.