Handling of page-relative vs. root-relative paths about furl HOT 5 CLOSED

gruns commented on July 26, 2024

Handling of page-relative vs. root-relative paths

from furl.

Comments (5)

gruns commented on July 26, 2024

First, thanks for your work on furl. I've found the API very useful for
slicing and dicing URLs.

No - thank you for using furl.

This behavior is a result of the ambiguity of incomplete URLs. For example

>>> f = furl('the/rainbow')

is clearly a path. But what about

>>> f = furl('google.com')

Is the intended URL the path '/google.com' or the domain 'google.com/'? It's
ambiguous.

By default, furl treats ambiguous inputs as paths. Then, when a path-only furl
is serialized to a URL, it's prepended with a '/' if it doesn't start with one
already.

>>> f = furl('google.com')
>>> f.url
'/google.com'

This is natural because in a full URL a path cannot start without a '/'. For
example

>>> f = furl('a/path')
>>> f.host = 'google.com

f.url should now be

>>> f.url
'google.com/a/path'

not

>>> f.url
'google.coma/path'

Note the automatically prepended '/' to 'a/path' in the final URL.

It's this automatic prepending of a '/' to path-only furls that results in the
unexpected behavior observed with furl.join().

I'll think about how this ambiguity and resultant unexpected behavior can be
mitigated.

from furl.

gruns commented on July 26, 2024

It makes sense for path-only URLs to be prepended with a '/' when serialized to
a URL. Paths in a URL must be preceded by a '/'.

>>> f = furl('a/path')
>>> f.url
'/a/path'

I think the best course of action is to remove the invariant that URL Paths are
always absolute. URL Paths should be optionally absolute, like Fragment Paths.

>>> f = furl('a/path')
>>> f.url
'/a/path'
>>> str(f.path)
'a/path'
>>> f.path.isabsolute
False
>>> f.path.isabsolute = True
>>> str(f.path)
'/a/path'

So, if your intention is to join() a non-absolute path to a URL, like
originally proposed, you would join() with the Path object, not the
serialized URL.

>>> f1 = furl('http://www.domain.com/somewhere/over")
>>> f2 = furl('the/rainbow')
>>> f2.url
'/the/rainbow'
>>> str(f2.path)
'the/rainbow'
>>> f2.path.isabsolute
False
>>> f1.join(str(f2.path)).url
'http://www.domain.com/somewhere/over/the/rainbow"

What do you think?

from furl.

Markbnj commented on July 26, 2024

You get to the correct results, but I'm not a fan of f2.url producing the path with the slash prepended. First, let me challenge the statement: "Paths in a URL must be preceded by a '/'." The URL RFC explicitly allows partial URLs. Here are the w3 rules on expanding them: http://www.w3.org/Addressing/URL/4_3_Partial.html. The key point is that these partial URLs commonly appear in web pages, and users of your package will definitely be trying to parse them with it. A URL of the form given in your paraphrasing of my example, i.e. "the/rainbow" has a specific meaning within the context of a parent object, and you can't arbitrarily change that meaning by prepending a '/' to it.

In your earlier example, "google.com," this is a case of trying to help the implementer more than he or she deserves. According to all the rules of URLs that is a partial path. You and I might recognize it as a domain name and treat it specially, but there is no reason for your library to do so. In short, given a base URL and a list of URLs to be joined with it, this is what I would expect to happen:

Base URL: http://www.domain.com/somewhere/over

the/rainbow - http://www.domain.com/somewhere/over/the/rainbow
/the/rainbow - http://www.domain.com/the/rainbow
//the/rainbow - //the/rainbow
google.com - http://www.domain.com/somewhere/over/google.com

The last case may seem strange, but it is the fault of the implementer, not your library. In this case adhering to the rule and allowing things to come apart at the seams is probably the kinder way to proceed.

from furl.

gruns commented on July 26, 2024

You're right: prepending '/' to non-absolute paths when they're serialized to a
URL is confusing. furl is, in-effect, modifying the input data without being
instructed to do so. It's confusing if one feeds 'a/path' into furl, makes no
changes to the furl object, but doesn't get 'a/path' back out.

A strong, natural solution is the one I mentioned before: remove the invariant
that URL Paths are always absolute. Thus, the new behavior will be:

>>> f = furl('a/path')
>>> f.url
'a/path'
>>> f.path.isabsolute
False

Instead of the current behavior:

>>> f = furl('a/path')
>>> f.url
'/a/path'
>>> f.path.isabsolute
True

For the second issue, treating 'google.com' in furl('google.com') as a path, not
a domain, is already in-place and will remain so. furl will not give paths that
resemble domains special treatment.

I'm leaving this ticket open until I fix the path issue. Pull requests welcome.

from furl.

gruns commented on July 26, 2024

This issue has been fixed in furl v0.3.5. URL paths are no longer always absolute if non-empty; they're now only always absolute in the presence of a netloc (a username, password, host, and/or port).

>>> from furl import furl
>>> f = furl('/a/path')
>>> f.path.isabsolute
True
>>> f.path
Path('/a/path')
>>> f.path.isabsolute = False
>>> f.path
Path('a/path')
>>> f.host = 'arc.io'
>>> f
furl('arc.io/a/path')
>>> f.path.isabsolute
True
>>> f.path.isabsolute = True
Traceback (most recent call last):
  ...
AttributeError: Path.isabsolute is True and read-only for URLs with a netloc (a username, password, host, and/or port). A URL path must start with a '/' to separate itself from a netloc.

Your original example now works (though somewhere/over should be somewhere/over/ for
the joined path to become /somewhere/over/the/rainbow/, as probably intended).

>>> f1 = furl('http://www.domain.com/somewhere/over/')
>>> f2 = furl('the/rainbow')
>>> print f2.path
the/rainbow
>>> print f1.join(f2.url)
http://www.domain.com/somewhere/over/the/rainbow

Upgrade to furl v0.3.5 with

pip install furl --upgrade

Thank you for bringing this issue to my attention and for your input and suggestions, Markbnj.

from furl.

Handling of page-relative vs. root-relative paths about furl HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent