Git Product home page Git Product logo

Comments (5)

gruns avatar gruns commented on July 26, 2024

First, thanks for your work on furl. I've found the API very useful for
slicing and dicing URLs.

No - thank you for using furl.

This behavior is a result of the ambiguity of incomplete URLs. For example

>>> f = furl('the/rainbow')

is clearly a path. But what about

>>> f = furl('google.com')

Is the intended URL the path '/google.com' or the domain 'google.com/'? It's
ambiguous.

By default, furl treats ambiguous inputs as paths. Then, when a path-only furl
is serialized to a URL, it's prepended with a '/' if it doesn't start with one
already.

>>> f = furl('google.com')
>>> f.url
'/google.com'

This is natural because in a full URL a path cannot start without a '/'. For
example

>>> f = furl('a/path')
>>> f.host = 'google.com

f.url should now be

>>> f.url
'google.com/a/path'

not

>>> f.url
'google.coma/path'

Note the automatically prepended '/' to 'a/path' in the final URL.

It's this automatic prepending of a '/' to path-only furls that results in the
unexpected behavior observed with furl.join().

I'll think about how this ambiguity and resultant unexpected behavior can be
mitigated.

from furl.

gruns avatar gruns commented on July 26, 2024

It makes sense for path-only URLs to be prepended with a '/' when serialized to
a URL. Paths in a URL must be preceded by a '/'.

>>> f = furl('a/path')
>>> f.url
'/a/path'

I think the best course of action is to remove the invariant that URL Paths are
always absolute. URL Paths should be optionally absolute, like Fragment Paths.

>>> f = furl('a/path')
>>> f.url
'/a/path'
>>> str(f.path)
'a/path'
>>> f.path.isabsolute
False
>>> f.path.isabsolute = True
>>> str(f.path)
'/a/path'

So, if your intention is to join() a non-absolute path to a URL, like
originally proposed, you would join() with the Path object, not the
serialized URL.

>>> f1 = furl('http://www.domain.com/somewhere/over")
>>> f2 = furl('the/rainbow')
>>> f2.url
'/the/rainbow'
>>> str(f2.path)
'the/rainbow'
>>> f2.path.isabsolute
False
>>> f1.join(str(f2.path)).url
'http://www.domain.com/somewhere/over/the/rainbow"

What do you think?

from furl.

Markbnj avatar Markbnj commented on July 26, 2024

You get to the correct results, but I'm not a fan of f2.url producing the path with the slash prepended. First, let me challenge the statement: "Paths in a URL must be preceded by a '/'." The URL RFC explicitly allows partial URLs. Here are the w3 rules on expanding them: http://www.w3.org/Addressing/URL/4_3_Partial.html. The key point is that these partial URLs commonly appear in web pages, and users of your package will definitely be trying to parse them with it. A URL of the form given in your paraphrasing of my example, i.e. "the/rainbow" has a specific meaning within the context of a parent object, and you can't arbitrarily change that meaning by prepending a '/' to it.

In your earlier example, "google.com," this is a case of trying to help the implementer more than he or she deserves. According to all the rules of URLs that is a partial path. You and I might recognize it as a domain name and treat it specially, but there is no reason for your library to do so. In short, given a base URL and a list of URLs to be joined with it, this is what I would expect to happen:

Base URL: http://www.domain.com/somewhere/over

the/rainbow - http://www.domain.com/somewhere/over/the/rainbow
/the/rainbow - http://www.domain.com/the/rainbow
//the/rainbow - //the/rainbow
google.com - http://www.domain.com/somewhere/over/google.com

The last case may seem strange, but it is the fault of the implementer, not your library. In this case adhering to the rule and allowing things to come apart at the seams is probably the kinder way to proceed.

from furl.

gruns avatar gruns commented on July 26, 2024

You're right: prepending '/' to non-absolute paths when they're serialized to a
URL is confusing. furl is, in-effect, modifying the input data without being
instructed to do so. It's confusing if one feeds 'a/path' into furl, makes no
changes to the furl object, but doesn't get 'a/path' back out.

A strong, natural solution is the one I mentioned before: remove the invariant
that URL Paths are always absolute. Thus, the new behavior will be:

>>> f = furl('a/path')
>>> f.url
'a/path'
>>> f.path.isabsolute
False

Instead of the current behavior:

>>> f = furl('a/path')
>>> f.url
'/a/path'
>>> f.path.isabsolute
True

For the second issue, treating 'google.com' in furl('google.com') as a path, not
a domain, is already in-place and will remain so. furl will not give paths that
resemble domains special treatment.

I'm leaving this ticket open until I fix the path issue. Pull requests welcome.

from furl.

gruns avatar gruns commented on July 26, 2024

This issue has been fixed in furl v0.3.5. URL paths are no longer always absolute if non-empty; they're now only always absolute in the presence of a netloc (a username, password, host, and/or port).

>>> from furl import furl
>>> f = furl('/a/path')
>>> f.path.isabsolute
True
>>> f.path
Path('/a/path')
>>> f.path.isabsolute = False
>>> f.path
Path('a/path')
>>> f.host = 'arc.io'
>>> f
furl('arc.io/a/path')
>>> f.path.isabsolute
True
>>> f.path.isabsolute = True
Traceback (most recent call last):
  ...
AttributeError: Path.isabsolute is True and read-only for URLs with a netloc (a username, password, host, and/or port). A URL path must start with a '/' to separate itself from a netloc.

Your original example now works (though somewhere/over should be somewhere/over/ for
the joined path to become /somewhere/over/the/rainbow/, as probably intended).

>>> f1 = furl('http://www.domain.com/somewhere/over/')
>>> f2 = furl('the/rainbow')
>>> print f2.path
the/rainbow
>>> print f1.join(f2.url)
http://www.domain.com/somewhere/over/the/rainbow

Upgrade to furl v0.3.5 with

pip install furl --upgrade

Thank you for bringing this issue to my attention and for your input and suggestions, Markbnj.

from furl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.