Git Product home page Git Product logo

Comments (6)

mvdan avatar mvdan commented on July 21, 2024

Thanks for filing this issue. This is a bit of an edge case, because characters like | and , are valid and even common in URLs, but they're also very common in plaintext.

The strategy I ended up with is to accept them as "middle" characters in an URL, but not as the last character. For example:

$ echo "http://foo| bar" | xurls
http://foo
$ echo "http://foo|bar" | xurls
http://foo|bar
$ echo "http://foo, bar" | xurls
http://foo
$ echo "http://foo,bar" | xurls
http://foo,bar

I could make the URL matching more conservative, and to always cut off commas and vertical bars. However, that would break perfectly valid URLs like https://en.wikipedia.org/wiki/Colma,_California, for example. As I'm writing this, I wonder if GitHub will auto-link that properly :)

You can also find real examples using vertical bars, like https://fonts.googleapis.com/css?family=Lato:400,700,400italic,700italic%7CRoboto+Slab:400,700%7CInconsolata:400,700. In both cases, note how modern browsers don't escape the character.

I think the current mechanism is an OK middle ground. If you can provide a real example, perhaps there's some tweak we could make. With the http://google.com|google.com example you gave above, I don't think there's anything we can do without breaking perfectly valid URLs.

from xurls.

mvdan avatar mvdan commented on July 21, 2024

I forgot to say - if you would have plaintext that you know makes heavy use of certain characters like |, what you could do is use strings.Split first, then pass each part through xurls later.

from xurls.

mvdan avatar mvdan commented on July 21, 2024

Ping @jlory - any thoughts?

from xurls.

jlory avatar jlory commented on July 21, 2024

So the story behind my findings with the "|" is I'm using the Slack API to parse some messages and I found out that they rewrite URL links in messages using | sometimes: https://api.slack.com/docs/message-formatting#linking_to_urls

In your example the pipe character is escaped and replaced by the proper value, as I'm reading this: https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid it's still unclear if we should exclude it, I've never seen a URL / URI with | in it.

As for now I'm actually doing a string split with | and discarding the rest.

from xurls.

mvdan avatar mvdan commented on July 21, 2024

Hmm, you're right - that's a good example of | being used to separate URLs. I haven't myself found a use case for vertical bars to be part of a URL, so I'll make this change.

If anyone runs into regressions because of it, they can file an issue, and we can reconsider reverting the commit at that point.

from xurls.

mvdan avatar mvdan commented on July 21, 2024

Ah, this was never intended to work like this. I simply added a | in the wrong place - between [ and ], effectively adding it to a character set by mistake.

from xurls.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.