Comments (5)
This is definitely not an issue regarding non-ascii chars, as they are supported: http://**.**/**
matches correctly.
I'm limiting the amount of symbols allowed in urls though, to keep symbols often used in shells and programming languages to be eaten. I'll see if I can add a unicode category to get this to work.
from xurls.
This might be related but I couldn't get it to match https://shmibbles.me/tmp/自殺でも?.png
. xurls gives me https://shmibbles.me/tmp/自殺でも
when run directly. In conjunction with st's external pipe it doesn't go past the first unicode character. But your test case still works.
from xurls.
@j605 do you happen to know what unicode category ?
is in?
from xurls.
http://graphemica.com/%EF%BC%9F (“Other Punctuation” apparently)
from xurls.
@HalosGhost you're right, thank you.
Here is the full list: http://www.fileformat.info/info/unicode/category/Po/list.htm
We already allow many of the ascii ones in there like .
, ,
, !
and ?
itself as long as they are not at the end of the URL. It makes things more consistent to do this across the entire unicode character class.
I'm applying this change. The only breaking change that my tests catch is that "
is now allowed if not in the end of the URL. So before, http://foo.com/@"style="color:red"onmouseover=func()
would match just http://foo.com/
but now it matches all of it (I think I found this one in some twitter tests).
This example is very obscure though, and I'd rather be consistent and support unicode better than support these very rare edge cases.
from xurls.
Related Issues (20)
- parsing issue with json file HOT 2
- missing postgresql in schemes HOT 1
- Dangling dots, mid-string, are seen as domains HOT 5
- generate: concurrent map write
- Managing duplicate URLs? HOT 4
- Email support HOT 6
- go: error loading module requirements HOT 3
- cmd/xurls: -fix eats input when URLs get longer
- avoid Relaxed from matching trailing TLDs without a word break HOT 2
- tag request HOT 1
- Static Urls HOT 1
- Issue with Email Addresses HOT 7
- [Bug] - Identifying tel:654654 as URL HOT 1
- go get -u ? HOT 1
- Relaxed mode is too relaxed HOT 3
- xurls does not recognize valid IRIs HOT 1
- authority component parsing does not align with RFC 3987 HOT 3
- make a deterministic variant of "go generate" and have CI check it's up to date HOT 1
- character ranges can optionally be limited to ascii HOT 2
- add a mode to only get relative urls? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xurls.