Comments (3)
I was wondering if trurl allows to remove the scheme (to dedup them later)
Oh, duh. Sorry, your example also had URLs that were identical except for the scheme, so I don't know how i missed that. :p
Still, I don't understand why you are trying to only remove the scheme.
In that case, you can simply set the scheme to the desired value e.g. http://
and then pipe to sort -u
or awk '!seen[$0]++'
, no?
$ trurl -f - -s 'scheme=http' < ./test | sort -u
If you want to do something more complex like discarding non-http/https URLs, and keeping https:// if both http:// and https:// are specified, you can use jq
:
$ trurl --json -f - < ./test | jq -r 'group_by(del(.url, .scheme, .raw_port))[] | first(("https", "http") as $s | .[] | select(.scheme == $s).url)'
from trurl.
trurl only outputs, unless you use -g
or --json
, valid URLs, one for each line of output.
--set
, --redirect
, --trim
, --append
, --iterate
, and --sort-query
, only modify the URL in a way that keeps it valid, and re-parsable by libcurl (with the current flags: --accept-space
, --no-guess-scheme
, etc.).
You cannot use a --trim
command that outputs something without a scheme, because that is not a valid URL.
If your goal is actually to only print out only the {host}
and {path}
parts of the URL, you can use -g '{:host}{:path}'
:
$ cat test
http://a.example.com/test/foo/./bar/..
xyz.example.org
https://b.example.com:20/test?hi#hello
ftp://[email protected]/hey.txt
$ trurl -f - < ./test
http://a.example.com/test/foo
http://xyz.example.org/
https://b.example.com:20/test?hi#hello
ftp://[email protected]/hey.txt
$ trurl -f - -g '{:host}{:path}' < ./test
a.example.com/test/foo
xyz.example.org/
b.example.com:20/test
c.example.org/hey.txt
You may also use {:host}{:path}{:query}{:fragment}
since {query}
and {fragment}
expand with ?
/#
at the start, but if you also want to include also other stuff like {user}
and {pass}
it gets tricky, because if you use -g '{:user}:{:pass}@{:host}{:path}'
it gets tricky since trurl would output :@a.example.org/foo
for http://a.example.org/foo
which is probably not what you want.
Maybe the -g
command could be improved to allow printng a full URL with some parts omitted somehow to satisfy your use case, but I don't know how that would be useful. Can you explain why you are doing this?
Anyway, as a workaround, in the specific case of removing a scheme, if you really want to remove the scheme and nothing else from a full URL for some reason, I guess you can use something like this:
$ trurl -f - < ./test | sed -n 's@^[^:]*://@@p'
a.example.com/test/foo
xyz.example.org
b.example.com:20/test?hi#hello
[email protected]/hey.txt
$ # or to only print http/https URLs, without the scheme
$ trurl -f - < ./test | sed -n 's@^https\{0,1\}://@@p'
a.example.com/test/foo
xyz.example.org
b.example.com:20/test?hi#hello
$ # notice that trurl guessed the scheme for xyz.example.org as http://
$ # so it is printed.
This should be fine since trurl will only output lines that contain one full valid URL, and discard invalid URLs in the input, so you can assume that the scheme will not contain colons, and removing everything before the first ":", and the "://" after that will only remove the scheme.
from trurl.
I'm with @emanuele6. You can do this already with a few very simple workarounds: either decide to use -g
and output all parts except the scheme, or just set a fixed scheme before you compare. I think "trurl only outputs valid URLs" is a good idea to stick to.
from trurl.
Related Issues (20)
- Construct and build URLs from JSON HOT 5
- Tests fail with out IDN HOT 7
- Using `--iterate foo='bar baz'` and `--set foo='rab'` at the same time is not an error HOT 4
- trurl's checksrc.pl is outdated
- Add more options for configuring curl_url_set() HOT 6
- Query params values in JSON output have NUL replaced by . HOT 4
- Shell Mode Feature HOT 1
- Test fails on windows because of different null device name HOT 4
- Please support zero-sized fragment and query HOT 9
- unable to selectively remove query parameters with asterisk in them HOT 5
- Invalid characters in scheme give wrong error HOT 2
- test fails if built against libcurl 8.3.0 HOT 4
- new version release with recent fixes? HOT 1
- About the release scheme HOT 6
- Components with control characters don't appear in `--json` output, and non-urlencoded `--get` fails HOT 3
- [FR] allow JSON input in the same form as JSON parts output HOT 3
- incorrect decode of %3d as = character in query string leads to out of bound reads
- incorrect decode of %00 in query string leads to out of bound reads when printing the param key HOT 2
- incorrect handling of strings with null characters in memdupdec causes out of bounds write to adjacent memory region HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trurl.