Comments (6)
I don't understand what --verify scheme
is supposed to do.
If sample
was meant to be the host name, you should have used -s 'host=sample'
or -s "host=$sample"
, and you would have gotten:
$ trurl -s 'host=sample'
trurl error: not enough input for a URL
trurl error: Try trurl -h for help
With trurl sample
you are specifying sample
as a URL, and libcurl will parse it as a URL.
Can you specify an actual use case for this to clarify what you want it to do? Why did you think this would be useful?
In any case, in my opinion, we should definitely not implement wacky optional option arguments like those. If you want optional arguments, use something like --verify=foo
to specify the optional argument or just use another option.
A --verify
that may or may not interpret the next argument as an option argument, or that must be specified at the end if you don't want to pass the optional argument (making it impossible to use with --
) are just bad non-script-friendly option styles.
from trurl.
Sorry for the confusion, scheme
here would be the preceding http://
, and sample
is a generic URL. The use case I'm thinking of is a script for extracting all the "valid" URLs in a large text file. trurl has a lot of the mechanics in place for something like this, but it would be kind of roundabout, so it would be nice to have something that can verify the string is a URL that meets some requirements (like having a preceding HTTP or something). It might make more sense to do it with something similar to --get {raw:port}
. Let me know if you have any more questions, I feel like I am missing some keywords in my explanation.
from trurl.
Sorry for the confusion,
scheme
here would be the precedinghttp://
, and sample is a generic URL.
Can you provide an example command line of what you mean withhttp://
as scheme?
You mean like trurl -u "$url" --verify=http://
?
The use case I'm thinking of is a script for extracting all the "valid" URLs in a large text file.
This doesn't really help much understand what you mean.
trurl
already discards unparsable URLs in input files:
$ cat foo.txt
https://example.org
file:///home/emanuele6/./foo/..
\\
foo
:/lol/
ftp://curl.se
$ trurl -f foo.txt
https://example.org/
file:///home/emanuele6/
trurl note: Bad hostname [\\]
http://foo/
trurl note: Port number was not a decimal number between 0 and 65535 [:/lol/]
ftp://curl.se/
$ trurl -f foo.txt 2>/dev/null
https://example.org/
file:///home/emanuele6/
http://foo/
ftp://curl.se/
If you are asking for an option that would count URLs without a scheme invalid, I think it is fine to add an option like --strict
/--no-guess-scheme
that makes trurl call curl_url_set()
without CURLU_GUESS_SCHEME
to achieve that. It would be a great addition.
We could also add a --no-credentials
one that makes it also use CURLU_DISALLOW_USER
.
I don't really see the connection with --verify
though, --verify
does something completely different: it makes trurl
abort and return non-zero at the first invalid URL; e.g. from the example above trurl --verify -f foo.txt 2>/dev/null
will output:
https://example.org
file:///home/emanuele6/
trurl --verify
, instead of ignoring \\
and :/lol/
, and printing a warning for them, aborts as soon as the first invalid URL (\\
in this case) is encountered, so http://foo/
and ftp://curl.se/
never get printed.
trurl has a lot of the mechanics in place for something like this, but it would be kind of roundabout, so it would be nice to have something that can verify the string is a URL that meets some requirements (like having a preceding HTTP or something).
Isn't that the same as the --filter
proposed in #159?
trurl
will output only valid URLs, so in the specific case of checking the scheme you can just use even grep
e.g. to filter non-https://
URLs trurl -f urls.txt 2>/dev/null | grep '^https://'
.
And if you need something more complex, like only URLs with a username specified, you can use JSON output and a tool like jq
:
# only URLs with a embedded username
trurl --json -f urls.txt 2>/dev/null | jq -r '.[] | select(.user).url'
# only URLs with a embedded username that is in the users array
users=( emanuele6 jacobmealey )
trurl --json -f urls.txt 2>/dev/null | jq -r --args '.[] | select(.user | IN($ARGS.positional[])).url' -- "${users[@]}"
# or, streaming solution:
trurl --json -f urls.txt 2>/dev/null |
jq -nr --stream 'fromstream(1 | truncate_stream(inputs)) | select(.user | IN("tom", "bob").url'
It might make more sense to do it with something similar to
--get {raw:port}
How? -g '{if:scheme=https:{url}:{nonewline}}'
? xD
Maybe I am not understanding what you are proposing to add again.
from trurl.
If you are asking for an option that would count URLs without a scheme invalid, I think it is fine to add an option like --strict/--no-guess-scheme that makes trurl call curl_url_set() without CURLU_GUESS_SCHEME to achieve that. Adding more options to configure the CURLU_* flags passed to curl_url_set() would definitely be a nice addition!
this is exactly what I mean, yes! perhaps I should have omitted --verify in my original proposal. THANK YOU
from trurl.
The only meaningful options to configure the flags passed to curl_url_set() that are not already implemented are:
one to not pass6909ceeCURLU_GUESS_SCHEME
- one to pass
CURLU_DISALLOW_USER
, disallows embeddeduser:password@
in URLs - one to pass
CURLU_PATH_AS_IS
, skips path normalisation:https://foo/a/b/../.././c/d/e/foo/../..
remains unchanged instead of becominghttps://foo/c/d/
. CURLU_DEFAULT_SCHEME
, URLs are assumed to behttps://
. Note that this is different fromCURLU_GUESS_SCHEME
that assumeshttp://
in general, andftp://
if the host name starts withftp.
,imap://
if the host name starts withimap.
, etc.
The one to not pass CURLU_GUESS_SCHEME
can definitely be very helpful. I am not too sure about the usefulness of the other ones.
from trurl.
I think this was fixed in #195 so I'm closing it.
from trurl.
Related Issues (20)
- Construct and build URLs from JSON HOT 5
- Tests fail with out IDN HOT 7
- Using `--iterate foo='bar baz'` and `--set foo='rab'` at the same time is not an error HOT 4
- trurl's checksrc.pl is outdated
- `trurl --trim scheme`? HOT 3
- Query params values in JSON output have NUL replaced by . HOT 4
- Shell Mode Feature HOT 1
- Test fails on windows because of different null device name HOT 4
- Please support zero-sized fragment and query HOT 9
- unable to selectively remove query parameters with asterisk in them HOT 5
- Invalid characters in scheme give wrong error HOT 2
- test fails if built against libcurl 8.3.0 HOT 4
- new version release with recent fixes? HOT 1
- About the release scheme HOT 6
- Components with control characters don't appear in `--json` output, and non-urlencoded `--get` fails HOT 3
- [FR] allow JSON input in the same form as JSON parts output HOT 3
- incorrect decode of %3d as = character in query string leads to out of bound reads
- incorrect decode of %00 in query string leads to out of bound reads when printing the param key HOT 2
- incorrect handling of strings with null characters in memdupdec causes out of bounds write to adjacent memory region HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trurl.