Git Product home page Git Product logo

trurl's Introduction

curl logo

Curl is a command-line tool for transferring data specified with URL syntax. Find out how to use curl by reading the curl.1 man page or the MANUAL document. Find out how to install Curl by reading the INSTALL document.

libcurl is the library curl is using to do its job. It is readily available to be used by your software. Read the libcurl.3 man page to learn how.

You can find answers to the most frequent questions we get in the FAQ document.

Study the COPYING file for distribution terms.

Contact

If you have problems, questions, ideas or suggestions, please contact us by posting to a suitable mailing list.

All contributors to the project are listed in the THANKS document.

Commercial support

For commercial support, maybe private and dedicated help with your problems or applications using (lib)curl visit the support page.

Website

Visit the curl website for the latest news and downloads.

Git

To download the latest source from the Git server, do this:

git clone https://github.com/curl/curl.git

(you will get a directory named curl created, filled with the source code)

Security problems

Report suspected security problems via our HackerOne page and not in public.

Notice

Curl contains pieces of source code that is Copyright (c) 1998, 1999 Kungliga Tekniska Högskolan. This notice is included here to comply with the distribution terms.

Backers

Thank you to all our backers! 🙏 Become a backer.

Sponsors

Support this project by becoming a sponsor.

trurl's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trurl's Issues

{query:key} for multiple keys

Currently, getting {query:key} only shows the first occurrence of key.

$ trurl "https://curl.se?a=d&b=c&a=b" --get {query:a}
d
$ trurl "https://curl.se?a=d&b=c&a=b" --get {query:a} --sort-query
b

We might want a way to ask for all occurrences to be shown

Error message that could be improved

When the argument is invalid, the error message is a bit cryptic:

 % ./trurl "inva lid"
not enough input for a URL:

Usage: [options] [URL]
  -h,--help                   - this help
  -v,--version                - show version
...

Query string sort

Feature:

  • order query string variables according to supplied list
  • sort query string variables ascending/descending

Order query string variables according to list:
$ trurl "https://example.com?echo=foo&lima=foo&november=foo" -qsl '[november, lima, echo]' would produce
https://example.com?november=foo&lima=foo&echo=foo

Sorted ascending or descending:
$ trurl "https://example.com?echo=foo&lima=foo&november=foo" -qsa would produce
https://example.com?echo=foo&lima=foo&november=foo

$ trurl "https://example.com?echo=foo&lima=foo&november=foo" -qsd would produce
https://example.com?novemmber=foo&lima=foo&echo=foo

Background: I do a lot of URL manipulation to de-duplicate Podcast feed links (for PodcastIndex.org), by sorting query string variables or ordering them in a desired fashion, duplicates can be found and removed.

adapt tests to less capable libcurl versions

  • trurl can get built with a restricted libcurl
  • trurl might execute with a restricted libcurl

(where "restricted" means it might not have all the features or might suffer from a known bug.)

These needs to be taken into account when running tests, since not all features can evaluate and be performed in all situations.

Cannot compile on Ubuntu 20.04.5 LTS

(Hello Daniel, thank you for the project)

I cloned trurl and fetched libcurl4-openssl-dev on Ubuntu Focal (20.04) and found that libcurl4-openssl-dev (7.68.0-1ubuntu2.18) alone is insufficient.

I see the note in the readme on CURLUPART_ZONEID, is it the case that the minimum required version for compiling is thus 7.81.0?

~/src/github.com/trurl (master ✔) make 
cc  -W -Wall -pedantic -g   -c -o trurl.o trurl.c
trurl.c: In function ‘get’:
trurl.c:365:18: error: ‘CURLUE_NO_ZONEID’ undeclared (first use in this function); did you mean ‘CURLUPART_ZONEID’?
  365 |             case CURLUE_NO_ZONEID:
      |                  ^~~~~~~~~~~~~~~~
      |                  CURLUPART_ZONEID
trurl.c:365:18: note: each undeclared identifier is reported only once for each function it appears in
trurl.c:369:55: warning: implicit declaration of function ‘curl_url_strerror’; did you mean ‘curl_multi_strerror’? [-Wimplicit-function-declaration]
  369 |               fprintf(stderr, PROGNAME ": %s (%s)\n", curl_url_strerror(rc),
      |                                                       ^~~~~~~~~~~~~~~~~
      |                                                       curl_multi_strerror
trurl.c:47:25: warning: format ‘%s’ expects argument of type ‘char *’, but argument 3 has type ‘int’ [-Wformat=]
   47 | #define PROGNAME        "trurl"
      |                         ^~~~~~~
trurl.c:369:31: note: in expansion of macro ‘PROGNAME’
  369 |               fprintf(stderr, PROGNAME ": %s (%s)\n", curl_url_strerror(rc),
      |                               ^~~~~~~~
trurl.c:369:44: note: format string is defined here
  369 |               fprintf(stderr, PROGNAME ": %s (%s)\n", curl_url_strerror(rc),
      |                                           ~^
      |                                            |
      |                                            char *
      |                                           %d
make: *** [<builtin>: trurl.o] Error 1

When given a file of URLs, continue even if one is bad

This case should output four URLs - there is one bad one in the middle.

$ cat urls.txt 
localhost
haxx.se
ftp://ex ample.com
https://curl.se
curl.haxx.se
$ trurl -f urls.txt
http://localhost/
http://haxx.se/
trurl note: Bad hostname [ftp://ex ample.com]
trurl error: not enough input for a URL
trurl error: Try trurl -h for help

Python test script

Have a look at https://github.com/sevehub/trurl, JSON data (test.json) and the script (test.py) are separated. It also shows the return codes.

Let me know if you prefer this solution or if you are happier with the existing Perl scripts.

-- is not supported

trurl doesn't support the standard -- argument to indicate end of options

bash-5.1$ trurl --
trurl error: unknown option: --
trurl error: Try trurl -h for help
bash-5.1$ trurl -- hi
trurl error: unknown option: --
trurl error: Try trurl -h for help

Unicode characters shouldn't be mapped to their codes

I'm guessing this isn't the desired output for Unicode inputs:

$ trurl --json 'https://😃@example.com/x?emoji=%f0%9f%98%83'
[
  {
    "url": "https://[email protected]/x?emoji=%f0%9f%98%83",
    "scheme": "https",
    "user": "ufffffff0uffffff9fuffffff98uffffff83",
    "host": "example.com",
    "port": "443",
    "path": "/x",
    "query": "emoji=ufffffff0uffffff9fuffffff98uffffff83"
  }
]

Literal versus percent-encoding doesn't seem to matter.

Loop over multiple --sets

I'm not sure what it would look like but it would be helpful to generate a range of URLs for something like the port. I was thinking it could have a syntax similar to below but there is most certainly a better way!
./trurl --url google.com --set port=[1000-1005]

or if it's something that would be a string instead of numbers it could take a list like:
./trurl --url google.com --set host=[google.co.uk, google.co.nz]

Improve help output

Hi,

I've created 3 different pull-requests (#2 #3 #4) to improve the program help output.
If you prefer, I can merge them in a single one.

url-encoding

Offer a way to URL encode input similar to curl's --data-urlencode and --url-query

offer handling spaces in URLs

From experiences with curl I know there are lots of users "out there" who want to use URLs with spaces and want tools to just automatically encode them "correctly".

I think we should consider allowing trurl to do this with an extra option.

"hide" the port if set to the scheme's default port

trurl could be told to not show the port number when the URL is displayed and the port number is set to the default number of the scheme

This is what it does now:

$ trurl https://example.com:4433/ --set port=443
https://example.com:443/

This is how it could behave:

$ trurl https://example.com:4433/ --set port=443
https://example.com/

Multiple URLs and/or STDIN

Only one URL can be manipulated per invocation using the --url flag. It may be useful to manipulate more than one.

I have yet to find a specific use case, so this seems more like a UX thing. It seems natural that a focused utility like this would accept its "main" input as non-flag arguments or via standard input. For example:

$ tool --set-user myself 'https://github.com' 'ftp://google.com'
https://[email protected]
ftp://[email protected]

It's not uncommon for "no arguments" to cause a utility to read from STDIN, but in my experience, that can allow mistakes to cause hung scripts. A good balance, I think, is some "file" flag that uses - to mean STDIN:

$ echo 'https://github.com' > data.txt
$ echo 'ftp://google.com' >> data.txt
$ tool --set-user myself --file data.txt
https://[email protected]
ftp://[email protected]
$ echo '[{"url": "https://github.com"}, {"url": "ftp://google.com"}]' > data.json
$ jq -r .[].url data.json | tool --set-user myself --file -
https://[email protected]
ftp://[email protected]

Construct and build URLs from JSON

Very cool and useful program.
I was wondering:
The very same JSON output that this util give when using --json could be processed by some other program (like jq) and then this program could reconstruct a new URL. I known that we already have the set flag but for some case it could be useful to use jq for more complex manipulation

Examples:
echo "https://example.org/helloworld" | trurl --json | trurl --construct should give the same url

Could this be interesting as a feature of this program?

Handling mailto: URLs

This doesn't seem to handle mailto: URLs. It prepends "http://" to the input and treats it as a web link.

$ trurl "mailto:[email protected]?subject=foo?body=bar" --json
[
  {
    "url": "http://mailto:[email protected]/?subject=foo?body=bar",
    "scheme": "http",
    "user": "mailto",
    "password": "jdoe",
    "host": "example.com",
    "port": "80",
    "path": "/",
    "query": "subject=foo?body=bar"
  }
]

Getting the punycode version

Due to to how crazy IDN is the only way to be able to get a canonical version of a URL with IDN is to convert it to its punycode version, which I figure is a valid reason for trurl to offer this feature: extract the host as punycoded or show the entire URL with the host name punycoded.

But how to expose this power to the CLI ?

My initial thinking is to use another symbol for the --get variables. We already do {:var} for not URL decoding, and we could use something else than colon to ask for the punycoded version. Maybe '*' ? Then you can ask for both the plain name and the punycoded one in the same command line: --get '{host} {*host}'

Thoughts? Better ideas?

Should `{query:}` be case sensitive?

Currently, the {component} is case insensitive

$ trurl "https://duck.com/?q=trurl&t=h_&ia=web" --get "{query}\n{QUERY}"
q=trurl&t=h_&ia=web
q=trurl&t=h_&ia=web

But {query:} isn't, which seems inconsistent

$ trurl "https://duck.com/?q=trurl&t=h_&ia=web" --get "{query:q}\n{QUERY:q}"
trurl
trurl error: Bad --get syntax: QUERY:q}
trurl error: Try trurl -h for help

Build path component from multiple segments

I find myself working with "base" URLs from time to time that already have a path component. It would be nice if there were a way to "extend" the path of a URL more intelligently than I can with shell string concatenation.

Maybe something like the following:

$ tool --segment 'a' --segment '/' 'https://github.com/b'
https://github.com/b/a/%2F

Get and set the origin

The origin is the foundational concept in web security, defined as the combination of scheme, host, and port.

I'd love to have a way to either get the origin or a URL, or replace it. This is not only a convenience for calculating the main browser security boundary of a URL, but it could also avoid bugs because it's easy to forget about the port and write incorrect code otherwise.

For example:

> trurl --get "{origin}" --url https://twizzle:[email protected]/deploy-versions/
https://alpha.twizzle.net
> trurl --set "origin=https://example.com" --url localhost:8000/app/?foo=bar
https://example.com/app/?foo=bar

That said, the origin is a combination of other URL parts, so I'm not sure it should have the same syntax.

ability to get told the port number was not provided?

Right now trurl provides the default port number if none was provided and trurl knows the URL scheme. Like for https://example.com it shows 443.

I can imagine that there might be use cases where you actually rather want to know that no port number was present in the URL. If so, how should we ask for that information?

more fine-grained output option

for example like curl's -w option that allows using percent signs and variable names in braces for a slightly verbose, but printf-like DSL. Very common options like formatting {scheme}://{host} may even be given a long flag to print.

(from Tunapunk)

optionally run test.py with valgrind

We should provide a method to run the tests (in the CI) on Linux with valgrind.

A crude patch that makes it:

diff --git a/test.py b/test.py
index 05473b2..65d21ab 100644
--- a/test.py
+++ b/test.py
@@ -9,11 +9,11 @@ from dataclasses import dataclass, asdict
 
 TESTFILE = "./tests.json"
 if sys.platform == "win32" or sys.platform == "cygwin":
     BASECMD = "./trurl.exe"  # windows
 else:
-    BASECMD = "./trurl"  # linux
+    BASECMD = "valgrind -q ./trurl"  # linux
 
 RED = "\033[91m"  # used to mark unsuccessful tests
 NOCOLOR = "\033[0m"
 TAB = "\x09"
 

Trurl act as a filter

It may be beneficial to have the ability to filter URLs based on a set of criteria.
I was thinking something like cat url-list.txt | trurl --url-file - --filter "host=example.com scheme=ftp" and trurl would only print the URLs which match the criteria.

A better name?

Proposals that have been mentioned:

  • trurl
  • purlmutate
  • manipurler
  • manipurlate
  • curlify
  • urlwtf
  • crurl
  • purl
  • urlchop
  • burl
  • unfURL
  • urlchemy
  • manipURLator

binary Windows releases (on github)

it would be nice to have binary releases on github (including windows). for golang tools it is quite common that they provide multiplatform builds on github.

tests

Once we agree on the fundamental names for flags etc I will write up a bunch of command lines and their expected outputs to use as test cases.

Output JSON

To hook into other tooling, or just to give all the things in one command, it would be handy to have trurl --url <url> --json so you can pipe to jq or logging, or whatever else.

For example


$ trurl --url "https://curl.se/user?id=foo" --json
{
  "host": "curl.se",
  "scheme": "https",
  "port": "443",
  "path": "user",
  "query": [ { "id": "foo" } ],
}

Maybe add an option to disable output buffering

One possible fun use case for trurl could be:

# server to which clients can send one line
get_line_from_port 12345 |
# trurl filters out invalid URLs, and outputs normalised URLs
trurl -f - |
# valid URLs are logged
tee /var/downloadlog |
# do something with the URLs
download_urls_to /var/public/downloads

or:

# server to which clients can send one line
get_line_from_port 12345 |
# trurl filters out invalid URLs, and adds something to valid URLs
trurl -f - -s 'user=me' -s 'password=hunter2' -a 'query=source=myservice' |
# do something with the modified and validated URLs
send_requests

or:

# server to which clients can send one line
get_line_from_port 12345 |
# trurl parse the URLs as JSON
trurl -f - --json |
# jq parses the JSON, and only prints URLs that are for port 2023
jq -nr --unbuffered --stream 'fromstream(1 | truncatestream(inputs)) | select(.port == 2023).url' |
# do something with the URLs
do_something

But one reason why this doesn't work very well is that trurl output buffers when printing to pipes by default (since it uses stdio functions):

bash-5.1$ { printf %s\\n hi hello; sleep 2; printf %s\\n hey ;} | trurl -f - | cat
<2 seconds wait>
http://hi/
http://hello/
http://hey/
bash-5.1$

Luckly, if you want to use curl with for of those usecase, at least in GNU systems, you can use stdbuf to change ŧrurl's stdout's buffering mode to line buffering (or no buffering with -o0 instead of -oL):

bash-5.1$ { printf %s\\n hi:20 hello:500; sleep 2; printf %s\\n hey:40 ;} | stdbuf -oL trurl -f -
http://hi/
http://hello/
<2 seconds wait>
http://hey/
bash-5.1$

It would be probably nicer if there was an option users could use to make this work portably and reliably, that either enables line buffering or no buffering on stdout (with stdvbuf()), or that just flushes stdout after every URL is printed or -g is evaluated or object of --json output is fully printed.

Another reason why this doesn't work very well is that the presense of an invalid URL (e.g. \) in the input causes trurl to stop, and return failure immediately, but if I recall correctly it was mentioned in the presentation video that this will be changed in the future. (solved by 2125339)

--append query adds "(nil)"

$ ./trurl localhost --append query=hello=foo
http://localhost/?(nil)&hello=foo

... does not look correct!

suggestion: strip parameters

something i do a lot, remove tracking url parameters before sharing a url. i see --append but it'd be cool to have something to strip the garbage.

erasing port does not work

As I learned when doing the presentation video just now:

$ trurl https:/hello:22/hello --set path=
https://hello:22/

Should rather show https://hello/

split the query in JSON

The #44 proposal for JSON output shows the query component being split up in sub-components. That's was not implemented in the first json take but is probably still a good idea! Probably in addition to the full query. Something like:

$ trurl --url "https://curl.se/user?id=foo&size=44" --json
{
  "host": "curl.se",
  "scheme": "https",
  "port": "443",
  "path": "user",
  "query": "id=foo&size=44",
  "q": {
    "id": "foo",
    "size": "44"
  } 
}

Leaves a few questions: what if there are many id or size ? What if there is no = at all?

Install on ubuntu

When running the make command for installing trurl
cc -W -Wall -pedantic -g -c -o trurl.o trurl.c trurl.c:29:10: fatal error: curl/curl.h: No such file or directory 29 | #include <curl/curl.h> | ^~~~~~~~~~~~~ compilation terminated. make: *** [<builtin>: trurl.o] Error 1

multiple URL JSON output?

How is the JSON format supposed to look like when trurl outputs multiple objects?

Right now it looks like this when given 4 URLs:

$ cat urls.txt | ./trurl --json -f -
{
  "url": "http://localhost/",
  "scheme": "http",
  "host": "localhost",
  "port": "80",
  "path": "/",
}
{
  "url": "http://haxx.se/",
  "scheme": "http",
  "host": "haxx.se",
  "port": "80",
  "path": "/",
}
{
  "url": "https://curl.se/",
  "scheme": "https",
  "host": "curl.se",
  "port": "443",
  "path": "/",
}
{
  "url": "http://curl.haxx.se/",
  "scheme": "http",
  "host": "curl.haxx.se",
  "port": "80",
  "path": "/",
}

Does not build on Windows

I know this. It needs some tweaks and adjustments to build fine there too and I would be happy to accept PRs for this.

extracting %00 sequences

%00 is a URL encoded null-byte and right now trurl will misbehave when trying to show it. It will just zero terminate the output and what is on the right side of the %00 will not be shown at all.

Should trurl output it as a zero byte or should it encode it somehow?

Example:

$ trurl https://curl.se?name=mr%00smith --get {query:name}
mr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.