Git Product home page Git Product logo

yahc's Introduction

NAME

YAHC - Yet another HTTP client

SYNOPSIS

use YAHC qw/yahc_reinit_conn/;

my @hosts = ('www.booking.com', 'www.google.com:80');
my ($yahc, $yahc_storage) = YAHC->new({ host => \@hosts });

$yahc->request({ path => '/', host => 'www.reddit.com' });
$yahc->request({ path => '/', host => sub { 'www.reddit.com' } });
$yahc->request({ path => '/', host => \@hosts });
$yahc->request({ path => '/', callback => sub { ... } });
$yahc->request({ path => '/' });
$yahc->request({
    path => '/',
    callback => sub {
        yahc_reinit_conn($_[0], { host => 'www.newtarget.com' })
            if $_[0]->{response}{status} == 301;
    }
});

$yahc->run;

DESCRIPTION

YAHC is fast & minimal low-level asynchronous HTTP client intended to be used where you control both the client and the server. Is especially suits cases where set of requests need to be executed against group of machines.

It is NOT a general HTTP user agent, it doesn't support redirects, proxies and any number of other advanced HTTP features like (in roughly descending order of feature completeness) LWP::UserAgent, WWW::Curl, HTTP::Tiny, HTTP::Lite or Furl. This library is basically one step above manually talking HTTP over sockets.

YAHC supports SSL and socket reuse (later is in experimental mode).

STATE MACHINE

Each YAHC connection goes through following list of states in its lifetime:

+-----------------+
              +<<-|   INITALIZED    <-<<+
              v   +-----------------+   ^
              v           |             ^
              v   +-------v---------+   ^
              +<<-+   RESOLVE DNS   +->>+
              v   +-----------------+   ^
              v           |             ^
              v   +-------v---------+   ^
              +<<-+    CONNECTING   +->>+
              v   +-----------------+   ^
              v           |             ^
     Path in  v   +-------v---------+   ^  Retry
     case of  +<<-+    CONNECTED    +->>+  logic
     failure  v   +-----------------+   ^  path
              v           |             ^
              v   +-------v---------+   ^
              +<<-+     WRITING     +->>+
              v   +-----------------+   ^
              v           |             ^
              v   +-------v---------+   ^
              +<<-+     READING     +->>+
              v   +-----------------+   ^
              v           |             ^
              v   +-------v---------+   ^
              +>>->   USER ACTION   +->>+
+-----------------+
        |
+-------v---------+
|    COMPLETED    |
+-----------------+

There are three paths of workflow:

1) Normal execution (central line).

In normal situation a connection after being initialized goes through state:

- RESOLVE DNS

- CONNECTING - wait finishing of handshake

- CONNECTED

- WRITTING - sending request body

- READING - awaiting and reading response

- USER ACTION - see below

- COMPLETED - all done, this is terminal state

SSL connection has extra state SSL_HANDSHAKE after CONNECTED state. State 'RESOLVE DNS' is not implemented yet.

2) Retry path (right line).

In case of IO error during normal execution YAHC retries connection retries times. In practise this means that connection goes back to INITIALIZED state.

It's possible for a connection to go directly to COMPLETED state in case of internal error.

3) Failure path (left line).

If all retry attempts did not succeeded a connection goes to state 'USER ACTION' (see below).

State 'USER ACTION'

'USER ACTION' state is called right before connection if going to enter 'COMPLETED' state (with either failed or successful results) and is meant to give a change to user to interupt the workflow.

'USER ACTION' state is entered in these circumstances:

  • HTTP response received. Note that non-200 responses are NOT treated as error.

  • unsupported HTTP response is received (such as response without Content-Length header

  • retries limit reached

When a connection enters this state callback CodeRef is called:

$yahc->request({
    ...
    callback => sub {
        my (
            $conn,          # connection 'object'
            $error,         # one of YAHC::Error::* constants
            $strerror       # string representation of error
        ) = @_;

        # Note that fields in $conn->{response} are not set 
        # if $error != # YAHC::Error::NO_ERROR()

        # HTTP response is stored in $conn->{response}.
        # It can be also accessed via yahc_conn_response().
        my $response = $conn->{response};
        my $status = $response->{status};
        my $body = $response->{body};
    }
});

If there was no IO error yahc_conn_response return value is a HashRef representing a response. It contains the following key-value pairs.

proto         => :Str
status        => :StatusCode
body          => :Str
head          => :HashRef

In case of error or non-200 HTTP response yahc_retry_conn or yahc_reinit_conn may be called to give the request more chances to complete successfully (for example by following redirects or providing new target hosts).

Note that callback should NOT throw exception. If so the connection will be imidiately closed.

METHODS

new

This method creates YAHC object and accompanying storage object:

my ($yahc, $yahc_storage) = YAHC->new();

This is a radical way of solving all possible memleak because of cyclic references in callbacks. Since all references of callbacks are kept in $yahc_storage object it's fine to use YAHC object inside request callback:

my $yahc->request({
    callback => sub {
        $yahc->stop; # this is fine!!!
    },
});

However, user has to guarantee that both $yahc and $yahc_storage objects are kept in the same namespace. So, they will be destroyed at the same time.

new can be passed with all parameters supported by request. They will be inherited by all requests.

Additionally, new supports two parameters: socket_cache and account_for_signals.

socket_cache

socket_cache option controls socket reuse logic. By default socket cache is disabled. If user wants YAHC reuse sockets he should set socket_cache to a HashRef.

my ($yahc, $yahc_storage) = YAHC->new({ socket_cache => {} });

In this case YAHC maintains unused sockets keyed on join($;, $$, $host, $port, $scheme). We use $; so we can use the $socket_cache-&gt;{$$, $host, $port, $scheme} idiom to access the cache.

It's up to user to control the cache. It's also up to user to set necessary request headers for keep-alive. YAHC does not cache socket in cases of a error, HTTP/1.0 and when server explicitly instruct to close connection (i.e header 'Connection' = 'close').

account_for_signals

Another parameter account_for_signals requires special attention! Here is why:

    exerpt from EV documentation http://search.cpan.org/~mlehmann/EV-4.22/EV.pm#PERL_SIGNALS

    While Perl signal handling (%SIG) is not affected by EV, the behaviour with EV is as the same as any other C library: Perl-signals will only be handled when Perl runs, which means your signal handler might be invoked only the next time an event callback is invoked.

In practise this means that none of set %SIG handlers will be called until EV calls one of perl callbacks. Which, in some cases, may take long time. By setting account_for_signals YAHC adds EV::check watcher with empty callback effectively making EV calling the callback on every iteration. The trickery comes at some performance cost. This is what EV documentation says about it:

    ... you can also force a watcher to be called on every event loop iteration by installing a EV::check watcher. This ensures that perl gets into control for a short time to handle any pending signals, and also ensures (slightly) slower overall operation.

So, if your code or the codes surrounding your code use %SIG handlers it's wise to set account_for_signals.

request

protocol               => "HTTP/1.1", # (or "HTTP/1.0")
scheme                 => "http" or "https"
host                   => see below,
port                   => ...,
method                 => "GET",
path                   => "/",
query_string           => "",
head                   => [],
body                   => "",

# timeouts
connect_timeout        => undef,
request_timeout        => undef,
drain_timeout          => undef,

# callbacks
init_callback          => undef,
connecting_callback   => undef,
connected_callback     => undef,
writing_callback       => undef,
reading_callback       => undef,
callback               => undef,

Notice how YAHC does not take a full URI string as input, you have to specify the individual parts of the URL. Users who need to parse an existing URI string to produce a request should use the URI module to do so.

For example, to send a request to http://example.com/flower?color=red, pass the following parameters:

$yach->request({
    host         => "example.com",
    port         => "80",
    path         => "/flower",
    query_string => "color=red"
});

request building

YAHC doesn't escape any values for you, it just passes them through as-is. You can easily produce invalid requests if e.g. any of these strings contain a newline, or aren't otherwise properly escaped.

Notice that you do not need to put the leading "?" character in the query_string. You do, however, need to properly uri_escape the content of query_string.

The value of head is an ArrayRef of key-value pairs instead of a HashRef, this way you can decide in which order the headers are sent, and you can send the same header name multiple times. For example:

head => [
    "Content-Type" => "application/json",
    "X-Requested-With" => "YAHC",
]

Will produce these request headers:

Content-Type: application/json
X-Requested-With: YAHC

host

host parameter can accept one of following values:

1) string - represents target host. String may have following formats:
hostname:port, ip:port.

2) ArrayRef of strings - YAHC will cycle through items selecting new host
for each attempt.

3) CodeRef. The subroutine is invoked for each attempt and should at least
return a string (hostname or IP address). It can also return array
containing: ($host, $ip, $port, $scheme). This option effectively give a
user control over host selection for retries. The CodeRef is passed with
connection "object" which can be fed to yahc_conn_* family of functions.

timeouts

The value of connect_timeout, request_timeout and drain_timeout is in floating point seconds, and is used as the time limit for connecting to the host (reaching CONNECTED state), full request time (reaching COMPLETED state) and sending request to remote site (reaching READING state) respectively. The default value for all is undef, meaning no timeout limit. If you don't supply these timeouts and the host really is unreachable or slow, we'll reach the TCP timeout limit before returning some other error to you.

callbacks

The value of init_callback, connecting_callback, connected_callback, writing_callback, reading_callback is CodeRef to a subroutine which is called upon reaching corresponding state. Any exception thrown in the subroutine moves connection to COMPLETED state effectively terminating any ongoing IO.

The value of callback defines main request callback which is called when a connection enters 'USER ACTION' state (see 'USER ACTION' state above).

Also see LIMITATIONS

drop

Given connection HashRef or conn_id move connection to COMPLETED state (avoiding 'USER ACTION' state) and drop it from internal pool.

run

Start YAHC's loop. The loop stops when all connection complete.

Note that run can accept two extra parameters: until_state and list of connections. These two parameters tell YAHC to break the loop once specified connections reach desired state.

For example:

$yahc->run(YAHC::State::READING(), $conn_id);

Will loop until connection '$conn_id' move to state READING meaning that the data has been sent to remote side. In order to gather response one should later call:

$yahc->run(YAHC::State::COMPLETED(), $conn_id);

Leaving list of connection empty makes YAHC waiting for all connection reaching needed until_state.

Note that waiting one particular connection to finish doesn't mean that others are not executed. Instead, all active connections are looped at the same time, but YAHC breaks the loop once waited connection reaches needed state.

run_once

Same as run but with EV::RUN_ONCE set. For more details check https://metacpan.org/pod/EV

run_tick

Same as run but with EV::RUN_NOWAIT set. For more details check https://metacpan.org/pod/EV

is_running

Return true if YAHC is running, false otherwise.

loop

Return underlying EV loop object.

break

Break running EV loop if any.

EXPORTED FUNCTIONS

yahc_reinit_conn

yahc_reinit_conn reinitialize given connection. The attempt counter is reset to 0. The function accepts HashRef as second argument. By passing it one can change host, port, scheme, body, head and others parameters. The format and meaning of these parameters is same as in request method.

One of use cases of yahc_reinit_conn, for example, is to handle redirects:

use YAHC qw/yahc_reinit_conn/;

my ($yahc, $yahc_storage) = YAHC->new();
$yahc->request({
    host => 'domain_which_returns_301.com',
    callback => sub {
        my $conn = $_[0];
        yahc_reinit_conn($conn, { host => 'www.newtarget.com' })
            if $_[0]->{response}{status} == 301;
    }
});

$yahc->run;

yahc_reinit_conn is meant to be called inside callback i.e. when connection is in 'USER ACTION' state.

yahc_retry_conn

Retries given connection. yahc_retry_conn should be called only if yahc_conn_attempts_left returns positive value. Otherwise, it exits silently.

yahc_conn_attempts_left is meant to be called inside callback similarly to yahc_reinit_conn.

yahc_conn_id

Return id of given connection.

yahc_conn_state

Retrun state of given connection

yahc_conn_target

Return selected host and port for current attempt for given connection. Format "host:port". Default port values are omitted.

yahc_conn_url

Same as yahc_conn_target but return full URL

yahc_conn_errors

Return errors appeared in given connection. Note that the function returns all errors, not only ones happened during current attempt. Returned value is ArrayRef of ArrayRefs. Later one represents a error and contains following items:

    error number (see YAHC::Error constants)
    error string
    ArrayRef of host, ip, port, scheme
    time when the error happened

yahc_conn_last_error

Return last error appeared in connection. See yahc_conn_errors.

yahc_conn_timeline

Return timeline of given connection. See more about timeline in description of new method.

yahc_conn_request

Return request of given connection. See request.

yahc_conn_response

Return response of given connection. See request.

REPOSITORY

https://github.com/ikruglov/YAHC

NOTES

UTF8 flag

Note that YAHC has astonishing reduction in performance if any parameters participating in building HTTP message has UTF8 flag set. Those fields are protocol, host, port, method, path, query_string, head, body and maybe others.

Just one example (check scripts/utf8_test.pl for code). Simple HTTP request with 10MB of payload:

elapsed without utf8 flag: 0.039s
elapsed with utf8 flag: 0.540s

Because of this YAHC warns if detected UTF8-flagged payload. The user needs to make sure that *all* data passed to YAHC is unflagged binary strings.

LIMITATIONS

  • State 'RESOLVE DNS' is not implemented yet.

  • YAHC currently don't support servers returning a http body without an accompanying Content-Length header; bodies MUST have a Content-Length or we won't pick them up.

AUTHORS

Ivan Kruglov <[email protected]>

COPYRIGHT

Copyright (c) 2013-2016 Ivan Kruglov <[email protected]>.

ACKNOWLEDGMENT

This module derived lots of ideas, code and docs from Hijk https://github.com/gugod/Hijk. This module was originally developed for Booking.com.

LICENCE

The MIT License

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

yahc's People

Contributors

neilb avatar

Stargazers

Nick S. Knutov avatar

Watchers

Stevan Little avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.