Git Product home page Git Product logo

rolling-curl's Introduction

Hi there 👋

rolling-curl's People

rolling-curl's Issues

get_options custom headers

Should get_options also check the request object for request specific 
headers?

Original issue reported on code.google.com by dianoga7 on 23 Feb 2010 at 9:50

Does not work on php 5.1.6

What steps will reproduce the problem?
1. Attempt to use rolling-curl on RHEL with php 5.1.6


What is the expected output? What do you see instead?
No errors are thrown but no files download.

What version of the product are you using? On what operating system?
Redhat Enterprise Linux 5.4
PHP 5.1.6

Please provide any additional information below.
I've traced the problem to the function call curl_multi_info_read which 
requires php 5.2.0.  (The documentation on php.net is in error as only listing 
php 5.x) as a requirement.  curl_multi_info_read returns null in all cases 
causing the rolling curl to fail.

Original issue reported on code.google.com by [email protected] on 6 Oct 2010 at 7:40

Huge Memory Leak

I'm showing a 130KB memory loss every time rollingCurl finishes a single 
"window" (5 in my case) that is never recouped - about the size of the 5 
scraped URLS - this data should be getting dumped after the results are passed. 

Original issue reported on code.google.com by [email protected] on 17 Nov 2011 at 9:35

i added header , but it seems not working

see below code  i added Host: www.oozk.com  .but it seem not working;


<?php

require("RollingCurl.php");
function request_callback($response, $info, $request) { 

    print_r($info);
    echo "<hr>";
    print_r($response);
    echo "<hr>";
}

// top 20 sites according to alexa (11/5/09)
$urls = array("http://127.0.0.1",
             "http://127.0.0.1");

$headers = "Host: www.oozk.com";
$rc = new RollingCurl("request_callback");
$rc->window_size = 20;
foreach ($urls as $url) {
    $request = new RollingCurlRequest($url,'GET',NULL,$headers);
    $rc->add($request);
}
$rc->execute();

Original issue reported on code.google.com by [email protected] on 4 Sep 2012 at 10:56

The callback function isn't always called

I have 10 urls. But everytime I execute them, the callback function is only
called 9 times. I read something on your blog about this, so is it possible
that this bug still exists?

Original issue reported on code.google.com by [email protected] on 3 Dec 2009 at 4:11

[PATCH] Several patches

Hi,

This includes several patches I needed, which are attached.

[PATCH 1/3] Add $request as parameter to callback function:

Currently it is not possible to know, which request finished in the callback as 
the URL might not be the same due to 301 and is not unique anyway.

By providing the request to the callback this is changed and allows a very 
flexible implementation.

I also changed Request class name to be more RollingCurl specific as it is else 
cluttering the namespace.

[PATCH 2/3] Add rolling curl group:

This is an implementation using the power of OO programming by extending 
rolling curl to allow finishing groups of requests, which allows priotizing and 
also giving back feedback, when each group is finished.

The usage is really easy:

Inheriting from the base class allows processing of groups and requests 
directly from the class:

Test Class
======

class TestCurlRequest extends RollingCurlGroupRequest
{
        public $test_verbose = false;

        function process($output, $info)
        {
                echo "Processing " . $this->url . "\n";
                if ($this->test_verbose)
                        print_r($info);
                parent::process($output, $info);
        }
}

class TestCurlGroup extends RollingCurlGroup {

        function process($output, $info, $request)
        {
                echo "Group CB: Progress " . $this->name . " (" . ($this->finished_requests+1) . "/" . $this->num_requests .  ")\n";
                parent::process($output, $info, $request);
        }

        function finished()
        {
                echo "Group CB: Finished " . $group->name . "\n";
                parent::finished();
        }
}

Main function:
=========

        $group = new TestCurlGroup("High");
        $group->add(new TestCurlRequest("www.google.de"));
        $group->add(new TestCurlRequest("www.yahoo.de"));
        $group->add(new TestCurlRequest("www.newyorktimes.com"));
        $reqs[] = $group;

        $group = new TestCurlGroup("Normal");
        $group->add(new TestCurlRequest("twitter.com"));
        $group->add(new TestCurlRequest("www.bing.com"));
        $group->add(new TestCurlRequest("m.facebook.com"));
        $reqs[] = $group;

        $reqs[] = new TestCurlRequest("www.kernel.org");

        $rc = new GroupRollingCurl(); /* Note: No callback here, as its done in Request class*/

        foreach ($reqs as $req)
                $rc->add($req);

        $rc->window_size = $window_size;
        return $rc->execute();

---------

Due to the power of polymorphism, the same function (add) can be used for 
adding requests and groups of requests.

The "callback" in request and groups is:

process($output, $info)

and

process($request, $output, $info)

Also finished is available for groups.

[PATCH 3/3] Allow custom options to overwrite default ones

Issue #12 fixed in a patch. Dependent on patches before.

Hope you enjoy!

Please apply.

The patches are also available in the master branch of my fork of rolling-curl 
at github:

http://github.com/LionsAd/rolling-curl

Best Wishes,

Fabian (LionsAd)

Original issue reported on code.google.com by [email protected] on 5 Aug 2010 at 7:29

Attachments:

It would be nice to can use as callback class method.

It would be nice to can use as callback class method. It is quite easy to 
implement.

class RollingCurl {
...
function __construct($callback = null, $instance = null) {
  $this->instance = $instance;
  $this->callback = $callback;
}
}

and then call it like this:

if($instance){
  $this->instance->$callback;
} else if(is_callable($callback)){
  call_user_func($callback,...);
}

thank you.

Original issue reported on code.google.com by [email protected] on 1 Mar 2011 at 12:40

CURLOPT_TIMEOUT and CURLOPT_CONNECTTIMEOUT - не работают( class?, curl?, other?)

$rc = new RollingCurl("request_callback");
foreach ($urls as $url) {
    $request = new Request($url);
    $request->options = array(CURLOPT_PROXY => $proxy,CURLOPT_TIMEOUT => 2,CURLOPT_CONNECTTIMEOUT => 2);
    $rc->add($request);
}

получаем ответы:

    [http_code] => 200
    [total_time] => 17.635
    [namelookup_time] => 0.18
    [connect_time] => 0.571
    [pretransfer_time] => 0.571
    [starttransfer_time] => 17.635

или так:

    [http_code] => 200
    [total_time] => 30.904
    [namelookup_time] => 0.821
    [connect_time] => 11.877
    [pretransfer_time] => 11.897
    [starttransfer_time] => 30.133


Даже с учетом сложения таймаутов, 
обработка запроса явно не вписывается в 
четыре секунды.

В чем может быть проблема? По идее раз 
запрос идет через прокси, то в либу опции 
передаются, но вот почему не отрабатывают?

Original issue reported on code.google.com by [email protected] on 4 Sep 2010 at 8:16

Микропауза между curl_multi_exec

Как-то очень давно и не помню точно где, но 
читал, что рекомендуется делать
небольшую паузу, иначе на некоторых 
версиях курла происходит лишнее
потребление процессора.
Т.е. вместо 

while(($execrun = curl_multi_exec($master, $running)) ==
CURLM_CALL_MULTI_PERFORM);

нужно 

do{ $execrun = curl_multi_exec($master, $running); usleep(100); }
while($execrun==CURLM_CALL_MULTI_PERFORM);


Original issue reported on code.google.com by [email protected] on 21 May 2010 at 7:14

Add PSR-0 support and Composer

What steps will reproduce the problem?
1. Add Composer package description
2. One class in one file
3. PSR-0 autoloading


Original issue reported on code.google.com by [email protected] on 13 Mar 2014 at 8:44

Undefined variable in example.php if missing page title

What steps will reproduce the problem?
1. Fetch a URL that is wrong, or the HMTL $response is missing <title> tags, or 
if there is a server configuration issue that is breaking cURL
2. Run example.php

What is the expected output? What do you see instead?

No output, as expected, but PHP raises:
PHP Notice:  Undefined variable: title in /example.php on line 19

What version of the product are you using? On what operating system?

r20 Sep 12, 2010 on CentOS 6.2

Please provide any additional information below.

The PHP notice can be fixed by moving the echo $title inside the if block. e.g.,

  if (preg_match("~<title>(.*?)</title>~i", $response, $out)) {
    $title = $out[1];
    echo "<b>$title</b><br />";
  } else {
    echo "<i>page title not found</i><br />";
  };

Original issue reported on code.google.com by [email protected] on 12 Apr 2014 at 11:21

CURLOPT_FOLLOWLOCATION cannot be activated when an open_basedir is set

What steps will reproduce the problem?
1. Install script on server that has "open_basedir" set
2. Run example.php
3. See error message

----------------------------

What is the expected output? What do you see instead?

EXPECTED
Standard example.php output

INSTEAD
Warning: curl_setopt_array() [function.curl-setopt-array]: 
CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode is enabled or an 
open_basedir is set

----------------------------

What version of the product are you using? On what operating system?

r20

PHP Version 5.2.14 on Linux, MediaTemple (gs)

----------------------------

Please provide any additional information below.

I was able to patch this by changing one line.

ORIGINAL

if (ini_get('safe_mode') == 'Off' || !ini_get('safe_mode')) {

NEW
if ((ini_get('open_basedir') == '') && (ini_get('safe_mode') == 'Off' || 
!ini_get('safe_mode'))) {

Original issue reported on code.google.com by [email protected] on 28 Oct 2010 at 3:14

No way to retrieve curl error

With the current library setup there is no way to retrieve an error from the 
curl handle as you would with curl_error(). In order to do so you need the curl 
handle. I would suggest either passing the curl handle to the response 
processor.
or
call curl_error($ch) on each handle, Modify Request object to have an error 
property, stow that error in there, when the response processor gets the 
response it can check request for errors and choose to handle them however it 
likes. 

The information in curl_getinfo is not sufficient to describe what when wrong 
when a curl request goes awry. 


Original issue reported on code.google.com by [email protected] on 19 Dec 2010 at 2:47

single_curl should remove the request from the queue

If another request is added after executing a single request, it won't run 
as a single request since the first one still exists.

Sample Fix:
[code]
private function single_curl() {
    $ch = curl_init();
    $options = $this->get_options($this->requests[0]);
    curl_setopt_array($ch,$options);
    $output = curl_exec($ch);
    $info = curl_getinfo($ch);

    // Remove the request from the queue and reindex
    unset($this->requests[0]);
    $this->requests = array_values($this->requests);

    // it's not neccesary to set a callback for one-off requests
    if ($this->callback) {
        $callback = $this->callback;
        if (is_callable($this->callback)){
            call_user_func($callback, $output, $info);
        }
    } else {
        return $output;
    }
}
[/code]

Original issue reported on code.google.com by dianoga7 on 24 Feb 2010 at 5:22

sizeof($requests) -> sizeof($this->requests), line 169

Small mistype:

if ($i < sizeof($requests)) {
    $ch = curl_init();
    $options = $this->get_options($this->requests[$i++]); // note the
increment on i

    curl_setopt_array($ch,$options);
    ....................

Should be:

if ($i < sizeof($this->requests)) {
    $ch = curl_init();
    .........


Original issue reported on code.google.com by [email protected] on 31 Mar 2010 at 8:25

Better syntax for streaming file processing : huge XML files

This module is really wonderful... it saved the day in my application, which 
otherwise I would had to move to a threading solution!

--------------------
However, the call syntax is not quite optimal for what I'm doing.  I am 
processing huge XML files that are too big to load in memory, and too big to 
completely process before starting the curl operation.  This syntax would be 
ideal:

<?php
require("RollingCurl.php");

function request_callback
($response, $info, $request, $callback_parameter) {
    ...
}

$rc = new RollingCurl("request_callback");
$rc->window_size = 20;
$rq = new RollingCurlRequest();
while( $xml = get_next_xml_element() ) {
    $rq->url($xml['url']);
    $rq->callback_parameter = $xml;
    $rc->execute_until_blocked($rq);  // Blocks if queue full
}
$rc->finish(); // Returns after last pending request is done
?>

Then I can maintain my streaming process, yet still stuff requests in to curl 
as fast as they will go.  Also note the extra parameter that gets passed to the 
callback.

Original issue reported on code.google.com by [email protected] on 9 Jul 2011 at 5:13

Possibility to add an ID number to the url

Hello buddy, first thank you very much for your RollingCurl class.

There is a possibility to add an ID to every url and know which url was
finnished?

I have 5 calls to the same URL with differents options each one I make
something like this:

$urls = array(array("url", options1), array("url", options2));

There is a way to know in the callback function whichone finnish?

thanks


Original issue reported on code.google.com by [email protected] on 27 Mar 2010 at 7:16

Ability to use class methods as callbacks

It's better to use call_user_func since it will give an ability to call 
class methods.

$callback = $this->callback;
if (is_callable($callback)){
   call_user_func($callback, $output, $info);
}

Original issue reported on code.google.com by alexander.makarow on 7 Feb 2010 at 3:42

per-request callback

Hi,

This isn't an issue, more of a feature request. I think it'd be very handy
to be able to set the callback function on a per-request basis, rather than
one callback for all requests. Could you gus possibly have a think about
making such a change?

thank you

Original issue reported on code.google.com by [email protected] on 23 Feb 2010 at 12:57

$options defaults can't be overwritten

I was having trouble trying to set my own CURLOPT_TIMEOUT and 
CURLOPT_CONNECTTIMEOUT values in a Request instance. I'm not sure if this is 
intentional, but the default values can't be overwritten because of line 124. 
Simply reversing the arithmetic will fix this issue.

Original issue reported on code.google.com by [email protected] on 3 Aug 2010 at 7:13

Дополнительные данные в $info

Хорошо бы в info , который передается в callback 
было еще 
1. uri , с которого скачан данный контент
2. номер (индекс массива) этого урла в 
масcиве $urls , который мы передали
на скачку.

Часто надо выстроить принятые данные в том 
же порядке, что и урлы,
переданные скрипту на скачивание. Сейчас 
то, что скачалось быстрее, будет
первым, если в callback-функции делать распарс 
страницы и занесение в
глобальную переменную-массив.


Original issue reported on code.google.com by [email protected] on 19 May 2010 at 5:09

The same conditional is being checked twice inthe same statement in function rolling_curl(...) - sizeof()/count()

Hi,

There seems to be an erroneous (duplicated) check on the same conditional value 
in "function rolling_curl(...)" in the RollingCurl class.

// start a new request (it's important to do this before removing the old one)
if ($i < sizeof($this->requests) && isset($this->requests[$i]) && $i < 
count($this->requests)) {...}


$i < sizeof($this->requests)

and...

$i < count($this->requests)

are aliases of each other, and therefore will track each other with identical 
values (they are the same variables, only using alias names)

q.v. http://php.net/manual/en/function.sizeof.php
In PHP, sizeof() is just an alias for the true function, which is count()

REQUEST:
If this is indeed considered erroneous;

please remove "$i < sizeof($this->requests)" from the above mentioned 
conditional statement in function rolling_curl

Additionally renaming each sizeof(..) to count(...) would make the code a 
little more semantic when reading it.

Many Thanks,

Mark S.


Original issue reported on code.google.com by [email protected] on 26 Oct 2010 at 11:43

Stop all further requests.

Hello,

is it somehow possible to stop all pending requests? 

Let's say I parse 1000 pages for the word curl. The script accesses those pages 
with 10 threads. Let's say the script found the word curl on the 589th page, it 
should stop all further requests/threads now.


Is that possible?

Original issue reported on code.google.com by [email protected] on 15 May 2012 at 11:40

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.