strgg / rolling-curl Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 20 KB

Automatically exported from code.google.com/p/rolling-curl

PHP 100.00%

rolling-curl's Introduction

Hi there 👋

rolling-curl's People

rolling-curl's Issues

get_options custom headers

Should get_options also check the request object for request specific 
headers?

Original issue reported on code.google.com by dianoga7 on 23 Feb 2010 at 9:50

Does not work on php 5.1.6

What steps will reproduce the problem?
1. Attempt to use rolling-curl on RHEL with php 5.1.6


What is the expected output? What do you see instead?
No errors are thrown but no files download.

What version of the product are you using? On what operating system?
Redhat Enterprise Linux 5.4
PHP 5.1.6

Please provide any additional information below.
I've traced the problem to the function call curl_multi_info_read which 
requires php 5.2.0.  (The documentation on php.net is in error as only listing 
php 5.x) as a requirement.  curl_multi_info_read returns null in all cases 
causing the rolling curl to fail.

Original issue reported on code.google.com by [email protected] on 6 Oct 2010 at 7:40

Huge Memory Leak

I'm showing a 130KB memory loss every time rollingCurl finishes a single 
"window" (5 in my case) that is never recouped - about the size of the 5 
scraped URLS - this data should be getting dumped after the results are passed.

Original issue reported on code.google.com by [email protected] on 17 Nov 2011 at 9:35

This patch allows keeping track of request but adding an id to each of them

Some times it's not easy to determin what relations the returned data has, to 
solve this this patch allowes for adding an id when doing the add() call. The 
id will then be returned with the resulting data.

Hope you find it as usefull as i did

Original issue reported on code.google.com by [email protected] on 30 Dec 2014 at 5:15

Attachments:

0001-Patch-rolling-curl-to-allow-set-and-return-of-index-.patch

CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode

The option CURLOPT_FOLLOWLOCATION is set by default in "protected $options"
(line 97), but an error occurs when in safe_mode. See the attached patch.

Original issue reported on code.google.com by [email protected] on 8 Apr 2010 at 4:45

Attachments:

RollingCurl.php.patch

i added header , but it seems not working

see below code  i added Host: www.oozk.com  .but it seem not working;


<?php

require("RollingCurl.php");
function request_callback($response, $info, $request) { 

    print_r($info);
    echo "<hr>";
    print_r($response);
    echo "<hr>";
}

// top 20 sites according to alexa (11/5/09)
$urls = array("http://127.0.0.1",
             "http://127.0.0.1");

$headers = "Host: www.oozk.com";
$rc = new RollingCurl("request_callback");
$rc->window_size = 20;
foreach ($urls as $url) {
    $request = new RollingCurlRequest($url,'GET',NULL,$headers);
    $rc->add($request);
}
$rc->execute();

Original issue reported on code.google.com by [email protected] on 4 Sep 2012 at 10:56

The callback function isn't always called

I have 10 urls. But everytime I execute them, the callback function is only
called 9 times. I read something on your blog about this, so is it possible
that this bug still exists?

Original issue reported on code.google.com by [email protected] on 3 Dec 2009 at 4:11

[PATCH] Several patches

Hi,

This includes several patches I needed, which are attached.

[PATCH 1/3] Add $request as parameter to callback function:

Currently it is not possible to know, which request finished in the callback as 
the URL might not be the same due to 301 and is not unique anyway.

By providing the request to the callback this is changed and allows a very 
flexible implementation.

I also changed Request class name to be more RollingCurl specific as it is else 
cluttering the namespace.

[PATCH 2/3] Add rolling curl group:

This is an implementation using the power of OO programming by extending 
rolling curl to allow finishing groups of requests, which allows priotizing and 
also giving back feedback, when each group is finished.

The usage is really easy:

Inheriting from the base class allows processing of groups and requests 
directly from the class:

Test Class
======

class TestCurlRequest extends RollingCurlGroupRequest
{
        public $test_verbose = false;

        function process($output, $info)
        {
                echo "Processing " . $this->url . "\n";
                if ($this->test_verbose)
                        print_r($info);
                parent::process($output, $info);
        }
}

class TestCurlGroup extends RollingCurlGroup {

        function process($output, $info, $request)
        {
                echo "Group CB: Progress " . $this->name . " (" . ($this->finished_requests+1) . "/" . $this->num_requests .  ")\n";
                parent::process($output, $info, $request);
        }

        function finished()
        {
                echo "Group CB: Finished " . $group->name . "\n";
                parent::finished();
        }
}

Main function:
=========

        $group = new TestCurlGroup("High");
        $group->add(new TestCurlRequest("www.google.de"));
        $group->add(new TestCurlRequest("www.yahoo.de"));
        $group->add(new TestCurlRequest("www.newyorktimes.com"));
        $reqs[] = $group;

        $group = new TestCurlGroup("Normal");
        $group->add(new TestCurlRequest("twitter.com"));
        $group->add(new TestCurlRequest("www.bing.com"));
        $group->add(new TestCurlRequest("m.facebook.com"));
        $reqs[] = $group;

        $reqs[] = new TestCurlRequest("www.kernel.org");

        $rc = new GroupRollingCurl(); /* Note: No callback here, as its done in Request class*/

        foreach ($reqs as $req)
                $rc->add($req);

        $rc->window_size = $window_size;
        return $rc->execute();

---------

Due to the power of polymorphism, the same function (add) can be used for 
adding requests and groups of requests.

The "callback" in request and groups is:

process($output, $info)

and

process($request, $output, $info)

Also finished is available for groups.

[PATCH 3/3] Allow custom options to overwrite default ones

Issue #12 fixed in a patch. Dependent on patches before.

Hope you enjoy!

Please apply.

The patches are also available in the master branch of my fork of rolling-curl 
at github:

http://github.com/LionsAd/rolling-curl

Best Wishes,

Fabian (LionsAd)

Original issue reported on code.google.com by [email protected] on 5 Aug 2010 at 7:29

Attachments:

It would be nice to can use as callback class method.

It would be nice to can use as callback class method. It is quite easy to 
implement.

class RollingCurl {
...
function __construct($callback = null, $instance = null) {
  $this->instance = $instance;
  $this->callback = $callback;
}
}

and then call it like this:

if($instance){
  $this->instance->$callback;
} else if(is_callable($callback)){
  call_user_func($callback,...);
}

thank you.

Original issue reported on code.google.com by [email protected] on 1 Mar 2011 at 12:40

CURLOPT_TIMEOUT and CURLOPT_CONNECTTIMEOUT - не работают( class?, curl?, other?)

$rc = new RollingCurl("request_callback");
foreach ($urls as $url) {
    $request = new Request($url);
    $request->options = array(CURLOPT_PROXY => $proxy,CURLOPT_TIMEOUT => 2,CURLOPT_CONNECTTIMEOUT => 2);
    $rc->add($request);
}

получаем ответы:

    [http_code] => 200
    [total_time] => 17.635
    [namelookup_time] => 0.18
    [connect_time] => 0.571
    [pretransfer_time] => 0.571
    [starttransfer_time] => 17.635

или так:

    [http_code] => 200
    [total_time] => 30.904
    [namelookup_time] => 0.821
    [connect_time] => 11.877
    [pretransfer_time] => 11.897
    [starttransfer_time] => 30.133


Даже с учетом сложения таймаутов, 
обработка запроса явно не вписывается в 
четыре секунды.

В чем может быть проблема? По идее раз 
запрос идет через прокси, то в либу опции 
передаются, но вот почему не отрабатывают?

Original issue reported on code.google.com by [email protected] on 4 Sep 2010 at 8:16

Микропауза между curl_multi_exec

Как-то очень давно и не помню точно где, но 
читал, что рекомендуется делать
небольшую паузу, иначе на некоторых 
версиях курла происходит лишнее
потребление процессора.
Т.е. вместо 

while(($execrun = curl_multi_exec($master, $running)) ==
CURLM_CALL_MULTI_PERFORM);

нужно 

do{ $execrun = curl_multi_exec($master, $running); usleep(100); }
while($execrun==CURLM_CALL_MULTI_PERFORM);

Original issue reported on code.google.com by [email protected] on 21 May 2010 at 7:14

getoptions don't set the Headers

RollingCurl.php 
Line346:         $headers = $request->headers;

Original issue reported on code.google.com by [email protected] on 7 Apr 2011 at 8:36

rolling curl outer loop can terminate prematurely leaving many requests unprocessed

The outer do-while loop of the rolling_curl function terminates prematurely if 
the $running flag is false, even if new requests have since been added to 
$master and not started yet.

The conditional in RollingCurl.php:325 needs to check whether there are still 
requests outstanding.

Original issue reported on code.google.com by [email protected] on 3 Apr 2013 at 6:00

Add PSR-0 support and Composer

What steps will reproduce the problem?
1. Add Composer package description
2. One class in one file
3. PSR-0 autoloading

Original issue reported on code.google.com by [email protected] on 13 Mar 2014 at 8:44

Patch for /trunk/RollingCurl.php

dghgfhgfhgf

Original issue reported on code.google.com by [email protected] on 17 Sep 2011 at 7:07

Attachments:

RollingCurl.php.patch

Undefined variable in example.php if missing page title

What steps will reproduce the problem?
1. Fetch a URL that is wrong, or the HMTL $response is missing <title> tags, or 
if there is a server configuration issue that is breaking cURL
2. Run example.php

What is the expected output? What do you see instead?

No output, as expected, but PHP raises:
PHP Notice:  Undefined variable: title in /example.php on line 19

What version of the product are you using? On what operating system?

r20 Sep 12, 2010 on CentOS 6.2

Please provide any additional information below.

The PHP notice can be fixed by moving the echo $title inside the if block. e.g.,

  if (preg_match("~<title>(.*?)</title>~i", $response, $out)) {
    $title = $out[1];
    echo "<b>$title</b><br />";
  } else {
    echo "<i>page title not found</i><br />";
  };

Original issue reported on code.google.com by [email protected] on 12 Apr 2014 at 11:21

CURLOPT_FOLLOWLOCATION cannot be activated when an open_basedir is set

What steps will reproduce the problem?
1. Install script on server that has "open_basedir" set
2. Run example.php
3. See error message

----------------------------

What is the expected output? What do you see instead?

EXPECTED
Standard example.php output

INSTEAD
Warning: curl_setopt_array() [function.curl-setopt-array]: 
CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode is enabled or an 
open_basedir is set

----------------------------

What version of the product are you using? On what operating system?

r20

PHP Version 5.2.14 on Linux, MediaTemple (gs)

----------------------------

Please provide any additional information below.

I was able to patch this by changing one line.

ORIGINAL

if (ini_get('safe_mode') == 'Off' || !ini_get('safe_mode')) {

NEW
if ((ini_get('open_basedir') == '') && (ini_get('safe_mode') == 'Off' || 
!ini_get('safe_mode'))) {

Original issue reported on code.google.com by [email protected] on 28 Oct 2010 at 3:14

No way to retrieve curl error

With the current library setup there is no way to retrieve an error from the 
curl handle as you would with curl_error(). In order to do so you need the curl 
handle. I would suggest either passing the curl handle to the response 
processor.
or
call curl_error($ch) on each handle, Modify Request object to have an error 
property, stow that error in there, when the response processor gets the 
response it can check request for errors and choose to handle them however it 
likes. 

The information in curl_getinfo is not sufficient to describe what when wrong 
when a curl request goes awry.

Original issue reported on code.google.com by [email protected] on 19 Dec 2010 at 2:47

single_curl should remove the request from the queue

If another request is added after executing a single request, it won't run 
as a single request since the first one still exists.

Sample Fix:
[code]
private function single_curl() {
    $ch = curl_init();
    $options = $this->get_options($this->requests[0]);
    curl_setopt_array($ch,$options);
    $output = curl_exec($ch);
    $info = curl_getinfo($ch);

    // Remove the request from the queue and reindex
    unset($this->requests[0]);
    $this->requests = array_values($this->requests);

    // it's not neccesary to set a callback for one-off requests
    if ($this->callback) {
        $callback = $this->callback;
        if (is_callable($this->callback)){
            call_user_func($callback, $output, $info);
        }
    } else {
        return $output;
    }
}
[/code]

Original issue reported on code.google.com by dianoga7 on 24 Feb 2010 at 5:22

Add a delay to the threads

Is it possible to add a delay before each thread executes?

Original issue reported on code.google.com by [email protected] on 2 Jun 2010 at 11:45

sizeof($requests) -> sizeof($this->requests), line 169

Small mistype:

if ($i < sizeof($requests)) {
    $ch = curl_init();
    $options = $this->get_options($this->requests[$i++]); // note the
increment on i

    curl_setopt_array($ch,$options);
    ....................

Should be:

if ($i < sizeof($this->requests)) {
    $ch = curl_init();
    .........

Original issue reported on code.google.com by [email protected] on 31 Mar 2010 at 8:25

Better syntax for streaming file processing : huge XML files

This module is really wonderful... it saved the day in my application, which 
otherwise I would had to move to a threading solution!

--------------------
However, the call syntax is not quite optimal for what I'm doing.  I am 
processing huge XML files that are too big to load in memory, and too big to 
completely process before starting the curl operation.  This syntax would be 
ideal:

<?php
require("RollingCurl.php");

function request_callback
($response, $info, $request, $callback_parameter) {
    ...
}

$rc = new RollingCurl("request_callback");
$rc->window_size = 20;
$rq = new RollingCurlRequest();
while( $xml = get_next_xml_element() ) {
    $rq->url($xml['url']);
    $rq->callback_parameter = $xml;
    $rc->execute_until_blocked($rq);  // Blocks if queue full
}
$rc->finish(); // Returns after last pending request is done
?>

Then I can maintain my streaming process, yet still stuff requests in to curl 
as fast as they will go.  Also note the extra parameter that gets passed to the 
callback.

Original issue reported on code.google.com by [email protected] on 9 Jul 2011 at 5:13

Possibility to add an ID number to the url

Hello buddy, first thank you very much for your RollingCurl class.

There is a possibility to add an ID to every url and know which url was
finnished?

I have 5 calls to the same URL with differents options each one I make
something like this:

$urls = array(array("url", options1), array("url", options2));

There is a way to know in the callback function whichone finnish?

thanks

Original issue reported on code.google.com by [email protected] on 27 Mar 2010 at 7:16

Patch for /trunk/RollingCurl.php

$request->headers

Original issue reported on code.google.com by [email protected] on 16 Apr 2012 at 7:22

Attachments:

RollingCurl.php.patch

Ability to use class methods as callbacks

It's better to use call_user_func since it will give an ability to call 
class methods.

$callback = $this->callback;
if (is_callable($callback)){
   call_user_func($callback, $output, $info);
}

Original issue reported on code.google.com by alexander.makarow on 7 Feb 2010 at 3:42

per-request callback

Hi,

This isn't an issue, more of a feature request. I think it'd be very handy
to be able to set the callback function on a per-request basis, rather than
one callback for all requests. Could you gus possibly have a think about
making such a change?

thank you

Original issue reported on code.google.com by [email protected] on 23 Feb 2010 at 12:57

$options defaults can't be overwritten

I was having trouble trying to set my own CURLOPT_TIMEOUT and 
CURLOPT_CONNECTTIMEOUT values in a Request instance. I'm not sure if this is 
intentional, but the default values can't be overwritten because of line 124. 
Simply reversing the arithmetic will fix this issue.

Original issue reported on code.google.com by [email protected] on 3 Aug 2010 at 7:13

Дополнительные данные в $info

Хорошо бы в info , который передается в callback 
было еще 
1. uri , с которого скачан данный контент
2. номер (индекс массива) этого урла в 
масcиве $urls , который мы передали
на скачку.

Часто надо выстроить принятые данные в том 
же порядке, что и урлы,
переданные скрипту на скачивание. Сейчас 
то, что скачалось быстрее, будет
первым, если в callback-функции делать распарс 
страницы и занесение в
глобальную переменную-массив.

Original issue reported on code.google.com by [email protected] on 19 May 2010 at 5:09

The same conditional is being checked twice inthe same statement in function rolling_curl(...) - sizeof()/count()

Hi,

There seems to be an erroneous (duplicated) check on the same conditional value 
in "function rolling_curl(...)" in the RollingCurl class.

// start a new request (it's important to do this before removing the old one)
if ($i < sizeof($this->requests) && isset($this->requests[$i]) && $i < 
count($this->requests)) {...}


$i < sizeof($this->requests)

and...

$i < count($this->requests)

are aliases of each other, and therefore will track each other with identical 
values (they are the same variables, only using alias names)

q.v. http://php.net/manual/en/function.sizeof.php
In PHP, sizeof() is just an alias for the true function, which is count()

REQUEST:
If this is indeed considered erroneous;

please remove "$i < sizeof($this->requests)" from the above mentioned 
conditional statement in function rolling_curl

Additionally renaming each sizeof(..) to count(...) would make the code a 
little more semantic when reading it.

Many Thanks,

Mark S.

Original issue reported on code.google.com by [email protected] on 26 Oct 2010 at 11:43

Stop all further requests.

Hello,

is it somehow possible to stop all pending requests? 

Let's say I parse 1000 pages for the word curl. The script accesses those pages 
with 10 threads. Let's say the script found the word curl on the 589th page, it 
should stop all further requests/threads now.


Is that possible?

Original issue reported on code.google.com by [email protected] on 15 May 2012 at 11:40

strgg / rolling-curl Goto Github PK

rolling-curl's Introduction

Hi there 👋

rolling-curl's People

rolling-curl's Issues

Recommend Projects

Recommend Topics

Recommend Org