strgg / rolling-curl Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/rolling-curl
Automatically exported from code.google.com/p/rolling-curl
Should get_options also check the request object for request specific
headers?
Original issue reported on code.google.com by dianoga7
on 23 Feb 2010 at 9:50
What steps will reproduce the problem?
1. Attempt to use rolling-curl on RHEL with php 5.1.6
What is the expected output? What do you see instead?
No errors are thrown but no files download.
What version of the product are you using? On what operating system?
Redhat Enterprise Linux 5.4
PHP 5.1.6
Please provide any additional information below.
I've traced the problem to the function call curl_multi_info_read which
requires php 5.2.0. (The documentation on php.net is in error as only listing
php 5.x) as a requirement. curl_multi_info_read returns null in all cases
causing the rolling curl to fail.
Original issue reported on code.google.com by [email protected]
on 6 Oct 2010 at 7:40
I'm showing a 130KB memory loss every time rollingCurl finishes a single
"window" (5 in my case) that is never recouped - about the size of the 5
scraped URLS - this data should be getting dumped after the results are passed.
Original issue reported on code.google.com by [email protected]
on 17 Nov 2011 at 9:35
Some times it's not easy to determin what relations the returned data has, to
solve this this patch allowes for adding an id when doing the add() call. The
id will then be returned with the resulting data.
Hope you find it as usefull as i did
Original issue reported on code.google.com by [email protected]
on 30 Dec 2014 at 5:15
Attachments:
The option CURLOPT_FOLLOWLOCATION is set by default in "protected $options"
(line 97), but an error occurs when in safe_mode. See the attached patch.
Original issue reported on code.google.com by [email protected]
on 8 Apr 2010 at 4:45
Attachments:
see below code i added Host: www.oozk.com .but it seem not working;
<?php
require("RollingCurl.php");
function request_callback($response, $info, $request) {
print_r($info);
echo "<hr>";
print_r($response);
echo "<hr>";
}
// top 20 sites according to alexa (11/5/09)
$urls = array("http://127.0.0.1",
"http://127.0.0.1");
$headers = "Host: www.oozk.com";
$rc = new RollingCurl("request_callback");
$rc->window_size = 20;
foreach ($urls as $url) {
$request = new RollingCurlRequest($url,'GET',NULL,$headers);
$rc->add($request);
}
$rc->execute();
Original issue reported on code.google.com by [email protected]
on 4 Sep 2012 at 10:56
I have 10 urls. But everytime I execute them, the callback function is only
called 9 times. I read something on your blog about this, so is it possible
that this bug still exists?
Original issue reported on code.google.com by [email protected]
on 3 Dec 2009 at 4:11
Hi,
This includes several patches I needed, which are attached.
[PATCH 1/3] Add $request as parameter to callback function:
Currently it is not possible to know, which request finished in the callback as
the URL might not be the same due to 301 and is not unique anyway.
By providing the request to the callback this is changed and allows a very
flexible implementation.
I also changed Request class name to be more RollingCurl specific as it is else
cluttering the namespace.
[PATCH 2/3] Add rolling curl group:
This is an implementation using the power of OO programming by extending
rolling curl to allow finishing groups of requests, which allows priotizing and
also giving back feedback, when each group is finished.
The usage is really easy:
Inheriting from the base class allows processing of groups and requests
directly from the class:
Test Class
======
class TestCurlRequest extends RollingCurlGroupRequest
{
public $test_verbose = false;
function process($output, $info)
{
echo "Processing " . $this->url . "\n";
if ($this->test_verbose)
print_r($info);
parent::process($output, $info);
}
}
class TestCurlGroup extends RollingCurlGroup {
function process($output, $info, $request)
{
echo "Group CB: Progress " . $this->name . " (" . ($this->finished_requests+1) . "/" . $this->num_requests . ")\n";
parent::process($output, $info, $request);
}
function finished()
{
echo "Group CB: Finished " . $group->name . "\n";
parent::finished();
}
}
Main function:
=========
$group = new TestCurlGroup("High");
$group->add(new TestCurlRequest("www.google.de"));
$group->add(new TestCurlRequest("www.yahoo.de"));
$group->add(new TestCurlRequest("www.newyorktimes.com"));
$reqs[] = $group;
$group = new TestCurlGroup("Normal");
$group->add(new TestCurlRequest("twitter.com"));
$group->add(new TestCurlRequest("www.bing.com"));
$group->add(new TestCurlRequest("m.facebook.com"));
$reqs[] = $group;
$reqs[] = new TestCurlRequest("www.kernel.org");
$rc = new GroupRollingCurl(); /* Note: No callback here, as its done in Request class*/
foreach ($reqs as $req)
$rc->add($req);
$rc->window_size = $window_size;
return $rc->execute();
---------
Due to the power of polymorphism, the same function (add) can be used for
adding requests and groups of requests.
The "callback" in request and groups is:
process($output, $info)
and
process($request, $output, $info)
Also finished is available for groups.
[PATCH 3/3] Allow custom options to overwrite default ones
Issue #12 fixed in a patch. Dependent on patches before.
Hope you enjoy!
Please apply.
The patches are also available in the master branch of my fork of rolling-curl
at github:
http://github.com/LionsAd/rolling-curl
Best Wishes,
Fabian (LionsAd)
Original issue reported on code.google.com by [email protected]
on 5 Aug 2010 at 7:29
Attachments:
It would be nice to can use as callback class method. It is quite easy to
implement.
class RollingCurl {
...
function __construct($callback = null, $instance = null) {
$this->instance = $instance;
$this->callback = $callback;
}
}
and then call it like this:
if($instance){
$this->instance->$callback;
} else if(is_callable($callback)){
call_user_func($callback,...);
}
thank you.
Original issue reported on code.google.com by [email protected]
on 1 Mar 2011 at 12:40
$rc = new RollingCurl("request_callback");
foreach ($urls as $url) {
$request = new Request($url);
$request->options = array(CURLOPT_PROXY => $proxy,CURLOPT_TIMEOUT => 2,CURLOPT_CONNECTTIMEOUT => 2);
$rc->add($request);
}
получаем ответы:
[http_code] => 200
[total_time] => 17.635
[namelookup_time] => 0.18
[connect_time] => 0.571
[pretransfer_time] => 0.571
[starttransfer_time] => 17.635
или так:
[http_code] => 200
[total_time] => 30.904
[namelookup_time] => 0.821
[connect_time] => 11.877
[pretransfer_time] => 11.897
[starttransfer_time] => 30.133
Даже с учетом сложения таймаутов,
обработка запроса явно не вписывается в
четыре секунды.
В чем может быть проблема? По идее раз
запрос идет через прокси, то в либу опции
передаются, но вот почему не отрабатывают?
Original issue reported on code.google.com by [email protected]
on 4 Sep 2010 at 8:16
Как-то очень давно и не помню точно где, но
читал, что рекомендуется делать
небольшую паузу, иначе на некоторых
версиях курла происходит лишнее
потребление процессора.
Т.е. вместо
while(($execrun = curl_multi_exec($master, $running)) ==
CURLM_CALL_MULTI_PERFORM);
нужно
do{ $execrun = curl_multi_exec($master, $running); usleep(100); }
while($execrun==CURLM_CALL_MULTI_PERFORM);
Original issue reported on code.google.com by [email protected]
on 21 May 2010 at 7:14
RollingCurl.php
Line346: $headers = $request->headers;
Original issue reported on code.google.com by [email protected]
on 7 Apr 2011 at 8:36
The outer do-while loop of the rolling_curl function terminates prematurely if
the $running flag is false, even if new requests have since been added to
$master and not started yet.
The conditional in RollingCurl.php:325 needs to check whether there are still
requests outstanding.
Original issue reported on code.google.com by [email protected]
on 3 Apr 2013 at 6:00
What steps will reproduce the problem?
1. Add Composer package description
2. One class in one file
3. PSR-0 autoloading
Original issue reported on code.google.com by [email protected]
on 13 Mar 2014 at 8:44
dghgfhgfhgf
Original issue reported on code.google.com by [email protected]
on 17 Sep 2011 at 7:07
Attachments:
What steps will reproduce the problem?
1. Fetch a URL that is wrong, or the HMTL $response is missing <title> tags, or
if there is a server configuration issue that is breaking cURL
2. Run example.php
What is the expected output? What do you see instead?
No output, as expected, but PHP raises:
PHP Notice: Undefined variable: title in /example.php on line 19
What version of the product are you using? On what operating system?
r20 Sep 12, 2010 on CentOS 6.2
Please provide any additional information below.
The PHP notice can be fixed by moving the echo $title inside the if block. e.g.,
if (preg_match("~<title>(.*?)</title>~i", $response, $out)) {
$title = $out[1];
echo "<b>$title</b><br />";
} else {
echo "<i>page title not found</i><br />";
};
Original issue reported on code.google.com by [email protected]
on 12 Apr 2014 at 11:21
What steps will reproduce the problem?
1. Install script on server that has "open_basedir" set
2. Run example.php
3. See error message
----------------------------
What is the expected output? What do you see instead?
EXPECTED
Standard example.php output
INSTEAD
Warning: curl_setopt_array() [function.curl-setopt-array]:
CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode is enabled or an
open_basedir is set
----------------------------
What version of the product are you using? On what operating system?
r20
PHP Version 5.2.14 on Linux, MediaTemple (gs)
----------------------------
Please provide any additional information below.
I was able to patch this by changing one line.
ORIGINAL
if (ini_get('safe_mode') == 'Off' || !ini_get('safe_mode')) {
NEW
if ((ini_get('open_basedir') == '') && (ini_get('safe_mode') == 'Off' ||
!ini_get('safe_mode'))) {
Original issue reported on code.google.com by [email protected]
on 28 Oct 2010 at 3:14
With the current library setup there is no way to retrieve an error from the
curl handle as you would with curl_error(). In order to do so you need the curl
handle. I would suggest either passing the curl handle to the response
processor.
or
call curl_error($ch) on each handle, Modify Request object to have an error
property, stow that error in there, when the response processor gets the
response it can check request for errors and choose to handle them however it
likes.
The information in curl_getinfo is not sufficient to describe what when wrong
when a curl request goes awry.
Original issue reported on code.google.com by [email protected]
on 19 Dec 2010 at 2:47
If another request is added after executing a single request, it won't run
as a single request since the first one still exists.
Sample Fix:
[code]
private function single_curl() {
$ch = curl_init();
$options = $this->get_options($this->requests[0]);
curl_setopt_array($ch,$options);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
// Remove the request from the queue and reindex
unset($this->requests[0]);
$this->requests = array_values($this->requests);
// it's not neccesary to set a callback for one-off requests
if ($this->callback) {
$callback = $this->callback;
if (is_callable($this->callback)){
call_user_func($callback, $output, $info);
}
} else {
return $output;
}
}
[/code]
Original issue reported on code.google.com by dianoga7
on 24 Feb 2010 at 5:22
Is it possible to add a delay before each thread executes?
Original issue reported on code.google.com by [email protected]
on 2 Jun 2010 at 11:45
Small mistype:
if ($i < sizeof($requests)) {
$ch = curl_init();
$options = $this->get_options($this->requests[$i++]); // note the
increment on i
curl_setopt_array($ch,$options);
....................
Should be:
if ($i < sizeof($this->requests)) {
$ch = curl_init();
.........
Original issue reported on code.google.com by [email protected]
on 31 Mar 2010 at 8:25
This module is really wonderful... it saved the day in my application, which
otherwise I would had to move to a threading solution!
--------------------
However, the call syntax is not quite optimal for what I'm doing. I am
processing huge XML files that are too big to load in memory, and too big to
completely process before starting the curl operation. This syntax would be
ideal:
<?php
require("RollingCurl.php");
function request_callback
($response, $info, $request, $callback_parameter) {
...
}
$rc = new RollingCurl("request_callback");
$rc->window_size = 20;
$rq = new RollingCurlRequest();
while( $xml = get_next_xml_element() ) {
$rq->url($xml['url']);
$rq->callback_parameter = $xml;
$rc->execute_until_blocked($rq); // Blocks if queue full
}
$rc->finish(); // Returns after last pending request is done
?>
Then I can maintain my streaming process, yet still stuff requests in to curl
as fast as they will go. Also note the extra parameter that gets passed to the
callback.
Original issue reported on code.google.com by [email protected]
on 9 Jul 2011 at 5:13
Hello buddy, first thank you very much for your RollingCurl class.
There is a possibility to add an ID to every url and know which url was
finnished?
I have 5 calls to the same URL with differents options each one I make
something like this:
$urls = array(array("url", options1), array("url", options2));
There is a way to know in the callback function whichone finnish?
thanks
Original issue reported on code.google.com by [email protected]
on 27 Mar 2010 at 7:16
$request->headers
Original issue reported on code.google.com by [email protected]
on 16 Apr 2012 at 7:22
Attachments:
It's better to use call_user_func since it will give an ability to call
class methods.
$callback = $this->callback;
if (is_callable($callback)){
call_user_func($callback, $output, $info);
}
Original issue reported on code.google.com by alexander.makarow
on 7 Feb 2010 at 3:42
Hi,
This isn't an issue, more of a feature request. I think it'd be very handy
to be able to set the callback function on a per-request basis, rather than
one callback for all requests. Could you gus possibly have a think about
making such a change?
thank you
Original issue reported on code.google.com by [email protected]
on 23 Feb 2010 at 12:57
I was having trouble trying to set my own CURLOPT_TIMEOUT and
CURLOPT_CONNECTTIMEOUT values in a Request instance. I'm not sure if this is
intentional, but the default values can't be overwritten because of line 124.
Simply reversing the arithmetic will fix this issue.
Original issue reported on code.google.com by [email protected]
on 3 Aug 2010 at 7:13
Хорошо бы в info , который передается в callback
было еще
1. uri , с которого скачан данный контент
2. номер (индекс массива) этого урла в
масcиве $urls , который мы передали
на скачку.
Часто надо выстроить принятые данные в том
же порядке, что и урлы,
переданные скрипту на скачивание. Сейчас
то, что скачалось быстрее, будет
первым, если в callback-функции делать распарс
страницы и занесение в
глобальную переменную-массив.
Original issue reported on code.google.com by [email protected]
on 19 May 2010 at 5:09
Hi,
There seems to be an erroneous (duplicated) check on the same conditional value
in "function rolling_curl(...)" in the RollingCurl class.
// start a new request (it's important to do this before removing the old one)
if ($i < sizeof($this->requests) && isset($this->requests[$i]) && $i <
count($this->requests)) {...}
$i < sizeof($this->requests)
and...
$i < count($this->requests)
are aliases of each other, and therefore will track each other with identical
values (they are the same variables, only using alias names)
q.v. http://php.net/manual/en/function.sizeof.php
In PHP, sizeof() is just an alias for the true function, which is count()
REQUEST:
If this is indeed considered erroneous;
please remove "$i < sizeof($this->requests)" from the above mentioned
conditional statement in function rolling_curl
Additionally renaming each sizeof(..) to count(...) would make the code a
little more semantic when reading it.
Many Thanks,
Mark S.
Original issue reported on code.google.com by [email protected]
on 26 Oct 2010 at 11:43
Hello,
is it somehow possible to stop all pending requests?
Let's say I parse 1000 pages for the word curl. The script accesses those pages
with 10 threads. Let's say the script found the word curl on the 589th page, it
should stop all further requests/threads now.
Is that possible?
Original issue reported on code.google.com by [email protected]
on 15 May 2012 at 11:40
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.