Use Case
Our system is paginating a Doctrine Query object with the KnpPaginator (via the Knp PaginatorBundle) in order to execute a batch-processing job.
KnpPaginator is the preferred paginator system due to its ability to abstract and deal with multiple data sources for pagination.
The batch processing system calls the ->paginate()
method repeatedly (~140 times) to fetch batches of results for processing. Yes, there's ways we can do this more efficiently, but we're slowly working on the Paginator to make these scenarios faster while still maintaining the desired level of abstraction.
High-Level Problem
Repeated calls the pagination service lead to excessive memory usage.
The paginator slowly becomes unresponsive with each iteration of the ->paginate()
call.
Factors
- Symfony 2.3
- PHP 5.6.27 with OPcache
- Using Monolog
NullHandler
attached to TraceableEventDispatcher
, to eliminate memory leak during logging of the EventDispatcher system.
- Paginate item reduced to a basic array, as opposed to a query.
Example Code
$paginator = $container->get('knp_paginator');
for ($i=0; $i<1000; $i++) {
$pagination = $paginator->paginate([1,2,3,4,5,6, 7, 8, 9, 10], $i, 10);
}
Low-Level Problem
\Knp\Component\Pager\Paginator
maintains its own eventDispatcher
for its entire life, and is never reset or replaced. In Symfony this is the Kernel EventDispatcher
The eventDispatcher
is populated with these basic subscribers:
\Knp\Component\Pager\Event\Subscriber\Paginate\PaginationSubscriber
\Knp\Component\Pager\Event\Subscriber\Sortable\SortableSubscriber
Either during Construct or as part of the service definition in the KnpPaginatorBundle.
The Paginator can also be added to using the subscribe()
and connect()
methods, which indicates that the Paginator and EventDispatch is supposed to act as a single-instance service for the lifetime of the application.
\Knp\Component\Pager\Paginator#paginate()
has a BeforeEvent
(knp_pager.before
) which is used to prime the paginator to act upon paginatable items.
This is created as follows:
https://github.com/KnpLabs/knp-components/blob/master/src/Knp/Component/Pager/Paginator.php#L107
$beforeEvent = new Event\BeforeEvent($this->eventDispatcher);
\Knp\Component\Pager\Event\Subscriber\Paginate\PaginationSubscriber#before
is subsequently called, which is used to attach (lazy-load) the standard paginatable subscribers:
public function before(BeforeEvent $event)
{
$disp = $event->getEventDispatcher();
// hook all standard subscribers
$disp->addSubscriber(new ArraySubscriber);
$disp->addSubscriber(new Doctrine\ORM\QueryBuilderSubscriber);
// ... etc ...
}
These calls to addSubscriber
are actually being affected on the eventDispatcher
attach to the persistent Paginator
class (in Symfony, this is the Kernel Event Dispatcher).
Effectively, each time we're calling ->paginate()
the knp_pager.before
event is being called, which attached more subscribers to the Kernel Event Dispatcher. This eventually results in a significant memory leak.
Subscribers that exhibit this behaviour:
\Knp\Component\Pager\Event\Subscriber\Paginate\PaginationSubscriber
\Knp\Component\Pager\Event\Subscriber\Filtration\FiltrationSubscriber
\Knp\Component\Pager\Event\Subscriber\Sortable\SortableSubscriber
Intended use?
\Knp\Component\Pager\Event\BeforeEvent
looks like its was designed purely to perform this lazy-loading Subscriber registration action, as it only provides scope over the EventDispatcher and nothing else.
The fact that BeforeEvent can be called multiple times seems like an oversight in the design of the ->paginate()
method.
Proposed fix
The 3 subscriber-loading-subscribers which cause this problem in the first place are merely acting as lazy-loaders.
These 3 classes can be augmented to hold state, which will prevent them from executing more than once on any single EventDispatcher.
This keeps the lazy-loading aspect firmly sioled to the domain of the Subscriber, and lets the Paginator not care about how Subscribers are loaded.
Implementors with similar subscribers will have to add this behaviour to their own subscribers in order to prevent the duplication and subsequent memory leak bug.
What now
I'll be submitting a PR shortly (~1 day) to provide the fix proposed above.
If there is issue with this solution (and I looked at ~5) please let me know so I can amend the fix appropriately.