Git Product home page Git Product logo

Comments (4)

Krinkle avatar Krinkle commented on August 16, 2024

@JohntheFish Are you able to share a test case to reproduce this?

For example:

$lessFile = __DIR__ . '/myinput.less';

// no cache
$parser = new Less_Parser();
$parser->parseFile( $lessFile );
$parser->getCss();

// cache clear
// $ rm -rf /tmp/less_php_cache/
// cache miss
$options = [ 'cache_dir' => '/tmp/less_php_cache/' ];
$files = [ $lessFile => '' ];
Less_Cache::Get( $lessFiles, $options )
// cache hit
Less_Cache::Get( $lessFiles, $options )

If you run this from an ad-hoc php script, does it run out of memory only in the cache hit case, or also in one or both of the no-cache or cache-miss cases?

It looks like in your case, you're running out of memory by possilby only a small margin (slightly more than the limit of 128M). In that case, it would be useful to determine the following:

  1. How much more does it need to succeed?
  2. How small can you make the limit before the no-cache script fails as well?

You can run the script like php -d memory_limit=256M myscript.php and increase/decrease accordingly.

This would help rule out whether the majority of the memory consumption (and possible leak) is in how the tree is represented in general, or whether there is a notable cost that is specific to caching. I suspect the caching is only making a small difference, and that perhaps your input exposes a general problem in how the tree is represented.

If so, I would need a copy of your input file or a simplified version of it. I suspect there is a specific kind of syntax in one of your files that is represented in a recursive or otherwise very inefficient manner, however, that is difficult to find without an example.

For example, if your input file exceeds, let's say, 64M of memory even without caching, then caching is likely not the real issue, but something in how Less.php represents your input file. In that case, I would suggest iteratively removing half your input file and narrow down bit by bit what kind of Less syntax is triggering the leak. If you're comfortable sharing the entire input file/directory, I could help with that. GitHub allows attaching ZIP files as well.

from less.php.

JohntheFish avatar JohntheFish commented on August 16, 2024

I can confirm it only hits the memory limit in the cache hit case.

  • No cache always works
  • A cache miss always works
  • A cache hit fails on some files. The file content is successfully read, but the memory limit is hit by unserialize.

Upping the memory limit resolves the problem. (I doubled it to 256M).

For some more diagnostics, I set it down again to 128M.

Working through the source files individually (or minimal subsets of them where there is interdependence), no individual file (or minimal subset) results in the issue. They all cache and cache reading is always successful.

Going back to the overall compilation, I added some logging to identify the last file read from cache before an unserialize broke. This identified a respective source file, but commenting that source out of the overall resulted in it hitting a memory limit on unserialize of a cache file a few files later. The unserialize tended to break on unserializing cache files originating from larger source files rather than any particularly complex rules. However, if the cause of the problem is cumulative consumption of memory by successive unserialize, that would be expected.

Looking for "Unserialize" issues on php github. This issue looks promising: php/php-src#10126

Anyway, thanks for prompting me about memory limits. Increasing to 256M gets round the problem.

from less.php.

JohntheFish avatar JohntheFish commented on August 16, 2024

A quick experiment to dodge memory limits. The magic fudge factor of 0.5 worked for my current project (0.6 failed). The magic 0.5 is obviously application specific - increase the size of a source file and the magic 0.5 could be invalidated.

With that in mind this code is not a workable solution, also because work has already been done saving a cache file that is not now used.

               case 'serialize':
                        if (preg_match('/^(\d+)M$/', ini_get('memory_limit'), $matches)) {
                            if (memory_get_usage() < (0.5 * 1024 * 1024 * $matches[1])) {
                                $cache = unserialize(file_get_contents($cache_file));
                            }
                        }

                        if ($cache ?? null) {
                            touch($cache_file);
                            $this->UnsetInput();
                            return $cache;
                        }
                        break;

I won't be leaving the above in the code, the simple increase of memory to 256M is less messy (though again I expect application specific)

from less.php.

Krinkle avatar Krinkle commented on August 16, 2024

@JohntheFish Thanks, that makes sense. The 4X increase in memory, as stated at php/php-src#10126, would indeed get you over the limit much more quickly.

I suspect there might still be something we can do to reduce memory usage. For example, if you take any leaf input file and concatenate copies of the same input file until you reach roughly the same SLoC as your actual input file, I suspect it would not reach the limit. That is to say, some input is more "expensive" than others, and probably a certain combination of operators or mixins in your source code, perhaps some kind of seemingly-recursive Less logic, might be leading to a disproportionate amount of memory being consumed, compared to other input code with the same number of lines/tokens.

Having said that, let me share what we do for Wikipedia in MediaWiki production. We don't use the cache feature of less.php, in part due to a policy against storing serialized PHP, but also in part for performance. We find we get way better cache-hit performance by storing the resulting CSS code rather than the intermediary Less Tree. This requires a few more lines of code on the caller side, but is something we've been doing since before we adopted the current less.php library.

You can find our code at https://github.com/wikimedia/mediawiki/blob/1.41.0/includes/ResourceLoader/FileModule.php#L1092, but it basically boils down to:

  • After a cache miss, store the result of $parser->getCSS(), along with the result of $parser->AllParsedFiles(), and the result of md5( implode( '', array_map($files, 'md5_file') ) ).
  • Cache this in-memory with apcu_store under a key like 'lessphp-css', md5_file($lessFilePath), $lessFilePath, $vars, $importDirs.
  • On cache-hit, access $data['files'] and do the same md5_file mapping for the current point in time, this tells you whether or not any indirectly imported files have changed. If the combined hash is still the same, then return $data['css'] and consider it a real cache hit. Otherwise, treat as cache miss.

If you read the actual MediaWIki code, you'll find it does a few more things. We actually use hash('md4') instead of md5(). In addition we avoid calling md5_file and instead try to maximise use of the operating system's fstat cache by caching the result of hash('md4', file_get_contents() in APCU under a key based on filepath:mtime from filemtime. That way in the common case all we do is read a key from apcu_fetch, and then to validate the hashes the "map" operation only calls filemtime a bunch of times (cheap), with which we then fetch each cached file hash from apcu. We don't use the mtime as the final validation itself we find those aren't reliable over time (i.e. Git doesn't track it, and may not be deterministic).

from less.php.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.