Git Product home page Git Product logo

spaceleak's Introduction

Spaceleak detection

Every large Haskell program almost inevitably contains space leaks. Space leaks are often difficult to detect, but relatively easy to fix once detected (typically insert a !). This page gives a simple technique to detect some space leaks, along with a set of blog posts using that technique and detailing other space leaks.

Spaceleak stack-limiting technique

The stack-limiting technique is useful for detecting a common kind of spaceleak, a situation where an excessive number of interdependent thunks are created and eventually evaluated. The technique works because thunks are evaluated on the stack, so by limiting the stack size we effectively limit the number of interdependent thunks that we can evaluate.

To find space leaks, given a program and a representative run (e.g. the test suite, a suitable input file):

  • Compile the program for profiling, e.g. ghc --make Main.hs -rtsopts -prof -fprof-auto.
  • Run the program with a specific stack size, e.g. ./Main +RTS -K100K to run with a 100Kb stack.
  • Increase/decrease the stack size until you have determined the minimum stack for which the program succeeds, e.g. -K33K.
  • Reduce the stack by a small amount and rerun with -xc, e.g. ./Main +RTS -K32K -xc.
  • The -xc run will print out the stack trace on every exception, look for the one which says stack overflow (likely the last one) and look at the stack trace to determine roughly where the leak is.
  • Attempt to fix the space leak, confirm by rerunning with -K32K.
  • Repeat until the test works with a small stack, typically -K1K.
  • Add something to your test suite to ensure that if a space leak is ever introduced then it fails, e.g. ghc-options: -with-rtsopts=-K1K in Cabal.

This technique does not detect when an excessive number of thunks are created but never evaluated, or when a small number of thunks hold on to a large amount of live data.

Note that typically the main thread requires greater than 1K stack, and that once GHC crosses 1K it makes the stack 32K bigger, as a result anything less than 33K (e.g. 1K) is often rounded up to 33K. To obtain a more precise stack measurement use -kc2k which increases the stack in 2K chunks (but does cause only half the stack to be used, so bear that in mind when scaling -K numbers). That said, 32K corresponds to approximately 1000 stack evaluation slots, which suggests you don't have any significant space leaks.

Below are links to further information, including lots of practical examples of real space leaks caught using the method above.

Talks

Blog tales

Fixes

Code changes

Publications

Other approaches

Notes

On the main thread the stack limit is less effective, usually more like 8K if you request 1K. On spawned threads it seems much better. Solution is to always join . onceFork on the main thread.

spaceleak's People

Contributors

ndmitchell avatar prednaz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

prednaz doytsujin

spaceleak's Issues

Jumps in stack limits?

The README uses -K33K as an example for a stack limit that is required for a program to run. And while I experimented with this technique I observed the exact same limit a couple of times.

I don't think this is a coincidence!

More specifically, I repeatedly had situations where a program completed successfully with -K1K. When I then incrementally increased the input size it would continue to work until at some point I would observe a jump where it would require -K33K.

One explanation for this would be that stack limits are enforced in 32K blocks.

An other thought, I haven't used join . onceFork. @ndmitchell in the README you mention a minimum stack limit of 8K for the main thread. Do you know if this was for x86? Could it be that for x86_64 this is actually 32K?

Add rationale to README?

Hey @ndmitchell! I'm still experimenting with this approach, but you may have a new convert. What I think would be useful is some short rational on how this technique works to the README.

As I understand it the key idea is: Thunks are evaluated on the stack => by limiting the stack size we limited the amount of thunking that can happen

How about something along the lines of:


This technique is useful for detecting a common kind of spaceleak, a situations where an excessive number of interdependent thunks are created and eventually evaluated.

This technique works because thunks are evaluated on the stack. By limiting the stack size we effectively limit the number of interdependent thunks that we can evaluate.

This technique falls short

  • when an excessive number of thunks is created but never evaluated
  • when a thunk chain is tail recursive

This technique will also not detect situations where a small number of thunks holds on to a large amount of live data.


Would that be accurate?

Question regarding your talk at eXchange 2016

Hi, great talk!

Regarding slide 4 of your talk, it is not clear to me, what you mean by "ideal solution".
Isn't [1..10] or [1,2,3,4,5,6,7,8,9,10] as an argument actually producing an graph-of-expressions ?

So actually isn't Haskell also laying-out this list first completely... just not as a list of Integers but as a list of thunks ?

My first reaction to this slide was, wow here is a language 'producing' Integers on the fly, or by demand.. if you will.
But this actually isn't really the case, right ?

Best,
Joerg

Experience Report

Trying out this technique on a large program with many dependencies I found that compiling with profiling seems to distort stack usage too much for the results to be useful. That is following the process here resulted in chasing "leaks" in dependencies where none exist (or at least no high stack usage) in a non-profile build (in my case even one with -O0).

I was using stack ... -profile and frankly don't have a great idea about what flags ghc was passed for my program and its dependencies.

Do you think it would be a good suggestion to first try running an optimized build with K1K and only to undertake this exercise if it fails?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.