Git Product home page Git Product logo

checklist's Introduction

The Haskell performance checklist

You have a Haskell program that's not performing how you'd like. Use this list to check that you've done the usual steps to performance nirvana:

Are you compiling with -Wall?

Are you compiling with -O or above?

Have you run your code with the profiler?

Have you checked for stack space leaks?

Have you setup an isolated benchmark?

Have you looked at strictness of your function arguments?

Are you using the right data structure?

Are your data types strict and/or unpacked?

Did you check your code isn't too polymorphic?

Do you have an explicit export list?

Have you looked at the Core?

Have you considered unboxed arrays/strefs/etc?

Are you using Text or ByteString instead of String?

Have you considered compiling with LLVM?

Are you compiling with -Wall?

GHC warns about type defaults and missing type signatures:

  • If you let GHC default integers, it will choose Integer. This is 10x slower than Int. So make sure you explicitly choose your types.
  • You should have explicit types to not miss something obvious in the types that is slow.

Are you compiling with -O or above?

By default GHC does not optimize your programs. Cabal and Stack enable this in the build process. If you're calling ghc directly, don't forget to add -O.

Enable -O2 for serious, non-dangerous optimizations.

Have you run your code with the profiler?

Profiling is the standard way to see for expressions in your program:

  1. How many times they run?
  2. How much do they allocate?

Resources on profiling:

Did you try weighing your operations?

Check that your operations aren't allocating too much or more than you'd expect:

https://github.com/fpco/weigh#readme

Allocating in GC is claimed to be "fast" but not allocating is always faster.

Have you checked for stack space leaks?

Most space leaks result in an excess use of stack. If you look for the part of the program that results in the largest stack usage, that is the most likely space leak, and the one that should be investigated first.

Resource on stack space leak:

Have you setup an isolated benchmark?

Benchmarking is a tricky business to get right, especially when timing things at a smaller scale. Haskell is lucky to have a very good benchmarking package. If you are asking someone for help, you are helping them by providing benchmarks, and they are likely to ask for them.

Do it right and use Criterion.

Resources on Criterion:

Have you looked at strictness of your function arguments?

https://wiki.haskell.org/Performance/Strictness

Are you using the right data structure?

This GitHub organization provides comparative benchmarks against a few types of data structures. You can often use this to determine which data structure is best for your problem:

  • sets - for set-like things
  • dictionaries - dictionaries, hashmaps, maps, etc.
  • sequences - lists, vectors/arrays, sequences, etc.

Tip: Lists are almost always the wrong data structure. But sometimes they are the right one.

See also HaskellWiki on data structures.

Are your data types strict and/or unpacked?

By default, Haskell fields are lazy and boxed. Making them strict can often (not always) give them more predictable performance, and unboxed fields (such as integers) do not require pointer indirection.

Resources on data type strictness:

Did you check your code isn't too polymorphic?

Code which is type-class-polymorphic, such as,

genericLength :: Num n => [a] -> n

has to accept an additional dictionary argument for which class instance you want to use for Num. That can make things slower.

Resources on overloading:

Do you have an explicit export list?

This is a suggestion from the HaskellWiki, but I believe it's based on out of date information about how GHC does inlining. It's left here for interested parties, however.

https://wiki.haskell.org/Performance/Modules

Have you looked at the Core?

Haskell compiles down to a small language, Core, which represents the real code generated before assembly. This is where many optimization passes take place.

Resources on core:

Have you considered unboxed arrays/strefs/etc?

An array with boxed elements such as Data.Vector.Vector a means each element is a pointer to the value, instead of containing the values inline.

Use an unboxed vector where you can (integers and atomic types like that) to avoid the pointer indirection. The vector may be stored and accessed in the CPU cache, avoiding mainline memory altogether.

Likewise, a mutable container like IORef or STRef both contain a pointer rather than the value. Use URef for an unboxed version.

Are you using Text or ByteString instead of String?

The String type is slow for these reasons:

  • It's a linked list, meaning access is linear.
  • It's not a packed representation, so each character is a separate structure with a pointer to the next. It requires access to mainline memory.
  • It allocates a lot more memory than packed representations.

Resources on string types:

Case studies:

Have you considered compiling with LLVM?

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/codegens.html#llvm-code-generator-fllvm

checklist's People

Contributors

chrisdone avatar kcsongor avatar colonelpanic8 avatar psibi avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.