Git Product home page Git Product logo

Comments (35)

colinjcotter avatar colinjcotter commented on July 29, 2024 1

I think that it is specific to minimize(), because if I just call Jhat.derivative() then the memory usage asymptotes to a constant value.

from firedrake.

dham avatar dham commented on July 29, 2024 1

@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.
@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?

Just double checked. The memory usage stays constant if you have the lines

gc.collect()
PETSc.garbage_cleanup(mesh._comm)

inside the loop. Otherwise it does increase over time. This is not going to be the solution here since we're still leaking with these calls when we use minimize.

Are both necessary or is the second sufficient?
This is a bit weird, because NLVS.solve should be being called when re-evaluating the functional, and that should call the garbage cleanup.

Actually only the first (gc.collect()) is required. This must be because we're otherwise only rarely cleaning up cyclically-referenced objects so the memory appears to grow at first. Ultimately it would get cleared when Python decides to do its own cleanup.

OK so it sounds like we need to add gc.collect() in front of the PETSc collection operation when we call it.

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

I believe that this is a known issue. We only clean up the memory when solve is called.

I think that your issue will be resolved if you add the line PETSc.garbage_cleanup(mesh._comm) inside your loop.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

When we evaluate the adjoint we use LinearSolvers instead of LinearVariationalSolvers (since we can't have RHSs that are Cofunctions). I think it makes sense that we're not hitting the code path where we call garbage_cleanup.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

from firedrake.

maneeshkrsingh avatar maneeshkrsingh commented on July 29, 2024

When executing this script with added lines PETSc.garbage_cleanup(mesh._comm) in the loop, memory usage initiates at approximately 200MB and gradually escalates to 2GB within the first 200 steps. Upon reaching the 1000th step, the total memory consumption peaks at around 7GB. Results are same without this line as well.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

Yes, I also observe that garbage_cleanup does not help.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

mprofile_20230821135302.dat.gz

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

... actually it is still creeping upwards, just waiting for a longer run

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

I think I've observed that Jhat(...) is also increasing memory usage. I'll continue to investigate.

I was wrong.

from firedrake.

dham avatar dham commented on July 29, 2024

Can we get an approximate quantification of the leakage in units of Functions per timestep? I think that might help us understand what's going on (e.g. are we leaking functions or solvers).

from firedrake.

dham avatar dham commented on July 29, 2024

You can approximate a Function as number of DoFs x 8 bytes.

from firedrake.

dham avatar dham commented on July 29, 2024

Also, can you try a different optimiser?

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.

@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

@dham: I'm just calling Jhat and Jhat.derivative, and that's enough to see a memory increase in mprof.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

@dham do you mean some other function than minimize()? Or some different options in the call to minimize?

from firedrake.

dham avatar dham commented on July 29, 2024

I'm a bit confused because you seem to have said something a bit different an hour ago. In any event, if it's in our code that's easier to deal with. So what we need to know is what we're leaking in units of Functions per step.

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.

@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?

Just double checked. The memory usage stays constant if you have the lines

gc.collect()
PETSc.garbage_cleanup(mesh._comm)

inside the loop. Otherwise it does increase over time. This is not going to be the solution here since we're still leaking with these calls when we use minimize.

from firedrake.

dham avatar dham commented on July 29, 2024

@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.
@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?

Just double checked. The memory usage stays constant if you have the lines

gc.collect()
PETSc.garbage_cleanup(mesh._comm)

inside the loop. Otherwise it does increase over time. This is not going to be the solution here since we're still leaking with these calls when we use minimize.

Are both necessary or is the second sufficient?

This is a bit weird, because NLVS.solve should be being called when re-evaluating the functional, and that should call the garbage cleanup.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

Thanks for nailing it down a bit more, @connorjward !

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

@dham the second is not sufficient.

Summary: if the loop just calls jhat and/or jhat.derivative(), gc.collect() and garbage_cleanup(...) are necessary and sufficient.
If the loop calls minimize(), we are leaking.

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.
@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?

Just double checked. The memory usage stays constant if you have the lines

gc.collect()
PETSc.garbage_cleanup(mesh._comm)

inside the loop. Otherwise it does increase over time. This is not going to be the solution here since we're still leaking with these calls when we use minimize.

Are both necessary or is the second sufficient?

This is a bit weird, because NLVS.solve should be being called when re-evaluating the functional, and that should call the garbage cleanup.

Actually only the first (gc.collect()) is required. This must be because we're otherwise only rarely cleaning up cyclically-referenced objects so the memory appears to grow at first. Ultimately it would get cleared when Python decides to do its own cleanup.

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

To be clear though, this does not fix the memory leak being experienced. Adding gc.collect() and garbage_cleanup calls does not stop the memory from leaking from the example given above.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

Here's the output of the above from mprof (so x-axis is sample times). It is 200 calls to minimize.
Figure_1

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

This is with gc.collect() and garbage_collect added after each minimise.

It looks like most of the memory is eventually collected, but this always happens at the end of the loop, no matter how long it is.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

That's about 5250Mb leaking total, which is about 26.25Mb per minimize() call.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

I am interested to try other optimisers, but there is not even any API documentation for minimize(), so I don't know how to.

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

I've figured it out. We are leaking PETSc objects that are stored on COMM_SELF. Adding the line PETSc.garbage_cleanup(PETSc.COMM_SELF) inside the loop makes the leak go away. You can see the number of objects increasing if you call PETSc.garbage_view(PETSc.COMM_SELF).

The proper fix for this is probably to add some code to PETSc/petsc4py so COMM_SELF is always cleared when garbage_cleanup is called. Or one could actually eagerly delete these objects since there are no deadlock concerns.

from firedrake.

maneeshkrsingh avatar maneeshkrsingh commented on July 29, 2024

Yes, it works. Significantly reduces memory use.

from firedrake.

wence- avatar wence- commented on July 29, 2024

The proper fix for this is probably to add some code to PETSc/petsc4py so COMM_SELF is always cleared when garbage_cleanup is called. Or one could actually eagerly delete these objects since there are no deadlock concerns.

Yeah, if comm.size == 1 in petsc then you never need to defer collection

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

The proper fix for this is probably to add some code to PETSc/petsc4py so COMM_SELF is always cleared when garbage_cleanup is called. Or one could actually eagerly delete these objects since there are no deadlock concerns.

Yeah, if comm.size == 1 in petsc then you never need to defer collection

That's exactly what I've done here. I will close this issue when that gets merged and our PETSc fork is updated.

from firedrake.

colinjcotter avatar colinjcotter commented on July 29, 2024

Thanks for clearing this up!

Once the fix is in, we would still need to call garbage_cleanup in this case - is that right?

What kind of objects are on COMM_SELF?

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

Thanks for clearing this up!

Once the fix is in, we would still need to call garbage_cleanup in this case - is that right?

No you shouldn't need to call garbage_cleanup. It gets called automatically by the LinearSolver during the adjoint calculation. Also, if you're only running this in serial, then things should always be eagerly collected anyway.

What kind of objects are on COMM_SELF?

I think there were a lot of PetscSFs and the like.

from firedrake.

connorjward avatar connorjward commented on July 29, 2024

The proper fix for this is probably to add some code to PETSc/petsc4py so COMM_SELF is always cleared when garbage_cleanup is called. Or one could actually eagerly delete these objects since there are no deadlock concerns.

Yeah, if comm.size == 1 in petsc then you never need to defer collection

That's exactly what I've done here. I will close this issue when that gets merged and our PETSc fork is updated.

Closing as this has happened.

from firedrake.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.