Comments (35)
I think that it is specific to minimize(), because if I just call Jhat.derivative() then the memory usage asymptotes to a constant value.
from firedrake.
@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.
@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?Just double checked. The memory usage stays constant if you have the lines
gc.collect() PETSc.garbage_cleanup(mesh._comm)inside the loop. Otherwise it does increase over time. This is not going to be the solution here since we're still leaking with these calls when we use
minimize
.Are both necessary or is the second sufficient?
This is a bit weird, because NLVS.solve should be being called when re-evaluating the functional, and that should call the garbage cleanup.Actually only the first (
gc.collect()
) is required. This must be because we're otherwise only rarely cleaning up cyclically-referenced objects so the memory appears to grow at first. Ultimately it would get cleared when Python decides to do its own cleanup.
OK so it sounds like we need to add gc.collect()
in front of the PETSc collection operation when we call it.
from firedrake.
I believe that this is a known issue. We only clean up the memory when solve
is called.
I think that your issue will be resolved if you add the line PETSc.garbage_cleanup(mesh._comm)
inside your loop.
from firedrake.
from firedrake.
When we evaluate the adjoint we use LinearSolver
s instead of LinearVariationalSolver
s (since we can't have RHSs that are Cofunction
s). I think it makes sense that we're not hitting the code path where we call garbage_cleanup
.
from firedrake.
from firedrake.
When executing this script with added lines PETSc.garbage_cleanup(mesh._comm)
in the loop, memory usage initiates at approximately 200MB and gradually escalates to 2GB within the first 200 steps. Upon reaching the 1000th step, the total memory consumption peaks at around 7GB. Results are same without this line as well.
from firedrake.
Yes, I also observe that garbage_cleanup
does not help.
from firedrake.
mprofile_20230821135302.dat.gz
from firedrake.
... actually it is still creeping upwards, just waiting for a longer run
from firedrake.
I think I've observed that I'll continue to investigate.Jhat(...)
is also increasing memory usage.
I was wrong.
from firedrake.
Can we get an approximate quantification of the leakage in units of Functions per timestep? I think that might help us understand what's going on (e.g. are we leaking functions or solvers).
from firedrake.
You can approximate a Function as number of DoFs x 8 bytes.
from firedrake.
Also, can you try a different optimiser?
from firedrake.
@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.
@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?
from firedrake.
@dham: I'm just calling Jhat and Jhat.derivative, and that's enough to see a memory increase in mprof.
from firedrake.
@dham do you mean some other function than minimize()? Or some different options in the call to minimize?
from firedrake.
I'm a bit confused because you seem to have said something a bit different an hour ago. In any event, if it's in our code that's easier to deal with. So what we need to know is what we're leaking in units of Functions per step.
from firedrake.
@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.
@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?
Just double checked. The memory usage stays constant if you have the lines
gc.collect()
PETSc.garbage_cleanup(mesh._comm)
inside the loop. Otherwise it does increase over time. This is not going to be the solution here since we're still leaking with these calls when we use minimize
.
from firedrake.
@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.
@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?Just double checked. The memory usage stays constant if you have the lines
gc.collect() PETSc.garbage_cleanup(mesh._comm)inside the loop. Otherwise it does increase over time. This is not going to be the solution here since we're still leaking with these calls when we use
minimize
.
Are both necessary or is the second sufficient?
This is a bit weird, because NLVS.solve should be being called when re-evaluating the functional, and that should call the garbage cleanup.
from firedrake.
Thanks for nailing it down a bit more, @connorjward !
from firedrake.
@dham the second is not sufficient.
Summary: if the loop just calls jhat and/or jhat.derivative(), gc.collect() and garbage_cleanup(...) are necessary and sufficient.
If the loop calls minimize(), we are leaking.
from firedrake.
@dham I'm using mprof which just gives a time series from snapshots. I suppose I can just divide the final number by the number of for loop iterations.
@connorjward I am getting what seems to be an increase when using Jhat(). What made you think it wasn't happening?Just double checked. The memory usage stays constant if you have the lines
gc.collect() PETSc.garbage_cleanup(mesh._comm)inside the loop. Otherwise it does increase over time. This is not going to be the solution here since we're still leaking with these calls when we use
minimize
.Are both necessary or is the second sufficient?
This is a bit weird, because NLVS.solve should be being called when re-evaluating the functional, and that should call the garbage cleanup.
Actually only the first (gc.collect()
) is required. This must be because we're otherwise only rarely cleaning up cyclically-referenced objects so the memory appears to grow at first. Ultimately it would get cleared when Python decides to do its own cleanup.
from firedrake.
To be clear though, this does not fix the memory leak being experienced. Adding gc.collect()
and garbage_cleanup
calls does not stop the memory from leaking from the example given above.
from firedrake.
Here's the output of the above from mprof (so x-axis is sample times). It is 200 calls to minimize.
from firedrake.
This is with gc.collect() and garbage_collect added after each minimise.
It looks like most of the memory is eventually collected, but this always happens at the end of the loop, no matter how long it is.
from firedrake.
That's about 5250Mb leaking total, which is about 26.25Mb per minimize() call.
from firedrake.
I am interested to try other optimisers, but there is not even any API documentation for minimize(), so I don't know how to.
from firedrake.
I've figured it out. We are leaking PETSc objects that are stored on COMM_SELF
. Adding the line PETSc.garbage_cleanup(PETSc.COMM_SELF)
inside the loop makes the leak go away. You can see the number of objects increasing if you call PETSc.garbage_view(PETSc.COMM_SELF)
.
The proper fix for this is probably to add some code to PETSc/petsc4py so COMM_SELF
is always cleared when garbage_cleanup
is called. Or one could actually eagerly delete these objects since there are no deadlock concerns.
from firedrake.
Yes, it works. Significantly reduces memory use.
from firedrake.
The proper fix for this is probably to add some code to PETSc/petsc4py so
COMM_SELF
is always cleared whengarbage_cleanup
is called. Or one could actually eagerly delete these objects since there are no deadlock concerns.
Yeah, if comm.size == 1
in petsc then you never need to defer collection
from firedrake.
The proper fix for this is probably to add some code to PETSc/petsc4py so
COMM_SELF
is always cleared whengarbage_cleanup
is called. Or one could actually eagerly delete these objects since there are no deadlock concerns.Yeah, if
comm.size == 1
in petsc then you never need to defer collection
That's exactly what I've done here. I will close this issue when that gets merged and our PETSc fork is updated.
from firedrake.
Thanks for clearing this up!
Once the fix is in, we would still need to call garbage_cleanup in this case - is that right?
What kind of objects are on COMM_SELF?
from firedrake.
Thanks for clearing this up!
Once the fix is in, we would still need to call garbage_cleanup in this case - is that right?
No you shouldn't need to call garbage_cleanup
. It gets called automatically by the LinearSolver
during the adjoint calculation. Also, if you're only running this in serial, then things should always be eagerly collected anyway.
What kind of objects are on COMM_SELF?
I think there were a lot of PetscSF
s and the like.
from firedrake.
The proper fix for this is probably to add some code to PETSc/petsc4py so
COMM_SELF
is always cleared whengarbage_cleanup
is called. Or one could actually eagerly delete these objects since there are no deadlock concerns.Yeah, if
comm.size == 1
in petsc then you never need to defer collectionThat's exactly what I've done here. I will close this issue when that gets merged and our PETSc fork is updated.
Closing as this has happened.
from firedrake.
Related Issues (20)
- BUG: unable to save CG 2 field to `pvd` when using hexahedral mesh HOT 1
- DOCS: Use Sphinx directives to include code _using tags_
- VomMover HOT 5
- BUG: UFLInequalityConstraint from firedrake.adjoint is not working HOT 7
- Checkpointing arbitrary tags
- Successful installation on HPC but code is not working HOT 6
- BUG: cannot import from firedrake.pyplot HOT 3
- BUG: Error in UFLInequalityConstraint with ROL
- BUG: bug in assembly or vertex-only mesh HOT 6
- Function assigning a Cofunction HOT 2
- BUG: Incorrect `Action` derivatives in the complex case HOT 3
- BUG: $VIRTUAL_ENV/.cache not created HOT 5
- Follow up to #3138: cannot install libspatialindex and firedrake in the same path HOT 7
- BUG: Fieldsplit doesn't work with a `Cofunction` right hand side. HOT 4
- BUG: Errors checkpointing extruded mesh HOT 3
- Zenodo release HOT 1
- interpolation calls the old interpolation implementaion which produces warnings, could be considered error due to code predictable execution path. HOT 2
- BUG: Interpolators do not work anymore. HOT 4
- Multigrid for Non-nested meshes HOT 1
- BUG: Assembling a `Cofunction` returns the `Cofunction` rather than a supplied `tensor` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from firedrake.