Git Product home page Git Product logo

Comments (8)

daniel-koehn avatar daniel-koehn commented on August 30, 2024

Hi Pavel,

Assuming that you used 16 CPU cores for the parallelization with domain decompositon, the remaining cores are used for shot parallelization. How many shots are you modelling in total? Are they dividible by 24 without any remainder? Does the problem also occur when using less cores for the shot parallelization, or in the extreme case only using the domain decomposition?

Best regards,

Daniel

from denise-black-edition.

pplotn avatar pplotn commented on August 30, 2024

Hello Daniel,
I am modeling 51 shots.
As I understand, I use 4*4=16 cores per shot.
Overall, I have 12*32=384 cores.
It means, that I parallelize over 384/16=24 shots.
It means, I need 3 iterations to go through al the 51 shots.

This exception is very rare, I don't get it for other model size and number of shots.

20320209ws_fwi_3_strategy_51_Overthrust_true.err.txt
20320209ws_fwi_3_strategy_51_Overthrust_true.out.txt

from denise-black-edition.

daniel-koehn avatar daniel-koehn commented on August 30, 2024

Hi Pavel,

I have the suspicion, that one problem when using shot parallelization might be, that non-merged model files are removed in
PSV/model_it_out_PSV:

https://github.com/daniel-koehn/DENISE-Black-Edition/blob/master/src/PSV/model_it_out_PSV.c

Try to comment or delete all remove() functions in model_it_out_PSV.c and recompile the source code, before running the code again. If this is indeed the issue, similar problems will occur in gauss_filt.c and gauss_filt_var.c

Best regards,

Daniel

from denise-black-edition.

pplotn avatar pplotn commented on August 30, 2024

Ok, thanks Daniel. I recompiled the code and the problem still occurs on the same velocity model. Though on other models it is not happening.

PE 0 is writing model to
./fwi/ws_fwi_3_strategy_55/Overthrust_true/fld/model/modelTest_rho_stage_1_it_10.bin.0.0
**Message from mergemod (printed by PE 0):
PE 0 starts merge of 16 model files

writing merged model file to ./fwi/ws_fwi_3_strategy_55/Overthrust_true/fld/model/modelTest_rho_stage_1_it_10.bin
Opening model files: ./fwi/ws_fwi_3_strategy_55/Overthrust_true/fld/model/modelTest_rho_stage_1_it_10.bin.??? Message from PE 0
R U N - T I M E E R R O R:
merge: can't read model file !
...now exiting to system.

from denise-black-edition.

pplotn avatar pplotn commented on August 30, 2024

Hello, in my experience setting Nprocx and Nprocy helps to get rid of this error.
It works with parallelization by shots enabled.

from denise-black-edition.

pplotn avatar pplotn commented on August 30, 2024

Increasing stringsize variable in fd.h file helped.

from denise-black-edition.

daniel-koehn avatar daniel-koehn commented on August 30, 2024

That makes sense. If the stringsize of the model name and directory are longer than the pre-defined maximum stringsize in fd.h, the numbering of the domain decomposition might be missing in the file name extension of the model files. Therefore, the mergemod function will fail to merge the model files from the different sub-domains correctly. Thank you for finding this bug, Pavel.

from denise-black-edition.

pplotn avatar pplotn commented on August 30, 2024

Yes, Daniel.
I have a bit complicated paths to my folders. So I increased STRINGSIZE to 150.

from denise-black-edition.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.