Git Product home page Git Product logo

Comments (7)

jmansour avatar jmansour commented on July 30, 2024

from underworld2.

Yidali26 avatar Yidali26 commented on July 30, 2024

This is a followup reply from the admins:

Did you install your own PETSc? If so did you use the flag:

--download-mpich --download-fblaslapack

If that is the case I believe this is the cause of the issue.

from underworld2.

Yidali26 avatar Yidali26 commented on July 30, 2024

Hi John,
Thanks for your reply!
On Frontera I get

$ uname -a
Linux login4.frontera.tacc.utexas.edu 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Should I just use the image for Stampede2(underworld2:2.7.1b_stampede2_psm2)? I'm currently using underworld2_latest.sif
Thanks
Yida

from underworld2.

jmansour avatar jmansour commented on July 30, 2024

from underworld2.

Yidali26 avatar Yidali26 commented on July 30, 2024

It’d be worth giving that one a go at least to see if it fires up the Infiniband correctly. Yep 3.10 is around 10 years old.

On Sat, 25 Feb 2023 at 7:54 am, Yidali26 @.> wrote: Hi John, Thanks for your reply! On Frontera I get $ uname -a Linux login4.frontera.tacc.utexas.edu 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Should I just use the image for Stampede2(underworld2:2.7.1b_stampede2_psm2)? I'm currently using underworld2_latest.sif Thanks Yida — Reply to this email directly, view it on GitHub <#654 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK7NHIQPYSB6T5WFBC2SD3WZENWRANCNFSM6AAAAAAVHJ2NII . You are receiving this because you commented.Message ID: @.>

Hi John,
I tried the stampede image on Frontera, but turns out a bug appear when I use more than 2 mpi tasks:

$ ibrun -n 4 singularity exec /work/06262/yidali/singularity_cache/underworld2-2.7.1b_stampede2.simg
python Puysegur3Dpy2.py 0 1
TACC:  Starting up job 5259461
TACC:  Starting parallel tasks...
ERROR: ld.so: object '/opt/apps/xalt/xalt/lib64/libxalt_init.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/opt/apps/xalt/xalt/lib64/libxalt_init.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/opt/apps/xalt/xalt/lib64/libxalt_init.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(490).....:
MPID_Init(395)............: channel initialization failed
MPIDI_CH3_Init(104).......:
MPID_nem_init(272)........:
MPIDI_CH3I_Seg_commit(369): PMI_KVS_Get returned -1
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(490).....:
MPID_Init(395)............: channel initialization failed
MPIDI_CH3_Init(104).......:
MPID_nem_init(272)........:
MPIDI_CH3I_Seg_commit(369): PMI_KVS_Get returned -1

Thanks
Yida

from underworld2.

Yidali26 avatar Yidali26 commented on July 30, 2024

It’d be worth giving that one a go at least to see if it fires up the Infiniband correctly. Yep 3.10 is around 10 years old.

On Sat, 25 Feb 2023 at 7:54 am, Yidali26 @.> wrote: Hi John, Thanks for your reply! On Frontera I get $ uname -a Linux login4.frontera.tacc.utexas.edu 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Should I just use the image for Stampede2(underworld2:2.7.1b_stampede2_psm2)? I'm currently using underworld2_latest.sif Thanks Yida — Reply to this email directly, view it on GitHub <#654 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK7NHIQPYSB6T5WFBC2SD3WZENWRANCNFSM6AAAAAAVHJ2NII . You are receiving this because you commented.Message ID: @.>

Hi John,
I succeed with a native installation on Frontera by tacc support earlier. Hopefully the native installation wouldn't have such a problem with the network. Really appreciate your help!
Yida

from underworld2.

jmansour avatar jmansour commented on July 30, 2024

That's great Yida!

I'm not too sure what went wrong with the Stampede image.. possibly the version of Mpich the image was built against is too old and not ABI compatible with the local versions on Frontera.

In any case, native operation is the best option here, as container related MPI issues can be somewhat opaque and difficult to debug, and the native build shouldn't have any issues lighting up the Infiniband interconnects. Indeed you should see a marked performance improvement in any jobs that traverse nodes.

from underworld2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.