Git Product home page Git Product logo

Comments (15)

gbalduzz avatar gbalduzz commented on September 27, 2024

I just compiled and run the tests with no cuda and no issue. Did you set the CUDA flag to off when you invoked cmake, or afterwards? I am not sure if the latter works. cmake -C <...> -DDCA_WITH_CUDA=OFF should work.

from dca.

weilewei avatar weilewei commented on September 27, 2024

I doubled check and rerun cmake -C <...> -DDCA_WITH_CUDA=OFF (in clean build) but still get the same error...

from dca.

weilewei avatar weilewei commented on September 27, 2024

btw, the cluster I am running is heterogeneous, for some nodes, there are some nvidia gpus, but the nodes I am running, there are no GPUs.

from dca.

weilewei avatar weilewei commented on September 27, 2024

In the error log, for (dca::linalg::DeviceType)0, does it mean that it runs on CPU?

from dca.

weilewei avatar weilewei commented on September 27, 2024

If I run 1 walker and 1 accumulator thread (shared), DCA++ completes the run but generates the following error:

OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xffffa6dd2000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xffff2ff41000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xffff58423000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffef29ce000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xffff6c2ca000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffe8d323000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xffff80478000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xffff1aca4000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xffff43741000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffede5a4000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffea0d13000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffec96cd000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffe272ee000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffdfe8b5000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffca4f6e000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xffff069b8000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffe504e7000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffc2bb3b000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffd1f245000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffde9f03000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffe131c9000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffcce327000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffe790c3000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffe63cf8000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffd5becb000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffe3b4eb000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffdd5c2c000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffd84805000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffc54606000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffbeebe5000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffd4765a000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffdc1639000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffdad3d3000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffd70377000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffd33cff000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffcf6933000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffc90d6e000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffcb968a000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffc7cb56000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffd0b057000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffce256d000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffc3fa4b000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffbecb15000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffd98ad9000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffc0254c000
OpenBLAS : munmap failed:: Invalid argument
error code=22,  release->address=0xfffc16edc000

from dca.

weilewei avatar weilewei commented on September 27, 2024

Is OpenBlas a necessary component to build DCA?

from dca.

efdazedo avatar efdazedo commented on September 27, 2024

I think openblas is providing an efficient implementation of blas library and some subset of lapack.
It should be possible to use say the FORTRAN version of lapack and FORTRAN blas but with some significant impact on performance. For the purpose of debugging it can be acceptable to use the FORTRAN version of lapack and blas. The debugger should be able to access the FORTRAN source code.

from dca.

efdazedo avatar efdazedo commented on September 27, 2024

Are you using a prebuilt version of openblas or you built your own version? Can you turn off the auto parallelized version? Perhaps set an environment variable OPENBLAS_NUM_THREADS=1

from dca.

weilewei avatar weilewei commented on September 27, 2024

@efdazedo setting OPENBLAS_NUM_THREADS=1 only reduces number of occurrences of error code=22, release->address=0xfffc2bb3b000 when running 1 walker and accumulator (shared). But when setting multiple walkers, I still got the Cuda code error in this ticket.

from dca.

gbalduzz avatar gbalduzz commented on September 27, 2024

(dca::linalg::DeviceType)0 means CPU. It looks like a problem with the BLAS library to me. Have you tried with any other BLAS implementation? Doe not need be OpenBLAS.

from dca.

efdazedo avatar efdazedo commented on September 27, 2024

There are other optimized blas library such as BLIS or ATLAS. Are there vendor provided vectorized or optimized libraries?

from dca.

weilewei avatar weilewei commented on September 27, 2024

After re-installing OpenBlas using make instead of cmake, the error OpenBLAS : munmap failed:: Invalid argument is gone. The DCA runs fine when I use 1 walker.accumulator, but fails the same reason in this ticket if I use 7 walker/acc.

main_dca: /ccsopen/home/weile/dev/src/dca_a64fx/DCA/include/dca/linalg/util/stream_container.hpp:50: dca::linalg::util::CudaStream& dca::linalg::util::StreamContainer::operator()(int, int): Assertion `thread_id >= 0 && thread_id < get_max_threads()' failed.
Received signal (6) received.
main_dca: /ccsopen/home/weile/dev/src/dca_a64fx/DCA/src/function/domains/domain.cpp:40: void dca::func::domain::linind_2_subind(std::size_t, int*) const: Assertion `linind < size' failed.
Received signal (6) received.

from dca.

gbalduzz avatar gbalduzz commented on September 27, 2024

That looks like an issue with the code (i did not realize the fix was not merged yet from CT-INT). Can you check if https://github.com/gbalduzz/DCA/tree/fix_287 fixes it. Also it would be helpful if you attach the input file in the issue.

from dca.

weilewei avatar weilewei commented on September 27, 2024

@gbalduzz your fix works! Thanks very much. I have tested DCA++ that now it can successfully run on the A64fx machine (distributed run as well).

from dca.

weilewei avatar weilewei commented on September 27, 2024

Here is the input file, just FYI

{
    "output": {
     "directory" : ".",
     "output-format": "HDF5",
     "filename-dca": "dca.hdf5",
     "filename-profiling": "profiling.json",
     "directory-config-read" : "",
     "dump-lattice-self-energy": false,
     "dump-cluster-Greens-functions": true,
     "dump-Gamma-lattice": false,
     "dump-chi-0-lattice": false
    },

    "physics": {
        "beta": 50,
        "density": 1,
        "chemical-potential": 0.,
        "adjust-chemical-potential": false
    },

    "single-band-Hubbard-model": {
        "t": 1.,
        "U": 4
    },

    "DCA": {
        "initial-self-energy" : "zero",
        "iterations": 1,
        "accuracy": 0.,
        "self-energy-mixing-factor": 1.,
        "interacting-orbitals": [0],
        "do-finite-size-QMC": false,

        "coarse-graining": {
            "k-mesh-recursion": 3,
            "periods": 2,
            "quadrature-rule": 1,
            "threads": 7,
            "tail-frequencies": 0
        },

        "DCA+": {
            "do-DCA+": false,
            "deconvolution-iterations": 16,
            "deconvolution-tolerance": 1.e-2
        }
    },

    "domains": {
        "real-space-grids": {
            "cluster": [[2, 0],
                        [0, 2]]
        },

        "imaginary-time": {
            "sp-time-intervals": 1024
        },

        "imaginary-frequency": {
            "sp-fermionic-frequencies": 1024,
            "four-point-fermionic-frequencies": 16
        }
    },

    "four-point": {
        "type": "PARTICLE_PARTICLE_UP_DOWN",
        "compute-all-transfers" : true,
        "frequency-transfer" : 0
    },

    "Monte-Carlo-integration": {
        "seed": 42,
        "warm-up-sweeps": 100,
        "sweeps-per-measurement": 1,
        "measurements": 100,
	"error-computation-type" : "NONE",
        "threaded-solver": {
            "walkers": 7,
            "accumulators": 1,
            "shared-walk-and-accumulation-thread": false
        }
    },

    "CT-AUX": {
        "expansion-parameter-K": 1.,
        "max-submatrix-size": 128,
        "neglect-Bennett-updates": false,
        "additional-time-measurements": false
    }
}

from dca.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.