Comments (15)
I just compiled and run the tests with no cuda and no issue. Did you set the CUDA flag to off when you invoked cmake, or afterwards? I am not sure if the latter works. cmake -C <...> -DDCA_WITH_CUDA=OFF
from dca.
I doubled check and rerun cmake -C <...> -DDCA_WITH_CUDA=OFF (in clean build) but still get the same error...
from dca.
btw, the cluster I am running is heterogeneous, for some nodes, there are some nvidia gpus, but the nodes I am running, there are no GPUs.
from dca.
In the error log, for (dca::linalg::DeviceType)0
, does it mean that it runs on CPU?
from dca.
If I run 1 walker and 1 accumulator thread (shared), DCA++ completes the run but generates the following error:
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xffffa6dd2000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xffff2ff41000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xffff58423000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffef29ce000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xffff6c2ca000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffe8d323000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xffff80478000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xffff1aca4000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xffff43741000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffede5a4000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffea0d13000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffec96cd000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffe272ee000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffdfe8b5000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffca4f6e000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xffff069b8000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffe504e7000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffc2bb3b000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffd1f245000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffde9f03000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffe131c9000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffcce327000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffe790c3000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffe63cf8000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffd5becb000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffe3b4eb000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffdd5c2c000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffd84805000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffc54606000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffbeebe5000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffd4765a000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffdc1639000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffdad3d3000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffd70377000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffd33cff000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffcf6933000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffc90d6e000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffcb968a000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffc7cb56000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffd0b057000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffce256d000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffc3fa4b000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffbecb15000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffd98ad9000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffc0254c000
OpenBLAS : munmap failed:: Invalid argument
error code=22, release->address=0xfffc16edc000
from dca.
Is OpenBlas a necessary component to build DCA?
from dca.
I think openblas is providing an efficient implementation of blas library and some subset of lapack.
It should be possible to use say the FORTRAN version of lapack and FORTRAN blas but with some significant impact on performance. For the purpose of debugging it can be acceptable to use the FORTRAN version of lapack and blas. The debugger should be able to access the FORTRAN source code.
from dca.
Are you using a prebuilt version of openblas or you built your own version? Can you turn off the auto parallelized version? Perhaps set an environment variable OPENBLAS_NUM_THREADS=1
from dca.
@efdazedo setting OPENBLAS_NUM_THREADS=1
only reduces number of occurrences of error code=22, release->address=0xfffc2bb3b000
when running 1 walker and accumulator (shared). But when setting multiple walkers, I still got the Cuda code error in this ticket.
from dca.
(dca::linalg::DeviceType)0 means CPU. It looks like a problem with the BLAS library to me. Have you tried with any other BLAS implementation? Doe not need be OpenBLAS.
from dca.
There are other optimized blas library such as BLIS or ATLAS. Are there vendor provided vectorized or optimized libraries?
from dca.
After re-installing OpenBlas using make
instead of cmake
, the error OpenBLAS : munmap failed:: Invalid argument
is gone. The DCA runs fine when I use 1 walker.accumulator, but fails the same reason in this ticket if I use 7 walker/acc.
main_dca: /ccsopen/home/weile/dev/src/dca_a64fx/DCA/include/dca/linalg/util/stream_container.hpp:50: dca::linalg::util::CudaStream& dca::linalg::util::StreamContainer::operator()(int, int): Assertion `thread_id >= 0 && thread_id < get_max_threads()' failed.
Received signal (6) received.
main_dca: /ccsopen/home/weile/dev/src/dca_a64fx/DCA/src/function/domains/domain.cpp:40: void dca::func::domain::linind_2_subind(std::size_t, int*) const: Assertion `linind < size' failed.
Received signal (6) received.
from dca.
That looks like an issue with the code (i did not realize the fix was not merged yet from CT-INT). Can you check if https://github.com/gbalduzz/DCA/tree/fix_287 fixes it. Also it would be helpful if you attach the input file in the issue.
from dca.
@gbalduzz your fix works! Thanks very much. I have tested DCA++ that now it can successfully run on the A64fx machine (distributed run as well).
from dca.
Here is the input file, just FYI
{
"output": {
"directory" : ".",
"output-format": "HDF5",
"filename-dca": "dca.hdf5",
"filename-profiling": "profiling.json",
"directory-config-read" : "",
"dump-lattice-self-energy": false,
"dump-cluster-Greens-functions": true,
"dump-Gamma-lattice": false,
"dump-chi-0-lattice": false
},
"physics": {
"beta": 50,
"density": 1,
"chemical-potential": 0.,
"adjust-chemical-potential": false
},
"single-band-Hubbard-model": {
"t": 1.,
"U": 4
},
"DCA": {
"initial-self-energy" : "zero",
"iterations": 1,
"accuracy": 0.,
"self-energy-mixing-factor": 1.,
"interacting-orbitals": [0],
"do-finite-size-QMC": false,
"coarse-graining": {
"k-mesh-recursion": 3,
"periods": 2,
"quadrature-rule": 1,
"threads": 7,
"tail-frequencies": 0
},
"DCA+": {
"do-DCA+": false,
"deconvolution-iterations": 16,
"deconvolution-tolerance": 1.e-2
}
},
"domains": {
"real-space-grids": {
"cluster": [[2, 0],
[0, 2]]
},
"imaginary-time": {
"sp-time-intervals": 1024
},
"imaginary-frequency": {
"sp-fermionic-frequencies": 1024,
"four-point-fermionic-frequencies": 16
}
},
"four-point": {
"type": "PARTICLE_PARTICLE_UP_DOWN",
"compute-all-transfers" : true,
"frequency-transfer" : 0
},
"Monte-Carlo-integration": {
"seed": 42,
"warm-up-sweeps": 100,
"sweeps-per-measurement": 1,
"measurements": 100,
"error-computation-type" : "NONE",
"threaded-solver": {
"walkers": 7,
"accumulators": 1,
"shared-walk-and-accumulation-thread": false
}
},
"CT-AUX": {
"expansion-parameter-K": 1.,
"max-submatrix-size": 128,
"neglect-Bennett-updates": false,
"additional-time-measurements": false
}
}
from dca.
Related Issues (20)
- Meaningful error if no model given... HOT 1
- internal compiler error HOT 1
- Compiling errors when DCA_WITH_SINGLE_PRECISION_MC=ON HOT 1
- Additional improvements to Json IO
- "Sign problem" in CT-INT solver HOT 1
- beta 1.0 cause distributedG4 deadlock
- Bug in smallInverse function
- Potential problem in getGMultiband() in tp accumulation for multi-orbital problems
- tp_accumulator_test_baseline.hdf5 and others are not portable
- mpi_collective_sum_test broken on Spock HOT 1
- Cluster elements in the type of object
- compute-all-momentum/frequency-transfer for G4
- error in compiling triangular model
- Breakage of autoresume and initial self-energy from hdf5 on master HOT 1
- Cluster visualisation and calculating double occupancy HOT 1
- Input Error Messages
- errors appear when turning on the neighbor interaction V
- error with building
- argument of type "cuDoubleComplex *" is incompatible with parameter of type "double *"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dca.