qsimulate-open / bagel Goto Github PK
View Code? Open in Web Editor NEWBrilliantly Advanced General Electronic-structure Library
License: GNU General Public License v3.0
Brilliantly Advanced General Electronic-structure Library
License: GNU General Public License v3.0
Transformation back to shell boundary is broken. Guessing that this is to do with copy_backward?
Apparently MVAPICH2 2.0a has no problem in running BAGEL with MPI_THREAD_MULTIPLE on our Linux cluster (unlike openmpi), but the program is stuck in FCI.
This should be a bug in our code that needs to be fixed.
FYI, you could run BAGEL with
mpirun -n 2 -hostfile nodes -env MV2_ENABLE_AFFINITY 0 ./BAGEL ../../test/*dist.json
Have not designed this at all. Needs to be implemented.
One could reduce the number of diagonalization in ROHF. In addition, MO projection seems now properly working for ROHF. More work needed.
It seems that Jop::compute_mo2e was called incorrectly. This is seen with and without MKL.
Entering test case "RAS"
Intel MKL ERROR: Parameter 13 was incorrect on entry to DGEMM .
Intel MKL ERROR: Parameter 8 was incorrect on entry to DGEMM .
Assertion failed: (nocc > 0), function compute_mo2e, file ../../../src/fci/mofile.cc, line 159.
unknown location:0: fatal error in "RAS": signal: SIGABRT (application abort requested)
../../src/meh/test_meh.cc:123: last checkpoint
Leaving test case "RAS"; testing time: 12013826mks
Leaving test suite "TEST_MEH"
Leaving test suite "Suites"*** 1 failure detected in test suite "Suites"
(1) Figure out why MPI_Init_thread does not work properly with OpenMPI
(2) Optimize sleeptime__ (util/constants.h) with realistic calculations
It probably wouldn't change the performance at all even if it worked.
Need some work to make it compatible with btas::Tensor's serialization. I am working on it.
Lots of errors. It compiled and all tests passed before merge (with the following changes that I forgot to check in). Ryan - can you look at it?
diff --git a/src/ciutil/determinants_base.h b/src/ciutil/determinants_base.h
index 92b71d3..4aafc54 100644
--- a/src/ciutil/determinants_base.h
+++ b/src/ciutil/determinants_base.h
@@ -88,7 +88,7 @@ class Determinants_base {
return (*iter)->sign<spin>(bit, pos);
return (*iter)->template sign<spin>(bit, pos);
If you do not have a build, please use the following flags on our cluster.
'CXX=/usr/local/clang/bin/clang++' 'CXXFLAGS=-Wall -Wno-logical-op-parentheses -Wno-sign-compare -Wno-unused-function -Werror -ftemplate-depth=1024 -std=c++11 -O0 -g' '--enable-mkl' '--with-include=-I/usr/local/include -I/opt/intel/composerxe/mkl/include' '--prefix=/usr/local/bagel_clang'
There is an autoconf test for the pyconfig.h header, but I do not see this header being included anywhere in the source. There is python-boost code in src/wfn/wfn_py.cc, but it is not clear to me why this needs to know internals about the system python.
In any case, the AC_CHECK_LIB check for python supports python${PYTHON_VERSION}, so I think the pyconfig.h header check could also be made smart enough to look in /usr/include/python${PYTHON_VERSION} for pyconfig.h as an alternative (if this is really required at all).
Or maybe just checking for boost_python is already enough?
Needs a lot of work! algorithm=bfgs is closest to work, but eventually everything should be fixed.
(1) Some intermediates have two complex vectors, which should be re-written to be two real vectors (efficiency and code reuse)
(2) All the redundant code should be removed. (i) DF; (ii) Hartree-Fock; (iii) FCI.
More to be added.
I am not sure if it is a good idea to have a couple of list<shared_ptr> for each basis set. Should Atom know all the basis set instead? Then naming would be the problem. Hmm...
I know how to autodetect this but I wonder why it is necessary or if there is something more fundamental for which this question is a proxy.
jeffhammond#1 is related.
Thanks,
Jeff
When contracting to gradient integrals, we need to call get_block, which assumes that integrals are divided at the shell boundaries. We either need to back transform to this format, or implement get_block that involves inter-node communication.
See the code that disables averaging: 231f3a4
DFT is not tuned at all, though it works. We need
(1) Basis set screening
(2) More efficient grid design
(3) ...
This has to be fixed ASAP.
Entering test case "DIST_FCI"
../../src/fci/test_fci.cc(121): error in "DIST_FCI": check compare(fci_energy("hf_sto3g_fci_dist"), reference_fci_energy())../../src/fci/test_fci.cc(121): error in "DIST_FCI": check failed
Leaving test case "DIST_FCI"; testing time: 12560ms
Leaving test suite "TEST_FCI"
Entering test suite "TEST_RELFCI"
compare(fci_energy("hf_sto3g_fci_dist"), reference_fci_energy()) failed
Leaving test case "DIST_FCI"; testing time: 71970ms
On my laptop. Please fix asap. It could well be a compile bug, but it has to be circumvented in some way.
In file included from ../../../src/wfn/geometry.cc:28:0:
../../../src/df/complexdf.h: In constructor 'bagel::ComplexDFDist_ints<TBatch>::ComplexDFDist_ints(int, int, const std::vector<std::shared_ptr<const bagel::Atom> >&, const std::vector<std::shared_ptr<const bagel::Atom> >&, double, bool, double, bool) [with TBatch = bagel::ComplexSmallERIBatch]':
../../../src/df/complexdf.h:115:127: internal compiler error: in create_tmp_var, at gimplify.c:479
const double thr, const bool inverse, const double dum, const bool average = false) : ComplexDFDist(nbas, naux) {
^
../../../src/df/complexdf.h:115:127: internal compiler error: Abort trap: 6
g++: internal compiler error: Abort trap: 6 (program cc1plus)
../../libtool: line 1122: 18961 Abort trap: 6 g++ -DHAVE_CONFIG_H -I. -I../../../src/wfn -I../.. -I/usr/local/boost/include -I/opt/intel/composerxe/mkl/include -I/usr/local/slater/include -I/usr/local/include -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/slater/include -I../../.. -I/usr/local/boost/include -I/opt/intel/composerxe/mkl/include -I/usr/local/slater/include -I/usr/local/include -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/slater/include -DDISABLE_SERIALIZATION -Wall -Wno-unused-local-typedefs -Wno-sign-compare -Wno-unused-function -Werror -O0 -g -std=c++11 -MT geometry.lo -MD -MP -MF .deps/geometry.Tpo -c ../../../src/wfn/geometry.cc -fno-common -DPIC -o .libs/geometry.o
Makefile:430: recipe for target 'geometry.lo' failed
it was configured by
'CXXFLAGS=-DDISABLE_SERIALIZATION -Wall -Wno-unused-local-typedefs -Wno-sign-compare -Wno-unused-function -Werror -O0 -g' '--enable-mkl' '--with-include=-I/usr/local/boost/include -I/opt/intel/composerxe/mkl/include -I/usr/local/slater/include -I/usr/local/include -I/opt/local/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/slater/include' '--with-libxc' '--with-slater' 'LDFLAGS=-L/usr/local/boost/lib -L/usr/local/slater/lib'
In configure.ac, https://github.com/shiozaki/BAGEL is configured as homepage. This is either obsolete or a private repo, I guess http://nubakery.org/ (once online) would be more appropriate anyway.
(1) inter-molecular coordinates are not properly handled
(2) cartesian to internal transformation is not updated during the optimization
There are lots of things to do. Note that
Currently, complex density-fitted integral 3-tensors are derived from standard DF objects and have analogous member functions with different names. This works but is not very elegant; in principle the interfaces for real and complex DF objects should be identical. Introducing polymorphic behavior is not straightforward due to differences in arguments and return type for some functions.
Not sure at this point what is the best solution…
Stack size for integral evaluation is hardwired to 80M (?) now. This is especially problematic for relativistic (gradient) calculations where one sometimes needs to increase the size manually.
Has to be sorted otherwise so that higher-order RDM computation is easier in the future. Hopefully I will do it tonight.
I thought that it is better to be consistent with the name we used in the papers.
A relevant paper: Berghold, G.; Mundy, C.; Romero, A.; Hutter, J.; Parrinello, M. Phys. Rev. B 2000, 61, 10040–10048.
In the above paper, they claim their procedure combined with DIIS can be 3 - 10 times faster than doing Jacobi rotations for PM localization, depending on the molecule.
Reported originally by Shane Parker in the private repo.
Related to 6af07d5
I agree that Dvec is poorly designed, but replacing it with Matrix is not good either, because it is actually a 3-index quantity, and its annotation information (such as those for print-out) is lost after the replacement.
For the time being, I revert this commit as it is incomplete (together with 03595b2). One can restore by picking it up.
I think the best solution is to properly redesign it so that Dvec contains a vector of Civec.
After introducing TensorView in BAGEL, the test broke. It gives a right number sometimes, and sometimes wrong ones. I suspect that this is due to the instability of the test associated with the molecular symmetry.
Shane - could you look into it?
So far Lapack interface uses 4 byte integer. Nice if we could control that.
TEST_RELFCI does not pass. For the time being they are commented out in test_main.cc
Entering test case "ZHARRISON"
Assertion failed: ((*this - *(this->transpose_conjg())).norm()/size() < 1e-10), function diagonalize, file ../../../src/math/zmatrix.cc, line 95.
slice_copy should be replaced by slice as much as possible. Probably I should implement a wrapper class so that I could use the same scalapack/lapack things.
The MEH test is broken on zinc. It fails every time I run TestSuite in parallel. In addition, it fails from time to time in serial (one in 10 times?).
TestSuite in serial:
Entering test case "CAS"
../../src/meh/test_meh.cc(108): error in "CAS": check compare(meh_energy("benzene_sto3g_meh_stack"), -459.40037129, 1.0e-6) failed
../../src/meh/test_meh.cc(109): info: check compare(meh_energy("benzene_sto3g_meh_T"), -459.35640265, 1.0e-6) passed
Leaving test case "CAS"; testing time: 47890ms
TestSuite in parallel:
Entering test suite "TEST_MEH"
Entering test case "CAS"
[cli_1]: aborting job:
Fatal error in PMPI_Bcast:
Message truncated, error stack:
MPIDI_CH3U_Receive_data_found(281): Message from rank 0 and tag 2 truncated; 2888 bytes received but buffer size is 2592
or
Entering test case "CAS"
[cli_1]: aborting job:
Fatal error in PMPI_Bcast:
Message truncated, error stack:
MPIDI_CH3U_Receive_data_found(281): Message from rank 0 and tag 2 truncated; 2888 bytes received but buffer size is 128
Typically one sees the following. -99.95674630 is the right energy at the optimized geometry.
1 -99.95627054 0.01818382 2.29
2 -99.95644741 0.01426011 2.48
3 -99.95674630 0.00005240 2.33
4 -99.95674628 0.00000394 2.35
5 -99.95679945 0.00006972 2.35
Could be related to the fix in PairFile, but I do not know why this breaks.
due to the recent improvement in DavidsonDiag. These tests are tentatively commented out (see 2aed69e). I will fix it soon.
They are basically the same (except for few lines). Probably we need to isolate the RDM part as it is substantially different (?)
I am for it, but we need to decide. Most features have been implemented in gcc 4.9, so it comes down to whether or not (or when) we want to require gcc 4.9.
http://gcc.gnu.org/projects/cxx1y.html
The new features include the following, but not many..
std::make_unique
generic lambda's
runtime-sized arrays
shared mutex
Boost's serialization library apparently requires the users to compile it with the same compiler to the one used for the rest of the program. Currently this is not checked at configure time (nor compile time), but it would be nice to be able to detect it.
We want to delete these DFDist before constructing new ones. The code now discards them using stupid stupid mutable things. Needs to be updated at a certain point.
RASCI and RASSCF to be implemented!
We need to sync matrices inside Davidson especially when the data of T is replicated, and eliminate extra sync calls in the drivers.
In order to realize this, we may need to pass a function<void(shared_ptr)> that takes care of denominator scaling. I am not completely sure how to make it consistent when, e.g., a BFGS update to the denominator is on.
With MVAPITCH2 2.0b, test/hf_sto3g_fci_dist.json hangs at spin_decontaminate. I am pretty sure that this is not a bug in MVAPITCH. DistQueue and SpinTask are suspicious (those in the master might not be the latest version, either).
configure flag:
'CXXFLAGS=-DNDEBUG -Wall -Wno-sign-compare -Wno-unused -Werror -O3 -mavx' '--with-mpi=mvapich' '--enable-mkl' '--with-include=-I/opt/intel/composerxe/mkl/include' '--enable-scalapack' '--enable-static' '-disable-shared'
Please fix asap.
I could not make CPCASSCF::form_sigma to work for symmetric part (to be contracted by overlap derivatives). Due to time constraint I gave up and instead implemented a separate function CPCASSCF::form_sigma_sym that mimics Molpro's algorithm, which works just fine.
We need to figure out why form_sigma did not work (I guess some terms rely on anti-symmetry and destroy symmetric part of sigma), and merge these two functions into one. Algorithm-wise I prefer the one used in form_sigma, since it is transparent.
When basis functions are linearly dependent, BAGEL is to be broken (especially so in UHF and relativistic calculations). Needs to re-implement the TildeX class so that it allows to use canonical orthogonalization when linear dependency is detected.
Currently MP2 parallelization is stupidly inefficient. Needs more work.
since some MPI processes have no data in DFDistT.
The same problem actually exists in DFDist, and I think it should be fixed. Note that the code works with no or many closed orbitals.
On a separate note, I am not sure if the (-2) sector is correct or not as it is too sensitive to thresholds in the underlying CASSCF.
F12 DF factory now calls ERIBatch. It should be replaced by SlaterBatch.
https://github.com/nubakery/libslater
Anyone want to do F12?
Compiling that file with g++-4.8 takes up to 3GB of RES memory according to top, grinding smaller sized compile hosts to a halt. All the other files in src/integral take at most 1 GB I believe.
I realize integral routines can be involved, but maybe there are some ways to reduce the memory required to compile the file, or even split it up into several files.
Alternatively, maybe this file is only required for some special purposes and could be made optional?
Once the paper has been submitted, we should clean up the ZCASSCF. The following comments are on zcasbfgs.cc, but it applies to other files.
(1) There are too many "optimize_electrons == true " type branches. We should make two functions that could be called from the main branch.
(2) Control logic is not transparent (line 241-246). There should be a readable way of doing the same.
(3) Related, there are too many variables that are defined outside the main loop. Minimize the number of these things.
(4) line 208-227 may not be the best way of writing these operations (any way to define functions)
... Some other comments may follow.
Does MoldenIO properly run in parallel? I think that only the rank0 process should write to disk.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.