libatoms / pymatnest Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 18.0 650.16 MB

Nested Sampling code

License: GNU General Public License v3.0

TeX 0.56% Makefile 0.08% C 1.43% C++ 1.18% Shell 0.34% Python 35.50% Fortran 8.14% Jupyter Notebook 52.79%

pymatnest's People

Contributors

Stargazers

Watchers

Forkers

pastewka alvarovm marshallmcdonnell scut-ccmp haydensoliver lc453 coding-ronny casv2 gamarchant janklinux omaradesida wexlergroup mperezjigato sebastianhavens

pymatnest's Issues

Nested sampling failed for converge_down_to_T

ERROR.tar.gz

I am running NS for a GAP model with 'converge_down_to_T' . But the runs failed with an MPI abort.

For sanity check i took the LJ input file available in the path below and redid the NS.

https://github.com/libAtoms/pymatnest/blob/master/example_inputs/inputs.test.periodic.MD.lammps.converged

The NS input for LJ with converge_down_to_T failed with the same error. All the inputs and outputs are attached for your reference.

I was previously using n_iter_times_fraction_killed which was working fine and ran for many iterations.

'cell' variable isn't being set since recent updates

In Fortran models the 'cell' variable isn't being passed correctly to ll_eval_energy(...). In my own tests it's either all zeros, or all zeros except for the last value. My model works in previous versions of the pymatnest code (e.g. from around May-June), and I normally set cell using max_volume_per_atom keyword.

Has there been a change in functionality, or is this a bug?

lammpslib.py - failing in test cases

The test case in https://svn.fysik.dtu.dk/projects/ase-extra/trunk/ase/test/testlammpslib.py
fails with lammps & ase trunk versions
the test.log has:
----------------------------------- test.log start -----------------------------------------------------------
LAMMPS (26 Jan 2017)
units metal
atom_style atomic
atom_modify map array sort 0 0
boundary p p s
region cell prism 0 5.72756492761 0 4.96021672914 0 0.0 0.0 0.0 0.0 units box
ERROR: Illegal region prism command (../region_prism.cpp:88)
Last command: region cell prism 0 5.72756492761 0 4.96021672914 0 0.0 0.0 0.0 0.0 units box
Total wall time: 0:00:00
----------------------------------- test.log end -----------------------------------------------------------

Similarly this case fails:

---------------------------------------- lammpslib-example.py Start---------------------------------------------------
from ase import Atom, Atoms
from lammpslib import LAMMPSlib
cmds = ["pair_style eam/alloy",
"pair_coeff * * NiAlH_jea.eam.alloy Al H"]
a = 4.05
al = Atoms([Atom('Al')], cell=(a, a, a), pbc=True)
h = Atom([Atom('H')])
alh = al + h
lammps = LAMMPSlib(lmpcmds = cmds, logfile='test.log')
alh.set_calculator(lammps)
print "Energy ", alh.get_potential_energy()
---------------------------------lammpslib-example.py End -----------------------------------------------------------

I get in the stdout this:
Traceback (most recent call last):
File "pymatnest-example.py", line 10, in
alh = al + h
File "/home/vama/install/local/anaconda2/lib/python2.7/site-packages/ase/atoms.py", line 866, in add
atoms += other
File "/home/vama/install/local/anaconda2/lib/python2.7/site-packages/ase/atoms.py", line 872, in extend
other = self.class([other])
File "/home/vama/install/local/anaconda2/lib/python2.7/site-packages/ase/atoms.py", line 150, in init
atoms = self.class(None, *data)
File "/home/vama/install/local/anaconda2/lib/python2.7/site-packages/ase/atoms.py", line 195, in init
self.new_array('numbers', numbers, int)
File "/home/vama/install/local/anaconda2/lib/python2.7/site-packages/ase/atoms.py", line 391, in new_array
a = np.array(a, dtype)
TypeError: long() argument must be a string or a number, not 'Atom'

Thanks,

ns_process_traj: quippy dependencies and python 2 code

ns_process_traj has some old quippy dependencies, python 2 print statements and it seems also some indentation issues currently. We need this I think in order to look at XRD or perhaps short order parameters.

Possible bug in example_LJ_model.F90

I think there's a bug in the pymatnest fortran example model: example_LJ_model.f90.

In ll_eval_energy there is a term:
if (i==j) E_term = E_term * 0.5
It looks like this is to prevent double-counting, however I think the double-counting actually happens then the indices i and j are different (not the same, as would be implied by the if(i==j) line above).

For example if I've got two atoms then the i and j loops give me:
i=0, j=0
i=0, j=1
i=1, j=0
i=1, j=1

As far as I can see the only potential for a double-count of the energy in the i == j cases is when the 0th images are considered (i.e. dj1, dj2, dj3 all equal 0) and that is correctly prevented by the line:
if (i == j .and. dj1 == 0 .and. dj2 == 0 .and. dj3 == 0) cycle
The double-counting actually happens for the two cases (i=0, j=1) and (i=1, j=0) when the 0th image of each is being computed.

I.e. in the two-atom case given above there is a double-count between:
i=0, j=1 and dj1 = dj2 = dj3 = 0
and
i=1, j=0 and dj1 = dj2 = dj3 = 0

Apologies if there's a mistake in my logic here.

no way to disable sending communicator to lammps initializer

lammps seems to have recently made it so that the serial version does not have symbols that conflict with the actual mpi library used by mpi4py. This means that serial lammps can be used, but then you have to send None for the communicator instead of COMM_SELF as is happening now. That's an easy patch, but should the default be LAMMPS_serial=T, or would that break too many existing setups that use mpi lammps + COMM_SELF?

Poor parallel scaling efficiency due to MPI_gather_all

Per e-mail conversation with Gabor I'm posting about this issue here. Basically the parallel scaling of pymatnest is relatively poor, and performance drops off at a relatively low number of CPU cores.

Taking Archer as an example (Archer is a Cray machine very similar to Titan in the US) with 1152 walkers I see a drop-off in parallel scaling after just 12 cores (1/2 a node), while with 11520 walkers I can only scale up to 48 cores. From looking at the code it seems very likely the problem lies with over-use of the MPI_gather_all routine, as this causes a lot of congestion between nodes (on Archer each node is 24 cores). Gabor informed me that he had trouble going beyond 96 cores (4 nodes).

I've posted my (brief) results from my tests on Archer here, with some discussion of the cause (see the pure MPI_gather_all test towards the end):
https://gist.github.com/erlendd/c236f393ed597187c612599cb472cd4b

errror in LAMMPS example

Hi, I cannot run the lammps test examples, it seems like a variable is missing.
I get the follow error:

$>mpirun -n 2 ../ns_run < inputs.test.cluster.MC.lammps
WARNING: no quippy module loaded
WARNING: no quippy module loaded
comm <mpi4py.MPI.Intracomm object at 0x7f66f1d06900> size 2 rank 0
comm <mpi4py.MPI.Intracomm object at 0x7f00f2fcf900> size 2 rank 1
Traceback (most recent call last):
File "../ns_run", line 6, in
Traceback (most recent call last):
File "../ns_run", line 6, in
ns_run.main()
File "/home/vama/soft/pymatnest/ns_run.py", line 2606, in main
ns_run.main()
File "/home/vama/soft/pymatnest/ns_run.py", line 2606, in main
exit_error("need either n_iter_times_fraction_killed or converge_down_to_T")
TypeError: exit_error() takes exactly 2 arguments (1 given)
exit_error("need either n_iter_times_fraction_killed or converge_down_to_T")
TypeError: exit_error() takes exactly 2 arguments (1 given)

License?

Is this project licensed under an opensource license?

MPI restart bug?

Restarting Nested Sampling runs by first concatenating the walkers

ACE_NS.snapshot.8607.*.extxyz > ACE_NS.snapshot.8607.all.extxyz

and specying restart input file:

restart_file=ACE_NS.snapshot.8607.all.extxyz

has always worked for me, and still seems to run fine running in serial. However, using the most recent pymatnest using MPI I get something weird like this:

6 truncating traj file to start_first_iter 8608
 Uncaught Exception Type: <class 'TypeError'>
 Value: '<' not supported between instances of 'NoneType' and 'int'
 Traceback: <traceback object at 0x14775b2d5680>
 Aborting
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 6

I think it happens during the distributions of walkers over the MPI instances right after reading them in from the restart file.

This only happens at restarts weirdly enough... And when restarting in serial all looks fine

I'm not sure how this can be related to the latest ASE calculator functionality, judging by the code changes it actually looks unlikely it has got anything to do with it, but I do feel it's related somehow...

Starting a NS run from scratch using MPI works fine, it's just the MPI restarts raising this error

Incorrect Contents of `misc_calc_lib`

@MartinSchlegel Following the instructions for the XRD, it appears that misc_calc_lib was inadvertently overwritten with the contents of make_thermal_average_xrd_rdfd_lenhisto.

Traceback (most recent call last):
  File "../pymatnest/make_thermal_average_xrd_rdfd_lenhisto.py", line 1, in <module>
    import misc_calc_lib
  File "/home/ubuntu/pymatnest/misc_calc_lib.py", line 163, in <module>
    rdfd_results = misc_calc_lib.rdfd_QUIP(QUIP_path,at,n_a,r_range)
AttributeError: 'module' object has no attribute 'rdfd_QUIP'

This should be a trivial fix. Thanks!

libatoms / pymatnest Goto Github PK

pymatnest's People

Contributors

Stargazers

Watchers

Forkers

pymatnest's Issues

Nested sampling failed for converge_down_to_T

'cell' variable isn't being set since recent updates

lammpslib.py - failing in test cases

ns_process_traj: quippy dependencies and python 2 code

Possible bug in example_LJ_model.F90

no way to disable sending communicator to lammps initializer

Poor parallel scaling efficiency due to MPI_gather_all

errror in LAMMPS example

License?

MPI restart bug?

Incorrect Contents of `misc_calc_lib`

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent