Git Product home page Git Product logo

Comments (11)

elcorto avatar elcorto commented on June 7, 2024 1

Awesome! I just updated the README and the cp2k wiki in order to point people there :) Thanks for your work!

from pwtools.

elcorto avatar elcorto commented on June 7, 2024

Hi Patrick

Sure, I'd be glad to see this functionality be implemented in ASE and/or PLAMS :)

Since both projects use the LGPL and pwtools uses the 3-Clause BSD license, mentioning pwtools in accordance to the BSD license should suffice.

Regarding the DCD file format: Much of the credit actually goes to the VMD folks since I used their DCD plugin as a reference [1,2]. I also talked to the CP2K devs when developing the parser, which was a big help as well [3]. Details of the format, VMD's implementation and how CP2K writes DCD is in src/dcd.f90. These are the Fortran sources for the first parser I wrote (the Python wrapper is pwtools.dcd.read_dcd_data_f()), but make sure to use the faster Python-only version in pwtools.dcd.read_dcd_data().

Also note that you need to use DCD_ALIGNED_CELL [4] when dealing with non-orthorhombic cells (e.g. MD in ensembles that allow cell deformations such as NPT). DCD outputs only Cartesian atom coords and cell parameters (a,b,c,alpha,beta,gamma) and not cell vectors, which can lead to a mismatch between atom coords and the cell if they are not aligned correctly. This thread [5] has a lot more details.

best,
Steve

[1] http://www.ks.uiuc.edu/Research/vmd/plugins/molfile/dcdplugin.html
[2] http://www.ks.uiuc.edu/Research/namd/wiki/?ReadingDCDinFortran
[3] https://groups.google.com/forum/#!searchin/cp2k/reading$20dcd$20files%7Csort:date/cp2k/L6W4Dl4xxP4/G2AEcp1ukfcJ
[4] https://manual.cp2k.org/cp2k-6_1-branch/CP2K_INPUT/MOTION/PRINT/TRAJECTORY.html#FORMAT
[5] https://groups.google.com/forum/#!searchin/cp2k/Outputting$20cell$20information$20and$20fractional$20coordinates%7Csort:relevance/cp2k/72MhYykrSrQ/5c9Jaw7S9yQJ

from pwtools.

patrickmelix avatar patrickmelix commented on June 7, 2024

Hey Steve,

awesome! I'll then put this on my ToDo list and mention here when its done. Don't expect it right away, but it was annoying me a lot lately so there's hope.

My plan is to only support the DCD_ALIGNED_CELL format of CP2K, exactly because of the discussion you mentioned in [5]. Might put in a check for the CP2K string in the header to prevent users from thinking it can read other dcd formats too. Not sure if there's a nice way of checking if it really is aligned or not, will check the CP2K source for that then.

Thanks for all the support and your great tool!
Cheers,
Patrick

from pwtools.

elcorto avatar elcorto commented on June 7, 2024

This is somewhat off-topic, but IIRC it is not possible to check, just by using the data, whether the file was generated after first rotating ("aligning") the coords. That is precisely the biggest flaw of this and all other formats that don't store cell vectors.

In summary, problems with DCD are:

  • It stores only Cartesian coords and (a,b,c,alpha,beta,gamma) instead of cell vectors, which makes it impossible to reconstruct fractional coords without assuming alignment. Given that cell parameters are 6 floats and cell vectors are 9 floats, there is no plausible reason not to store them. This would also make alignment unnecessary altogether.
  • It stores cell parameters in this way: a, cos(gamma), b, cos(beta), cos(alpha), c. I'd really love hear an explanation for this choice :)
  • There is no specification of the format (and no, a parser implementation is no spec). If there was, one could extend the spec to let the header have an "aligned" field, which parsers could use. But this is unnecessary, when cell vectors are stored, as stated above.

Personally, I'd like like to see all MD codes abandon crude, undocumented formats such as DCD and instead agree on modern efficient binary containers such as H5MD [1]. I haven't used it myself, but it looks like a step in the right direction. It uses HDF5 as backend (a very good idea) and seems to be flexible enough to support all data needed to represent an MD trajectory.

[1] http://nongnu.org/h5md/

from pwtools.

elcorto avatar elcorto commented on June 7, 2024

I took the liberty to close this one as I think the question is answered. In any case, I'd like to hear about your progress with this, so feel free to drop news here or ping me at: [email protected] - also in the LICENSE file now :)

from pwtools.

patrickmelix avatar patrickmelix commented on June 7, 2024

https://gitlab.com/ase/ase/merge_requests/1109

Just opened a MR.

from pwtools.

elcorto avatar elcorto commented on June 7, 2024

Nice one!

You mentioned that you don't have tests, which is not ideal. You should be able to generate a short MD trajectory with CP2K and use that for testing. I did exactly that: run 2x 15 steps of NPT MD, write once text (xyz format) and once DCD files and use them to compare. See this part of my CP2K parser tests, and the test data used. This test shows all the subtle gotchas which you encounter when dealing with DCD.

See also the other DCD test where I check the header against different parser implementations.

from pwtools.

patrickmelix avatar patrickmelix commented on June 7, 2024

Mmh, good idea... But ASE doesn't allow non-code files to be added to the repo. So I would have to run CP2K through ASE first. I'll have a look into that.

from pwtools.

elcorto avatar elcorto commented on June 7, 2024

Well, since the design of ASE's MD and optimization is to call only the calculator's SCF engine in each step and store the trajectory in ASE's trajectory format, you won't be able to write a native DCD containing a trajectory from within CP2K using ASE. But even if, since you are not allowed to add the generated data, this won't help.

You could, in theory, start a short native CP2K MD at test time and then parse, but I would not go down that route. It would prolong test time and it necessitates a working CP2K install for anybody running the test. Such a feature would be, in fact, a special-purpose calculator that allows you to run a full native MD using the calculator, which works against ASE's design. As such, it is much more work to maintain than a simple data file.

ASE has quite a number of parsers which read MD-like output (e.g. QE, LAMMPS, CASTEP, ...), even though it is not able to generate it -- so your contribution fits in there quite nicely.

I briefly checked ASE's test suite and they seem to test mainly the calculators (doing SCF, MD, optimization and so forth), which makes total sense since this is what ASE does best. I haven't found tests using canned trajectory data generated by external code (but I may have overlooked something, of course), so it looks like less test coverage in those parts of the code. Therefore, you'll probably get away with it :)

from pwtools.

patrickmelix avatar patrickmelix commented on June 7, 2024

Thats what I figured out so far too. Too bad, but lets see what the maintainers think about it.
Thx for your help!

from pwtools.

patrickmelix avatar patrickmelix commented on June 7, 2024

Hey,
just to let you know: The MR is approved and the reader is now part of ASE.

Thanks for letting me reuse your stuff!
Cheers,
Patrick

from pwtools.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.