Git Product home page Git Product logo

glosim's People

Contributors

albapa avatar ceriottm avatar felixmusil avatar giulioi avatar sandipde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

glosim's Issues

SOAP with multiple species

Hi,

I have three questions regarding SOAP with multiple species:

  1. Does the --kernel fastavg take different species into account? Looking at the definition it does not seem to be the case:

screenshot from 2018-09-06 14-57-52

  1. I would like to take the different species into account with the kronecker delta function, i.e.:
    screenshot from 2018-09-06 15-05-27
    Is --kernel fastspecies the way to go? It seems it requires environmental information (a lot of RAM) so it is not comparable with fastavg in that sense. Are the --kernel average and --kernel rematch also taking into account different species with a delta function, and if so, how is it different from --kernel fastspecies?

  2. It seems like the --separate_species flag is about the specie of the centers, not actually about taking the species in the environment into account. When it is recommended to use this?

Performance

Hello,

I am trying to compute farthest point sampling for my data set.
It contains ~10000 frames and I want to get 1000 points.
Currently, I am running in a cluster with 10 nodes (400 CPU) and in 11 hours it computes only the first
~ 130 points.
I am using the following flags: --kernel rematch -n 12 -l 8 --gamma 0.1 -c 4.5 --peratom --distance
and my system contains 38 atoms.
Is this the expected performance? Is there any trick I can do to accelerate the calculation beyond reducing the number of considered atoms and/or the data set size?

Regards,
Yair

Cannot execute the sample code in the ESI of Chem. Sci., 2018, 9, 1289.

Hi,
Thanks for providing the algorithms.
I just installed the quippy and tried the glosim.
I tried the code in electronic supplementary materials for the paper Chem. Sci., 2018, 9, 1289.

glosim.py traj-pentacene.xyz -n 9 -l 9 -g 0.3 -c 3 --kernel rematch --gamma 2 --periodic --nonorm

The glosim program seems not to recognize options '--periodic --nonorm' .

                   :
glosim.py: error: unrecognized arguments: --periodic --nonorm

Also, without these options, the program abort with the message below.

WARNING! fast hungarian library is not available 

Cannot find mcpermanent.so module in pythonpath. Permanent evaluations will be very slow and approximate.
Get it from https://github.com/sandipde/MCpermanent 
          TIME:   Wed Oct 17 14:46:59 2018
        ___  __    _____  ___  ____  __  __ 
       / __)(  )  (  _  )/ __)(_  _)(  \/  )
      ( (_-. )(__  )(_)( \__ \ _)(_  )    ( 
       \___/(____)(_____)(___/(____)(_/\/\_)
                                            
                                             
using output prefix = traj-pentacene-n9-l9-c3.0-g0.3_rematch-2.0
Reading input file traj-pentacene.xyz
564  Configurations Read
Computing SOAPs
Traceback (most recent call last):   
  File "glosim/glosim.py", line 862, in <module>
    main(args.filename, nd=args.n, ld=args.l, coff=args.c, cotw=args.cotw, gs=args.g, mu=args.mu, centerweight=args.cw, onpy=args.onpy, peratom=args.peratom, unsoap = args.unsoap, usekit=args.usekit, kit=args.kit,alchemyrules=args.alchemy_rules, kmode=args.kernel, normalize_global=args.normalize_global, permanenteps=args.permanenteps, reggamma=args.gamma, noatom=noatom, nocenter=nocenter, envsim=args.envsim, nprocs=args.np, verbose=args.verbose, envij=envij, prefix=args.prefix, nlandmark=args.nlandmarks, printsim=args.distance,ref_xyz=args.refxyz,partialsim=args.livek,lowmem=args.lowmem,restartflag=args.restart, zeta=args.zeta, xspecies=args.separate_species, alrange=(args.first,args.last), refrange=(args.reffirst, args.reflast))
  File "glosim/glosim.py", line 243, in main
    sii,senvii = structk(si, si, alchem, peratom, mode=kmode, fout=fii, peps=permanenteps, gamma=reggamma, zeta=zeta, xspecies=xspecies)        
  File "/home/iino/Work/glosim/libmatch/structures.py", line 292, in structk
    cost=rematch(kk, gamma, 1e-6)  # hard-coded residual error for regularized gamma
  File "/home/iino/Work/glosim/libmatch/lap/perm.py", line 46, in rematch
    raise ValueError("No Python equivalent to rematch function...")
ValueError: No Python equivalent to rematch function...

It seems the code for REMatch kernel is not implemented yet.
Is there any problem in my environment, or is it intentional?

parameters to repeat the prediction on QM7b with a average kernel

Hi Dr Sandip and Professor Ceriotti,
I want to repeat your work in Science Advances in qm7b dataset. It looks like I did some steps wrong which always return MAE higher than 8 kcal/mol. Can you please give me some tips about which step I have made mistake?
What I did as follows.
1, I use

python glosim.py qm7.xyz -n 9 -l 9 -g 0.3 -c 3 --zeta 2 --peratom --kernel average

to generate the kernel matrix. According to the SI of your paper, the parameters should be -n 9 -l 9 -g 0.3 -c 3 --zeta 2 --periodic --norm , but I replaced the --periodic by --peratom, and deleted --nonorm because glosim.py had been updated.
2, I use the shufflesplit form scikit-learn to get the training kernel matrix and test kernel matrix. (I used random split, instead of choosing training set by FPS).

from sklearn.model_selection import ShuffleSplit
rs = ShuffleSplit(n_splits=10, test_size=.20, random_state=0) # 10 times independently running
train_index, test_index in rs.split(X):
    X_train, y_train = X[train_index][:, train_index], y[train_index]
    X_test, y_test = X[test_index][:, train_index], y[test_index]

3, I put X_train to a Gaussian Process Regression with different regularization parameters, and based on this to test the X_test.

The lowest MAE I got was 8 kcal/mol with average kernel (Cutoff_r = 4 A). The high MAE here may come from the random split, but I think it shouldn’t be responsible for a so high MAE. Can you please give me some suggestions?

Difference between average and fastavg kernel

Not exactly an issue but I was wondering if you could tell me what the difference was between the average and fastavg kernel. In which cases can I use the much faster fastavg kernel and when should I resort to the average kernel?
Thank you! :)

How to handle similarity/distance matrix with sketchmap ?

As far as I can understand, glosim tool generates similarity or distance matrix between the molecules or the structures. While, sketchmap tool handle the high dimensional input data.
How can I input the similarity or distance matrix into sketchmap ?
Is there the mode like sklearn.manifold.MDS to handle pre-computed dissimilarities?
(sklearn.manifold.MDS can handle them with dissimilarity=‘precomputed’ option)
regard,

AttributeError: 'module' object has no attribute 'descriptors'

Hi,
Thanks for making these algorithms available.
I just downloaded quippy and installed it using this guide https://libatoms.github.io/QUIP/install.html
I checked out your code and tried to run the example, but I get the following trace:

python2 glosim.py example/mol-50.xyz
WARNING! fast hungarian library is not available 

Cannot find mcpermanent.so module in pythonpath. Permanent evaluations will be very slow and approximate.
Get it from https://github.com/sandipde/MCpermanent 
          TIME:   Wed Dec 21 17:34:03 2016
        ___  __    _____  ___  ____  __  __ 
       / __)(  )  (  _  )/ __)(_  _)(  \/  )
      ( (_-. )(__  )(_)( \__ \ _)(_  )    ( 
       \___/(____)(_____)(___/(____)(_/\/\_)
                                            
                                             
Reading input file example/mol-50.xyz
50  Configurations Read
Computing SOAPs
Traceback (most recent call last):   
  File "glosim.py", line 871, in <module>
    main(args.filename, nd=args.n, ld=args.l, coff=args.c, cotw=args.cotw, gs=args.g, mu=args.mu, centerweight=args.cw, periodic=args.periodic, usekit=args.usekit, kit=args.kit,alchemyrules=args.alchemy_rules, kmode=args.kernel, nonorm=args.nonorm, permanenteps=args.permanenteps, reggamma=args.gamma, noatom=noatom, nocenter=nocenter, envsim=args.envsim, nprocs=args.np, verbose=args.verbose, envij=envij, prefix=args.prefix, nlandmark=args.nlandmarks, printsim=args.distance,ref_xyz=args.refxyz,partialsim=args.livek,lowmem=args.lowmem,restartflag=args.restart, zeta=args.zeta, xspecies=args.separate_species, alrange=(args.first,args.last), refrange=(args.reffirst, args.reflast))
  File "glosim.py", line 189, in main
    si.parse(at, coff, cotw, nd, ld, gs, centerweight, nocenter, noatom, kit = kit)       
  File "/home/peter/kodesjov/glosim/libmatch/structures.py", line 82, in parse
    desc = quippy.descriptors.Descriptor("soap central_weight="+str(cw)+"  covariance_sigma0=0.0 atom_sigma="+str(gs)+" cutoff="+str(coff)+" cutoff_transition_width="+str(cotw)+" n_max="+str(nmax)+" l_max="+str(lmax)+' '+lspecies+' Z='+str(sp) )   
AttributeError: 'module' object has no attribute 'descriptors'

When to use unnormalized SOAPs

Hi all,

I noticed the new flag --unsoap was added to use unnormalized SOAPs, when do you recommend to use this flag?

Best regards,
Bart

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.