Git Product home page Git Product logo

canonizedrmsd's Introduction

A Bifunctional Canonization for Efficient Minimal Root-mean-sqaure Deviation (RMSD) Calculation and CIP Stereochemistry Identification

This repository provides a tool that can be used to canonize molecules and provide CIP stereochemistry tags for atoms in the molecule. The branching tiebreaking process implemented in the algorithm calculates the minimal RMSD between two molecules.

Usage notes

Usage

Canonize.py [-h] [-o OUTPUT] file

to obtain canonized indices and stereochemical tags for atoms in a molecule.

CanonizedRMSD.py [-h] [-s] [-m] [-i] [-q] [-r] file1 file2

to calculate the RMSD of two molecules after canonizing them.

Supported file types

.mol | .sdf | .rxn | .mol2 | .ml2 | .pdb

Optional arguments:

Canonize.py file
    -o OUTPUT, --output OUTPUT    output file name. When not specified, the output will be printed to the screen.

CanonizedRMSD.py file1 file2

    -s, --save save intermediate results

    -m, --mapping output atom mapping relationship with two molecules

    -i, --ignore_isomerism ignore geometric and stereometric isomerism when canonizing

    -a, --no_alignment do not apply molecule alignment by Kabsch algorithm or QCP algorithm when calculating RMSD

    -q, --use QCP algorithm instead of Kabsch algorithm

    -r, --remove H atoms

    -at, --arbitrary_tiebreaking  apply an arbitrary tiebreaking, namely not performing branching tiebreaking

canonizedrmsd's People

Contributors

jerryjohnsonlee avatar palbhhq avatar

Stargazers

ansatz avatar  avatar Y. Zhai avatar

Watchers

Y. Zhai avatar  avatar

canonizedrmsd's Issues

Mis-represented spyrmsd performance?

Hello. I've just seen your pre-print about this work, and the comparison with spyrmsd performance caught my eyes (I'm the author of spyrmsd). Could you please clarify which backend did you use? I was unable to find it in your manuscript.

In our original published work, spyrmsd could use either the networkx or graph-tool backends. The former is the default (since it's widely available), but it's known to be slow.

Since version 0.7.0 of spyrmsd, released the 5th of April 2024, rustworkx is also supported, which is both fast and widely available.

rustworkx or graph-tool are the backends to be used for a fair comparison and benchmarks.


These are the timings I get running with the rustworkx and graph-tool backends for the molecules in testsets (Table 1 in your manuscript) on a Apple M1 Pro:

System spyrmsd [rustworkx] (s) spyrmsd [graph-tool] (s) CanonizedRMSD.py (s)
a 0.309 0.266 0.230
b 0.305 0.257 0.260
c 0.307 0.250 0.402
d * * *
e 6.617 1.698 307.3

These are much faster than the ones you reported in your manuscript. For a fair comparison, I also run the test cases on the Apple M1 Pro with CanonizedRMSD.py (same Python environment used for the spyrmsd timings).

Therefore, it seems that with the high-performant backends (rustworkx and graph-tool), spyrmsd is much faster than claimed in your pre-print, and performs significantly faster than CanonizedRMSD.py (2 or 7 seconds instead of 5 minutes).

I'm not sure why the timing for e is not reported for spyrmsd in your Table 1.


I was unable to run test d with spyrmsd because I hit the following:

[21:43:36] Explicit valence for atom # 0 N, 4, is greater than permitted
[21:43:36] ERROR: Could not sanitize molecule ending on line 446
[21:43:36] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted
[21:43:36] Explicit valence for atom # 0 N, 4, is greater than permitted
[21:43:36] ERROR: Could not sanitize molecule ending on line 446
[21:43:36] ERROR: Explicit valence for atom # 0 N, 4, is greater than permitted

Given the size of the molecule (217 atoms) I haven't checked where this is coming from, but the error would indicate an issue with your input file.

Interestingly, I also get the same exception using CanonizeRMSD.py:

rdkit.Chem.rdchem.AtomValenceException: Explicit valence for atom # 0 N, 4, is greater than permitted

Therefore, I'm not entirely sure if the molecule d available in the repository is the same used for benchmarks.


If you only used the networkx, I would kindly ask to update your pre-print with the correct timings for spyrmsd using the graph-tool and/or rustworkx, especially Table 1, Figure 7, and Figure 8.

I'd be happy to provide any assistance, and I'll make rustworkx the default backend in future releases, give the potentials for misrepresentation of performance when using the current default networkx.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.