This repository provides an easy-to-implement python module called PyConforMap that generates scatter plots of instantaneous shape ratio (Rs) against relative radius of gyration (Rg/Rgmean).
PLEASE READ ALL DOCUMENTATION
There are two main main metrics: the relative radius of gyration (Rg/Rgmean) and the instantaneous shape ratio (Rs). Rs is computed as Rs = Ree2/Rg2 where Ree and Rg are (instantaneous) end-to-end distance and (instantaneous) radius of gyration respectively.
The Rg/Rgmean is a measure of (relative) size for a protein or polymer chain, and Rs is a measure of its shape. Rs is expected to be low (~2 or lower) for compact structures and high for highly extended structures (~12 or higher). A single Rg/Rgmean value and corresponding Rs value for a polymer together is how we define its instantaneous conformation. When all the Rg/Rgmean and Rs values of a polymer are plotted together, they constitute what we call a 2D map of the conformational landscape of that polymer.
This module generates 2D scatter plots of Rs against Rg/Rgmean for a protein/polymer simulation (data and protein label/identity provided by user) and a Gausssian Walk (GW) polymer model simulation (data for 720000 snapshots of a GW model of length 100 included with repository). Each point on the scatter plot (belonging to either GW or a given protein/polymer) represents a conformation snapshot, and has coordinates (Rg/Rgmean, Rs). The GW model is intended to be a reference model, whose conformational landscape map (i.e. as represented by all the (Rg/Rgmean, Rs) points) serves as a 'universal' or reference map for those of other proteins/polymers. Using the 2D scatter plot, an fC, representing the fraction of the GW points 'close' (i.e. within a pre-defined radius) to at least one protein/polymer point, is automatically calculated. fC is a quantity that represents the conformational diversity of the protein/polymer provided, and can be used to rank the conformational diversities of different proteins/polymers. The included GW file is 'GW_chainlen100.csv.' The python module can be additionally used to conduct a new GW simulation with different chain length and number of snapshots, should the user wish to do so. On the scatter plot, it is important that the protein/polymer points do not significantly exceed the boundaries defined by the reference (GW) points. Most of the protein/polymer points should be 'close' (i.e. within a pre-defined radius) to at least one GW point.
The needed input is a csv file (for a given protein/polymer simulation) with 2 columns. The first column contains Rg2 values and the second column contains Ree2 values. In this (user provided) file, each row represents a protein/polymer conformation snapshot from the simulation. An example input is the 'example_protein.csv' csv file (included with repository).
The 'code_input_output.md' file provides technical details (input arguments, expected outputs) of the module. The 'pyconformap.py' file contains the source code for the module. The 'illustrated_example.ipynb' jupyter notebook file shows examples to illustrate implementation of the code. The 'GW_chainlen100.csv' is the reference GW simulation and 'example_protein.csv' is the simulation of an example protein.
The module requires the pandas, numpy, matplotlib, scipy, itertools, more_itertools and random python packages. They are automatically loaded when the 'pyconformap.py' file is executed, as shown in the illustrated examples.
PyConforMap is a companion to a paper that is under review for publication, as of 15 Feb, 2024.
If you use this module, please cite us using the provided DOI.
If you have comments/suggestions or a bug report, please feel free to email me at [email protected], or contact me through my social media links provided in the home page.