Git Product home page Git Product logo

dohlee / protstruc Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 51.46 MB

Protein structure handling in Python made easy.

Home Page: https://dohlee.github.io/protstruc/

License: MIT License

Python 1.19% HTML 98.53% CSS 0.02% JavaScript 0.26%
antibody antibody-design bioinformatics bioinformatics-tool computational-biology data-handling protein protein-design protein-structure protein-structure-analysis

protstruc's Introduction

protstruc

PyPI version Documentation Black

Protein structure handling in Python made easy.

Dealing with a batch of protein structures is a common task in this new era of artificial intelligence-based protein structure prediction, optimization, and design. This package, protstruc, provides an easy-to-use interface to handle a batch of protein structures, while specifically designed for deep learning applications based on PyTorch. protstruc provides various features including:

  • Computing geometric features of a protein structure

    • pairwise distance matrix
    • backbone dihedrals
    • backbone orientations
    • backbone translations
    • pairwise dihedrals
    • pairwise planar angles
  • Manipulation of protein structures
    • selecting specific chains
    • selecting specific residues
    • rotating the structure
    • translating the structure
    • zeroing center-of-mass
    • diffusing (adding Gaussian noise) atom 3D coordinates

There are many options to initialize the core data structure for a batch of protein structures, StructureBatch:

  • from local PDB files StructureBatch.from_pdb
  • from PDB IDs StructureBatch.from_pdb_id
  • Backbone (or even full-) atom 3D coordinates StructureBatch.from_xyz
  • Backbone orientation and translations StructureBatch.from_backbone_orientations_translations
  • Dihedral angles StructureBatch.from_dihedrals (WIP)

Installation

Simple enough, just use pip:

pip install protstruc

Examples

Loading a structure from a PDB identifier, getting atom 3D coordinates and visualizing it

This code snippet loads a structure of a renowned monoclonal antibody targeting PD-1, named Pembrolizumab, from its PDB identifier 5DK3, and visualizes its structure.

import matplotlib.pyplot as plt
from protstruc import StructureBatch

# Load a structure from a PDB identifier
struc = StructureBatch.from_pdb_id('5dk3')
# Full-atom 3D coordinates of shape (batch_size, num_residues, max_num_atoms_per_residue, 3)
# Note that max_num_atoms_per_residue is set to 15 by default.
# Also note that in this case, batch_size=1
xyz = struc.get_xyz()

# Visualize the structure
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(
    xyz[0, :, :, 0], xyz[0, :, :, 1], xyz[0, :, :, 2],
    alpha=0.1, c='C0'
)
ax.view_init(20, 20)

Selecting specific chains

You may need a selector for specific peptide chains for some reason. A StructureBatch object provides handy methods get_chain_ids and get_chain_idx to get chain IDs and the integer indices to represent the corresponding chains, respectively. The following code snippet generates boolean masks for each chain, uses them to select specific chain and colors heavy chain with bluish colors and light chain with reddish colors.

chain_idx = struc.get_chain_idx()[0]  # Note we only have one structure in the batch.
chain_ids = struc.get_chain_ids()[0]
# Let heavy chains colored in bluish colors,
# and light chains colored in reddish ones.
colors = ['C3', 'C0', 'C1', 'C9']

for idx, (chain_id, c) in enumerate(zip(chain_ids, colors)):
    mask = chain_idx == idx
    ax.scatter(
        xyz[0, mask, :, 0], xyz[0, mask, :, 1], xyz[0, mask, :, 2],
        alpha=0.1, c=c
    )
ax.view_init(-10, 20, 90)

Computing geometric features of a protein structure

Pairwise distance matrix

from protstruc.general import ATOM
struc = StructureBatch.from_pdb_id('5dk3')

# Shape (batch_size, num_residue, num_residue, num_atom, num_atom)
distmat, distmat_mask = struc.pairwise_distance_matrix()
plt.matshow(distmat[0, :, :, ATOM.CA, ATOM.CA])  # Ca-Ca distance matrix
plt.axis('off')

Diffusing 3D atom coordinates by adding Gaussian noise

We often need to add Gaussian noises to 3D atom coordinates to train Euclidean diffusion models for protein structure generation. A StructureBatch object provides a handy method diffuse_xyz to add Gaussian noises of zero mean and given variances to 3D atom coordinates.

# Diffusion parameters
T = 300
sched = cosine_variance_schedule(T)
prt_idx = 0

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
struc = StructureBatch.from_pdb_id('5dk3')
# Center atom coordinates at the origin and scale to unit variance
# NOTE: All atoms are considered here
struc.standardize()

ims = []
for t in range(T):
    xyz = struc.get_xyz()
    im1 = ax.scatter(
        xyz[prt_idx, :, ATOM.CA, 0],
        xyz[prt_idx, :, ATOM.CA, 1],
        xyz[prt_idx, :, ATOM.CA, 2],
        c='C1'
    )
    ax.view_init(-10, 20, 90)
    ims.append([im1])

    # Add Gaussian noise to the structure
    struc.diffuse_xyz(beta=torch.tensor([sched['beta'][t]]))

ani = animation.ArtistAnimation(fig, ims, interval=100, blit=True, repeat_delay=1000)
ani.save(f'animations/pembrolizumab_diffusion.gif')

Testing

To test the integrity of the package, run the following command:

pytest

protstruc's People

Contributors

dohlee avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

chemphy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.