Git Product home page Git Product logo

jordivillafreixa / hras-activation-classifier Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jorgeandomics/hras-activation-classifier

0.0 0.0 0.0 97.62 MB

HRAS Activation Classifier is part of the MFP of the Omics Data Analysis master's degree from the University of VIC. Contains the pipeline necessary for the generation of a Random Forest classifier of prediction status in enzymes, as well as the results and output files of the Human HRAS analysis and model generation.

License: MIT License

Python 0.18% Jupyter Notebook 99.82% Nextflow 0.01%

hras-activation-classifier's Introduction

HRAS Activation Classifier

HRAS Activation Classifier is a complete pipeline for the generation of a Random Forest model, capable of discriminating a given protein structure between its active or inactive status, based on homology with similar structures available in databases. It was first used to generate a classifier on Human HRAS, hence the name. It is part of the MFP of Omics Data Analysis in the University of VIC.

Table of Contents

  • Installation
  • Usage
  • Usage Example
  • Contributing
  • License

Installation

To use this pipeline, you will need to have Python 3.10, and the latest version of NextFlow installed (https://github.com/nextflow-io/nextflow) .

Once you have Python installed, please download the repository with:

git clone https://github.com/JorgeAndOmics/HRAS-Activation-Classifier

You have to install the WCN Standalone Library from Martin Floor repository [Martin-Floor/WCN (github.com)]. Please follow the instalation instructions in the repository, but do not create a custom environment just yet.

After this, you will have to acquire a license for the Pyrosetta library [PyRosetta - Licensing PyRosetta]. Install as advised after this.

Then, with the .yaml file available in the Pipeline folder, modify it with your Pyrosetta credentials and use the following command to create a custom environment. Use of Conda is advised for environment management:

conda env create -f hras_classifier.yaml

Activate the environment with:

conda activate hras_classifier

After this, we enter into the directory :

cd HRAS-Activation-Classifier
cd Pipeline

Usage

The HRAS Activation Classifier pipeline takes the following command line arguments:

Parameter Description
--uniprot_code The Uniprot identifier of the desired protein
--identity_threshold The identity selection threshold for the retrieved structures
--query_coverage_threshold The query coverage selection threshold for the retrieved structures
--active_ligands A comma-separated list of ligands that unequivocally identify a protein as active
--inactive_ligands A comma-separated list of ligands that unequivocally identify a protein as inactive
--gap_open_penalty Gap open penalty for the pairwise alignment
--gap_extend_penalty Gap extend penalty for the pairwise alignment
--seed The seed for the pseudo-random processes

To run the script, open a terminal or command prompt, navigate to the downloaded directory HRAS-Activation-Classifier file, and enter the following command:

nextflow protein_activity_classifier.nf \  
  --uniprot_code <UNIPROT_CODE> \  
  --identity_threshold <IDENTITY_THRESHOLD> \  
  --query_coverage_threshold <QUERY_COVERAGE_THRESHOLD> \  
  --active_ligands <ACTIVE_LIGANDS> \  
  --inactive_ligands <INACTIVE_LIGANDS> \  
  --gap_open_penalty <GAP_OPEN_PENALTY> \  
  --gap_extend_penalty <GAP_EXTEND_PENALTY> \  
  --seed <SEED>  

Replace each of the options with the appropriate values for your analysis.

Usage Example

The pipeline was created for the study of Human HRAS (Uniprot code P01112), but it is usable for any kind of enzyme. Suppose you want to generate a model for the protein Citrate synthase, Uniprot code P09948:

nextflow protein_activity_classifier.nf \  
      --uniprot_code = P09948 \  
      --identity_threshold = 300 \  
      --query_coverage_threshold = 50 \  
      --active_ligands = acetyl-CoA,oxaloacetate \  
      --inactive_ligands = CoA,H> \  
      --gap_open_penalty = -1 \  
      --gap_extend_penalty = -1 \  
      --seed = 123  

The script will commence to download the required files and extract the data for them. The average time for a protein of ~200 amino acids long in a single 6 core-processor with hyperthreading was ~1h to retrieve and preprocess the data and ~2 days for training the model.

Contributing

If you would like to contribute to the HRAS Activation Classifier script, please fork the repository, make your changes, and submit a pull request.

License

This project is licensed under the terms of the MIT license. See the LICENSE file for more information.

References

hras-activation-classifier's People

Contributors

jorgeandomics avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.