Git Product home page Git Product logo

codon-conservation-rate's Introduction

Algorithm for the analysis of codon/bicodon conservation rates across linked species

This project facilitates calculating codon and bicodon conservation rates for a given genus.

1. About the project

This project started in 2021 as part of my thesis to obtain an engineering degree, but due to personal reasons I've decided to change my career path and the project was left abandoned. I think it would be beneficial for others to make use of my work, so I'm now releasing it as an open-source bioinformatics project licensed under GPL v3.0, everyone is welcome to participate in it. The inspiration for this project came from this paper, which I've tried to (partially) replicate using Drosophila's alignments from FlyDIVaS. Feel free to make as many additions as you'd like.

2. How does it work?

In this repo you will find a Python script called main.py. It takes a file (or a group of files) as input, which contains homologous genes previously aligned, in FASTA format. After parsing the file(s) for data extraction, it creates a matrix using Numpy, in order to iterate across matrix slices. The obtained information (codon count from reference sequence, and number of times that said codon was conserved across species) is stored in a CSV file, which is created using Pandas. For each MSA file two types of dataframes will be generated - one for codons, and another for codon pairs. The algorithm calculates codon/bicodon conservation rates across all 3 reading frames. So, there will be a total of 6 CSV files that will be generated (2 for each reading frame).

3. Running the algorithm

Copy and paste the MSA file samples on the directory where main.py is located. Then, you can just type python3 main.py in the terminal.

4. Dependencies

To execute this script you must install Python and its package manager, pip. You can do it on Ubuntu through the terminal:

$ sudo apt-get update
$ sudo apt-get install python3 python3-pip

Once you have installed Python and its package manager, you can proceed to install numpy and pandas:

$ pip3 install numpy pandas

5. Additional details

main.py must be edited to include the directory in which the MSA files are located. After parsing the files and calculating conservation rates, it will also generate a file called unreadable.txt which stores the names of MSA files that could not be parsed. Then, it will assemble all individual dataframes into 6 different dataframes that contain all the information across linked species. After the analysis is over, unnecessary dataframes will be deleted using delete_files.bat or delete_files.sh depending on your current OS.

6. How can I contribute?

You can check the backlog for pending tasks. Also, you can make modifications to the current code by creating pull requests which will then be reviewed before merging.

codon-conservation-rate's People

Contributors

fx-biocoder avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.