Git Product home page Git Product logo

intensiphy's Introduction

Intensiphy

Intensiphy is an expansion of function wrapper for the Extensiphy pipeline. With Intensiphy, you can specify the organism you wish to add to your starting phylogeny and raw genomic data will be found in the NCBI Sequence Read Archive. Those sequences will be assembled with Extensiphy and added to your phylogeny automatically. If your query organism is used for too much broad research that you don't want included in your phylogeny, you can input a csv file of SRA IDs that you would specifically like to include and Intensiphy will do the rest!

Setup and Use

Intensiphy Controls and Flags

Output Files

Phylogenetic Estimation

Dependencies

Reporting Problems

Setup and Use

Mamba

The simplest and most hassle free way to install the dependencies of Intensiphy is using Mamba. The Mamba Installation section of this repository will walk through this process in more detail.

Advanced

If you're comfortable installing programs by hand, the Advanced Installation Methods section is for you. This is largely only tested on Linux (Ubuntu) operating systems.

Intensiphy Tutorial

We recommend you run through the Intensiphy tutorial for a more in-depth walkthrough of Intensiphy's features. The tutorial will walk through how to run Intensiphy using different data types and options. You can copy code snippets into your terminal window.

Additional Tutorials

To help explain some of the jargon (technical words and terms) that goes along with bioinformatics programs, we've written some tutorials, packaged with our Extensiphy program.

  • The command line tutorial will help you get a grasp on how to find files in your computer using the shell/terminal/command line (you'll be a hacker in no time!).
  • The suffix tutorial will help clarify the read suffix arguments.

Quick test run

If you have followed one of the install approaches above, you are now ready to try a test run!
We'll use the ip_combo.fas alignment file as our starting alignment. ip_combo.fas can be found in:

/intensiphy/testdata/ip_combo.fas

Now from the directory where you installed Intensiphy, run:

./intensiphy.py --align_file ./testdata/combo.fas --accession_method USER_INPUT --accs_file ./testdata/sra.csv 

This is a simple run, which downloads and assembles sequences found in the sra.csv file.

  • The --align_file flag provides the path to the existing alignment to update.
  • The --accession_method flag specfies what method accession numbers will be provided or collected by the program. Here we have collected and curated a file of NCBI SRA numbers.
  • The --accs_file flag provides the path to the accession numbers file. The file must be in .csv format

Once Intensiphy has finished running on the test data, you should see a line saying:

Assembled sequences are found in /project/Intensiphy/ip_output/sequence_storage/

Number of sequences assembled during this run: 5

Number of sequences in the current sequence library: 10

  • If you did not get this message, you'll have to check output log ip_dev_log.txt to learn more about the issue before proceeding.

We just constructed a sequence library using our original alignment and added 5 new sequences to the library.

Intensiphy Controls and Flags:

Required flags

- (--align_file) alignment in fasta format,

Either

- (--organism) the name or taxon ID of the organism you wish to collect sequences for from the NCBI SRA (Example: Neisseria gonorrhoeae[Organism] or txid482),

Or

- (--accession_method) Dictates how collecting and inputting accession numbers will be handled. (OPTIONS: USER_INPUT and AUTO_DL), (DEFAULT:AUTO_DL),
- (--accs_file) accession file. Used if you are using the [--accession_method USER_INPUT] flag to pass in a curated file of SRA numbers,

Optional flags

- (--cores) number of cores allocated to Intensiphy for the alignment and assembly steps (DEFAULT: 2),
- (--ref) reference sequence label (without suffix or file ending information). (Example: SRR1500345) (DEFAULT: random selection)
- (--placement) Toggles the phylogenetic placement function once all accession files have been downloaded and assembled. Toggles automatic phylogenetic estimation of starting tree if one was not input using the [--starting_tree] flag. (OPTIONS: ON, OFF) (DEFAULT: OFF)
- (--starting_tree) a phylogeny produced from the input alignment file. Used for phylogenetic placement.
- (--ip_out_dir) path and folder name you would like to use to store the outputs of this program (DEFAULT: ip_output).

Output Files

  • Single sequence files: found in your sequence storage folder
[OUTDIR]/sequence_storage/[SEQUENCE_FOLDERS]

Phylogenetic Estimation

Explanation of Intensiphy's interaction and utility with phylogenetic estimation coming soon!

Dependencies

Dependencies (Separate programs you'll need to install):

  1. Python 3
  2. bwa-mem2
  3. RAxMLHPC
  4. Seqtk
  5. Samtools
  6. Bcftools
  7. Fastx toolkit
  8. Dendropy

Reporting Problems

Software will have bugs. We try to address issues with Extensiphy as they arise. If you run into an issue, please report it using Extensiphy's Issue Tracker. You can also search the Issue Tracker for solved fixes for previously identified issues. Finally, you can contact us at [email protected] to discuss any problems with installing or running Extensiphy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.