Git Product home page Git Product logo

snipit's Introduction

snipit

Summarise snps relative to a reference sequence

Install

pip install snipit

Example Usage

Link to test data: test.fasta

  • Basic usage for nucleotide alignments:
snipit test.fasta \
--output-file test

Default format output is png. Only specify output path/name (not extension).

  • To change output format, use --format:
snipit test.fasta \
--output-file test \
--format pdf

Options: png, jpg, pdf, svg, tiff.

  • To change color scheme, use --colour-palette:
snipit test.fasta \
--output-file test \
--colour-palette classic_extended

Other colours schemes:

classic, classic_extended, primary, purine-pyrimidine, greyscale, wes,verity, ugene

Use ugene for protein (aa) alignments. Use classic_extended for colouring ambiguous bases.

  • There are multiple options to control which SNPs or indels are included/excluded:
snipit test.fasta \
--show-indels \
--include-positions '100-150' \
--exclude-positions '223 224 225'
  • For control over ambiguous bases, use --ambig-mode to specify how ambiguous bases are handled:
[all] include all ambig such as N,Y,B in all positions
[snps] only include ambig if a snp is present at the same position - Default 
[exclude] remove all ambig, same as depreciated --exclude-ambig-pos

Use the colour palette classic_extended when plotting with all or snps.

  • Recombination mode is designed to assist with recombination analysis for SC2. This mode allows for colouring of mutations present in two references. For recombination mode, three flags are required: --reference,--recombi-mode,--recombi-references.

The specified --reference must be different from the --recombi-references.

snipit test.fasta \
--reference USA_3 \
--recombi-mode \
--recombi-references "USA_1,USA_2"

For amino acid alignments, specify the sequence type as aa, use the colour palette ugene:

snipit test.prot.fasta \
--sequence-type aa \
--colour-palette ugene \
--output-file test.prot

There are several more options, see below for full usage.

Full Usage

snipit

optional arguments:
  -h, --help            show this help message and exit

Input options:
  alignment             Input alignment fasta file
  -t {nt,aa}, --sequence-type {nt,aa}
                        Input sequence type: aa or nt
  -r REFERENCE, --reference REFERENCE
                        Indicates which sequence in the alignment is the
                        reference (by sequence ID). Default: first sequence in
                        alignment
  -l LABELS, --labels LABELS
                        Optional csv file of labels to show in output snipit
                        plot. Default: sequence names
  --l-header LABEL_HEADERS
                        Comma separated string of column headers in label csv.
                        First field indicates sequence name column, second the
                        label column. Default: 'name,label'

Mode options:
  --recombi-mode        Allow colouring of query seqeunces by mutations
                        present in two 'recombi-references' from the input
                        alignment fasta file
  --recombi-references RECOMBI_REFERENCES
                        Specify two comma separated sequence IDs in the input
                        alignment to use as 'recombi-references'. Ex.
                        Sequence_ID_A,Sequence_ID_B
  --cds-mode            Assumes sequence supplied is a coding sequence

Output options:
  -d OUTPUT_DIR, --output-dir OUTPUT_DIR
                        Output directory. Default: current working directory
  -o OUTFILE, --output-file OUTFILE
                        Output file name stem. Default: snp_plot
  -s, --write-snps      Write out the SNPs in a csv file.
  -f FORMAT, --format FORMAT
                        Format options (png, jpg, pdf, svg, tiff) Default: png

Figure options:
  --height HEIGHT       Overwrite the default figure height
  --width WIDTH         Overwrite the default figure width
  --size-option SIZE_OPTION
                        Specify options for sizing. Options: expand, scale
  --solid-background    Force the plot to have a solid background, rather than
                        a transparent one.
  -c , --colour-palette 
                        Specify colour palette. Options: [classic,
                        classic_extended, primary, purine-pyrimidine,
                        greyscale, wes, verity, ugene]. Use ugene for protein
                        alignments.
  --flip-vertical       Flip the orientation of the plot so sequences are
                        below the reference rather than above it.
  --sort-by-mutation-number
                        Render the graph with sequences sorted by the number
                        of SNPs relative to the reference (fewest to most).
                        Default: False
  --sort-by-id          Sort sequences alphabetically by sequence id. Default:
                        False
  --sort-by-mutations SORT_BY_MUTATIONS
                        Sort sequences by bases at specified positions.
                        Positions are comma separated integers. Ex. '1,2,3'
  --high-to-low         If sorted by mutation number is selected, show the
                        sequences with the fewest SNPs closest to the
                        reference. Default: False

SNP options:
  --show-indels         Include insertion and deletion mutations in snipit
                        plot.
  --include-positions INCLUDED_POSITIONS [INCLUDED_POSITIONS ...]
                        One or more range (closed, inclusive; one-indexed) or
                        specific position only included in the output. Ex.
                        '100-150' or Ex. '100 101' Considered before '--
                        exclude-positions'.
  --exclude-positions EXCLUDED_POSITIONS [EXCLUDED_POSITIONS ...]
                        One or more range (closed, inclusive; one-indexed) or
                        specific position to exclude in the output. Ex.
                        '100-150' or Ex. '100 101' Considered after '--
                        include-positions'.
  --ambig-mode {all,snps,exclude}
                        Controls how ambiguous bases are handled - [all]
                        include all ambig such as N,Y,B in all positions;
                        [snps] only include ambig if a snp is present at the
                        same position; [exclude] remove all ambig, same as
                        depreciated --exclude-ambig-pos

Misc options:
  -v, --version         show program's version number and exit

Cite

Please cite this tool as follows:

Aine O'Toole, snipit (2024) GitHub repository, https://github.com/aineniamh/snipit

snipit's People

Contributors

aineniamh avatar tomkinsc avatar ammaraziz avatar matt-sd-watson avatar corneliusroemer avatar desperate-dan avatar wm75 avatar

Stargazers

 avatar Biopig avatar Luan Rabelo avatar Dave avatar Timothy Driscoll avatar Gaorui Gong avatar Clint avatar Jessica Wiley, PhD avatar Janis Doss avatar Anthony Fries avatar Zhang Yixing avatar Magnus G. Jespersen avatar Ron Kagan avatar Qiqi Yang avatar  avatar Scott Nguyen avatar Yan avatar Mark Cheng avatar  avatar Katherine LD Running avatar tycour avatar raycefod avatar Ahmed M. A. Elsherbini avatar Carlo Berg avatar shivanshss avatar CWCWW avatar  avatar Matheus Pimenta avatar Steve avatar Ramses Rosales avatar Anan Ibrahim avatar  avatar Songtao Gui avatar Stephen Shank avatar  avatar  avatar Zhonghui Gai avatar mtekman avatar zhenpeng yu avatar Jonas Fuchs avatar Tamás Stirling avatar Eszter Ari avatar Erik Volz avatar Gautier RICHARD avatar Andy avatar Wytamma Wirth avatar Eike Steinig avatar moshi avatar Cora Albers avatar Carol Lee avatar Denise Lavezzari avatar Genomics Division, ITER avatar  avatar  avatar Lingfeng Mao avatar  avatar  avatar Mike Marquet avatar Martin Hölzer avatar  avatar Ceci Valenzuela avatar  avatar José Afonso Guerra-Assunção avatar Sergio Buenestado Serrano avatar  avatar Yexiao Cheng avatar Natalie Knox avatar dbeslic avatar Animesh Kumar Singh avatar Felipe Marques de Almeida avatar Camilo García avatar fanninpm avatar Lena Schimmel avatar  avatar Katherine Eaton avatar Kevin Wamae avatar Andrew avatar Emma Thomson avatar  avatar Richard Haigh avatar  avatar Angie Hinrichs avatar Miguel Álvarez Herrera avatar  avatar  avatar Carmen Sheppard avatar Rebee Penrice-Randal avatar Nikolaos Dionelis avatar Bede Constantinides avatar Peter Menzel avatar Jie Zhu avatar Arnold Knijn avatar Karan Kapuria avatar Lee Katz avatar Curtis Kapsak avatar Alexey Rakov avatar Young avatar Marco Galardini avatar Liz Batty avatar Ahmed M Moustafa avatar

Watchers

James Cloos avatar  avatar  avatar Stephen Beckstrom-Sternberg avatar

snipit's Issues

Can the graph be directly drawn through the reference sequence and VCF file?

Dear @aineniamh,
Thank you for developing such a great project.
When I'm conducting SNP analyzing, the typical scenario is that: align the second-generation sequencing data to the reference genome, then obtain a VCF file, and then visualize the SNP.
So, I'm wondering if Snipit can support the input of the reference genome and the VCF file, and then generate a SNP distribution plot accordingly? If that's possible, Snipit will have even more application scenarios.
Best wishes.

Error: name not a column name in label.csv

Having trouble with renaming the samples in my plot (which has worked many times before!?) with the error

Error: name not a column name in label.csv

Input

snipit aligned.fasta -r MN908947.3 -l label.csv

csv file looks fine?

`(snipit) rebee@server:~/projects/feb2022/filtered/consensus/with-ref$ head label.csv
name,label`

Coloring reference bases in --recombi-mode

I'm currently working on integrating your visualization script into my recombination detection tools. I love the --recombi-mode option, but since the reference bases aren't colored, it can be difficult to see the breakpoints.

For example, this command produces the following visualization of XBF:

snipit XBF.fasta \
  --recombi-mode \
  --recombi-references BA.5.2,CJ.1 \
  --reference Reference \
  --flip-vertical \
  --format png \
  --solid-background

Alignment: XBF.fasta.zip (I've masked non-barcode positions for BA.5.2 and CJ.1)

recombination_BA 5 2_CJ 1

But I'd find it more useful if:

  1. The reference bases were also colored for each recombi-reference (ex. BA.5.2=blue, CJ.1=red)
  2. The recombi-references maybe had different saturation for reference versus mutations (ex. reference=light color, mutation = dark color)?

This is a mock-up of a color scheme I'm imagining. Which makes the genomic composition of XBF more clear, and the breakpoints easier to see.

image

I'm wondering if you'd also find value (or problems?) with this style of visualization? If you think it's worthwhile, I could experiment in a pull request to see what kind of code changes would be needed?

Thanks for developing this amazing visualizer!

Plans for Bioconda?

hi @aineniamh!

This looks really cool and useful! Would it be ok with you if I submitted a recipe to Bioconda?

Cheers,
Robert

Option to remove the SNP base and coordinate text from image?

Hi,
This tool is very easy and works well on small alignments. Nevertheless, when visualising larger alignments the image becomes messy with the text of the bases in each tile as well as the coordinates on the top. Would it be possible to have an option to remove these elements?
Thanks!

Suggestion: show coordinates in reference sequence

Hi,

I love this program!

One suggestion: The coordinates on top show the position in the alignment, and it would be lovely if they could show the position in the reference sequence instead, maybe using an additional command line switch.

Suggestion: reformat the code with black

I would be happy to submit some pull requests for snipit, but I find the code style extremely dense, hard to read and to imitate, and generally discouraging.... Without meaning to sound rude or disrespectful etc., I think it would be a healthy move for the project if you installed black and ran it. The result would produce Python that is much more typical and I think encourage participation. Anyway, just a suggestion.

For example, I thought of a way to preserve the original order of sequences in the input, but I don't want to work on Python that looks like this (my editor really screams at me when I load the files). So I just didn't do it, and leave this comment instead. I hope that's ok. Thanks for snipit!

Suggestion: ability to order sequences according to a file

Hi there,

Really love this tool, thanks for sharing it! :)

I think it would also be really handy to be able to give a specific order for the sequences to appear in (perhaps the same order as the labels csv file?) so it can be easier to compare across predefined conditions rather than closeness to reference.

Thanks :)

pip install snipit missing latest parameters displayed on github repo

Love this tool - thank you so much for sharing it!

I installed this last week and noticed the latest parameters (e.g. below) aren't available or recognised when used!

  --flip-vertical       Flip the orientation of the plot so sequences are below the reference rather than above it.
  --include-positions INCLUDED_POSITIONS [INCLUDED_POSITIONS ...]
                        One or more range (closed, inclusive; one-indexed) or specific position only included in the output. Ex. '100-150' or Ex. '100 101' Considered before '--exclude-positions'.
  --exclude-positions EXCLUDED_POSITIONS [EXCLUDED_POSITIONS ...]
                        One or more range (closed, inclusive; one-indexed) or specific position to exclude in the output. Ex. '100-150' or Ex. '100 101' Considered after '--include-positions'.
  --exclude-ambig-pos   Exclude positions with ambig base in any sequences. Considered after '--include-positions'
  --sort-by-mutation-number
                        Render the graph with sequences sorted by the number of SNPs relative to the reference (fewest to most). Default: False
  --high-to-low         If sorted by mutation number is selected, show the sequences with the fewest SNPs closest to the reference. Default: False

Licenses put/patch bug

I am currently working on an API to automatically check and connect licenses. The problem is that the Put and Patch commands both default the License ID to 1. This is so far not changeable. I have been able to recreate the bug on the readme and on a local project. No matter which ID you put in. It will always use 1.

ModuleNotFoundError: No module named 'snp_functions'

I have trying to run snipit from a conda environment in Windows 10 and in Linux but I get the subject error. I have checked the snipt files and the "snp_funcitons" is there. Any light on why this is happening?
File "c:\users\xxxx\appdata\local\conda\conda\envs\snipit\lib\site-packages\snipit\command.py", line 9, in
import snp_functions as sfunks

snipit 1.1.2

Hello
I installed snipit with pip but it does not seem to have all the options (notably the most recent ones like --sequence-type.

can you advise?
thank you!

snipit --version
snipit 1.1.2
 snipit --help
usage: snipit <alignment> [options]

snipit

positional arguments:
  alignment             Input alignment fasta file

optional arguments:
  -h, --help            show this help message and exit
  -r REFERENCE, --reference REFERENCE
                        Indicates which sequence in the alignment is the reference (by sequence ID). Default: first sequence in alignment
  -l LABELS, --labels LABELS
                        Optional csv file of labels to show in output snipit plot. Default: sequence names
  --l-header LABEL_HEADERS
                        Comma separated string of column headers in label csv. First field indicates sequence name column, second the label column. Default: 'name,label'
  -d OUTPUT_DIR, --output-dir OUTPUT_DIR
                        Output directory. Default: current working directory
  -o OUTFILE, --output-file OUTFILE
                        Output file name stem. Default: snp_plot
  -s, --write-snps      Write out the SNPs in a csv file.
  -f FORMAT, --format FORMAT
                        Format options (png, jpg, pdf, svg, tiff) Default: png
  --height HEIGHT       Overwrite the default figure height
  --width WIDTH         Overwrite the default figure width
  --size-option SIZE_OPTION
                        Specify options for sizing. Options: expand, scale
  --solid-background    Force the plot to have a solid background, rather than a transparent one.
  --flip-vertical       Flip the orientation of the plot so sequences are below the reference rather than above it.
  --show-indels         Include insertion and deletion mutations in snipit plot.
  --include-positions INCLUDED_POSITIONS [INCLUDED_POSITIONS ...]
                        One or more range (closed, inclusive; one-indexed) or specific position only included in the output. Ex. '100-150' or Ex. '100 101' Considered before '--exclude-positions'.
  --exclude-positions EXCLUDED_POSITIONS [EXCLUDED_POSITIONS ...]
                        One or more range (closed, inclusive; one-indexed) or specific position to exclude in the output. Ex. '100-150' or Ex. '100 101' Considered after '--include-positions'.
  --exclude-ambig-pos   Exclude positions with ambig base in any sequences. Considered after '--include-positions'
  --sort-by-mutation-number
                        Render the graph with sequences sorted by the number of SNPs relative to the reference (fewest to most). Default: False
  --sort-by-id          Sort sequences alphabetically by sequence id. Default: False
  --sort-by-mutations SORT_BY_MUTATIONS
                        Sort sequences by bases at specified positions. Positions are comma separated integers. Ex. '1,2,3'
  --high-to-low         If sorted by mutation number is selected, show the sequences with the fewest SNPs closest to the reference. Default: False
  -v, --version         show program's version number and exit
  -c COLOUR_PALETTE, --colour-palette COLOUR_PALETTE
                        Specify colour palette. Options: primary, classic, purine-pyrimidine, greyscale, wes, verity
  --recombi-mode        Allow colouring of query seqeunces by mutations present in two 'recombi-references' from the input alignment fasta file
  --recombi-references RECOMBI_REFERENCES
                        Specify two comma separated sequence IDs in the input alignment to use as 'recombi-references'. Ex. Sequence_ID_A,Sequence_ID_B


v1.1.2 issues

a couple of observations for the current version:

  • --sort-by-mutations is broken (fix in #24)
  • code in master branch has __version__ set to 1.1, but pypi version has 1.1.2 - so the repo code does not seem to be identical to the pypi one (was the pypi upload generated from the tidy-up branch?)
  • README still lists --snps-only instead of --show-indels

snipit for amino acid

is there a way to use snipit for aminoacid alignments?
it works great with my nt data but I am interested in plotting AA

thank you for this nice program!

Suggestion: Writing out SNPs of a specific region in CSV file

Hello @aineniamh,

I have been using this program and so far I love it. I also like the idea of exporting a CSV file with all the SNPs detected.

I'm using genomic sequences as input and the option --include-positions to plot mutations on a specific region. However, when creating the CSV file, it shows mutations observed in the whole genomic sequences. Could it be possible to create a CSV file showing SNPs located in an specific range or positions?

Thanks for developing this useful tool!

Error about: "UnboundLocalError: local variable 'top_polygon' referenced before assignment"

A couple of times snipit has failed with the following error:

Traceback (most recent call last):
  File "/mnt/userapps/miniconda3/bin/snipit", line 8, in <module>
    sys.exit(main())
  File "/mnt/userapps/miniconda3/lib/python3.8/site-packages/snipit/command.py", line 72, in main
    sfunks.make_graph(num_seqs,num_snps,record_ambs,record_snps,output,label_map,colours,length,args.width,args.height,args.size_option)
  File "/mnt/userapps/miniconda3/bin/snp_functions.py", line 331, in make_graph
    rect = patches.Rectangle((0,(top_polygon)), length, y_inc ,alpha=0.2, fill=True, edgecolor='none',facecolor="dimgrey")
UnboundLocalError: local variable 'top_polygon' referenced before assignment

I think this occurs when the samples in the multi-fasta input file all have the same SNP's. I can send an example fasta file containing two fasta sequences that produced this error. If this is the reason, then perhaps could just output a "No SNP's found message."

I'm using MAFFT v7.487 with the "linsi" option (eg: "mafft-linsi input_fasta > aligned_fasta"), and using miniconda with conda 4.10.3, and Python 3.8.10, and Snipit version 1.0.3:

$ pip show snipit
Name: snipit
Version: 1.0.3
Summary: snipit
Home-page: https://github.com/aineniamh/snipit
etc...

Citation

Thank you very much for your awesome scripts. How can I cite your work?

Suggestion: enhancements to show possible effects on protein

This is a great tool for visualsing nucleotide changes, thanks for making it available to the community!

A related use case would be to visualise amino acid changes vs a reference (eg if giving snipit an alignment of a specific CDS of interest), which I don't think is currently supported? Either by giving it the amino acid sequence directly as the input, or by defining the start codon in the ref and converting on the fly.

As an alternative (or complement), it would be nice to be able to highlight nucleotide changes expected to be non-synonymous in the ouput in some way.

Maybe these suggestions don't scale so well to whole genomes which is what I know snipit was initially designed for.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.