desmid / mview Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 11.0 2.93 MB

MView extracts and reformats the results of a sequence database search or multiple alignment.

License: GNU General Public License v2.0

Makefile 0.93% Perl 99.07%

bioinformatics bioinformatics-tool blast blast-search clustal fasta sequence-alignment-visualization

mview's People

Contributors

Stargazers

Watchers

Forkers

lry198010 tw7649116 tolot27 mtw wangdi2014 chenminacm xrobin fsherif wook2014 hongmeiyin photocyte

mview's Issues

Refactor Universal; merge duplications in Bio and NPB libraries

Simple functions like 'min', 'max' are duplicated in various places.

Refactor into one definition.
Split Universal into topic-specific packages, shared by parser and mview libraries with explicit imports as needed.

Refactor row identifier handling into separate package

The Build/* classes and Align/* classes use various sequence row identifiers (raw, processed string, row number, unique identifier, ...). Currently, the code to manage these, cross-reference them and perform searches is in the Build/Base package.

Following SR, split this out as a distinct class and aggregate it where needed.

Sort by Identity or coverage

It would be nice if we could sort the alignment by the pid/cov columns, or even by a given position in the alignment.

include groupmap for all ambiguous nucleotides

Currently, only two different group maps are available for nucleotides: D1 and CYS. D1 contains conserved ring types only. In contrast to the groupmap D1, the colomap D1 contains all ambiguous nucleotides and. An additional colormap DC1 for the groupmap CYS exists as well. The naming is slightly inconsistent and confused me slightly.

I'd like to see all the different ambiguity codes in the consensus sequence as well, not only differently colored in the html output. Hence, I used -groupfile. But I recommend to make naming and content of color and group maps consistent.

Replace parsers with DSL and machine-generated code

Historically, all parsers are hand-crafted format by format and version by version, but use common parser layout and components to achieve this and produce a common parse tree.

Replace with a DSL (Domain Specific Language) describing input format syntax and generate parsers automatically.

Can envisage a GUI DSL editor for fitting a DSL to sample input.

support hsp tiling without sequences

For a quick overview of the blast hits it would be helpful to get the tilled hsp statistics without the alignment.

Currently, qseq and sseq must be written to the blast output and mview needs quite a bit of time computing the alignment, especially for larger sequences.

mview for PHI-BLAST not working

Dear Mview Team,

Using Mview, I was trying to fetch MSA from PHI-BLAST search results. Mview reports no alignments found although there are hits to may input pattern input sequence and target database. It is noted that since BLAST+, PHI-BLAST is available as an option to psiblast executable. Header of my blast results looks like:
PHIBLASTP 2.7.1+

Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.

Looking at mview help it shows PSI-BLAST is supported from both BLAST+ and BLAST but PHI-BLAST seems to supported in older version of BLAST only.

Is there a way to read PHI-BLAST results to be read by mview?

Many Thanks,

Intikhab

Remove '-hsp all' mode

The other two modes '-hsp ranked' and '-hsp discrete' make sense. The third mode 'all' is a historical artifact and has long been deprecated in the manual.

Remove.

Refactor parse library into main Bio tree

Currently, the parsers are under a self-contained subtree with top-level NPB. This is historical and there's no reason to keep them separate. The NPB top-level could then be removed to be consistent with the rest of the design.

Can't output in MSF format

When I try to convert a fasta file to an MSF one, it gets stuck. I've tried outputting into all other formats as well, and they all work as expected, but when I try to use msf format, it just keeps on running (in one case, for more than 15 minutes, and then I decided to terminate the program manually). It can read the msf file easily, but it can't output that same msf file in the msf format.

In the above picture, the program has been executing the last statement for more than 5 minutes, still no visible progress. Can you tell me what the issue might be and what's the possible solution? I want to convert my FASTA formatted alignments to MSF format so I can use the bali_score program for scoring.

Add percent coverage filters

Filtering is available by percent identity with '-minident' and '-maxident'.
Add corresponding command line options for percent coverage.

Add NCBI BLAST web service-like alignment coverage schematic generation

Port to another cross-platform language

MView may be ported at a future date to another language:

cross-platform
widely available
easier to read
better OO support

Nice to have:

static types

Candidates:

Python
Java (pro: static types, con: commercial)

show only nonconserved residues

Is it possible to show only the nonconserved residues of the sequence alignment with the exception of the reference sequence (mostly the first one)? All conserved residues could than be shown with a special character, typically a dot. Example:

ref	GGGGAACTTCTCCTGCTAGAAT
2	GGG...................
3	.................A....
4	...................A..
5	..........T........A..

Add graphical output (jpg, SVG, ...)

Groundwork for this had been laid in the rewrite of the Display packages (commit: #b32c748)

Handling of lower-case protein sequences

I am trying to visualize the MSA of genes that are coding in two separate positons.
I would be happy if I could display the amino acid sequence of one part in lowercase letters.
When I tried, I got the following error
$ Sequence lengths differ for output format 'mview' - aborting

I use MSA files like this one.

f__Nostocaceae_separate_coding_coded_000316665.1_6271
MTDENIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRIMIVGCDPKADSTRLMLHSKAQTT
VLSLAAERGAVEDLELEEVMLEGFRGVRCVESGGPEPGVGCAGRGIITSINFLEENGAYQ
DLDFVSYDVLGDVVCGGFAMPIREGKAQEIYIVTSGEMMAMYAANNIARGVLKYAHSGGV
RLGGLICNSRNVDREVELIETLADRLNTQMIHFVPRDNIVQHAELRRMTVNEYAPDSNQA
KEYATLATKIINNKNLAIPTPNELTTLTFVICSLKSmdeleallvefgildddtkhadi
igkkaeevpas
f__Nostocaceae_separate_coding_coded_000317435.1_405
MAVDKSIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRILIVGCDPKADSTRLMLHSKAQT
TVLSLAAERGAVEDLELEEVMLTGFRGVKCVESGGPEPGVGCAGRGIITAINFLEENGAY
QDVDFVSYDVLGDVVCGGFTNYHmpiregkaqeiyivtsgemmamyaanniargvlkya
htggvrlgglicnsrkvdrevelietlakrlntqmihfvprdnivqhaelrrmtvneyap
dsdqgneyrtlakkiinnknltiptpiemdelealliefgildddtkhaapeikaaask
f__Nostocaceae_separate_coding_coded_002368075.1_2211
MSDEKIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRILIVGCDPKADSTRLMLHSKAQTT
VLHLAAERGAVEDLELHEVMLTGFRGVKCVESGGPEPGVGCAGRGIITAINFLEENGAYT
DLDFVSYDVLGDVVCGGFAMPIREGKGGLFRVFVGAIVmmamyaanniargilkyahsg
gvrlgglicnsrktdrehelietlakrlntqmihfvprdnivqhaelrrmtvneyapdsn
qaneyralakkiienknltiptpiemdeleallieygildddskhaeiigkpasatk

Best wishes,

Consolidate -label options into single option

Annotation columns "labels" are switched off using -label0, -label1, etc.
Replace with something like:
-labeloff:all
-labelon:row,id,desc
or some other syntax using named columns.

Deal with asterisk "*" character in inputs

Hello!
I'd like to kindly ask if is it possible to add support to inputs baring asterisk "*" character.
This one I tested was made with sam2fasta.py:

raw

$ cat minitestsam2fastapy.fasta

>NC_045512.2_1bp_to_1680bp
ATTAAAGGTTTATACCTTCCCAGGTAACAAACC
>clone1
ATTAAAGGTTTATACC**CCCAGGTAACAAACC
>clone2
-----------------------------AACC

raw fails

$ mview -in fasta minitestsam2fastapy.fasta

Sequence lengths differ for output format 'mview' - aborting
mview: no alignments found
mview: no alignments found

edited

$ cat minitestsam2fastapy-EDIT.fasta

ATTAAAGGTTTATACCTTCCCAGGTAACAAACC
>clone1
ATTAAAGGTTTATACC--CCCAGGTAACAAACC
>clone2
-----------------------------AACC

edited works

$ mview -in fasta minitestsam2fastapy-EDIT.fasta

Identities normalised by aligned length.

                               cov    pid  1 [        .         .         .  ] 33
1 NC_045512.2_1bp_to_1680bp 100.0% 100.0%    ATTAAAGGTTTATACCTTCCCAGGTAACAAACC   
2 clone1                     93.9%  93.9%    ATTAAAGGTTTATACC--CCCAGGTAACAAACC   
3 clone2                     12.1% 100.0%    -----------------------------AACC   

MView 1.67, Copyright (C) 1997-2020 Nigel P. Brown

(tools) david@NewLinux:/media/david/SSD2a/1-workdir/bam2msa/test$

replace with a common error handling class
standardized message formats
possibly switchable reporting levels