Git Product home page Git Product logo

mview's People

Contributors

desmid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mview's Issues

Refactor row identifier handling into separate package

The Build/* classes and Align/* classes use various sequence row identifiers (raw, processed string, row number, unique identifier, ...). Currently, the code to manage these, cross-reference them and perform searches is in the Build/Base package.

Following SR, split this out as a distinct class and aggregate it where needed.

Sort by Identity or coverage

It would be nice if we could sort the alignment by the pid/cov columns, or even by a given position in the alignment.

include groupmap for all ambiguous nucleotides

Currently, only two different group maps are available for nucleotides: D1 and CYS. D1 contains conserved ring types only. In contrast to the groupmap D1, the colomap D1 contains all ambiguous nucleotides and. An additional colormap DC1 for the groupmap CYS exists as well. The naming is slightly inconsistent and confused me slightly.

I'd like to see all the different ambiguity codes in the consensus sequence as well, not only differently colored in the html output. Hence, I used -groupfile. But I recommend to make naming and content of color and group maps consistent.

Replace parsers with DSL and machine-generated code

Historically, all parsers are hand-crafted format by format and version by version, but use common parser layout and components to achieve this and produce a common parse tree.

Replace with a DSL (Domain Specific Language) describing input format syntax and generate parsers automatically.

Can envisage a GUI DSL editor for fitting a DSL to sample input.

support hsp tiling without sequences

For a quick overview of the blast hits it would be helpful to get the tilled hsp statistics without the alignment.

Currently, qseq and sseq must be written to the blast output and mview needs quite a bit of time computing the alignment, especially for larger sequences.

mview for PHI-BLAST not working

Dear Mview Team,

Using Mview, I was trying to fetch MSA from PHI-BLAST search results. Mview reports no alignments found although there are hits to may input pattern input sequence and target database. It is noted that since BLAST+, PHI-BLAST is available as an option to psiblast executable. Header of my blast results looks like:
PHIBLASTP 2.7.1+

Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.

Looking at mview help it shows PSI-BLAST is supported from both BLAST+ and BLAST but PHI-BLAST seems to supported in older version of BLAST only.

Is there a way to read PHI-BLAST results to be read by mview?

Many Thanks,

Intikhab

Remove '-hsp all' mode

The other two modes '-hsp ranked' and '-hsp discrete' make sense. The third mode 'all' is a historical artifact and has long been deprecated in the manual.

Remove.

Refactor parse library into main Bio tree

Currently, the parsers are under a self-contained subtree with top-level NPB. This is historical and there's no reason to keep them separate. The NPB top-level could then be removed to be consistent with the rest of the design.

See also issue #3

Can't output in MSF format

When I try to convert a fasta file to an MSF one, it gets stuck. I've tried outputting into all other formats as well, and they all work as expected, but when I try to use msf format, it just keeps on running (in one case, for more than 15 minutes, and then I decided to terminate the program manually). It can read the msf file easily, but it can't output that same msf file in the msf format.
image

In the above picture, the program has been executing the last statement for more than 5 minutes, still no visible progress. Can you tell me what the issue might be and what's the possible solution? I want to convert my FASTA formatted alignments to MSF format so I can use the bali_score program for scoring.

Add percent coverage filters

Filtering is available by percent identity with '-minident' and '-maxident'.
Add corresponding command line options for percent coverage.

Port to another cross-platform language

MView may be ported at a future date to another language:

  • cross-platform
  • widely available
  • easier to read
  • better OO support

Nice to have:

  • static types

Candidates:

  • Python
  • Java (pro: static types, con: commercial)

show only nonconserved residues

Is it possible to show only the nonconserved residues of the sequence alignment with the exception of the reference sequence (mostly the first one)? All conserved residues could than be shown with a special character, typically a dot. Example:

ref	GGGGAACTTCTCCTGCTAGAAT
2	GGG...................
3	.................A....
4	...................A..
5	..........T........A..

Handling of lower-case protein sequences

I am trying to visualize the MSA of genes that are coding in two separate positons.
I would be happy if I could display the amino acid sequence of one part in lowercase letters.
When I tried, I got the following error
$ Sequence lengths differ for output format 'mview' - aborting

I use MSA files like this one.

f__Nostocaceae_separate_coding_coded_000316665.1_6271
MTDENIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRIMIVGCDPKADSTRLMLHSKAQTT
VLSLAAERGAVEDLELEEVMLEGFRGVRCVESGGPEPGVGCAGRGIITSINFLEENGAYQ
DLDFVSYDVLGDVVCGGFAMPIREGKAQEIYIVTSGEMMAMYAANNIARGVLKYAHSGGV
RLGGLICNSRNVDREVELIETLADRLNTQMIHFVPRDNIVQHAELRRMTVNEYAPDSNQA
KEYATLATKIINNKNLAIPTPNELTTLTFVICSLKSmdeleallvefgildddtkhadi
igkkaeevpas

f__Nostocaceae_separate_coding_coded_000317435.1_405
MAVDKSIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRILIVGCDPKADSTRLMLHSKAQT
TVLSLAAERGAVEDLELEEVMLTGFRGVKCVESGGPEPGVGCAGRGIITAINFLEENGAY
QDVDFVSYDVLGDVVCGGFTNYHmpiregkaqeiyivtsgemmamyaanniargvlkya
htggvrlgglicnsrkvdrevelietlakrlntqmihfvprdnivqhaelrrmtvneyap
dsdqgneyrtlakkiinnknltiptpiemdelealliefgildddtkhaapeikaaask

f__Nostocaceae_separate_coding_coded_002368075.1_2211
MSDEKIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRILIVGCDPKADSTRLMLHSKAQTT
VLHLAAERGAVEDLELHEVMLTGFRGVKCVESGGPEPGVGCAGRGIITAINFLEENGAYT
DLDFVSYDVLGDVVCGGFAMPIREGKGGLFRVFVGAIVmmamyaanniargilkyahsg
gvrlgglicnsrktdrehelietlakrlntqmihfvprdnivqhaelrrmtvneyapdsn
qaneyralakkiienknltiptpiemdeleallieygildddskhaeiigkpasatk

Best wishes,

Consolidate -label options into single option

Annotation columns "labels" are switched off using -label0, -label1, etc.
Replace with something like:
-labeloff:all
-labelon:row,id,desc
or some other syntax using named columns.

Deal with asterisk "*" character in inputs

Hello!
I'd like to kindly ask if is it possible to add support to inputs baring asterisk "*" character.
This one I tested was made with sam2fasta.py:

raw

$ cat minitestsam2fastapy.fasta

>NC_045512.2_1bp_to_1680bp
ATTAAAGGTTTATACCTTCCCAGGTAACAAACC
>clone1
ATTAAAGGTTTATACC**CCCAGGTAACAAACC
>clone2
-----------------------------AACC

raw fails

$ mview -in fasta minitestsam2fastapy.fasta

Sequence lengths differ for output format 'mview' - aborting
mview: no alignments found
mview: no alignments found

edited

$ cat minitestsam2fastapy-EDIT.fasta

ATTAAAGGTTTATACCTTCCCAGGTAACAAACC
>clone1
ATTAAAGGTTTATACC--CCCAGGTAACAAACC
>clone2
-----------------------------AACC

edited works

$ mview -in fasta minitestsam2fastapy-EDIT.fasta

Identities normalised by aligned length.

                               cov    pid  1 [        .         .         .  ] 33
1 NC_045512.2_1bp_to_1680bp 100.0% 100.0%    ATTAAAGGTTTATACCTTCCCAGGTAACAAACC   
2 clone1                     93.9%  93.9%    ATTAAAGGTTTATACC--CCCAGGTAACAAACC   
3 clone2                     12.1% 100.0%    -----------------------------AACC   

MView 1.67, Copyright (C) 1997-2020 Nigel P. Brown

(tools) david@NewLinux:/media/david/SSD2a/1-workdir/bam2msa/test$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.