desmid / mview Goto Github PK
View Code? Open in Web Editor NEWMView extracts and reformats the results of a sequence database search or multiple alignment.
License: GNU General Public License v2.0
MView extracts and reformats the results of a sequence database search or multiple alignment.
License: GNU General Public License v2.0
Simple functions like 'min', 'max' are duplicated in various places.
See also issue #2
The Build/* classes and Align/* classes use various sequence row identifiers (raw, processed string, row number, unique identifier, ...). Currently, the code to manage these, cross-reference them and perform searches is in the Build/Base package.
Following SR, split this out as a distinct class and aggregate it where needed.
It would be nice if we could sort the alignment by the pid/cov columns, or even by a given position in the alignment.
Currently, only two different group maps are available for nucleotides: D1
and CYS
. D1
contains conserved ring types only. In contrast to the groupmap D1
, the colomap D1
contains all ambiguous nucleotides and. An additional colormap DC1
for the groupmap CYS
exists as well. The naming is slightly inconsistent and confused me slightly.
I'd like to see all the different ambiguity codes in the consensus sequence as well, not only differently colored in the html output. Hence, I used -groupfile
. But I recommend to make naming and content of color and group maps consistent.
Historically, all parsers are hand-crafted format by format and version by version, but use common parser layout and components to achieve this and produce a common parse tree.
Replace with a DSL (Domain Specific Language) describing input format syntax and generate parsers automatically.
Can envisage a GUI DSL editor for fitting a DSL to sample input.
For a quick overview of the blast hits it would be helpful to get the tilled hsp statistics without the alignment.
Currently, qseq and sseq must be written to the blast output and mview needs quite a bit of time computing the alignment, especially for larger sequences.
Dear Mview Team,
Using Mview, I was trying to fetch MSA from PHI-BLAST search results. Mview reports no alignments found although there are hits to may input pattern input sequence and target database. It is noted that since BLAST+, PHI-BLAST is available as an option to psiblast executable. Header of my blast results looks like:
PHIBLASTP 2.7.1+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Looking at mview help it shows PSI-BLAST is supported from both BLAST+ and BLAST but PHI-BLAST seems to supported in older version of BLAST only.
Is there a way to read PHI-BLAST results to be read by mview?
Many Thanks,
Intikhab
The other two modes '-hsp ranked' and '-hsp discrete' make sense. The third mode 'all' is a historical artifact and has long been deprecated in the manual.
Remove.
Currently, the parsers are under a self-contained subtree with top-level NPB. This is historical and there's no reason to keep them separate. The NPB top-level could then be removed to be consistent with the rest of the design.
See also issue #3
When I try to convert a fasta
file to an MSF
one, it gets stuck. I've tried outputting into all other formats as well, and they all work as expected, but when I try to use msf
format, it just keeps on running (in one case, for more than 15 minutes, and then I decided to terminate the program manually). It can read the msf
file easily, but it can't output that same msf
file in the msf
format.
In the above picture, the program has been executing the last statement for more than 5 minutes, still no visible progress. Can you tell me what the issue might be and what's the possible solution? I want to convert my FASTA formatted alignments to MSF format so I can use the bali_score
program for scoring.
Filtering is available by percent identity with '-minident' and '-maxident'.
Add corresponding command line options for percent coverage.
See also #8
MView may be ported at a future date to another language:
Nice to have:
Candidates:
Is it possible to show only the nonconserved residues of the sequence alignment with the exception of the reference sequence (mostly the first one)? All conserved residues could than be shown with a special character, typically a dot. Example:
ref GGGGAACTTCTCCTGCTAGAAT
2 GGG...................
3 .................A....
4 ...................A..
5 ..........T........A..
Groundwork for this had been laid in the rewrite of the Display packages (commit: #b32c748)
I am trying to visualize the MSA of genes that are coding in two separate positons.
I would be happy if I could display the amino acid sequence of one part in lowercase letters.
When I tried, I got the following error
$ Sequence lengths differ for output format 'mview' - aborting
I use MSA files like this one.
f__Nostocaceae_separate_coding_coded_000316665.1_6271
MTDENIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRIMIVGCDPKADSTRLMLHSKAQTT
VLSLAAERGAVEDLELEEVMLEGFRGVRCVESGGPEPGVGCAGRGIITSINFLEENGAYQ
DLDFVSYDVLGDVVCGGFAMPIREGKAQEIYIVTSGEMMAMYAANNIARGVLKYAHSGGV
RLGGLICNSRNVDREVELIETLADRLNTQMIHFVPRDNIVQHAELRRMTVNEYAPDSNQA
KEYATLATKIINNKNLAIPTPNELTTLTFVICSLKSmdeleallvefgildddtkhadi
igkkaeevpas
f__Nostocaceae_separate_coding_coded_000317435.1_405
MAVDKSIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRILIVGCDPKADSTRLMLHSKAQT
TVLSLAAERGAVEDLELEEVMLTGFRGVKCVESGGPEPGVGCAGRGIITAINFLEENGAY
QDVDFVSYDVLGDVVCGGFTNYHmpiregkaqeiyivtsgemmamyaanniargvlkya
htggvrlgglicnsrkvdrevelietlakrlntqmihfvprdnivqhaelrrmtvneyap
dsdqgneyrtlakkiinnknltiptpiemdelealliefgildddtkhaapeikaaask
f__Nostocaceae_separate_coding_coded_002368075.1_2211
MSDEKIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRILIVGCDPKADSTRLMLHSKAQTT
VLHLAAERGAVEDLELHEVMLTGFRGVKCVESGGPEPGVGCAGRGIITAINFLEENGAYT
DLDFVSYDVLGDVVCGGFAMPIREGKGGLFRVFVGAIVmmamyaanniargilkyahsg
gvrlgglicnsrktdrehelietlakrlntqmihfvprdnivqhaelrrmtvneyapdsn
qaneyralakkiienknltiptpiemdeleallieygildddskhaeiigkpasatk
Best wishes,
Annotation columns "labels" are switched off using -label0, -label1, etc.
Replace with something like:
-labeloff:all
-labelon:row,id,desc
or some other syntax using named columns.
Hello!
I'd like to kindly ask if is it possible to add support to inputs baring asterisk "*" character.
This one I tested was made with sam2fasta.py:
$ cat minitestsam2fastapy.fasta
>NC_045512.2_1bp_to_1680bp
ATTAAAGGTTTATACCTTCCCAGGTAACAAACC
>clone1
ATTAAAGGTTTATACC**CCCAGGTAACAAACC
>clone2
-----------------------------AACC
$ mview -in fasta minitestsam2fastapy.fasta
Sequence lengths differ for output format 'mview' - aborting
mview: no alignments found
mview: no alignments found
$ cat minitestsam2fastapy-EDIT.fasta
ATTAAAGGTTTATACCTTCCCAGGTAACAAACC
>clone1
ATTAAAGGTTTATACC--CCCAGGTAACAAACC
>clone2
-----------------------------AACC
$ mview -in fasta minitestsam2fastapy-EDIT.fasta
Identities normalised by aligned length.
cov pid 1 [ . . . ] 33
1 NC_045512.2_1bp_to_1680bp 100.0% 100.0% ATTAAAGGTTTATACCTTCCCAGGTAACAAACC
2 clone1 93.9% 93.9% ATTAAAGGTTTATACC--CCCAGGTAACAAACC
3 clone2 12.1% 100.0% -----------------------------AACC
MView 1.67, Copyright (C) 1997-2020 Nigel P. Brown
(tools) david@NewLinux:/media/david/SSD2a/1-workdir/bam2msa/test$
requested by Mathias Walter / tolot27
The mechanism for removing duplicated tags in marked up sequences is inefficient.
Develop unit tests and refine.
Error messages are produced ad-hoc all over the place with different styling conventions:
See also #2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.