mmcguffi / plannotate Goto Github PK
View Code? Open in Web Editor NEWWebserver and command line tool for annotating engineered plasmids
License: GNU General Public License v3.0
Webserver and command line tool for annotating engineered plasmids
License: GNU General Public License v3.0
Installed via conda as per the instructions in the README:
mamba create -n plannotate -c conda-forge -c bioconda plannotate
which resulted in version 1.2.0 being installed. I then (after failing with my own plasmids) downloaded pUC19.fa
from the data directory in this github repo, and tried annotating it and this is what I get:
(plannotate) fennell@x86_64 /tmp $ plannotate batch --input pUC19.fa --html --output /tmp
2023-04-07 09:15:03.772
Warning: to view this Streamlit app on a browser, run it with the following
command:
streamlit run /Users/fennell/conda.x86/envs/plannotate/bin/plannotate [ARGUMENTS]
Traceback (most recent call last):
File "/Users/fennell/conda.x86/envs/plannotate/bin/plannotate", line 10, in <module>
sys.exit(main())
File "/Users/fennell/conda.x86/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/fennell/conda.x86/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/fennell/conda.x86/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/fennell/conda.x86/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/fennell/conda.x86/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/fennell/conda.x86/envs/plannotate/lib/python3.10/site-packages/plannotate/pLannotate.py", line 116, in main_batch
recordDf = annotate(inSeq, yaml_file, linear, detailed)
File "/Users/fennell/conda.x86/envs/plannotate/lib/python3.10/site-packages/plannotate/annotate.py", line 355, in annotate
blastDf = clean(blastDf)
File "/Users/fennell/conda.x86/envs/plannotate/lib/python3.10/site-packages/plannotate/annotate.py", line 178, in clean
rowSlice = (seqSpace[columnSlice] == kind).any(1) #only the rows that are in the columns of hit
TypeError: NDFrame._add_numeric_operations.<locals>.any() takes 1 positional argument but 2 were given
Hi,
I am interested in using PLannotate on my own annotated file (e.g., in genebank format), how can I add this into the database? what formats are accepted in the database folder?
Thanks, appreciate it.
Hello,
I am not able to find the location of the blast databases. Can you please guide me on this?
Best wishes,
Found this error while trying a local install. This seems to have been caused by an update to streamlit and so I found and used the solution here: streamlit/streamlit#5146 (comment)
The change required is to change all references to streamlit.cli
to streamlit.web.cli
Hi,
Whether the limitation of plasmid size could be removed in web server?
Thank you
Amber
Hi, thanks for the great software.
I was testing pLannotate on my plasmids when I hit this error. I also get this error using the pUC19 fasta file from the repository, so I am presuming that this is not a problem from my own plasmids. The pLannotate was installed through conda.
% plannotate --version
plannotate, version 1.2.0
% plannotate batch -i tmp/pUC19.fa --output tmp -d --file_name tmp_plannotate --html
2023-04-28 09:16:35.350
Warning: to view this Streamlit app on a browser, run it with the following
command:
streamlit run /home/abs/anaconda3/envs/bio/bin/plannotate [ARGUMENTS]
Traceback (most recent call last):
File "/home/abs/anaconda3/envs/bio/bin/plannotate", line 10, in <module>
sys.exit(main())
File "/home/abs/anaconda3/envs/bio/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/abs/anaconda3/envs/bio/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/abs/anaconda3/envs/bio/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/abs/anaconda3/envs/bio/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/abs/anaconda3/envs/bio/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/abs/anaconda3/envs/bio/lib/python3.8/site-packages/plannotate/pLannotate.py", line 116, in main_batch
recordDf = annotate(inSeq, yaml_file, linear, detailed)
File "/home/abs/anaconda3/envs/bio/lib/python3.8/site-packages/plannotate/annotate.py", line 355, in annotate
blastDf = clean(blastDf)
File "/home/abs/anaconda3/envs/bio/lib/python3.8/site-packages/plannotate/annotate.py", line 178, in clean
rowSlice = (seqSpace[columnSlice] == kind).any(1) #only the rows that are in the columns of hit
TypeError: any() takes 1 positional argument but 2 were given
Hello,
I was wondering if it would be possible to add the "feature description" into the "/note" section of the feature so they are conserved in the GenBank file. It would be conveninet to have these longer descriptions instead of just the feature name, especially when using Snapgene or other visualization programs.
Hey,
I am trying to set up a custom blast database and run pLannotate using a custom yaml_file but run into some issues
I have a fasta file mtcsb_parts.fasta containing my custom nucleotide sequences:
>1
NNNNNNN
>2
NNNNNNN
I create the blast database using:
makeblastdb -in /Users/ruprec01/Documents/Faith_lab/Git/blastdb/mtcsb_parts/mtcsb_parts.fasta -title "mtcsb_parts" -dbtype nucl
I have a mtcsb_parts.csv file containing descriptions of the sequneces in the same path:
sseqid,Feature,Type,Description
1,feature1,type1,descript1
2,feature2,type2,descript2
I create a custom_yaml file, that contains the entry
mtcsb_parts:
details:
compressed: false
default_type: None
location: /path-to-folder/mtcsb_parts
location: /path-to-folder/mtcsb_parts
method: blastn
parameters:
- -perc_identity 95
priority: 1
version: Downloaded 2021-07-23
I run plannotate using in conda using:
plannotate batch -i test.fasta \
--yaml_file plannotate_custom.yaml \
--output /output
I get the following error:
streamlit run /Users/ruprec01/opt/anaconda3/envs/plannotate/bin/plannotate [ARGUMENTS]
Traceback (most recent call last):
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/bin/plannotate", line 10, in <module>
sys.exit(main())
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/plannotate/pLannotate.py", line 180, in main_batch
gbk = rsc.get_gbk(recordDf, inSeq, kwargs["linear"])
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/plannotate/resources.py", line 120, in get_gbk
record = get_seq_record(inDf, inSeq, is_linear, record)
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/plannotate/resources.py", line 151, in get_seq_record
inDf["feat loc"] = inDf.apply(FeatureLocation_smart, axis=1)
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/pandas/core/frame.py", line 3940, in __setitem__
self._set_item_frame_value(key, value)
File "/Users/ruprec01/opt/anaconda3/envs/plannotate/lib/python3.10/site-packages/pandas/core/frame.py", line 4094, in _set_item_frame_value
raise ValueError(
ValueError: Cannot set a DataFrame with multiple columns to the single column feat loc
I am wondering if you can help me out with how to create the blastdatabase properly and add the correct entry into the custom yaml file. plannotate works as soon as I add for example the snapgene entry back into the custom yaml file.
Thanks for any help, really love pLannotate!
Greetings,
Constantin
Would be great if for the local installation it could be possible to process files in batch mode. Some instructions to do it would be cool.
Another thing will good if users could create their own add on database, some of the parts we use in our lab are not picked up
I so far managed to run it within Docker, however could facilitate portability if there is a full docker implementation of the tool. However im so far impressed with the tool and very thankful that you developed such a cool tool.
error in manual install:
==> python setup.py install
['tabulate >=0.8.9', 'streamlit >=1.8.1', 'biopython>1.77', 'bokeh=2.4.1']
error in plannotate setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Invalid requirement, parse error at "'=2.4.1'"
to fix changed:
https://github.com/barricklab/pLannotate/blob/03417a3991558fd2aef8cc68f9cf3d45853b0a6c/requirements.txt#L5
to be:
bokeh==2.4.1
Hi!
After running pLannotate on my plasmid sequence, the plasmid name in the LOCUS defline by default "plasmid". Would it be possible to pass through the name of the sequence in the input fasta file to the plasmid name?
Just installed this great software but noticed a weird problem! All my reverse annotations are starting one base pair later than they should be.
reverse:
forward:
here is an example sequence: https://gist.github.com/dr3y/b3eac9953cb4808fb875d2d273d2ebe3
Hey @jeffreybarrick,
I wanted to get your thoughts before I make this PR, which will only take me a few hours. The idea would be to able to annotate a large batch of plasmid sequences, for example tens of thousands. This would be useful when QC'ing individual long-reads sequencing a colony of plasmids, prior to assembly of the various populations.
I would modify the annotate and accompanying methods to accept a batch of queries, versus a single one now, as well as explicitly set threading on the various blast/diamond/cmscan tools. I would then modify the main "batch" method to accept a FASTA or FASTQ that could have more than one sequence (I would use pysam
for parsing, so a new dependency). I would also add a bit of caching (opt-in of course). Finally, I would create HTML and GBK files (when the cli commands are set) on a per-read basis for QC. I'd probably add a bit of progress logging too.
Thoughts?
Hi,
I found plannotate to be very useful for my work but I am struggling to get it work for larger plasmids > 50,000bp. Is there a workaround for this situation or can you recommend another tool to explore that option?
Thanks!
Hi @mmcguffi, I'm creating this issue to work on getting this added to Bioconda. I think it would be straightforward.
One thing I was wondering about, would you be up for creating a plannotate download
function to download the databases and put them in the expected location? I think this would make pLannotate easier to use for novice users, but also make the download process more standardized.
Curious on your thoughts about this
Hi, I want to use plannotate with my desired genome (like zebrafish, drosophila, etc.).
I checked the BLAST_dbs.tar.gz file and I noticed the plannotate internally uses infernal(for RNA) and diamond(for DNA).
So, If I generate my own diamond db with a genome of interest, can I use it with plannotate?
I also checked plannotate_default.yaml file, but I couldn't find out instructions...
thanks!
Trying to run pLannotate via conda and I get the error:
ModuleNotFoundError: No module named 'streamlit.cli'
After a little digging it looks like the streamlit.cli module was moved to streamlit.web.cli in the newer versions of streamlit.
I suggest changing the import or locking the streamlit version to an earlier version.
It worked for me when I used streamlit version 1.10.0
Adding functionality for users to specify custom databases, as well as increasing ease of modularly extending pLannotate in the future.
Currently the idea is to create a separate file, perhaps YAML or similar format, that specifies:
Any other ideas or desired functionality is much appreciated.
Great tool, thanks for putting it together.
I have a few comments regarding the readme:
-b
Hi,
I got an error when starting plannotate after installing it via mamba. I changed "import streamlit.cli" to "import streamlit.web.cli" and got it working again.
Could you perhaps tell me how i can start the application on a different port then 8501? I have some other streamlit applications running on other ports and there i just specify the port by adding ex. --server.port 7501. But this does not seem to work with plannotate.
Regards
Nicolas
Hi @mmcguffi ,
Is it possible to convert the html annotation map file to an image?
I tried wkhtmltoimage but unfortunately it just gives a blank file.
It seems the image is being made dynamically.
Let me know if you have any insights into this.
HI!
I've noticed that when you have an ORF that can't be annotated with any of the databases currently in set, pLannotate does not produce an CDS feature. Adding a custom database via the -y
option did produce what I wanted, but would it be possible to add predicting ORFs, for say longer than 100bps?
pLannotate currently does not report valid features that are completely nested within a larger feature. For example, the SV40 origin of replication contained within the SV40 promoter, which currently pLannotate only reports the larger SV40 promoter.
I will address this by:
Hi! Thanks for the great tool.
I am trying to run plannotate batch in a singularity container and I'm facing the below error.
My input file is a fasta file and the command is as below
plannotate batch -h -i reference.fasta
reference
TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGACGGTCA
CAGCTTGTCTGTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGTCAGCGGGTG
TTGGCGGGTGTCGGGGCTGGCTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGC
ACCATATGCGGTGTGAAATACCGCACAGATGCGTAAGGAGAAAATACCGCATCAGGCGCC
ATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATCGGTGCGGGCCTCTTCGCTAT
TACGCCAGCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGT
TTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGGCGCGCCACATTGATTATTGACTA
GTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCG
TTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGA
This error is not seen when I run in docker container.
Hope you can help me with this issue.
Thanks
Hi @mmcguffi,
I am trying to annotate plasmids using this tool. So far it had worked pretty well but it is throwing error for plasmids larger than 50Kb. I tried the method mentioned in one of the issue which was building the tool from the source but it is still throwing me the same error. I tried manipulating the script a bit but it didn't gave the expected result.
So I request you to please help me out whenever its possible.
Please let me know if you need any other information from my side.
Thank you in advance.
Hi,
I have installed pLannotate from conda and I am getting the following error message:
File "(...)/miniconda3/envs/plann/lib/python3.11/site-packages/plannotate/pLannotate.py", line 5, in
import streamlit.cli
ModuleNotFoundError: No module named 'streamlit.cli'
According to this issue in github, it seems that:
"If anyone stumbles across this, the streamlit.cli module was moved to streamlit.web.cli."
Best,
Edit: if I install it with mamba, it gets a different version of streamlit that works. But sometimes I get an error with altair version 5 (if it gets version 4, it works). Maybe "fixing" the versions of streamlit and altair in the recipe should fix the problem?
Currently, DIAMOND only reports the top 25 hits which was an oversight in initial development and can lead to missing annotations.
This will be addressed by:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.