andrewjpage / tiptoft Goto Github PK
View Code? Open in Web Editor NEWPredict plasmids from uncorrected long read data
License: GNU General Public License v3.0
Predict plasmids from uncorrected long read data
License: GNU General Public License v3.0
x.y.z needs to be filled it?
$ pip3 install tiptoft
Collecting tiptoft
Downloading https://files.pythonhosted.org/packages/c0/9a/3b39936b78e2d0c1aeeb61ec6a6b03737e4bae7e5874ea121dbb6b3d0588/tiptoft-0.1.3.tar.gz (88kB)
100% |████████████████████████████████| 92kB 7.1MB/s
Requirement already satisfied: biopython>=1.68 in /home/linuxbrew/.linuxbrew/lib/python3.7/site-packages (from tiptoft) (1.72)
Requirement already satisfied: pyfastaq>=3.12.0 in /home/linuxbrew/.linuxbrew/lib/python3.7/site-packages (from tiptoft) (3.17.0)
Requirement already satisfied: cython in /home/linuxbrew/.linuxbrew/lib/python3.7/site-packages (from tiptoft) (0.28.5)
Requirement already satisfied: numpy in /home/linuxbrew/.linuxbrew/lib/python3.7/site-packages (from biopython>=1.68->tiptoft) (1.15.2)
Skipping bdist_wheel for tiptoft, due to binaries being disabled for it.
Installing collected packages: tiptoft
Running setup.py install for tiptoft ... done
Successfully installed tiptoft-x.y.z
the link here to an example data file does not work & the data file is not included in the repo any more.
I found the commit that deleted it like so,
git rev-list -n 1 HEAD -- ERS654932_plasmids.fastq.gz
and then checked it out like so,
git show 7c7d3e55da84cf814c783dbaf429def261c77328^:example_data/ERS654932_plasmids.fastq.gz > ERS654932_plasmids.fastq.gz
I think it'd be great if the file remained in the repo, but it should be provided somewhere (or removed from the README - -1 on that :). Note that deleting it without rewriting history means it's still in the git history so it doesn't speed up git clones, although it does reduce the bundled repo size for releases etc.
I note that data/
has the files downloaded.
It would be great if it could default to use those.
Are they part of the pip resources
section?
eg.
parser.add_argument('--output_prefix', '-o', help='Output directory',
default = pkg_resources.get_distribution("plasmidpredictor").the_files_we_need)
I guess the reason is ethical/political
Could make everyone cite Plasmidfinder as well as your future JOSS paper.
Some kind of cython issue?
pip3 install tiptoft
Collecting tiptoft
Downloading https://files.pythonhosted.org/packages/8d/1e/0df98a3565f3f656ca625bc427804de61282593c0fed17fcb14da7605891/tiptoft-0.1.1.tar.gz (86kB)
100% |████████████████████████████████| 92kB 20.5MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-qkhicklb/tiptoft/setup.py", line 36, in <module>
ext_modules = cythonize(extensions),
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 897, in cythonize
aliases=aliases)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 777, in create_extension_list
for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 102, in nonempty
raise ValueError(error_msg)
ValueError: 'homopolymer_compression.pyx' doesn't match any files
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-qkhicklb/tiptoft/
pip3 --version
pip 18.0 from /home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/pip (python 3.7)
cython --version
Cython version 0.28.5
As you mention in your REAMED.md: "Please remember to cite the plasmidFinder paper"
The paper.md would benefit from the addition of a brief explanation of the reference database it requires, along with the appropriate bibliographical reference.
One way to help users cite the appropriate literate is to create a CITATION.cff file, https://citation-file-format.github.io/. (there are other approaches, too, of course ;). ref #16.
This is just a suggestion, not critical :)
language: python
python:
- "3.5"
- "3.6"
- "3.7-dev"
Recent Python branches require OpenSSL 1.0.2+. As this library is not available for Trusty, 3.7, 3.7-dev, 3.8-dev, and nightly do not work (or use outdated archive).
OR NOT ?
% plasmidpredictor -o out.txt db/plasmids.fa subreads.fq.gz
rep2.1_ORF(E.faeciumContig1183)_JDOE 100
rep2.2_repR(pEF1)_DQ198088* 99
rep11.1_repA(pB82)_AB178871 100
rep14.3_ORF(pRI1)_EU327398* 99
<snip>
% less out.txt
No such file or directory
Hello,
I wanted to try tiptoft but similarly to issue #26 I encounter the following error when trying to run tiptoft_database_downloader v1.0.2. Before breaking, the downloader saved nine .fsa files to tiptoft_db.tmp.download .
If you require additional information please let me know.
Thanks for having a look at it!
Best,
Laura
Combining downloaded fasta files...
RepA_N.fsa
enterobacteriaceae.fsa
Traceback (most recent call last):
File "/home/user/.local/bin/tiptoft_database_downloader", line 30, in <module>
tiptoft.run()
File "/home/user/.local/lib/python3.9/site-packages/tiptoft/TipToftDatabaseDownloader.py", line 23, in run
refgenes.run(self.output_prefix)
File "/home/user/.local/lib/python3.9/site-packages/tiptoft/RefGenesGetter.py", line 85, in run
exec('self._get_from_' + self.ref_db + '(outprefix)')
File "<string>", line 1, in <module>
File "/home/user/.local/lib/python3.9/site-packages/tiptoft/RefGenesGetter.py", line 58, in _get_from_plasmidfinder
for seq in file_reader:
File "/home/user/.local/lib/python3.9/site-packages/pyfastaq/sequences.py", line 141, in file_reader
raise Error('Error determining file type from file "' + fname + '". First line is:\n' + line.rstrip())
pyfastaq.sequences.Error: Error determining file type from file "/path/to/location/enterobacteriaceae.fsa". First line is:
<!DOCTYPE html>
% plasmidpredictor_database_downloader
Downloading data with:
curl -X POST --data "folder=plasmidfinder&filename=plasmidfinder.zip"
Noooo! whats going on. whats all this scrolling. whats happened??
I think --outdir
should be required ... "principle of LEAST SURPRISE".
Most people expect to see --help
to stderr when no parameter supplied.
k=13 is commong for Nanopore seeding
Would it go faster for pacbio with a bigger k ?
When I run it i get a lot of output:
<SNIP>
**
rep2.1_ORF(E.faeciumContig1183)_JDOE 100
rep2.2_repR(pEF1)_DQ198088* 99
rep11.1_repA(pB82)_AB178871 100
rep14.3_ORF(pRI1)_EU327398* 99
rep17.1_CDS29(pRUM)_AF507977 100
repUS15._ORF(E.faecium287)_NZAAAK010000287* 89
****
I think this is 'progressive' output as you go through reads? print_interval
?
Do I just focus on the "final" section (see above) ?
What are the numbers?
How many plasmids should I expect given the above output? Six?
What are the numbers? coverage?
% abricate --db plasmidfinder E.faecium/canu/canu.contigs.fasta | cut -f 2,5,6 | column -t
SEQUENCE GENE COVERAGE
tig00000002 rep2_1_ORF(E.faeciumContig1183)_JDOE 1-1494/1494
tig00000002 rep2_1_ORF(E.faeciumContig1183)_JDOE 1-1494/1494
tig00000010 repUS15__ORF(E.faecium287) 1-1041/1041
tig00000011 rep17_1_CDS29(pRUM) 537-1041/1041
tig00000012 rep6_1_repA(p703/5) 14-663/723
tig00000012 rep17_1_CDS29(pRUM) 1-1041/1041
tig00000012 rep6_1_repA(p703/5) 14-663/723
tig00000013 rep18_1_repA(p200B) 413-931/933
tig00000013 rep18_1_repA(p200B) 413-931/933
tig00000013 rep18_1_repA(p200B) 413-931/933
tig00000016 rep14_3_ORF(pRI1) 1-951/951
tig00000016 rep14_3_ORF(pRI1) 772-951/951
tig00000026 rep2_1_ORF(E.faeciumContig1183)_JDOE 1-1494/1494
tig00000027 rep2_1_ORF(E.faeciumContig1183)_JDOE 1-1494/1494
tig00000028 rep2_1_ORF(E.faeciumContig1183)_JDOE 1-1494/1494
in the README you have,
If you wish to fix a bug or add new features to the software we welcome Pull Requests. Please fork the repo, make the change, then submit a Pull Request with details about what the change is and what it fixes/adds.
I'd suggest adding an explicit CONTRIBUTING.md and maybe a CoC, issue template, and PR template per suggestions here:
https://github.com/andrewjpage/tiptoft/community
See sourmash's CONTRIBUTING.md here: https://github.com/dib-lab/sourmash/blob/master/CONTRIBUTING.md - the content's more or less the same as what you have, it's just in a standardized place :)
plasmidpredictor --verbose subreads.fq.gz
GENE COMPLETENESS %COVERAGE ACCESSION DATABASE PRODUCT
Traceback (most recent call last):
File "/home/linuxbrew/.linuxbrew/bin/plasmidpredictor", line 53, in <module>
plasmidpredictor.run()
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.6/site-packages/plasmidpredictor/PlasmidPredictor.py", line 49, in run
fastq.read_filter_and_map()
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.6/site-packages/plasmidpredictor/Fastq.py", line 52, in read_filter_and_map
if self.map_read(read):
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.6/site-packages/plasmidpredictor/Fastq.py", line 66, in map_read
candidate_gene_names = self.does_read_contain_quick_pass_kmers(read.seq)
File "/home/linuxbrew/.linuxbrew/opt/python/lib/python3.6/site-packages/plasmidpredictor/Fastq.py", line 81, in does_read_contain_quick_pass_kmers
read_onex_kmers = kmers_obj.get_one_x_coverage_of_kmers()
TypeError: get_one_x_coverage_of_kmers() missing 3 required positional arguments: 'sequence', 'k', and 'end'
Heng uses homo-compressed k-mers in minimap2
Might be useful here
Or not
I realize this tool is designed for detecting plasmids, but I'm wondering if it could be modified for more general purposes such as detecting which samples had particular distinct sequences. There is surprisingly a lack of tools that do this using a kmer approach.
I'm assuming it would not be as simple as providing these distinct sequences to the "--plasmid_data" parameter?
tried both conda and pip installations, get the same error. Do you know what causes it?
tiptoft_database_downloader plasmidfinder
Downloading data with:
curl -o enterobacteriaceae.fsa https://bitbucket.org/genomicepidemiology/plasmidfinder_db/raw/master/enterobacteriaceae.fsa
Downloading data with:
curl -o gram_positive.fsa https://bitbucket.org/genomicepidemiology/plasmidfinder_db/raw/master/gram_positive.fsa
Combining downloaded fasta files...
gram_positive.fsa
Traceback (most recent call last):
File "/gpfs2/well/bag/users/lipworth/python3venv/bin/tiptoft_database_downloader", line 30, in <module>
tiptoft.run()
File "/gpfs2/well/bag/users/lipworth/python3venv/lib/python3.4/site-packages/tiptoft/TipToftDatabaseDownloader.py", line 23, in run
refgenes.run(self.output_prefix)
File "/gpfs2/well/bag/users/lipworth/python3venv/lib/python3.4/site-packages/tiptoft/RefGenesGetter.py", line 87, in run
exec('self._get_from_' + self.ref_db + '(outprefix)')
File "<string>", line 1, in <module>
File "/gpfs2/well/bag/users/lipworth/python3venv/lib/python3.4/site-packages/tiptoft/RefGenesGetter.py", line 60, in _get_from_plasmidfinder
for seq in file_reader:
File "/gpfs2/well/bag/users/lipworth/python3venv/lib/python3.4/site-packages/pyfastaq/sequences.py", line 141, in file_reader
raise Error('Error determining file type from file "' + fname + '". First line is:\n' + line.rstrip())
pyfastaq.sequences.Error: Error determining file type from file "/gpfs2/well/bag/users/lipworth/gram_neg/gnbc_nanopore/easymag/plasmidfinder.tmp.download/gram_positive.fsa". First line is:
<!DOCTYPE html>
```
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.