Hei!
I wanted to try out clinker, which is something that ive been longing for. I downloaded som gbff from ncbi, even renamed them gbk, but I get an error (header, and bottom). I havent done a thing with these, so howcome it doesnt work? Is there an open limit to genome size? These are mostly 5.5 Mb? Or is it not suitable for procaryotes?
(base) [annll@login ncbi-genomes-2020-11-16]$ ls
GCF_000003955.1_ASM395v1_genomic.gbk GCF_000497525.1_ASM49752v2_genomic.gbk GCF_002000005.1_ASM200000v1_genomic.gbk GCF_004119835.1_ASM411983v1_genomic.gbk
GCF_000008005.1_ASM800v1_genomic.gbk GCF_000832865.1_ASM83286v1_genomic.gbk GCF_002582095.1_ASM258209v1_genomic.gbk GCF_007682195.1_ASM768219v1_genomic.gbk
GCF_000161175.1_ASM16117v1_genomic.gbk GCF_001044825.1_ASM104482v1_genomic.gbk GCF_003426125.1_ASM342612v1_genomic.gbk md5checksums.txt
GCF_000161395.1_ASM16139v1_genomic.gbk GCF_001704095.1_ASM170409v1_genomic.gbk GCF_003612955.1_ASM361295v1_genomic.gbk README.txt
GCF_000186745.1_ASM18674v1_genomic.gbk GCF_001721165.1_ASM172116v1_genomic.gbk GCF_004101345.1_ASM410134v1_genomic.gbk
(base) [annll@login ncbi-genomes-2020-11-16]$ clinker *.gbk
[12:06:03] INFO - Starting clinker
[12:06:03] INFO - Parsing GenBank files: ['GCF_000003955.1_ASM395v1_genomic.gbk', 'GCF_000008005.1_ASM800v1_genomic.gbk', 'GCF_000161175.1_ASM16117v1_genomic.gbk', 'GCF_000161395.1_ASM16139v1_ genomic.gbk', 'GCF_000186745.1_ASM18674v1_genomic.gbk', 'GCF_000497525.1_ASM49752v2_genomic.gbk', 'GCF_000832865.1_ASM83286v1_genomic.gbk', 'GCF_001044825.1_ASM104482v1_genomic.gbk', 'GCF_0017 04095.1_ASM170409v1_genomic.gbk', 'GCF_001721165.1_ASM172116v1_genomic.gbk', 'GCF_002000005.1_ASM200000v1_genomic.gbk', 'GCF_002582095.1_ASM258209v1_genomic.gbk', 'GCF_003426125.1_ASM342612v1_ genomic.gbk', 'GCF_003612955.1_ASM361295v1_genomic.gbk', 'GCF_004101345.1_ASM410134v1_genomic.gbk', 'GCF_004119835.1_ASM411983v1_genomic.gbk', 'GCF_007682195.1_ASM768219v1_genomic.gbk']
/mnt/users/annll/.local/lib/python3.7/site-packages/Bio/Seq.py:2338: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N befo re translation. This may become an error in future.
BiopythonWarning,
[12:06:29] INFO - Starting cluster alignments
[12:06:29] INFO - GCF_000003955.1_ASM395v1_genomic vs GCF_000008005.1_ASM800v1_genomic
Traceback (most recent call last):
File "/mnt/users/annll/.local/bin/clinker", line 11, in
sys.exit(main())
File "/mnt/users/annll/.local/lib/python3.7/site-packages/clinker/main.py", line 153, in main
hide_alignment_headers=args.hide_aln_headers,
File "/mnt/users/annll/.local/lib/python3.7/site-packages/clinker/main.py", line 56, in clinker
globaligner = align.align_clusters(*clusters, cutoff=identity)
File "/mnt/users/annll/.local/lib/python3.7/site-packages/clinker/align.py", line 51, in align_clusters
aligner.align_stored_clusters(cutoff)
File "/mnt/users/annll/.local/lib/python3.7/site-packages/clinker/align.py", line 240, in align_stored_clusters
alignment = self.align_clusters(one, two, cutoff)
File "/mnt/users/annll/.local/lib/python3.7/site-packages/clinker/align.py", line 227, in align_clusters
aln = self.aligner.align(geneA.translation, geneB.translation)
File "/mnt/users/annll/.local/lib/python3.7/site-packages/Bio/Align/init.py", line 1592, in align
score, paths = _aligners.PairwiseAligner.align(self, seqA, seqB)
ValueError: sequence contains letters not in the alphabet
(base) [annll@login ncbi-genomes-2020-11-16]$