Comments (7)
Sure!
I'm attaching both original files here. "SB_biofilm_MAG_1_.txt" is the KEGG annotation file and "SB_biofilm_MAG_1_FASTA.txt" is the .fa file (I changed the name and extension since GitHub does not accept .fa files).
SB_biofilm_MAG_1_.txt
SB_biofilm_MAG_1_FASTA.txt
Best,
David
from kemet.
Hi David,
I guess all the errors are connected and have a few ideas on which could be the problem.
FIrst of all, you're possibly be using KoFamKOALA annotation in a format different from the one accepted by KEMET (have a look at the example files for reference), which I believe is the cause of point 1 on your errors list.
EDIT: the files you're using are called in two different ways, i.e. "MAG_1_" and "MAG_2_,", I assume this is a typo, otherwise that could also point to another error
Second, I would ask if you're using the later versions of KEMET (i.e. you've cloned the repository after the commit cef9acb
performed on July 8th - here) KEGG developers tweaked with the url to access their API. This could have interefered with downloading Bacteroidetes gene sequences and resulting in the empty .msa
files.
Furthermore I'm assuming you've rightfully added a line about which module to check for via HMMs in the module_file.instruction
, otherwise that could also be a source of error.
Related to the third point, not finding any sequences, the nhmmer
command results in an unspecified error, which is something I could look into and handle better (by adding error flags) for further users.
All that being said I would suggest to clone into the latest repository, if not previously done, and starting anew with empty report and HMM folders, as well as using a suited annotation format (have a look at the toy folder).
If the problem persist, feel free to send the files via mail and I'll look into it thoroughly!
Best,
Matteo
from kemet.
Hi Matteo,
Thank you for your response.
I apologize, the KEGG annotation file I used was called "MAG_1_". the "2" That was a typo while writing the previous comment.
For the following, I cloned the latest repository.
- I changed the KEGG annotation file extension from .tsv to .txt, as suggested. Additionally, I replaced tab characters for whitespace characters, just in case. Yet, I'm getting the same output:
M00001.kk M00001_Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
% 0.0 0__10 INCOMPLETE
1. K00844, K12407, K00845, K00886, K08074, K00918
2. K01810, K06859, K13810, K15916
3. K00850, K16370, K00918
4. K01623, K01624, K11645, K16305, K16306
5. K01803
6. K00134, K00150, K11389
7. K00927, K11389
8. K01834, K15633, K15634, K15635
9. K01689
10. K00873, K12406
M00002.kk M00002_Glycolysis, core module involving three-carbon compounds
% 0.0 0__6 INCOMPLETE
1. K01803
2. K00134, K00150, K11389
3. K00927, K11389
4. K01834, K15633, K15634, K15635
...
Alternatively, I used the input to KEGG mapper file and the output is the correct one:
M00001.kk M00001_Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
% 100.0 10__10 COMPLETE
M00002.kk M00002_Glycolysis, core module involving three-carbon compounds
% 100.0 6__6 COMPLETE
M00003.kk M00003_Gluconeogenesis, oxaloacetate => fructose-6P
% 100.0 8__8 COMPLETE
M00004.kk M00004_Pentose phosphate pathway (Pentose phosphate cycle)
% 62.5 5__8 INCOMPLETE
1. K13937, K00036, K19243
2. K13937, K01057, K07404
3. K00033
M00005.kk M00005_PRPP biosynthesis, ribose 5P => PRPP
% 100.0 1__1 COMPLETE
...
I believe this has something to do with the original detail file in .tsv format (which I got from command-line KofamKOALA). Due to this, I end up using the mapper format instead.
- For the HMM search module, I kept having the same traceback error as before. I realized that the traceback error shows " KeyError: '>SB_biofilm_MAG_1' ", but my files were named as "SB_biofilm_MAG_1_.fa" and "SB_biofilm_MAG_1_.txt" respectively (i.e., with an additional underscore). Changing the file names to "SB_biofilm_MAG01" did the trick. Maybe that ending underscore has something to do with it.
Although everything is running smoothly now, several .msa files are still empty, resulting in non-exsistent .hmm files. I inspected several of the KEGG IDs (e.g., K22896) on the KEGG website and found that no entries were associated to the input phylum (in this case, Bacteroidetes). Hence, it makes sense that the .msa files were empty.
Thank you very much once again!
Best,
David
from kemet.
Great! Now that I see it
Since the command you launched did not have the "quiet" argument (-q), the soft errors from MAFFT and HMMER were still printed on screen, as in your point 2! I didn't see it because I'm used to the quiet setting myself..
The genes that are not present even once in the KEGG genes of the phylum of choice result in blank .msa, I get it is somehow redundant but without any priors it was the choice I made to have the full spectrum of genes. KEGG database becomes bigger by the months and gets reorganised in terms of new orthologs being split from general to specifics to some phylogenetic groups.
I'm not sure if the file name ending with an underscore was the real deal, as the key error traceback refers to the names of the contig fasta headers, but anyhow glad it worked!
For the sake of good practice and to avoid other similar issues, could I still ask you the two files, to sort out the initial problem with the annotation file? This will likely help KoFamKOALA users, as that program really speeds up functional annotation matters.
Best,
Matteo
from kemet.
Ok, looking thoroughly into the KEGG annotation file format I got this was a two-sided problem.
First, the file you shared encodes an erroneous information in line#2 with respect to how that is handled in KEMET
i.e. dashes do not represent how following lines fields are structured, and kemet.py
uses that as info to parse the annotation and recover KOs.
Second, the code handles wrongly the spacings, so I'm patching that asap. And thanks a lot for this "test", I only used the two Kofam files obtained from the web-server so I was missing this use case!
Just the last question before I'm resolving this, which --format
option did you use for kofam_scan? Secondly, have you edited in any way the file obtained from the command-line to obtain the one you shared?
Best,
Matteo
from kemet.
Yes! That file was edited, since I got .tsv format from KofamKOALA (command-line), and I was trying to adjust to the .txt format by making a whitespace substitution to the tab values.
I'm sharing with you the original .tsv file. This one was not edited.
from kemet.
I'll close this issue after the fix performed on July.
If you'd need some other help please reopen.
Best,
Matteo
from kemet.
Related Issues (15)
- Dealing with different MAG completness? HOT 1
- conversion to python package HOT 5
- [error] problems with KoFamKOALA HOT 4
- Custom Modules? HOT 1
- "ktest" file error - due to file naming convention HOT 5
- Merge multiple KEMET results HOT 2
- 'ktest' error HOT 1
- Equivocal README & file-naming problems HOT 5
- How to use the script to convert the KEGG module file to <module_id>.kk? HOT 3
- Incorrect recognition of MAG filename HOT 1
- Module completeness as stand-alone package HOT 4
- Error regarding output directory HOT 7
- Create .kk files
- add_taxonomy_from_gtdb-tk.py - help! HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kemet.