turtletools / caretta Goto Github PK
View Code? Open in Web Editor NEWA software-suite to perform multiple protein structure alignment and structure feature extraction.
License: BSD 3-Clause "New" or "Revised" License
A software-suite to perform multiple protein structure alignment and structure feature extraction.
License: BSD 3-Clause "New" or "Revised" License
Hi.
Interested in using this tool, but I'm receiving the following error:
File "/root/miniconda3/bin/caretta-cli", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/caretta/bin/caretta-cli", line 103, in <module>
app()
File "/root/miniconda3/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
return get_command(self)(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/root/miniconda3/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/root/miniconda3/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "/caretta/bin/caretta-cli", line 87, in align
multiple_alignment.StructureMultiple.align_from_pdb_files(
File "/caretta/caretta/multiple_alignment.py", line 221, in align_from_pdb_files
msa_class.write_files(
File "/caretta/caretta/multiple_alignment.py", line 624, in write_files
self.write_superposed_pdbs(pdb_folder)
File "/caretta/caretta/multiple_alignment.py", line 688, in write_superposed_pdbs
reference_pdb[helper.get_alpha_indices(reference_pdb)]
IndexError: arrays used as indices must be of integer (or boolean) type
There's no error if I use the --no-pdb flag. But those alignments is what I'm after.
I'm using the cli and I'm running it as docker image. Not sure if that might have anything to do with it.
Any ideas?
and a line that suggests spamming the Enter key instead of mouse clicks for selection
Hi,
How can I solve the type error below?
Python version 3.8.13
(caretta) [user1@localhost hw]$ caretta-cli ldh/
Traceback (most recent call last):
File "/home/use1r/.conda/envs/caretta/bin/caretta-cli", line 2, in
from caretta import multiple_alignment
File "/home/user1/.conda/envs/caretta/lib/python3.8/site-packages/caretta/multiple_alignment.py", line 177, in
class StructureMultiple:
File "/home/user1/.conda/envs/caretta/lib/python3.8/site-packages/caretta/multiple_alignment.py", line 1028, in StructureMultiple
) -> tuple[list[str], dict[str, ndarray]]:
TypeError: 'type' object is not subscriptable
I am getting the following error while using caretta from the command line (I can provide the list of pdbs to reproduce the error if needed):
caretta-cli list_IPR000072_pdbs.dat
Found 66 structure files
Found 66 protein structures 100%|
Computed invariants in 13.59 seconds
Found 66 structures with valid invariants 100%|
Aligning: 3%|███
Traceback (most recent call last):
File "/home/disat/amuntoni/miniconda3/bin/caretta-cli", line 127, in <module>
app()
File "/home/disat/amuntoni/miniconda3/bin/caretta-cli", line 108, in align
multiple_alignment.align_from_structure_files(
File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 519, in align_from_structure_files
alignment = msa_class.multiple_align(
File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 278, in multiple_align
self.alignment = self.progressive_align(self.tree,
File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 243, in progressive_align
make_intermediate_node(node_1, node_2, node_int)
File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 204, in make_intermediate_node
score_matrix = final_sequences[n1].score_function(
File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 332, in score_function
aln_1, aln_2, score = dtw.smith_waterman(np.arange(score_matrix.shape[0]),
TypeError: expected UniTuple(int64 x 2), got None
Is it related to a "bad" PDB file? How can I spot it?
Thank you for your help!
This would be useful if you are trying to align structures which contain cofactors, such as ligands.
This is a super useful tool that helped me a lot!
However, when I took more than three structures as input, the outputs in the superposed_pdbs would always miss the third one from input. So I guess it's might be a bug.
Hi,
I'm trying caretta for the first time and I ran into an error AssertionError: Could not understand input caretta_results
using the PDB files obtained from ColabFold as well as the ones downloaded from PDB. The command was caretta-cli . -t 8
The whole traceback is:
Traceback (most recent call last):
File "/home/gsn/mambaforge/envs/caretta/bin/caretta-cli", line 127, in <module>
app()
File "/home/gsn/mambaforge/envs/caretta/bin/caretta-cli", line 108, in align
multiple_alignment.align_from_structure_files(
File "/home/gsn/mambaforge/envs/caretta/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 476, in align_from_structure_files
pdb_files = helper.parse_protein_files_and_clean(input_files, output_files.cleaned_pdb_folder)
File "/home/gsn/mambaforge/envs/caretta/lib/python3.9/site-packages/caretta/helper.py", line 166, in parse_protein_files_and_clean
protein_files = get_structure_files(input_value)
File "/home/gsn/mambaforge/envs/caretta/lib/python3.9/site-packages/geometricus/protein_utility.py", line 134, in get_structure_files assert type(protein_file) == str, f"Could not understand input {protein_file}"
AssertionError: Could not understand input caretta_results
Then I realized that when caretta needs to be run outside of the input folder; otherwise it will take the folder caretta_results
as input. Maybe it should be mentioned in the document?
Thanks.
This is a great software and it is very helpful :)
I'm currently trying to do the alignment for 66 proteins and managed to get the alignment file and the features in pkl file.
Would like to ask how can we link the alignment and features together as they are arranged randomly? Thank you so much!
Dear all,
Thank you for the software. After I download the software, and test the sample pdb files in the software. It showed a problem after input the command with "caretta-cli align". The error was " assert type(protein_file) == str, f"Could not understand input {protein_file}". Do you know what happens and what I needed to do?
Thanks,
Lee
--superpose-parameters
.Hi there,
I was wondering if it's possible to output the tm-scores between proteins. Also what exactly is the distance matrix outputted?
All the best
I just installed carretta manually on my Mac (MacOS 11.1). Can you suggest where the problem might be? Is my machine calling python2.7 by default instead of python3 ?
thank you
60 % caretta-cli test
@> 2465 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 2739 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 2647 atoms and 1 coordinate set(s) were parsed in 0.02s.
Found 3 PDB files
@> 2465 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 2739 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 2647 atoms and 1 coordinate set(s) were parsed in 0.02s.
Calculating pairwise distances...
Constructing neighbor joining tree...
Aligning [####################################] 100%
Traceback (most recent call last):
File "/Applications/Darwin/miniconda3/bin/caretta-cli", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/Applications/Darwin/caretta/bin/caretta-cli", line 115, in
app()
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/typer/main.py", line 214, in call
return get_command(self)(*args, **kwargs)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "/Applications/Darwin/caretta/bin/caretta-cli", line 97, in align
multiple_alignment.StructureMultiple.align_from_pdb_files(
File "/Applications/Darwin/caretta/caretta/multiple_alignment.py", line 295, in align_from_pdb_files
msa_class.write_files(
TypeError: write_files() got multiple values for argument 'only_dssp'
Hi there, I'm using caretta-cli to align all .pdb files in a folder but I get an error message after the files are parsed. I can see that cleaned pdb files are created for all my input pdb but the alignment fails to run. Here is the error message I get:
File "/Users/s1427471/anaconda3/envs/snakes/bin/caretta-cli", line 127, in
app()
File "/Users/s1427471/anaconda3/envs/snakes/bin/caretta-cli", line 108, in align
multiple_alignment.align_from_structure_files(
File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 476, in align_from_structure_files
pdb_files = helper.parse_protein_files_and_clean(input_files, output_files.cleaned_pdb_folder)
File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/site-packages/caretta/helper.py", line 169, in parse_protein_files_and_clean
protein = parse_structure_file(str(protein_file)).select("protein")
File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/site-packages/geometricus/protein_utility.py", line 82, in parse_structure_file
protein = pd.parsePDBStream(f)
File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/site-packages/prody/proteins/pdbfile.py", line 313, in parsePDBStream
lines = stream.readlines()
File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte
I'm using macOS. Any ideas how to fix this would be much appreciated :)
Hi, I am very much interested in using this program in my project. I am trying to align 831 predicted structures, they are well analysed with sequence alignments in previous studies but I am having problems in aligning them with Caretta. The run finishes well without an error message, but I can't find superimposed PDB file but only cleaned_pdb and result_pdb, and the result_pdb is virtually identical to the cleaned_pdb. also, I see very high RMSD values and very low TM scores which is not the case if I align some of them in PyMOL. Could you please help me to find what I am missing? Thank you in advance!
Hello, may I have an example for the usage of make_rmsd_coverage_tm_matrix
? Thank you :)
Originally posted by @lingnus1 in #14 (comment)
Currently, superpose
and write_superposed_pdbs
have different behaviors - superpose
uses the first protein as the reference to rotate others against, and write_superposed_pdbs
uses the set of core indices to rotate. Both are problematic - the first protein may be distant from the rest, and the core indices may be empty for divergent datasets.
Solution:
Related to Issue #8
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.