Git Product home page Git Product logo

caretta's People

Contributors

akdel avatar ninjani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

caretta's Issues

Error when writing aligned pdb

Hi.

Interested in using this tool, but I'm receiving the following error:

  File "/root/miniconda3/bin/caretta-cli", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/caretta/bin/caretta-cli", line 103, in <module>
    app()
  File "/root/miniconda3/lib/python3.9/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/root/miniconda3/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/miniconda3/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/caretta/bin/caretta-cli", line 87, in align
    multiple_alignment.StructureMultiple.align_from_pdb_files(
  File "/caretta/caretta/multiple_alignment.py", line 221, in align_from_pdb_files
    msa_class.write_files(
  File "/caretta/caretta/multiple_alignment.py", line 624, in write_files
    self.write_superposed_pdbs(pdb_folder)
  File "/caretta/caretta/multiple_alignment.py", line 688, in write_superposed_pdbs
    reference_pdb[helper.get_alpha_indices(reference_pdb)]
IndexError: arrays used as indices must be of integer (or boolean) type

There's no error if I use the --no-pdb flag. But those alignments is what I'm after.
I'm using the cli and I'm running it as docker image. Not sure if that might have anything to do with it.

Any ideas?

A type error occurs

Hi,
How can I solve the type error below?

Python version 3.8.13

(caretta) [user1@localhost hw]$ caretta-cli ldh/
Traceback (most recent call last):
File "/home/use1r/.conda/envs/caretta/bin/caretta-cli", line 2, in
from caretta import multiple_alignment
File "/home/user1/.conda/envs/caretta/lib/python3.8/site-packages/caretta/multiple_alignment.py", line 177, in
class StructureMultiple:
File "/home/user1/.conda/envs/caretta/lib/python3.8/site-packages/caretta/multiple_alignment.py", line 1028, in StructureMultiple
) -> tuple[list[str], dict[str, ndarray]]:
TypeError: 'type' object is not subscriptable

TypeError while aligning

I am getting the following error while using caretta from the command line (I can provide the list of pdbs to reproduce the error if needed):

caretta-cli list_IPR000072_pdbs.dat 
Found 66 structure files
Found 66 protein structures 100%|
Computed invariants in 13.59 seconds
Found 66 structures with valid invariants 100%|
Aligning:   3%|███   
Traceback (most recent call last):

  File "/home/disat/amuntoni/miniconda3/bin/caretta-cli", line 127, in <module>
    app()

  File "/home/disat/amuntoni/miniconda3/bin/caretta-cli", line 108, in align
    multiple_alignment.align_from_structure_files(

  File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 519, in align_from_structure_files
    alignment = msa_class.multiple_align(

  File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 278, in multiple_align
    self.alignment = self.progressive_align(self.tree,

  File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 243, in progressive_align
    make_intermediate_node(node_1, node_2, node_int)

  File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 204, in make_intermediate_node
    score_matrix = final_sequences[n1].score_function(

  File "/home/disat/amuntoni/miniconda3/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 332, in score_function
    aln_1, aln_2, score = dtw.smith_waterman(np.arange(score_matrix.shape[0]),

TypeError: expected UniTuple(int64 x 2), got None

Is it related to a "bad" PDB file? How can I spot it?
Thank you for your help!

caretta needs to be run outside of the input folder

Hi,
I'm trying caretta for the first time and I ran into an error AssertionError: Could not understand input caretta_results using the PDB files obtained from ColabFold as well as the ones downloaded from PDB. The command was caretta-cli . -t 8
The whole traceback is:

Traceback (most recent call last):

  File "/home/gsn/mambaforge/envs/caretta/bin/caretta-cli", line 127, in <module>
    app()

  File "/home/gsn/mambaforge/envs/caretta/bin/caretta-cli", line 108, in align
    multiple_alignment.align_from_structure_files(

  File "/home/gsn/mambaforge/envs/caretta/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 476, in align_from_structure_files
    pdb_files = helper.parse_protein_files_and_clean(input_files, output_files.cleaned_pdb_folder)

  File "/home/gsn/mambaforge/envs/caretta/lib/python3.9/site-packages/caretta/helper.py", line 166, in parse_protein_files_and_clean
    protein_files = get_structure_files(input_value)

  File "/home/gsn/mambaforge/envs/caretta/lib/python3.9/site-packages/geometricus/protein_utility.py", line 134, in get_structure_files    assert type(protein_file) == str, f"Could not understand input {protein_file}"

AssertionError: Could not understand input caretta_results

Then I realized that when caretta needs to be run outside of the input folder; otherwise it will take the folder caretta_results as input. Maybe it should be mentioned in the document?
Thanks.

Regarding features and alignment

This is a great software and it is very helpful :)
I'm currently trying to do the alignment for 66 proteins and managed to get the alignment file and the features in pkl file.
Would like to ask how can we link the alignment and features together as they are arranged randomly? Thank you so much!

Error with input

Dear all,

Thank you for the software. After I download the software, and test the sample pdb files in the software. It showed a problem after input the command with "caretta-cli align". The error was " assert type(protein_file) == str, f"Could not understand input {protein_file}". Do you know what happens and what I needed to do?
Thanks,
Lee

Add option to change superposition parameters from CLI/GUI

  • This could be a JSON/TOML/any kind of key value file, with the default parameters written in it that you pass in to the CLI via --superpose-parameters.
  • Each superposition function needs documentation to make clear what parameters it exposes.
  • The GUI needs a dropdown to select the superposition function which should trigger a set of textareas for the corresponding parameters

Is it possible to output Tm scores?

Hi there,

I was wondering if it's possible to output the tm-scores between proteins. Also what exactly is the distance matrix outputted?

All the best

python 2.7 problem ? MacOS 11.1

I just installed carretta manually on my Mac (MacOS 11.1). Can you suggest where the problem might be? Is my machine calling python2.7 by default instead of python3 ?

thank you

60 % caretta-cli test
@> 2465 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 2739 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 2647 atoms and 1 coordinate set(s) were parsed in 0.02s.
Found 3 PDB files
@> 2465 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 2739 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 2647 atoms and 1 coordinate set(s) were parsed in 0.02s.
Calculating pairwise distances...
Constructing neighbor joining tree...
Aligning [####################################] 100%
Traceback (most recent call last):
File "/Applications/Darwin/miniconda3/bin/caretta-cli", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/Applications/Darwin/caretta/bin/caretta-cli", line 115, in
app()
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/typer/main.py", line 214, in call
return get_command(self)(*args, **kwargs)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Applications/Darwin/miniconda3/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "/Applications/Darwin/caretta/bin/caretta-cli", line 97, in align
multiple_alignment.StructureMultiple.align_from_pdb_files(
File "/Applications/Darwin/caretta/caretta/multiple_alignment.py", line 295, in align_from_pdb_files
msa_class.write_files(
TypeError: write_files() got multiple values for argument 'only_dssp'

codec utf-8 error preventing align

Hi there, I'm using caretta-cli to align all .pdb files in a folder but I get an error message after the files are parsed. I can see that cleaned pdb files are created for all my input pdb but the alignment fails to run. Here is the error message I get:

File "/Users/s1427471/anaconda3/envs/snakes/bin/caretta-cli", line 127, in
app()

File "/Users/s1427471/anaconda3/envs/snakes/bin/caretta-cli", line 108, in align
multiple_alignment.align_from_structure_files(

File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/site-packages/caretta/multiple_alignment.py", line 476, in align_from_structure_files
pdb_files = helper.parse_protein_files_and_clean(input_files, output_files.cleaned_pdb_folder)

File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/site-packages/caretta/helper.py", line 169, in parse_protein_files_and_clean
protein = parse_structure_file(str(protein_file)).select("protein")

File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/site-packages/geometricus/protein_utility.py", line 82, in parse_structure_file
protein = pd.parsePDBStream(f)

File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/site-packages/prody/proteins/pdbfile.py", line 313, in parsePDBStream
lines = stream.readlines()

File "/Users/s1427471/anaconda3/envs/snakes/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte

I'm using macOS. Any ideas how to fix this would be much appreciated :)

No superimposed PDB

Hi, I am very much interested in using this program in my project. I am trying to align 831 predicted structures, they are well analysed with sequence alignments in previous studies but I am having problems in aligning them with Caretta. The run finishes well without an error message, but I can't find superimposed PDB file but only cleaned_pdb and result_pdb, and the result_pdb is virtually identical to the cleaned_pdb. also, I see very high RMSD values and very low TM scores which is not the case if I align some of them in PyMOL. Could you please help me to find what I am missing? Thank you in advance!

Change superpose functionality to use central protein as reference

Currently, superpose and write_superposed_pdbs have different behaviors - superpose uses the first protein as the reference to rotate others against, and write_superposed_pdbs uses the set of core indices to rotate. Both are problematic - the first protein may be distant from the rest, and the core indices may be empty for divergent datasets.

Solution:

  • Use the "central" protein (selected using the Geometricus similarity matrix as the protein with a reasonable level of similarity to the rest) as the reference to superpose
  • Print a warning message/error message when no core indices are found as this could represent a dataset with too divergent proteins for meaningful alignment or with multiple groups of proteins which should ideally be aligned separately.

Related to Issue #8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.