kortemme-lab / flex_ddg_tutorial Goto Github PK

View Code? Open in Web Editor NEW

69.0 69.0 27.0 481 KB

License: MIT License

Python 100.00%

flex_ddg_tutorial's People

Contributors

Stargazers

Watchers

flex_ddg_tutorial's Issues

Change in output between Rosetta versions

When FlexddG is run using the latest 2020 version of Rosetta the script does not return zero for DDG when the native residue is substituted for itself. On the other hand, when FlexddG is run using the 2018 version of Rosetta (specifically 2018.48.60516) the script always returns zero for DDG when the native residue is substituted for itself.

Small bug in StoredResidueSubset??

Dear FlexddG developers,

I read through the ddG-backrub.xml script and encounter this line that looks suspicious to me:

<PrimarySequenceNeighborhood name="bubble_adjacent" selector="bubble" lower="1" upper="1"/
<StoredResidueSubset name="restore_neighbor_shell" subset_name="neighbor_shell"/>

where is this neighbor_shell subset being defined in the script? Shouldn't be the bubble_adjacent?

Thanks
Cong

Usage of pands .append() method which is now depracated

In analyze_flex_ddG.py in function calc_ddg, the line ddg_scores = ddg_scores.append( ... ) uses the append method on a 'DataFrame' object from the pandas package, which is now deprecated and thus causes the run crash on latest versions of pandas.

ValueError: No objects to concatenate

Hi, I successfully ran run_example_2_saturation.py and an output_saturation folder has been generated. After that, however, when I run analyze_flex_ddG.py using the following command, an error of "ValueError: No objects to concatenate" appeared, as follows:

python analyze_flex_ddG.py output_saturation

Traceback (most recent call last):
File "analyze_flex_ddG.py", line 215, in
analyze_output_folder( folder_to_analyze )
File "analyze_flex_ddG.py", line 190, in analyze_output_folder
scores = pd.concat( inner_scores_list )
File "/home/cltam/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 281, in concat
sort=sort,
File "/home/cltam/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 329, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

May I know how should I make it work?
Thanks a lot!

output structures

Hello,
How to output the modeled mutant structures. Is there any flag like "-ddg::output_silent true " as in ddg monomer?
Thanks.

Error when analyze output_saturation

Hello,
I ran into a problem when I analyze output_saturation,please help!
command line : python3 analyze_flex_ddG.py output_saturation
Traceback (most recent call last):
File "/home/newdisk1/flex_ddG_tutorial/analyze_flex_ddG.py", line 215, in
analyze_output_folder( folder_to_analyze )
File "/home/newdisk1/flex_ddG_tutorial/analyze_flex_ddG.py", line 190, in analyze_output_folder
scores = pd.concat( inner_scores_list )
File "/home/zhengjishen/miniconda3/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/home/zhengjishen/miniconda3/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 368, in concat
op = _Concatenator(
File "/home/zhengjishen/miniconda3/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 425, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

Question about method application

Hello,

Is this method suitable for estimating ligand-protein ddg?

Thank you!

Questions about the tutotial

analyze_flex_ddG.py problem

Hi @kortemme:
I am using flex_ddG to enhance interface binding by mutating an amino acid on a small peptide, running rosetta gives me results, but when I run the analysis python script to try to analyze the results I get an error as follows.
Problem Description:
File "home.../flex_ddG_tutor-master/analyze_flex_ddG.py",line 215,inanalyze_out_folder(folder_to_analyze)
File "home.../flex_ddG_tutor-master/analyze_flex_ddG.py"line 194,in analyze_output_folder ddg_scores_dfs.append(apply_zemu_gam(ddg_scores))
File "home.../flex_ddG_tutor-master/analyze_flex_ddG.py",line 40,in apply_zemu_gam
assert(score_term in scores.columns)
AssertionError

I have tried many times and hope to get help, thank you very much for your help
wangyutao

Issue with the first example script

Hello,

When I run the first example script and then run the analyze_flex_ddG.py script, I only seem to get ddG's for a single mutation (a screenshot of the csv file is attached). Is it supposed to look like this?

The second example with saturation mutagenesis seems to work fine and gives expected outputs.

I am running Rosetta version 2020.08 release cb1caba, the most recent release as of June 10th, 2020, with the binaries already compiled. If it is relevant, 'python' is python2.7.18 and 'python3' is python3.8.2. My operating system is the most recent Ubuntu build. If you need any other information, just let me know.

Thanks,
Dru

Extract monomer stability from results

Hi,

Thanks for this protocol. Is it possible to extract the ddG for the monomers from the results?

Thank you!

How to analyze the output of flex ddG (example2: single site saturation mutagenesis)

Hi @kylebarlow,
Please advise on how to analyze the output of single site saturation mutagenesis after running the program, analyze_flex_ddG.py. The stdout to the terminal (or redirecting to a log) includes information for 1 out of 20 possible muations. However, in the directory, analysis_output, a csv file (output_saturation-results.csv) containing the ddG scores for all 20 possible mutations are included.

I notice that this file records the ddG scores multiple times for the same mutation (for e.g. F__Y is repeated 8 times) and similar for all other 19 amino acids. Should I average the value of the last backrub step accross multiple repeats or is it already averaged in the last step?

Thanks in advance for your help!
-Daipayan

Missing scoring terms when value is 0

Hi,
we encountered the case where some scoring terms result in 0 according to the score.sc file. By checking our structure it seems plausible as there are no hydrogen bonds of this particular type present. It seems that scoring terms with a value of 0 are not written to the db3 file and thus are not extracted by the analyze script causing an assertion error.
Thanks

flex-ddg for protein in a single form (not in complex with any other proteins)

Can the flex-ddg be used to measure the ddg of mutation for protein in a single form (not in complex with another protein)?

Simultaneous saturation mutagenesis using FlexDDG protocol

Hi,

I have used the protocol to do saturation mutagenesis at each residue position. However, I am interested in performing simultaneous (double or triple) mutations using the protocol and then obtaining the most preferred set of two/three residues.

I feel this is important because the adjacent residues will affect the stability of a sidechain. Currently, I am performing point mutations on a poly-GLY peptide, and am getting Trp or Phe as the best AAs for each position. It will be impractical to have 3 or 4 consecutive Phe or Trp residues.

Can you or anyone among the users help me out with this? Maybe it's a simple edit to the script or generating a new .resfile?

Thanks,
Yash

Protein monomer mutation ddG calculation

Hi,
Can this pipeline for a protein monomer? Or does one have to use ddG_monomer for that? Your help is appreciated!
Thanks,
Daipayan

How to mutate multiple residues together?

I would like to use flex_ddG to perform mutations at multiple positions together, can flex_ddG do that and what is the input res_file format would be?

Thanks,

Flex ddg crashes when two chains are specified to move.

Hi,
I get the following message in ROSETTA_CRASH.log :

InterfaceDdGMover needs to have chains (numbers or names) or jumps defined to move

Please advise on how to proceed. Many thanks in advance!

flex_ddG with glycans

Hello!

Thanks a lot for the nice repo!

I wanted to ask whether the .xml file has to be modified if I have a glycan in the vicinity of the mutated residue. This scenario seems to consistently lead to rosetta crashing. This happens when the backrub mover is doing its thing, and just before the crash the fa_rep term starts to blow up and fa_elec is -nan. With glycan removed from the structure there is no issue. It seems as if during the backrub clashes begin to appear. How to handle it?

Any advice would be highly appreciated!

restore_talaris_behaviour issue

Hi - when I try to run this tutorial, I get the following error in rosetta.out:

ERROR: Option matching -restore_talaris_behaviour not found in command line top-level context

I'm using Rosetta Scripts from rosetta 3.8, any help appreciated!

Thanks,

Gail

issue when mutating from phosphorylated residues

Hi,
I am encountering a strange issue while trying to mutate one phosphoserine (named SEP in my PDB file as per Rosetta conventions) to alanine. Indeed, judging from Rosetta output it recognises the residue but it does not mutate it, and the mutated PDB structure that I can retrieve after running the protocol actually shows the phosphoserine at the mutation site.

This is the part of the Rosetta output where normally the mutation is logged:

protocols.rosetta_scripts.ParsedProtocol: (0) =======================BEGIN MOVER PackRotamersMover - mutate======================= core.pack.task: (0) Packer task: initialize from command line() core.pack.pack_rotamers: (0) built 0 rotamers at 0 positions.

I am using a nataa_mutations.resfile written as follows (phosposerine at pos. 170 of the PDB file, chain B):

NATAA
start
170 B PIKAA A

I am running the release rosetta-2018.33.60351 on Linux, using the MPI version of the rosetta_scripts binary.

Thank you in advance!

Valentina

Questions about input files

Hi,

What are pdb2rosetta.resmap.json, rosetta2pdb.resmap.json, and mutations.mutfile? Are these files essential for running example 1?

Thanks!

Running flex ddG using multiple CPU cores-time frame issue

Hello all,

I noticed in the python scripts there is an option to use multiple CPU cores. In order to take advantage of this feature does one have to compile the MPI version of rosetta?

The reason for asking is because I was attempting to perform saturated mutagenesis on a residue. I used the recommended 35 structures for sampling, 35,000 backrub trials etc. and left the program running overnight. I was running the job using 20 CPU cores. 12 hours later it had only gone through 3 of the structures. Therefore, sampling all 35 structures would take days. Having to sample 17 residues, this time frame is highly unfeasible.

Is there something I am doing wrong, such as needing to compile the MPI version of rosetta, or is the normal time frame? Are there any ways to speed the calculation up while also maintaining accuracy?

ERROR in run analyze_flex_ddG.py

Because of the pands .append() method which is now depracated, so I modify the code as showed blow;
``#!/usr/bin/python3`

import sys
import os
import sqlite3
import shutil
import tempfile
from pprint import pprint
import pandas as pd
import numpy as np
import re
import datetime
import sys
import collections
import threading

rosetta_output_file_name = 'rosetta.out'
output_database_name = 'ddG.db3'
trajectory_stride = 5
script_output_folder = 'analysis_output'

zemu_gam_params = {
'fa_sol' : (6.940, -6.722),
'hbond_sc' : (1.902, -1.999),
'hbond_bb_sc' : (0.063, 0.452),
'fa_rep' : (1.659, -0.836),
'fa_elec' : (0.697, -0.122),
'hbond_lr_bb' : (2.738, -1.179),
'fa_atr' : (2.313, -1.649),
}

def gam_function(x, score_term = None ):
return -1.0 * np.exp( zemu_gam_params[score_term][0] ) + 2.0 * np.exp( zemu_gam_params[score_term][0] ) / ( 1.0 + np.exp( -1.0 * x * np.exp( zemu_gam_params[score_term][1] ) ) )

def apply_zemu_gam(scores):
new_columns = list(scores.columns)
new_columns.remove('total_score')
scores = scores.copy()[ new_columns ]
for score_term in zemu_gam_params:
assert( score_term in scores.columns )
scores[score_term] = scores[score_term].apply( gam_function, score_term = score_term )
scores[ 'total_score' ] = scores[ list(zemu_gam_params.keys()) ].sum( axis = 1 )
scores[ 'score_function_name' ] = scores[ 'score_function_name' ] + '-gam'
return scores

def rosetta_output_succeeded( potential_struct_dir ):
path_to_rosetta_output = os.path.join( potential_struct_dir, rosetta_output_file_name )
if not os.path.isfile(path_to_rosetta_output):
return False

db3_file = os.path.join( potential_struct_dir, output_database_name )
if not os.path.isfile( db3_file ):
    return False

success_line_found = False
no_more_batches_line_found = False
with open( path_to_rosetta_output, 'r' ) as f:
    for line in f:
        if line.startswith( 'protocols.jd2.JobDistributor' ) and 'reported success in' in line:
            success_line_found = True
        if line.startswith( 'protocols.jd2.JobDistributor' ) and 'no more batches to process' in line:
            no_more_batches_line_found = True

return no_more_batches_line_found and success_line_found

def find_finished_jobs( output_folder ):
return_dict = {}
job_dirs = [ os.path.abspath(os.path.join(output_folder, d)) for d in os.listdir(output_folder) if os.path.isdir( os.path.join(output_folder, d) )]
for job_dir in job_dirs:
completed_struct_dirs = []
for potential_struct_dir in sorted([ os.path.abspath(os.path.join(job_dir, d)) for d in os.listdir(job_dir) if os.path.isdir( os.path.join(job_dir, d) )]):
if rosetta_output_succeeded( potential_struct_dir ):
completed_struct_dirs.append( potential_struct_dir )
return_dict[job_dir] = completed_struct_dirs

return return_dict

def get_scores_from_db3_file(db3_file, struct_number, case_name):
conn = sqlite3.connect(db3_file)
conn.row_factory = sqlite3.Row
c = conn.cursor()

num_batches = c.execute('SELECT max(batch_id) from batches').fetchone()[0]

scores = pd.read_sql_query('''
SELECT batches.name, structure_scores.struct_id, score_types.score_type_name, structure_scores.score_value, score_function_method_options.score_function_name from structure_scores
INNER JOIN batches ON batches.batch_id=structure_scores.batch_id
INNER JOIN score_function_method_options ON score_function_method_options.batch_id=batches.batch_id
INNER JOIN score_types ON score_types.batch_id=structure_scores.batch_id AND score_types.score_type_id=structure_scores.score_type_id
''', conn)

def renumber_struct_id( struct_id ):
    return trajectory_stride * ( 1 + (int(struct_id-1) // num_batches) )

scores['struct_id'] = scores['struct_id'].apply( renumber_struct_id )
scores['name'] = scores['name'].apply( lambda x: x[:-9] if x.endswith('_dbreport') else x )
scores = scores.pivot_table( index = ['name', 'struct_id', 'score_function_name'], columns = 'score_type_name', values = 'score_value' ).reset_index()
scores.rename( columns = {
    'name' : 'state',
    'struct_id' : 'backrub_steps',
}, inplace=True)
scores['struct_num'] = struct_number
scores['case_name'] = case_name

conn.close()

return scores

def process_finished_struct( output_path, case_name ):
db3_file = os.path.join( output_path, output_database_name )
assert( os.path.isfile( db3_file ) )
struct_number = int( os.path.basename(output_path) )
scores_df = get_scores_from_db3_file( db3_file, struct_number, case_name )

return scores_df

def calc_ddg( scores ):
total_structs = np.max( scores['struct_num'] )

nstructs_to_analyze = set([total_structs])
for x in range(10, total_structs):
    if x % 10 == 0:
        nstructs_to_analyze.add(x)
nstructs_to_analyze = sorted(nstructs_to_analyze)

all_ddg_scores = []
for nstructs in nstructs_to_analyze:
    ddg_scores = scores.loc[ ((scores['state'] == 'unbound_mut') | (scores['state'] == 'bound_wt')) & (scores['struct_num'] <= nstructs) ].copy()
    for column in ddg_scores.columns:
        if column not in ['state', 'case_name', 'backrub_steps', 'struct_num', 'score_function_name']:
            ddg_scores.loc[:,column] *= -1.0
    ddg_scores = ddg_scores._append( scores.loc[ ((scores['state'] == 'unbound_wt') | (scores['state'] == 'bound_mut')) & (scores['struct_num'] <= nstructs) ].copy() )
    ddg_scores = ddg_scores.groupby( ['case_name', 'backrub_steps', 'struct_num', 'score_function_name'] ).sum().reset_index()

    if nstructs == total_structs:
        struct_scores = ddg_scores.copy()

    ddg_scores = ddg_scores.groupby( ['case_name', 'backrub_steps', 'score_function_name'] ).mean().round(decimals=5).reset_index()
    new_columns = list(ddg_scores.columns.values)
    new_columns.remove( 'struct_num' )
    ddg_scores = ddg_scores[new_columns]
    ddg_scores[ 'scored_state' ] = 'ddG'
    ddg_scores[ 'nstruct' ] = nstructs
    all_ddg_scores._append(ddg_scores)

return (pd.concat(all_ddg_scores), struct_scores)

def calc_dgs( scores ):
l = []

total_structs = np.max( scores['struct_num'] )

nstructs_to_analyze = set([total_structs])
for x in range(10, total_structs):
    if x % 10 == 0:
        nstructs_to_analyze.add(x)
nstructs_to_analyze = sorted(nstructs_to_analyze)

for state in ['mut', 'wt']:
    for nstructs in nstructs_to_analyze:
        dg_scores = scores.loc[ (scores['state'].str.endswith(state)) & (scores['state'].str.startswith('unbound')) & (scores['struct_num'] <= nstructs) ].copy()
        for column in dg_scores.columns:
            if column not in ['state', 'case_name', 'backrub_steps', 'struct_num', 'score_function_name']:
                dg_scores.loc[:,column] *= -1.0
        dg_scores = dg_scores._append( scores.loc[ (scores['state'].str.endswith(state)) & (scores['state'].str.startswith('bound')) & (scores['struct_num'] <= nstructs) ].copy() )
        dg_scores = dg_scores.groupby( ['case_name', 'backrub_steps', 'struct_num', 'score_function_name'] ).sum().reset_index()
        dg_scores = dg_scores.groupby( ['case_name', 'backrub_steps', 'score_function_name'] ).mean().round(decimals=5).reset_index()
        new_columns = list(dg_scores.columns.values)
        new_columns.remove( 'struct_num' )
        dg_scores = dg_scores[new_columns]
        dg_scores[ 'scored_state' ] = state + '_dG'
        dg_scores[ 'nstruct' ] = nstructs
        l.append( dg_scores )
return l

def analyze_output_folder( output_folder ):
# Pass in an outer output folder. Subdirectories are considered different mutation cases, with subdirectories of different structures.
finished_jobs = find_finished_jobs( output_folder )
if len(finished_jobs) == 0:
print( 'No finished jobs found' )
return

ddg_scores_dfs = []
struct_scores_dfs = []
for finished_job, finished_structs in finished_jobs.items():
    inner_scores_list = []
    for finished_struct in finished_structs:
        inner_scores = process_finished_struct( finished_struct, os.path.basename(finished_job) )
        inner_scores_list.append( inner_scores )
    scores = pd.concat( inner_scores_list )
    ddg_scores, struct_scores = calc_ddg( scores )
    struct_scores_dfs.append( struct_scores )
    ddg_scores_dfs.append( ddg_scores )
    ddg_scores_dfs.append( apply_zemu_gam(ddg_scores) )
    ddg_scores_dfs.extend( calc_dgs( scores ) )

if not os.path.isdir(script_output_folder):
    os.makedirs(script_output_folder)
basename = os.path.basename(output_folder)

pd.concat( struct_scores_dfs ).to_csv( os.path.join(script_output_folder, basename + '-struct_scores_results.csv' ) )

df = pd.concat( ddg_scores_dfs )
df.to_csv( os.path.join(script_output_folder, basename + '-results.csv') )

display_columns = ['backrub_steps', 'case_name', 'nstruct', 'score_function_name', 'scored_state', 'total_score']
for score_type in ['mut_dG', 'wt_dG', 'ddG']:
    print( score_type )
    print( df.loc[ df['scored_state'] == score_type ][display_columns].head( n = 20 ) )
    print( '' )

if name == 'main':
for folder_to_analyze in sys.argv[1:]:
if os.path.isdir( folder_to_analyze ):
analyze_output_folder( folder_to_analyze )
``
erroe happend as

Unique values in 'state' column: ['bound_wt' 'unbound_mut']
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1874, in _agg_py_fallback
res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 849, in agg_series
result = self._aggregate_series_pure_python(obj, func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 877, in _aggregate_series_pure_python
res = func(group)
^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 2380, in
alt=lambda x: Series(x).mean(numeric_only=numeric_only),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/series.py", line 6225, in mean
return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 11992, in mean
return self._stat_function(
^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 11949, in _stat_function
return self._reduce(
^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/series.py", line 6133, in _reduce
return op(delegate, skipna=skipna, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/nanops.py", line 147, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/nanops.py", line 404, in new_func
result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/nanops.py", line 720, in nanmean
the_sum = _ensure_numeric(the_sum)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/nanops.py", line 1693, in _ensure_numeric
raise TypeError(f"Could not convert string '{x}' to numeric")
TypeError: Could not convert string 'bound_wtunbound_mutbound_mutunbound_wtbound_wtunbound_mutbound_mutunbound_wtbound_wtunbound_mutbound_mutunbound_wt' to numeric

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/ai/data/Software/flexddg/flex_ddG_tutorial/analyze_flex_ddG2.py", line 214, in
analyze_output_folder( folder_to_analyze )
File "/ai/data/Software/flexddg/flex_ddG_tutorial/analyze_flex_ddG2.py", line 190, in analyze_output_folder
ddg_scores, struct_scores = calc_ddg( scores )
^^^^^^^^^^^^^^^^^^
File "/ai/data/Software/flexddg/flex_ddG_tutorial/analyze_flex_ddG2.py", line 137, in calc_ddg
ddg_scores = ddg_scores.groupby( ['case_name', 'backrub_steps', 'score_function_name'] ).mean().round(decimals=5).reset_index()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 2378, in mean
result = self._cython_agg_general(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1929, in _cython_agg_general
new_mgr = data.grouped_reduce(array_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 1428, in grouped_reduce
applied = sb.apply(func)
^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 366, in apply
result = func(self.values, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1926, in array_func
result = self._agg_py_fallback(how, values, ndim=data.ndim, alt=alt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1878, in _agg_py_fallback
raise type(err)(msg) from err
TypeError: agg function failed [how->mean,dtype->object]

then the code blow were added in to populate the state column with the appropriate values
`state = os.path.basename(os.path.dirname(output_path)) if state.startswith("bound"): state = "bound_" + state.split("_")[1] elif state.startswith("unbound"): state = "unbound_" + `state.split("_")[1]
After that, I got a empty dataframe in the output file.

Unique values in 'state' column: []
mut_dG
Empty DataFrame
Columns: [backrub_steps, case_name, nstruct, score_function_name, scored_state, total_score]
Index: []

wt_dG
Empty DataFrame
Columns: [backrub_steps, case_name, nstruct, score_function_name, scored_state, total_score]
Index: []

ddG
Empty DataFrame
Columns: [backrub_steps, case_name, nstruct, score_function_name, scored_state, total_score]
Index: []

analyze_flex_ddG.py

Hi! @kylebarlow
I run the run_example_1.py and it output files include (pdb rosetta.out and wt_traj). It didn't have the file of ddG.db3.
And then, I run the analyze_flex_ddG.py. Something wrong. So i fix the code in the analyze_flex_ddG.py. I delte the "ddG.db3" relative in the code. Finally it work.
It is okay?

Like I delte the following:
def get_scores_from_db3_file(db3_file, struct_number, case_name):
conn = sqlite3.connect(db3_file)
conn.row_factory = sqlite3.Row
c = conn.cursor()

num_batches = c.execute('SELECT max(batch_id) from batches').fetchone()[0]

scores = pd.read_sql_query('''
SELECT batches.name, structure_scores.struct_id, score_types.score_type_name, structure_scores.score_value, score_function_method_options.score_function_name from structure_scores
INNER JOIN batches ON batches.batch_id=structure_scores.batch_id
INNER JOIN score_function_method_options ON score_function_method_options.batch_id=batches.batch_id
INNER JOIN score_types ON score_types.batch_id=structure_scores.batch_id AND score_types.score_type_id=structure_scores.score_type_id
''', conn)

def renumber_struct_id( struct_id ):
    return trajectory_stride * ( 1 + (int(struct_id-1) // num_batches) )

scores['struct_id'] = scores['struct_id'].apply( renumber_struct_id )
scores['name'] = scores['name'].apply( lambda x: x[:-9] if x.endswith('_dbreport') else x )
scores = scores.pivot_table( index = ['name', 'struct_id', 'score_function_name'], columns = 'score_type_name', values = 'score_value' ).reset_index()
scores.rename( columns = {
    'name' : 'state',
    'struct_id' : 'backrub_steps',
}, inplace=True)
scores['struct_num'] = struct_number
scores['case_name'] = case_name

conn.close()

return scores

Error analyzing output_saturation

Hello,

I have used the flex ddG protocol perfectly before. However, this time I am attempting to analyze the results from a saturation mutagenesis run and get the following error while executing:

python3 analyze_flex_ddG.py output_saturation/

Traceback (most recent call last):
File "analyze_flex_ddG.py", line 215, in
analyze_output_folder( folder_to_analyze )
File "analyze_flex_ddG.py", line 194, in analyze_output_folder
ddg_scores_dfs.append( apply_zemu_gam(ddg_scores) )
File "analyze_flex_ddG.py", line 40, in apply_zemu_gam
assert( score_term in scores.columns )
AssertionError

I have used the same protocol for other projects with no issue, but this system is bringing up the error above.

Any ideas? @kylebarlow

error with ddG-backrub.xml

After running run_example_1.py I tried to run the following:
rosetta_scripts.linuxgccrelease -s 5M2O.pdb -parser:protocol ddG-backrub.xml -parser:script_vars chainstomove=B mutate_resfile_relpath=nataa_mutations.resfile number_backrub_trials=10 max_minimization_iter=5 abs_score_convergence_thresh=200.0 backrub_trajectory_stride=5 -restore_talaris_behavior -in:file:fullatom -ignore_unrecognized_res -ignore_zero_occupancy false -ex1 -ex2

I got the following error:
ERROR: Option matching -restore_talaris_behavior not found in command line top-level context

I do not have any idea whst this error means? could you please help me with this.

FYI: I have these files in my current directory: 5M2O.pdb, nataa_mutations.resfile and ddG-backrub.xml.

Thanks

ALLAA specified at a certain residue position

Hello,
I am wondering how this protocol handles if I specify ALLAA for a residue position. Do we get the ddg value for 19 other amino acids or the best mutant residue for that position? When I try this I do get results but not sure if it is for the best mutant residue for that position or something else. Thanks

saturating mutagenesis wild-type residue

Just a general question. I'm running single site saturating mutagenesis as per the protocol using run_example_2.py as a template using the suggested parameters:

nstruct = 35 # Normally 35
max_minimization_iter = 5000 # Normally 5000
abs_score_convergence_thresh = 1.0 # Normally 1.0
number_backrub_trials = 35000 # Normally 35000
backrub_trajectory_stride = 35000 # Can be whatever you want, if you would like to see results from earlier time points in the backrub trajectory. 7000 is a reasonable number, to give you three checkpoints for a 35000 step run, but you could also set it to 35000 for quickest run time (as the final minimization and packing steps will only need to be run one time).

The script runs fine, and when I analyze the output I noticed that the wild-type amino acid is included among the sampled mutations. Not surprising given that I didn't alter this line:

for mut_aa in 'ACDEFGHIKLMNPQRSTVWY':

However, occasionally I see that the wild-type amino acid scores fairly well (~ -0.5 to -1.0 kcal/mol) in both the fa_talaris2014 and fa_talaris2014-gam ddG metrics. Obviously, I would expect the ddG value of a wt-to-wt substitution to be 0, so I'm wondering if this large of a discrepancy is indicative of the amount of error inherent in the ddG calculations? I worry if other predicted stabilizing mutations are real given that the wild-type residue scores so well. Has this ever been seen before?

errors when writing a database file when executing on linux

Hello there,

I wanted to use flex ddG to estimate effect of mutations. Given that a single mutation in my system takes ~4 hours with optimal parameters (as estimated from a successful run of flex ddG with rosetta prebuilt binaries on MacOS), I tried scaling up and running it on the cluster with linux distribution of rosetta. When I execute run.py script on linux I get errors when writing database files.
I tried using 1) prebuilt rosetta binaries for linux; 2) then I compiled the source code myself using gcc/6.1.0 and python/2.7.14 on linux to ensure it's compatible with environment. I had exactly the same error (see below) when run.py was using rosetta_scripts executable path from either prebuilt or compiled rosetta distribution. I would appreciate if you could help me figure out what could be a source of this problem.

Thank you,
Alina

Here's the error I get when I tried running tutorial example on linux:

protocols.features.ReportToDB: Reporting features for 427 of the 427 total residues in the pose 1JTG_AB_0001 for batch 'structr
eport'.
basic.database.schema_generator.Schema: [ ERROR ] ERROR writing schema after retry.
CREATE TABLE IF NOT EXISTS protocols(
protocol_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
specified_options TEXT,
command_line TEXT,
svn_url TEXT,
svn_version TEXT,
script TEXT);

basic.database.schema_generator.Schema: [ ERROR ] database is locked
ERROR:: Exit from: src/basic/database/schema_generator/Schema.cc line: 260
BACKTRACE:
[0x58c6188]
[0x5574ac4]
[0x1e669ee]
[0x1e710eb]
[0x1e73b50]
[0x3544771]
[0x354593b]
[0x35460be]
[0x25d8983]
[0x3544771]
[0x354593b]
[0x35460be]
[0x35aa058]
[0x35abb61]
[0x364e6d8]
[0x410acb]
[0x5e0bee4]
[0x6151fd]
protocols.rosetta_scripts.ParsedProtocol: [ ERROR ] Exception while processing procotol:

File: src/basic/database/schema_generator/Schema.cc:260
[ ERROR ] UtilityExitException
ERROR:

protocols.rosetta_scripts.ParsedProtocol: [ ERROR ] Exception while processing procotol:

File: src/basic/database/schema_generator/Schema.cc:260
[ ERROR ] UtilityExitException
ERROR:

protocols.jd2.JobDistributor: [ ERROR ]

[ERROR] Exception caught by JobDistributor for job 1JTG_AB_0001

File: src/basic/database/schema_generator/Schema.cc:260
[ ERROR ] UtilityExitException
ERROR:

protocols.jd2.JobDistributor: [ ERROR ]
protocols.jd2.JobDistributor: [ WARNING ] 1JTG_AB_0001 reported failure and will NOT retry
protocols.jd2.JobDistributor: no more batches to process...
protocols.jd2.JobDistributor: 1 jobs considered, 1 jobs attempted in 29 seconds
Error: [ ERROR ] Exception caught by rosetta_scripts application:

File: src/protocols/jd2/JobDistributor.cc:329
1 jobs failed; check output for error messages
Error: [ ERROR ]

simultaneous mutations on both interacting protomers (chains), possible or not?

Dear Kyle,

Does the script of your tutorial allow for simultaneous mutations of residues on both sides of the interface? I think I understand how to modify mutfile and resfile for such input, but it's not clear to me how I should modify chains_to_move.txt...

By the way, for newbies who have never used Rosetta before (like me), it would have helped if you clarified that mutfile contains renumbered protein residues whereas resfile has pdb numbering. It took me some time to figure it out, given that the lines with mutation specifications are not in the same order in resfile and mutfile in your tutorial example.

Thanks,
Alina

kortemme-lab / flex_ddg_tutorial Goto Github PK

flex_ddg_tutorial's People

Contributors

Stargazers

Watchers

Forkers

flex_ddg_tutorial's Issues

Recommend Projects

Recommend Topics

Recommend Org