kortemme-lab / flex_ddg_tutorial Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
When FlexddG is run using the latest 2020 version of Rosetta the script does not return zero for DDG when the native residue is substituted for itself. On the other hand, when FlexddG is run using the 2018 version of Rosetta (specifically 2018.48.60516) the script always returns zero for DDG when the native residue is substituted for itself.
Dear FlexddG developers,
I read through the ddG-backrub.xml script and encounter this line that looks suspicious to me:
<PrimarySequenceNeighborhood name="bubble_adjacent" selector="bubble" lower="1" upper="1"/
<StoredResidueSubset name="restore_neighbor_shell" subset_name="neighbor_shell"/>
where is this neighbor_shell subset being defined in the script? Shouldn't be the bubble_adjacent?
Thanks
Cong
In analyze_flex_ddG.py
in function calc_ddg
, the line ddg_scores = ddg_scores.append( ... )
uses the append method on a 'DataFrame' object from the pandas
package, which is now deprecated and thus causes the run crash on latest versions of pandas.
Hi, I successfully ran run_example_2_saturation.py
and an output_saturation
folder has been generated. After that, however, when I run analyze_flex_ddG.py
using the following command, an error of "ValueError: No objects to concatenate" appeared, as follows:
python analyze_flex_ddG.py output_saturation
Traceback (most recent call last):
File "analyze_flex_ddG.py", line 215, in
analyze_output_folder( folder_to_analyze )
File "analyze_flex_ddG.py", line 190, in analyze_output_folder
scores = pd.concat( inner_scores_list )
File "/home/cltam/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 281, in concat
sort=sort,
File "/home/cltam/.pyenv/versions/anaconda3-5.3.1/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 329, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
May I know how should I make it work?
Thanks a lot!
Hello,
How to output the modeled mutant structures. Is there any flag like "-ddg::output_silent true " as in ddg monomer?
Thanks.
Hello,
I ran into a problem when I analyze output_saturation,please help!
command line : python3 analyze_flex_ddG.py output_saturation
Traceback (most recent call last):
File "/home/newdisk1/flex_ddG_tutorial/analyze_flex_ddG.py", line 215, in
analyze_output_folder( folder_to_analyze )
File "/home/newdisk1/flex_ddG_tutorial/analyze_flex_ddG.py", line 190, in analyze_output_folder
scores = pd.concat( inner_scores_list )
File "/home/zhengjishen/miniconda3/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/home/zhengjishen/miniconda3/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 368, in concat
op = _Concatenator(
File "/home/zhengjishen/miniconda3/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 425, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
Hello,
Is this method suitable for estimating ligand-protein ddg?
Thank you!
Hi @kortemme:
I am using flex_ddG to enhance interface binding by mutating an amino acid on a small peptide, running rosetta gives me results, but when I run the analysis python script to try to analyze the results I get an error as follows.
Problem Description:
File "home.../flex_ddG_tutor-master/analyze_flex_ddG.py",line 215,inanalyze_out_folder(folder_to_analyze)
File "home.../flex_ddG_tutor-master/analyze_flex_ddG.py"line 194,in analyze_output_folder ddg_scores_dfs.append(apply_zemu_gam(ddg_scores))
File "home.../flex_ddG_tutor-master/analyze_flex_ddG.py",line 40,in apply_zemu_gam
assert(score_term in scores.columns)
AssertionError
I have tried many times and hope to get help, thank you very much for your help
wangyutao
Hello,
When I run the first example script and then run the analyze_flex_ddG.py script, I only seem to get ddG's for a single mutation (a screenshot of the csv file is attached). Is it supposed to look like this?
The second example with saturation mutagenesis seems to work fine and gives expected outputs.
I am running Rosetta version 2020.08 release cb1caba, the most recent release as of June 10th, 2020, with the binaries already compiled. If it is relevant, 'python' is python2.7.18 and 'python3' is python3.8.2. My operating system is the most recent Ubuntu build. If you need any other information, just let me know.
Thanks,
Dru
Hi,
Thanks for this protocol. Is it possible to extract the ddG for the monomers from the results?
Thank you!
Hi @kylebarlow,
Please advise on how to analyze the output of single site saturation mutagenesis after running the program, analyze_flex_ddG.py
. The stdout to the terminal (or redirecting to a log) includes information for 1 out of 20 possible muations. However, in the directory, analysis_output
, a csv file (output_saturation-results.csv
) containing the ddG scores for all 20 possible mutations are included.
I notice that this file records the ddG scores multiple times for the same mutation (for e.g. F__Y is repeated 8 times) and similar for all other 19 amino acids. Should I average the value of the last backrub step accross multiple repeats or is it already averaged in the last step?
Thanks in advance for your help!
-Daipayan
Hi,
we encountered the case where some scoring terms result in 0 according to the score.sc file. By checking our structure it seems plausible as there are no hydrogen bonds of this particular type present. It seems that scoring terms with a value of 0 are not written to the db3 file and thus are not extracted by the analyze script causing an assertion error.
Thanks
Can the flex-ddg be used to measure the ddg of mutation for protein in a single form (not in complex with another protein)?
Hi,
I have used the protocol to do saturation mutagenesis at each residue position. However, I am interested in performing simultaneous (double or triple) mutations using the protocol and then obtaining the most preferred set of two/three residues.
I feel this is important because the adjacent residues will affect the stability of a sidechain. Currently, I am performing point mutations on a poly-GLY peptide, and am getting Trp or Phe as the best AAs for each position. It will be impractical to have 3 or 4 consecutive Phe or Trp residues.
Can you or anyone among the users help me out with this? Maybe it's a simple edit to the script or generating a new .resfile?
Thanks,
Yash
Hi,
Can this pipeline for a protein monomer? Or does one have to use ddG_monomer for that? Your help is appreciated!
Thanks,
Daipayan
I would like to use flex_ddG to perform mutations at multiple positions together, can flex_ddG do that and what is the input res_file format would be?
Thanks,
Hi,
I get the following message in ROSETTA_CRASH.log
:
InterfaceDdGMover needs to have chains (numbers or names) or jumps defined to move
Please advise on how to proceed. Many thanks in advance!
Hello!
Thanks a lot for the nice repo!
I wanted to ask whether the .xml file has to be modified if I have a glycan in the vicinity of the mutated residue. This scenario seems to consistently lead to rosetta crashing. This happens when the backrub mover is doing its thing, and just before the crash the fa_rep term starts to blow up and fa_elec is -nan. With glycan removed from the structure there is no issue. It seems as if during the backrub clashes begin to appear. How to handle it?
Any advice would be highly appreciated!
Hi - when I try to run this tutorial, I get the following error in rosetta.out:
ERROR: Option matching -restore_talaris_behaviour not found in command line top-level context
I'm using Rosetta Scripts from rosetta 3.8, any help appreciated!
Thanks,
Gail
Hi,
I am encountering a strange issue while trying to mutate one phosphoserine (named SEP in my PDB file as per Rosetta conventions) to alanine. Indeed, judging from Rosetta output it recognises the residue but it does not mutate it, and the mutated PDB structure that I can retrieve after running the protocol actually shows the phosphoserine at the mutation site.
This is the part of the Rosetta output where normally the mutation is logged:
protocols.rosetta_scripts.ParsedProtocol: (0) =======================BEGIN MOVER PackRotamersMover - mutate======================= core.pack.task: (0) Packer task: initialize from command line() core.pack.pack_rotamers: (0) built 0 rotamers at 0 positions.
I am using a nataa_mutations.resfile
written as follows (phosposerine at pos. 170 of the PDB file, chain B):
NATAA
start
170 B PIKAA A
I am running the release rosetta-2018.33.60351
on Linux, using the MPI version of the rosetta_scripts
binary.
Thank you in advance!
Valentina
Hi,
What are pdb2rosetta.resmap.json, rosetta2pdb.resmap.json, and mutations.mutfile? Are these files essential for running example 1?
Thanks!
Hello all,
I noticed in the python scripts there is an option to use multiple CPU cores. In order to take advantage of this feature does one have to compile the MPI version of rosetta?
The reason for asking is because I was attempting to perform saturated mutagenesis on a residue. I used the recommended 35 structures for sampling, 35,000 backrub trials etc. and left the program running overnight. I was running the job using 20 CPU cores. 12 hours later it had only gone through 3 of the structures. Therefore, sampling all 35 structures would take days. Having to sample 17 residues, this time frame is highly unfeasible.
Is there something I am doing wrong, such as needing to compile the MPI version of rosetta, or is the normal time frame? Are there any ways to speed the calculation up while also maintaining accuracy?
Because of the pands .append() method which is now depracated, so I modify the code as showed blow;
``#!/usr/bin/python3`
import sys
import os
import sqlite3
import shutil
import tempfile
from pprint import pprint
import pandas as pd
import numpy as np
import re
import datetime
import sys
import collections
import threading
rosetta_output_file_name = 'rosetta.out'
output_database_name = 'ddG.db3'
trajectory_stride = 5
script_output_folder = 'analysis_output'
zemu_gam_params = {
'fa_sol' : (6.940, -6.722),
'hbond_sc' : (1.902, -1.999),
'hbond_bb_sc' : (0.063, 0.452),
'fa_rep' : (1.659, -0.836),
'fa_elec' : (0.697, -0.122),
'hbond_lr_bb' : (2.738, -1.179),
'fa_atr' : (2.313, -1.649),
}
def gam_function(x, score_term = None ):
return -1.0 * np.exp( zemu_gam_params[score_term][0] ) + 2.0 * np.exp( zemu_gam_params[score_term][0] ) / ( 1.0 + np.exp( -1.0 * x * np.exp( zemu_gam_params[score_term][1] ) ) )
def apply_zemu_gam(scores):
new_columns = list(scores.columns)
new_columns.remove('total_score')
scores = scores.copy()[ new_columns ]
for score_term in zemu_gam_params:
assert( score_term in scores.columns )
scores[score_term] = scores[score_term].apply( gam_function, score_term = score_term )
scores[ 'total_score' ] = scores[ list(zemu_gam_params.keys()) ].sum( axis = 1 )
scores[ 'score_function_name' ] = scores[ 'score_function_name' ] + '-gam'
return scores
def rosetta_output_succeeded( potential_struct_dir ):
path_to_rosetta_output = os.path.join( potential_struct_dir, rosetta_output_file_name )
if not os.path.isfile(path_to_rosetta_output):
return False
db3_file = os.path.join( potential_struct_dir, output_database_name )
if not os.path.isfile( db3_file ):
return False
success_line_found = False
no_more_batches_line_found = False
with open( path_to_rosetta_output, 'r' ) as f:
for line in f:
if line.startswith( 'protocols.jd2.JobDistributor' ) and 'reported success in' in line:
success_line_found = True
if line.startswith( 'protocols.jd2.JobDistributor' ) and 'no more batches to process' in line:
no_more_batches_line_found = True
return no_more_batches_line_found and success_line_found
def find_finished_jobs( output_folder ):
return_dict = {}
job_dirs = [ os.path.abspath(os.path.join(output_folder, d)) for d in os.listdir(output_folder) if os.path.isdir( os.path.join(output_folder, d) )]
for job_dir in job_dirs:
completed_struct_dirs = []
for potential_struct_dir in sorted([ os.path.abspath(os.path.join(job_dir, d)) for d in os.listdir(job_dir) if os.path.isdir( os.path.join(job_dir, d) )]):
if rosetta_output_succeeded( potential_struct_dir ):
completed_struct_dirs.append( potential_struct_dir )
return_dict[job_dir] = completed_struct_dirs
return return_dict
def get_scores_from_db3_file(db3_file, struct_number, case_name):
conn = sqlite3.connect(db3_file)
conn.row_factory = sqlite3.Row
c = conn.cursor()
num_batches = c.execute('SELECT max(batch_id) from batches').fetchone()[0]
scores = pd.read_sql_query('''
SELECT batches.name, structure_scores.struct_id, score_types.score_type_name, structure_scores.score_value, score_function_method_options.score_function_name from structure_scores
INNER JOIN batches ON batches.batch_id=structure_scores.batch_id
INNER JOIN score_function_method_options ON score_function_method_options.batch_id=batches.batch_id
INNER JOIN score_types ON score_types.batch_id=structure_scores.batch_id AND score_types.score_type_id=structure_scores.score_type_id
''', conn)
def renumber_struct_id( struct_id ):
return trajectory_stride * ( 1 + (int(struct_id-1) // num_batches) )
scores['struct_id'] = scores['struct_id'].apply( renumber_struct_id )
scores['name'] = scores['name'].apply( lambda x: x[:-9] if x.endswith('_dbreport') else x )
scores = scores.pivot_table( index = ['name', 'struct_id', 'score_function_name'], columns = 'score_type_name', values = 'score_value' ).reset_index()
scores.rename( columns = {
'name' : 'state',
'struct_id' : 'backrub_steps',
}, inplace=True)
scores['struct_num'] = struct_number
scores['case_name'] = case_name
conn.close()
return scores
def process_finished_struct( output_path, case_name ):
db3_file = os.path.join( output_path, output_database_name )
assert( os.path.isfile( db3_file ) )
struct_number = int( os.path.basename(output_path) )
scores_df = get_scores_from_db3_file( db3_file, struct_number, case_name )
return scores_df
def calc_ddg( scores ):
total_structs = np.max( scores['struct_num'] )
nstructs_to_analyze = set([total_structs])
for x in range(10, total_structs):
if x % 10 == 0:
nstructs_to_analyze.add(x)
nstructs_to_analyze = sorted(nstructs_to_analyze)
all_ddg_scores = []
for nstructs in nstructs_to_analyze:
ddg_scores = scores.loc[ ((scores['state'] == 'unbound_mut') | (scores['state'] == 'bound_wt')) & (scores['struct_num'] <= nstructs) ].copy()
for column in ddg_scores.columns:
if column not in ['state', 'case_name', 'backrub_steps', 'struct_num', 'score_function_name']:
ddg_scores.loc[:,column] *= -1.0
ddg_scores = ddg_scores._append( scores.loc[ ((scores['state'] == 'unbound_wt') | (scores['state'] == 'bound_mut')) & (scores['struct_num'] <= nstructs) ].copy() )
ddg_scores = ddg_scores.groupby( ['case_name', 'backrub_steps', 'struct_num', 'score_function_name'] ).sum().reset_index()
if nstructs == total_structs:
struct_scores = ddg_scores.copy()
ddg_scores = ddg_scores.groupby( ['case_name', 'backrub_steps', 'score_function_name'] ).mean().round(decimals=5).reset_index()
new_columns = list(ddg_scores.columns.values)
new_columns.remove( 'struct_num' )
ddg_scores = ddg_scores[new_columns]
ddg_scores[ 'scored_state' ] = 'ddG'
ddg_scores[ 'nstruct' ] = nstructs
all_ddg_scores._append(ddg_scores)
return (pd.concat(all_ddg_scores), struct_scores)
def calc_dgs( scores ):
l = []
total_structs = np.max( scores['struct_num'] )
nstructs_to_analyze = set([total_structs])
for x in range(10, total_structs):
if x % 10 == 0:
nstructs_to_analyze.add(x)
nstructs_to_analyze = sorted(nstructs_to_analyze)
for state in ['mut', 'wt']:
for nstructs in nstructs_to_analyze:
dg_scores = scores.loc[ (scores['state'].str.endswith(state)) & (scores['state'].str.startswith('unbound')) & (scores['struct_num'] <= nstructs) ].copy()
for column in dg_scores.columns:
if column not in ['state', 'case_name', 'backrub_steps', 'struct_num', 'score_function_name']:
dg_scores.loc[:,column] *= -1.0
dg_scores = dg_scores._append( scores.loc[ (scores['state'].str.endswith(state)) & (scores['state'].str.startswith('bound')) & (scores['struct_num'] <= nstructs) ].copy() )
dg_scores = dg_scores.groupby( ['case_name', 'backrub_steps', 'struct_num', 'score_function_name'] ).sum().reset_index()
dg_scores = dg_scores.groupby( ['case_name', 'backrub_steps', 'score_function_name'] ).mean().round(decimals=5).reset_index()
new_columns = list(dg_scores.columns.values)
new_columns.remove( 'struct_num' )
dg_scores = dg_scores[new_columns]
dg_scores[ 'scored_state' ] = state + '_dG'
dg_scores[ 'nstruct' ] = nstructs
l.append( dg_scores )
return l
def analyze_output_folder( output_folder ):
# Pass in an outer output folder. Subdirectories are considered different mutation cases, with subdirectories of different structures.
finished_jobs = find_finished_jobs( output_folder )
if len(finished_jobs) == 0:
print( 'No finished jobs found' )
return
ddg_scores_dfs = []
struct_scores_dfs = []
for finished_job, finished_structs in finished_jobs.items():
inner_scores_list = []
for finished_struct in finished_structs:
inner_scores = process_finished_struct( finished_struct, os.path.basename(finished_job) )
inner_scores_list.append( inner_scores )
scores = pd.concat( inner_scores_list )
ddg_scores, struct_scores = calc_ddg( scores )
struct_scores_dfs.append( struct_scores )
ddg_scores_dfs.append( ddg_scores )
ddg_scores_dfs.append( apply_zemu_gam(ddg_scores) )
ddg_scores_dfs.extend( calc_dgs( scores ) )
if not os.path.isdir(script_output_folder):
os.makedirs(script_output_folder)
basename = os.path.basename(output_folder)
pd.concat( struct_scores_dfs ).to_csv( os.path.join(script_output_folder, basename + '-struct_scores_results.csv' ) )
df = pd.concat( ddg_scores_dfs )
df.to_csv( os.path.join(script_output_folder, basename + '-results.csv') )
display_columns = ['backrub_steps', 'case_name', 'nstruct', 'score_function_name', 'scored_state', 'total_score']
for score_type in ['mut_dG', 'wt_dG', 'ddG']:
print( score_type )
print( df.loc[ df['scored_state'] == score_type ][display_columns].head( n = 20 ) )
print( '' )
if name == 'main':
for folder_to_analyze in sys.argv[1:]:
if os.path.isdir( folder_to_analyze ):
analyze_output_folder( folder_to_analyze )
``
erroe happend as
Unique values in 'state' column: ['bound_wt' 'unbound_mut']
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1874, in _agg_py_fallback
res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 849, in agg_series
result = self._aggregate_series_pure_python(obj, func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 877, in _aggregate_series_pure_python
res = func(group)
^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 2380, in
alt=lambda x: Series(x).mean(numeric_only=numeric_only),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/series.py", line 6225, in mean
return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 11992, in mean
return self._stat_function(
^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/generic.py", line 11949, in _stat_function
return self._reduce(
^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/series.py", line 6133, in _reduce
return op(delegate, skipna=skipna, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/nanops.py", line 147, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/nanops.py", line 404, in new_func
result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/nanops.py", line 720, in nanmean
the_sum = _ensure_numeric(the_sum)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/nanops.py", line 1693, in _ensure_numeric
raise TypeError(f"Could not convert string '{x}' to numeric")
TypeError: Could not convert string 'bound_wtunbound_mutbound_mutunbound_wtbound_wtunbound_mutbound_mutunbound_wtbound_wtunbound_mutbound_mutunbound_wt' to numeric
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/ai/data/Software/flexddg/flex_ddG_tutorial/analyze_flex_ddG2.py", line 214, in
analyze_output_folder( folder_to_analyze )
File "/ai/data/Software/flexddg/flex_ddG_tutorial/analyze_flex_ddG2.py", line 190, in analyze_output_folder
ddg_scores, struct_scores = calc_ddg( scores )
^^^^^^^^^^^^^^^^^^
File "/ai/data/Software/flexddg/flex_ddG_tutorial/analyze_flex_ddG2.py", line 137, in calc_ddg
ddg_scores = ddg_scores.groupby( ['case_name', 'backrub_steps', 'score_function_name'] ).mean().round(decimals=5).reset_index()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 2378, in mean
result = self._cython_agg_general(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1929, in _cython_agg_general
new_mgr = data.grouped_reduce(array_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 1428, in grouped_reduce
applied = sb.apply(func)
^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 366, in apply
result = func(self.values, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1926, in array_func
result = self._agg_py_fallback(how, values, ndim=data.ndim, alt=alt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1878, in _agg_py_fallback
raise type(err)(msg) from err
TypeError: agg function failed [how->mean,dtype->object]
then the code blow were added in to populate the state column with the appropriate values
`state = os.path.basename(os.path.dirname(output_path)) if state.startswith("bound"): state = "bound_" + state.split("_")[1] elif state.startswith("unbound"): state = "unbound_" + `state.split("_")[1]
After that, I got a empty dataframe in the output file.
Unique values in 'state' column: []
mut_dG
Empty DataFrame
Columns: [backrub_steps, case_name, nstruct, score_function_name, scored_state, total_score]
Index: []
wt_dG
Empty DataFrame
Columns: [backrub_steps, case_name, nstruct, score_function_name, scored_state, total_score]
Index: []
ddG
Empty DataFrame
Columns: [backrub_steps, case_name, nstruct, score_function_name, scored_state, total_score]
Index: []
Hi! @kylebarlow
I run the run_example_1.py and it output files include (pdb rosetta.out and wt_traj). It didn't have the file of ddG.db3.
And then, I run the analyze_flex_ddG.py. Something wrong. So i fix the code in the analyze_flex_ddG.py. I delte the "ddG.db3" relative in the code. Finally it work.
It is okay?
Like I delte the following:
def get_scores_from_db3_file(db3_file, struct_number, case_name):
conn = sqlite3.connect(db3_file)
conn.row_factory = sqlite3.Row
c = conn.cursor()
num_batches = c.execute('SELECT max(batch_id) from batches').fetchone()[0]
scores = pd.read_sql_query('''
SELECT batches.name, structure_scores.struct_id, score_types.score_type_name, structure_scores.score_value, score_function_method_options.score_function_name from structure_scores
INNER JOIN batches ON batches.batch_id=structure_scores.batch_id
INNER JOIN score_function_method_options ON score_function_method_options.batch_id=batches.batch_id
INNER JOIN score_types ON score_types.batch_id=structure_scores.batch_id AND score_types.score_type_id=structure_scores.score_type_id
''', conn)
def renumber_struct_id( struct_id ):
return trajectory_stride * ( 1 + (int(struct_id-1) // num_batches) )
scores['struct_id'] = scores['struct_id'].apply( renumber_struct_id )
scores['name'] = scores['name'].apply( lambda x: x[:-9] if x.endswith('_dbreport') else x )
scores = scores.pivot_table( index = ['name', 'struct_id', 'score_function_name'], columns = 'score_type_name', values = 'score_value' ).reset_index()
scores.rename( columns = {
'name' : 'state',
'struct_id' : 'backrub_steps',
}, inplace=True)
scores['struct_num'] = struct_number
scores['case_name'] = case_name
conn.close()
return scores
Hello,
I have used the flex ddG protocol perfectly before. However, this time I am attempting to analyze the results from a saturation mutagenesis run and get the following error while executing:
python3 analyze_flex_ddG.py output_saturation/
Traceback (most recent call last):
File "analyze_flex_ddG.py", line 215, in
analyze_output_folder( folder_to_analyze )
File "analyze_flex_ddG.py", line 194, in analyze_output_folder
ddg_scores_dfs.append( apply_zemu_gam(ddg_scores) )
File "analyze_flex_ddG.py", line 40, in apply_zemu_gam
assert( score_term in scores.columns )
AssertionError
I have used the same protocol for other projects with no issue, but this system is bringing up the error above.
Any ideas? @kylebarlow
After running run_example_1.py I tried to run the following:
rosetta_scripts.linuxgccrelease -s 5M2O.pdb -parser:protocol ddG-backrub.xml -parser:script_vars chainstomove=B mutate_resfile_relpath=nataa_mutations.resfile number_backrub_trials=10 max_minimization_iter=5 abs_score_convergence_thresh=200.0 backrub_trajectory_stride=5 -restore_talaris_behavior -in:file:fullatom -ignore_unrecognized_res -ignore_zero_occupancy false -ex1 -ex2
I got the following error:
ERROR: Option matching -restore_talaris_behavior not found in command line top-level context
I do not have any idea whst this error means? could you please help me with this.
FYI: I have these files in my current directory: 5M2O.pdb, nataa_mutations.resfile and ddG-backrub.xml.
Thanks
Hello,
I am wondering how this protocol handles if I specify ALLAA for a residue position. Do we get the ddg value for 19 other amino acids or the best mutant residue for that position? When I try this I do get results but not sure if it is for the best mutant residue for that position or something else. Thanks
Just a general question. I'm running single site saturating mutagenesis as per the protocol using run_example_2.py as a template using the suggested parameters:
nstruct = 35 # Normally 35
max_minimization_iter = 5000 # Normally 5000
abs_score_convergence_thresh = 1.0 # Normally 1.0
number_backrub_trials = 35000 # Normally 35000
backrub_trajectory_stride = 35000 # Can be whatever you want, if you would like to see results from earlier time points in the backrub trajectory. 7000 is a reasonable number, to give you three checkpoints for a 35000 step run, but you could also set it to 35000 for quickest run time (as the final minimization and packing steps will only need to be run one time).
The script runs fine, and when I analyze the output I noticed that the wild-type amino acid is included among the sampled mutations. Not surprising given that I didn't alter this line:
for mut_aa in 'ACDEFGHIKLMNPQRSTVWY':
However, occasionally I see that the wild-type amino acid scores fairly well (~ -0.5 to -1.0 kcal/mol) in both the fa_talaris2014 and fa_talaris2014-gam ddG metrics. Obviously, I would expect the ddG value of a wt-to-wt substitution to be 0, so I'm wondering if this large of a discrepancy is indicative of the amount of error inherent in the ddG calculations? I worry if other predicted stabilizing mutations are real given that the wild-type residue scores so well. Has this ever been seen before?
Hello there,
I wanted to use flex ddG to estimate effect of mutations. Given that a single mutation in my system takes ~4 hours with optimal parameters (as estimated from a successful run of flex ddG with rosetta prebuilt binaries on MacOS), I tried scaling up and running it on the cluster with linux distribution of rosetta. When I execute run.py script on linux I get errors when writing database files.
I tried using 1) prebuilt rosetta binaries for linux; 2) then I compiled the source code myself using gcc/6.1.0 and python/2.7.14 on linux to ensure it's compatible with environment. I had exactly the same error (see below) when run.py was using rosetta_scripts executable path from either prebuilt or compiled rosetta distribution. I would appreciate if you could help me figure out what could be a source of this problem.
Thank you,
Alina
Here's the error I get when I tried running tutorial example on linux:
protocols.features.ReportToDB: Reporting features for 427 of the 427 total residues in the pose 1JTG_AB_0001 for batch 'structr
eport'.
basic.database.schema_generator.Schema: [ ERROR ] ERROR writing schema after retry.
CREATE TABLE IF NOT EXISTS protocols(
protocol_id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
specified_options TEXT,
command_line TEXT,
svn_url TEXT,
svn_version TEXT,
script TEXT);
basic.database.schema_generator.Schema: [ ERROR ] database is locked
ERROR:: Exit from: src/basic/database/schema_generator/Schema.cc line: 260
BACKTRACE:
[0x58c6188]
[0x5574ac4]
[0x1e669ee]
[0x1e710eb]
[0x1e73b50]
[0x3544771]
[0x354593b]
[0x35460be]
[0x25d8983]
[0x3544771]
[0x354593b]
[0x35460be]
[0x35aa058]
[0x35abb61]
[0x364e6d8]
[0x410acb]
[0x5e0bee4]
[0x6151fd]
protocols.rosetta_scripts.ParsedProtocol: [ ERROR ] Exception while processing procotol:
File: src/basic/database/schema_generator/Schema.cc:260
[ ERROR ] UtilityExitException
ERROR:
protocols.rosetta_scripts.ParsedProtocol: [ ERROR ] Exception while processing procotol:
File: src/basic/database/schema_generator/Schema.cc:260
[ ERROR ] UtilityExitException
ERROR:
protocols.jd2.JobDistributor: [ ERROR ]
[ERROR] Exception caught by JobDistributor for job 1JTG_AB_0001
File: src/basic/database/schema_generator/Schema.cc:260
[ ERROR ] UtilityExitException
ERROR:
protocols.jd2.JobDistributor: [ ERROR ]
protocols.jd2.JobDistributor: [ WARNING ] 1JTG_AB_0001 reported failure and will NOT retry
protocols.jd2.JobDistributor: no more batches to process...
protocols.jd2.JobDistributor: 1 jobs considered, 1 jobs attempted in 29 seconds
Error: [ ERROR ] Exception caught by rosetta_scripts application:
File: src/protocols/jd2/JobDistributor.cc:329
1 jobs failed; check output for error messages
Error: [ ERROR ]
Dear Kyle,
Does the script of your tutorial allow for simultaneous mutations of residues on both sides of the interface? I think I understand how to modify mutfile and resfile for such input, but it's not clear to me how I should modify chains_to_move.txt...
By the way, for newbies who have never used Rosetta before (like me), it would have helped if you clarified that mutfile contains renumbered protein residues whereas resfile has pdb numbering. It took me some time to figure it out, given that the lines with mutation specifications are not in the same order in resfile and mutfile in your tutorial example.
Thanks,
Alina
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.