bigginlab / abfe_workflow Goto Github PK

This is a SnakeMake based workflow for ABFE calculations that can be easily scaled in a high-throughput manner via Slurm for example.

License: GNU General Public License v3.0

Python 97.51% Shell 2.49%

abfe_workflow's Introduction

ABFE_workflow

A snakemake based workflow for ABFE calculations using GMX. The workflow can be scaled on Slurm Queuing systems. Checkout our publication: Ries, B; Alibay, I.; N. M. Anand; Magarkar, A.; Biggin, P. C. Automated Absolute Binding Free Energy Calculation Workflow for Drug Discovery. J. Chem. Inf. Model. 2024

The here provided Cyclophilin D Test systems and experimental values originate from:

Here a visualization of the triggered process:

New Features:

We are currently improving the user experience. The input was simplified to simply only the need of providing a pdb file for the receptor and .sdf files for the ligand.

>cli-abfe -h

usage: cli-abfe [-h] -p PROTEIN_PDB_PATH -l LIGAND_SDF_DIR -o OUTPUT_DIR_PATH [-c COFACTOR_SDF_PATH] [-nc NUMBER_OF_CPUS_PER_JOB] [-nj NUMBER_OF_PARALLEL_JOBS] [-nr NUMBER_OF_REPLICATES] [-submit]
                       [-gpu] [-hybrid]

optional arguments:
  -h, --help            show this help message and exit
  -p PROTEIN_PDB_PATH, --protein_pdb_path PROTEIN_PDB_PATH
                        Input protein pdb file path
  -l LIGAND_SDF_DIR, --ligand_sdf_dir LIGAND_SDF_DIR
                        Input ligand(s) sdf file path
  -o OUTPUT_DIR_PATH, --output_dir_path OUTPUT_DIR_PATH
                        Output approach folder
  -c COFACTOR_SDF_PATH, --cofactor_sdf_path COFACTOR_SDF_PATH
                        Input cofactor(s) sdf file path
  -nc NUMBER_OF_CPUS_PER_JOB, --number_of_cpus_per_job NUMBER_OF_CPUS_PER_JOB
                        Number of cpus per job
  -nj NUMBER_OF_PARALLEL_JOBS, --number_of_parallel_jobs NUMBER_OF_PARALLEL_JOBS
                        Number of jobs in parallel
  -nr NUMBER_OF_REPLICATES, --number_of_replicates NUMBER_OF_REPLICATES
                        Number of replicates
  -nosubmit             Will not automatically submit the ABFE calculations
  -nogpu                don't use gpus for the submissions?
  -nohybrid             don't do hybrid execution (complex jobs on gpu and ligand jobs on cpu (requires gpu flag))

Usage:

An example usage is provided with the examples/example_execution.sh, that uses the ABFE_Calculator.py script. If you remove the submit flag, you can a start a run, that only sets up the folder structure. (checkout our example folder)

Additional script information is provided via:

  conda activate abfe

  cli-abfe -h
  # or
  cli-abfe-gmx -h

Running an ABFE Campaign from Bash:

  conda activate abfe
  cli-abfe -p <path>/receptor.pdb \
           -l <path>/myligands \
           -o <path>/Out  \
           -nogpu -nohybrid -nc 8

Input

The input is suggested to be structured as follows for the commandline option:

<ligands>
- ligand1.sdf
- ligand2.sdf
- ligand3.sdf
- ...
receptor.pdb

For the python call:

ligand_sdfs:List[str] - paths to sdf files
protein_pdb_path: str - path to pdb file

Alternativley you can provide gromacs input files and use the command line tool ABFE_GMX_CLI. Please make sure your ligand is called LIG in the gmx files. The input sturucture should look like this:

<rooot_dir>
- <ligand-1>
  - solvent
    - solvent.gro
    - solvent.top
  - complex
    - complex.gro
    - complex.top
- <ligand-2>
  - solvent
    - solvent.gro
    - solvent.top
  - complex
    - complex.gro
    - complex.top ...

Checkout the example folder for more clarity on the structures.

Install:

The package can be installed like the following script:

  cd ABFE_workflow
  conda env create --file ./environment.yml
  conda activate abfe
  pip install .

HPC Configs:

for the HPC configurations you should consult thefollowing file: https://github.com/bigginlab/ABFE_workflow/blob/5619e0c2d735569725e9e39e7c565eacbd681cfd/src/abfe/template/cluster_configs/default_slurm_template.json

abfe_workflow's People

Contributors

Stargazers

Watchers

Forkers

riesben cn-lugen awoonor yinliu-91 qihuazhang nithishwer unixjunkie freitasr akos-bio1 bbyun28 smaiti7 convexitylabs zjujdj

abfe_workflow's Issues

Documentation on expected Slurm configuration

Dear developpers,

thanks for all the hard work of putting all this together and releasing it opensource ;) I was wondering if you have any documentation regarding the expected slurm configuration and/or how to adjust the program for different settings? Eg. I have no partition named gpu. Could you also comment on which parts of the work need gpu and which don't? If I have a simple workstation with 1 gpu and 32 cores, could I still run the program? You can imagine I don't have a complex slurm set-up.
Any advice is very welcome!
Regards
Daniel

Add Unit - tests

Snakemake will auto generate some pytest tests, we should look to add those.

Specific GPU - Kernel for simulations

flags one wants to set:

nb, pmefft, bonded
auto, cpu, gpu

-submit argument unrecognised

When I try to execute example_execution.sh with bash,

I get the following error:

usage: cli-abfe [-h] -p PROTEIN_PDB_PATH -l LIGAND_SDF_DIR -o OUTPUT_DIR_PATH [-c COFACTOR_SDF_PATH]
[-pn PROJECT_NAME] [-nr NUMBER_OF_REPLICATES] [-njr NUMBER_OF_PARALLEL_RECEPTOR_JOBS]
[-njl NUMBER_OF_PARALLEL_LIGAND_JOBS] [-ncl NUMBER_OF_CPUS_PER_LIGAND_JOB] [-sff SMALL_MOL_FF]
[-nosubmit] [-nogpu] [-nohybrid]
cli-abfe: error: unrecognized arguments: -submit

Clearly because cli-abfe has -nosubmit as an argument changing the bash script to -nosubmit solved the issue

abfe_cli module inaccesible to the cli-abfe

After initializing the conda environment using the yml script, trying to access cli-abfe's help results in :

Traceback (most recent call last):
File "/biggin/b211/reub0138/mambaforge/envs/abfe/bin/cli-abfe", line 17, in
from abfe_cli.ABFECalculator import main
ModuleNotFoundError: No module named 'abfe_cli'

Clearly, the cli-abfe script located at ../../mambaforge/envs/abfe/bin/ did not find the file 'ABFECalculator' under the 'abfe_cli' folder. This is because the abfe_cli folder is in the repository directory 'ABFE_workflow'. Therefore I decided to add ABFE_workflow to the system path in the cli-abfe script with:

# Add the ABFE github repo to the path
sys.path.append('/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main')

This workaround resolved the issue and the cli worked as expected. The alpha version might need a more refined solution.

Using the ABFE_workflow without a scheduler

Hi @RiesBen and team - I am currently trying to get a minimal example of the process with one ligand to run on a local GPU workstation. Is there a way to run the full workflow without slurm as a scheduler?

For example:

cli-abfe -p data/CyclophilinD_min/receptor.pdb -l data/CyclophilinD_min/one_ligand/ -o cyclo_out_small -njr 6 -njl 1 -nr 1

Thank you!

Using the ABFE_workflow with PBS scheduler

I am currently working on a high-performance cluster at my university where we utilize the PBS scheduler. I've encountered an issue with my workflow as it was originally designed for use with the Slurm scheduler. Is there a method to adapt or extend the workflow to be compatible with a PBS script?

Updating CyclophilinD TestSystem

for the current workflow we need sdfs of the ligands with the correct bond-order information.

Trying to access 'submit' variable in ABFECalculator.py after passing 'nosubmit' to it

When I tried using the example submission script for Clyclophilin from the ABFE_workflow-main/examples directory with:

bash example_execution.sh

I got an error stating that:

This is because the cli-abfe script takes in nosubmit as arg. Therefore I changed line 37 in ABFE_workflow-main/abfe_cli/ABFECalculator.py from

if(args.submit):

to:

if(args.nosubmit):

Trying to access undeclared variable: input_sdf_

Executing bash example_execution_gmx.sh ends up with:

Traceback (most recent call last):
File "/biggin/b211/reub0138/mambaforge/envs/abfe/bin/cli-abfe-gmx", line 12, in
sys.exit(main())
File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe_cli/ABFECalculatorGmx.py", line 50, in main
res = calculate_abfe_gmx(input_dir=args.gmx_files_root_dir, out_root_folder_path=args.output_dir_path, approach_name=args.project_name,
File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe/calculate_abfe_gmx.py", line 70, in calculate_abfe_gmx
job_approach_file_path = build_approach_flow(approach_name=approach_name,
File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe/orchestration/build_approach_flow.py", line 23, in build_approach_flow
generate_conf.generate_approach_conf(out_path=approach_conf_path,
TypeError: generate_approach_conf() missing 1 required positional argument: 'small_mol_ff'

build_approach_flow.py fails to provide small_mol_ff as argument to generate_conf.generate_approach_conf()

Not sure how to deal with this.

make slurm commands exchangable

Extend the workflow for membrane proteins

Hi, me again, short question: could be possible to use the workflow for membrane proteins? If yes, how? (Just some first "idea-steps" to figure it out where to look in the code). I will be happy to help on it if needed.
Best,
Alejandro.

Enhancement: Should add output flag to the submission script

When I ran example execution.sh with bash, I got an error saying that:

Traceback (most recent call last):
File "/biggin/b211/reub0138/mambaforge/envs/abfe/bin/cli-abfe", line 16, in
sys.exit(main())
File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe_cli/ABFECalculator.py", line 56, in main
res = calculate_abfe(protein_pdb_path=args.protein_pdb_path, ligand_sdf_paths=sdf_paths, out_root_folder_path=args.output_dir_path,
File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe/calculate_abfe.py", line 32, in calculate_abfe
os.mkdir(dir_path).

FileNotFoundError: [Errno 2] No such file or directory: '***********/CyclophilinD_selfParametrized'

This is because the output directory was unspecified. Therefore, I had to add the following to the cli-abfe call in the submission script:

-o /biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/examples/out

This seemed to fix the issue

Installation failed

Hi, I experienced an error related to Scipy using mamba env create --file environment.yml.
Any idea why?

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for scipy
ERROR: Could not build wheels for scipy, which is required to install pyproject.toml-based projects

failed

Nonetype error with cluster config

Running example_execution.sh with the nosubmit flag on bash gives:

Prepare

	starting preparing ABFE-ligand file structur
		Ligand:  ligand-4
Traceback (most recent call last):
  File "/biggin/b211/reub0138/mambaforge/envs/abfe2/bin/cli-abfe", line 9, in <module>
    sys.exit(main())
  File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe_cli/ABFECalculator.py", line 55, in main
    res = calculate_abfe(protein_pdb_path=args.protein_pdb_path, ligand_sdf_paths=sdf_paths, out_root_folder_path=args.output_dir_path,
  File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe/calculate_abfe.py", line 45, in calculate_abfe
    build_ligand_flows(input_ligand_paths=conf["input_ligands_sdf_path"],
  File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe/orchestration/build_ligand_flow.py", line 215, in build_ligand_flows
    build_replicas_simulation_flow(out_ligand_path=out_ligand_path,
  File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe/orchestration/build_ligand_flow.py", line 78, in build_replicas_simulation_flow
    cluster_config['partition'] = "cpu"
TypeError: 'NoneType' object does not support item assignment

Unit tests needed

We really need unit tests on this.

Setting up the workflow

Hi, I am trying to set up the workflow in my cluster. The first issue was how to create the conda environment. Now it is working with the following environment.yml:

name: abfe
channels:
  - conda-forge
dependencies:
  - python=3.7.12
  - pip
  - conda-build
  - gromacs=2022.4
  - parmed
  - bioconda::snakemake=7.8.5
  - pip:
    - alchemlyb==0.6.0
    - pymbar==3.0.5
    - matplotlib
    - mdanalysis
    - numpy
    - pandas
    - scipy

However, when the environment is activated, I am getting the warning

WARNING: No ICDs were found. Either,
- Install a conda package providing a OpenCL implementation (pocl, oclgrind, intel-compute-runtime, beignet) or 
- Make your system-wide implementation visible by installing ocl-icd-system conda package.

For the calc_ABFE.py

#!/usr/bin/env python3

import os
from abfe.orchestration.build_and_run_ligands import calculate_all_ligands
   
if __name__ == "__main__":
    orig_dir = os.getcwd()
    
    # IO:
    out_root_path = "./data/"
    in_root_path = "./data/input/system1"

    input_ligand_paths = [in_root_path+"/"+dir for dir in os.listdir(in_root_path) if(os.path.isdir(in_root_path+"/"+dir))]
    print("input ligand dirs: ", input_ligand_paths)
    print("output root dir: ", out_root_path)

    # Options:
    n_cores=1
    num_jobs = 40
    num_replicas=1
    submit=True

    
    cluster_config ={
        "partition": "deflt",
        "time": "48:00:00",
        "num_sim_threads":8,
        "mem": "20GB",
    }
    
    # Do Fun!
    if(not os.path.isdir(out_root_path)): os.mkdir(out_root_path)
    calculate_all_ligands(input_ligand_paths=input_ligand_paths, out_root_path=out_root_path,  num_max_thread = 8,
                           num_replicas=num_replicas, submit=submit, num_jobs=num_jobs, cluster_config=cluster_config)

    os.chdir(orig_dir)

I changed in_root_path = "/data/input/system1", the partition and the keyword n_cores=n_cores. The last is not valid.

However, with this script I am getting:

input ligand dirs:  ['./data/input/system1/ligand1']
output root dir:  ./data/
./data//ligand1/1/job.sh ./data//ligand1/1/scheduler.sh
Traceback (most recent call last):
  File "calc_ABFE.py", line 34, in <module>
    num_replicas=num_replicas, submit=submit, num_jobs=num_jobs, cluster_config=cluster_config)
  File "/home/users/alejandro/GIT/ABFE_workflow/abfe/orchestration/build_and_run_ligands.py", line 55, in calculate_all_ligands
    num_replicas=num_replicas, cluster_config=cluster_config, submit=submit, num_jobs=num_jobs)
  File "/home/users/alejandro/GIT/ABFE_workflow/abfe/orchestration/build_and_run_ligands.py", line 45, in build_run_ligand
    out = scheduler.schedule_run()
  File "/home/users/alejandro/GIT/ABFE_workflow/abfe/orchestration/generate_scheduler.py", line 81, in schedule_run
    job_id = int(out.split()[-1])
ValueError: invalid literal for int() with base 10: 'directory'

Now, I actually do not know how to continue. Could you please help me?

Unexpected keyword argument 'nosubmit'

Trying to execute example_execution_gmx.sh gives the following error

usage: conda [-h] [--no-plugins] [-V] COMMAND ...
conda: error: argument COMMAND: invalid choice: 'activate' (choose from 'clean', 'compare', 'config', 'create', 'info', 'init', 'install', 'list', 'notices', 'package', 'remove', 'uninstall', 'rename', 'run', 'search', 'update', 'upgrade', 'build', 'convert', 'debug', 'develop', 'doctor', 'index', 'inspect', 'metapackage', 'render', 'skeleton', 'env')
False
Traceback (most recent call last):
File "/biggin/b211/reub0138/mambaforge/envs/abfe2/bin/cli-abfe-gmx", line 10, in
sys.exit(main())
File "/biggin/b211/reub0138/Projects/abfe/IrfansPaper/ABFE_workflow-main/abfe_cli/ABFECalculatorGmx.py", line 50, in main
res = calculate_abfe_gmx(input_dir=args.gmx_files_root_dir, out_root_folder_path=args.output_dir_path, approach_name=args.project_name,
TypeError: calculate_abfe_gmx() got an unexpected keyword argument 'nosubmit'

Might have to switch the variable

Allow the ability to add arbitrary control parameters to mdrun

Currently it probably will pick up GPUs, but not very well, nor will it play well with ntomp flags.