openfreeenergy / industrybenchmarks2024 Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 15.0 16.13 MB

Repository for the 2024 OpenFE industry benchmark efforts

Home Page: https://industrybenchmarks2024.readthedocs.io/en/latest/

License: MIT License

Python 100.00%

industrybenchmarks2024's People

Contributors

Stargazers

Watchers

Forkers

mcompchem riesben jbluck vgapsys benefrg steinbt2 alexanderhwilliams sergioperezconesa billswope tlhr binqingwei chayast lili-cao

industrybenchmarks2024's Issues

Provide a conda-lock file

We want to provide a somewhat locked down environment build so we can control things that might affect our results.

This includes but is not limited to:

OpenMM version
OpenFF Tk version
AmberTools version
OpenMMForceFields version
OpenMMTools version

We probably don't want to lock the CUDA version, but we need to make sure it's possible for folks to manually specify one? Not sure on that one.

Test post-simulation clean up script

Follow up from #48

Once we have a script in place, we should test it on a large dataset w/ in progress simulations - that way we know that we won't accidentally mess up someone's in progress work.

Add validation scrip test for PTMs

industry-benchmarking-layout

Hi, I think it would be great if our folder structure follows this here:
https://docs.openfree.energy/en/stable/tutorials/rbfe_cli_tutorial.html

in my understanding the final folder structure, after full execution should look like this:

some_benchmark_folder

--network_setup
-----ligand_network.graphml
-----alchemical_network.json
-----transformations
-------*jsons

--results
----fun1
------shared_*
--------nc
-----fun1.json
----fun2
------shared_
--------*nc
-----fun2.json
---- ...

--final_results.tsv

Document post-simulation clean up script

Follow up from #48

We need to doicument the post-simulation clean up stage.

Trajectory file sub-sampling script

We need a script that extracts necessary data from trajectory files:

Subsampling the trajectories & write them out to XTCs
- Aim to store 20 frames
Extracting the reduced potential timeseries from the .nc files
- Saved as .npz files for easy storing
Extracting replica exchange states timeseries
- Allowing for exchange analysis as necessary

Update order of installation instructions

Make clear that single file installer is a backup plan if you can't access the internet.

Put the conda-lock instructions first
add instructions on installing conda-lock into a clean environment
add language clarifying that single file installer is a backup plan

Add schrodinger public benchmark v2.0 inputs

Make a snapshot copy of the Schrodinger v2.0 inputs from https://github.com/schrodinger/public_binding_free_energy_benchmark/tree/v2.0

QA preparation instructions

We should QA the preparation instructions and make sure that they make sense end-to-end.

Ideally we synchronously run through preparing a system like TYK2.

Link install instructions for "Phase 0"
Fix formatting on compute requirements section
Add warning to wait on phase two until we QA inputs

Fix announcement on front page docs

https://industrybenchmarks2024.readthedocs.io/en/latest/#public-dataset-benchmarks has a few broken links

Slack channel #consortium-advisory-board?

The "Discussions" section here (https://industrybenchmarks2024.readthedocs.io/en/latest/get-in-touch.html) mentions a "#consortium-advisory-board" channel, which I couldn't find. Does the channel already exist?

Add input preparation & contribution instructions

We need to add a "how to prepare your inputs" instruction file that contains: https://docs.google.com/document/d/1UOIt2bv2MLvxuHZ-ZvZcruRvmVW2hSt_SVni6mCFT0I/edit?usp=sharing

This file should be continually updated based on participant feedback.

Add CONTRIBUTING.md

We need a file that tells folks who are unfamilliar with github how to contribute inputs / open issues etc...

Possibly take a brief primer from somewhere else?

Update front page instructions to reflect benchmark kickoff

Our front page instructions for the phase 1 kick-off are no longer up to date. We should update them at the same time as we communicate them by email to our industry partners.

Add instructions for getting an OpenFE environment built.

Specifically here we need to provide the following information:

How to install OpenFE (using the conda-lock file if we make it available)
How to test an OpenFE installation

I think a lot of this can be dealt with by having conda-lock install instructions on the core repo.

Fix build warnings when building docs

We have a few warnings that we should fix + update RTD to fail if there is a warning

Verify that install works in an environment where conda's main channel is blocked

Is there a way we can spoof a firewall block on conda's main channel? If so, we should:

Check that it works
Think about building some CI check that will verify this (main on the main ofe repo)

Change theme

The current theme isn't suitable for easy navigation. We need something that can be easily navigated via a sidebar TOC.

Add PR template for input file submission

We need:

A PR template that outlines the key steps for input file submission
An example directory with examples of the relevant files (i.e. a PDB, ligand SDF, cofactor SDF, and short markdown file with relevant details on how the system was prepared).

QA instructions for the quickrun call script

We need to QA the instructions for the simulation execution script, making sure we don't overwrite anything.

Update README

We need to update the readme explaining:

What this repo is
What the structure is
What folks should look at when getting started

Create a post-simulation cleanup script

Mike has been working on a post-simulation script which:

Finds all the input JSON files
Checks if the simulation is complete
Does something

Can modify the script to safely run multiple times & post-process simulations.

TODO:

Create necessary script to downsample files
Validate the post-simulation cleanup script

Create RBFE simulation run script

We need to create a script that does the following:

Takes a set of inputs (ligand SDFs, cofactors SDF, protein PDB)
Create a LOMAP LigandNetwork (note needs fixing to deal with charge changes)
Create an RBFE AlchemicalNetwork
Selectively apply a different set of Protocol settings when dealing with net charge changes

Review instructions for ligands selection

Should we remove the request to copy over the edges.csv file and ligands.sdf file and just go ahead and ask that folks select whichever ligands they think is best at the preparation stage?

Collecting Scorer Data

https://docs.google.com/spreadsheets/d/1z7W-WxeW2OOnh2B5SaJb8sacYmN9cVE5JtV5PQdLJH4/edit?usp=sharing

Decide on ligand preparation approach for public benchmark set

We need to define how the ligands will be selected for the public benchmark set.

Currently the schrodinger inputs contain a mixture of different conformers & protonation states, should we use the ligands as defined by Schrodinger or should individuals select whichever ligands they deem to be most appropriate.

Add phase 1->4 instructions

This doesn't have to be complete, but we need to give an overview of each phase, relevant timelines and key instructions that individuals should follow.

Update & review phase 2 instructions to point to the run script and that a data cleaning script will be provided soon

Once the run script is made available we should review and update the instructions so that folks can get started with simulations whilst we finalize the run script.

Add FAQ issue template

We need an issue template for FAQ queries

Update contribution instructions to work from fork instead of a direct branch

Based on partner feedback, we should tell folks to work from a fork rather than a direct branch (as they won't have access unless we add them to a specific OFE team).

Update `input_validation` to `input_validation.py` in the input validation

Based on feedback from indutry partners

Point to installation instructions from home page

Add list of claimed systems

We need a list of systems that have been claimed for benchmarking (and hopefully by whom).

Change structures directory details / layout

Based on today's call:

maybe inputs to structure_inputs
organize prepared structures as directories under challenge_set/target_name/
try to enforce the same names in submitted files (i.e. protein.pdb, ligands.sdf, cofactors.sdf, edges.csv) [worst case we normalize it ourselves after the fact]

Notebook for visualizing the LigandNetwork graphml

It would be great to have a notebook we can point to for folks to look at the network they generated and the atom mappings within it.

Effectively we are looking for https://github.com/OpenFreeEnergy/ExampleNotebooks/blob/main/cookbook/ligandnetwork_vis.ipynb but with a graphml loading step.

Add reviewer instructions

We need a set of instructions for reviewers of submitted inputs

Verify status of single file installer

Related to #46

We should check that the single file installer provides an environment that is equivalent to what we would get from the conda-lock file.

Add short MD simulation utility script

We need a utility script that will run a very short simulation and check that a given PDB input, and optionally cofator SDF, will be correctly ingested and simulated by OpenFE.

Add directory layout for inputs to be deposited

QA installation instructions

Follow up from #48

Once we have instructions in place, we need to QA the instructions, building an environment from scratch using the conda-lock file and the single file installer.

Ideally someone that isn't involved in the process like @jameseastwood

QA simulation instructions

We should QA the phase 2 simulation run script instructions and make sure that they work as intended.

Add table of contents

We should have a TOC that does something like this:

Industry Benchmark (Overview)
- Benchmark phases
Phase 1: structure prep
- Preparing system
- Contributing system
- Review instructions
Phase 2: running simulations
- Simulation overview
- Carrying out the simulations
- Hardware requirements
Phase 3: analyzing results
- Foo
Phase 4: paper writing

Get an overview of "duplicate" nodes in input datasets

The ligands have several cases where there are multiple conformers & protonation states for the same ligands.

We need:

An overview of which datasets are affected & how many "duplicates" there are, including the type of duplicates
- Note: an easy way of getting this is by looking at the original benchmark results for the ligands: https://github.com/schrodinger/public_binding_free_energy_benchmark/blob/main/21_4_results/ligand_predictions/bayer_macrocycles/ftase_extraligs_custcore_stereo_out.csv
An overview of how many more edges you get when creating a lomap network with those extra edges
- Maybe just a few systems so we can show a subsample?

List of properties and results, we want to extract from benchmarking

This is about trying to define the attributes, properties that we need to get out of the benchmarking in order to be able to use the data for many things (benchmarking analysis, building better scoreres, etc.)

I wrote an initial list and would like to get your feedback and additions

Please:
https://docs.google.com/spreadsheets/d/1z7W-WxeW2OOnh2B5SaJb8sacYmN9cVE5JtV5PQdLJH4/edit?usp=sharing

Work out how we handle triplicate simulations

The OpenFE default is to run all triplicates in one go. This may be ok for standard transformations, but for net charge transformations this could lead to very large walltime needs.

Should we split the simulation in 3 sets of 1 repeat instead?

Document data retention policy

We need to document:

What data individuals will be keeping around
How long that data needs to be kept around for

openfreeenergy / industrybenchmarks2024 Goto Github PK

industrybenchmarks2024's People

Contributors

Stargazers

Watchers

Forkers

industrybenchmarks2024's Issues

Recommend Projects

Recommend Topics

Recommend Org