Git Product home page Git Product logo

industrybenchmarks2024's People

Contributors

billswope avatar chayast avatar hannahbaumann avatar ialibay avatar mcompchem avatar mikemhenry avatar riesben avatar sergioperezconesa avatar steinbt2 avatar tlhr avatar vgapsys avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

industrybenchmarks2024's Issues

Provide a conda-lock file

We want to provide a somewhat locked down environment build so we can control things that might affect our results.

This includes but is not limited to:

  • OpenMM version
  • OpenFF Tk version
  • AmberTools version
  • OpenMMForceFields version
  • OpenMMTools version

We probably don't want to lock the CUDA version, but we need to make sure it's possible for folks to manually specify one? Not sure on that one.

Test post-simulation clean up script

Follow up from #48

Once we have a script in place, we should test it on a large dataset w/ in progress simulations - that way we know that we won't accidentally mess up someone's in progress work.

industry-benchmarking-layout

Hi, I think it would be great if our folder structure follows this here:
https://docs.openfree.energy/en/stable/tutorials/rbfe_cli_tutorial.html

in my understanding the final folder structure, after full execution should look like this:

some_benchmark_folder

--network_setup
-----ligand_network.graphml
-----alchemical_network.json
-----transformations
-------*jsons

--results
----fun1
------shared_*
--------nc
-----fun1.json
----fun2
------shared_
--------*nc
-----fun2.json
---- ...

--final_results.tsv

Trajectory file sub-sampling script

We need a script that extracts necessary data from trajectory files:

  • Subsampling the trajectories & write them out to XTCs
    • Aim to store 20 frames
  • Extracting the reduced potential timeseries from the .nc files
    • Saved as .npz files for easy storing
  • Extracting replica exchange states timeseries
    • Allowing for exchange analysis as necessary

Update order of installation instructions

Make clear that single file installer is a backup plan if you can't access the internet.

  • Put the conda-lock instructions first
  • add instructions on installing conda-lock into a clean environment
  • add language clarifying that single file installer is a backup plan

QA preparation instructions

We should QA the preparation instructions and make sure that they make sense end-to-end.

Ideally we synchronously run through preparing a system like TYK2.

  • Link install instructions for "Phase 0"
  • Fix formatting on compute requirements section
  • Add warning to wait on phase two until we QA inputs

Add CONTRIBUTING.md

We need a file that tells folks who are unfamilliar with github how to contribute inputs / open issues etc...

Possibly take a brief primer from somewhere else?

Add instructions for getting an OpenFE environment built.

Specifically here we need to provide the following information:

  1. How to install OpenFE (using the conda-lock file if we make it available)
  2. How to test an OpenFE installation

I think a lot of this can be dealt with by having conda-lock install instructions on the core repo.

Change theme

The current theme isn't suitable for easy navigation. We need something that can be easily navigated via a sidebar TOC.

Add PR template for input file submission

We need:

  1. A PR template that outlines the key steps for input file submission
  2. An example directory with examples of the relevant files (i.e. a PDB, ligand SDF, cofactor SDF, and short markdown file with relevant details on how the system was prepared).

Update README

We need to update the readme explaining:

  1. What this repo is
  2. What the structure is
  3. What folks should look at when getting started

Create a post-simulation cleanup script

Mike has been working on a post-simulation script which:

  • Finds all the input JSON files
  • Checks if the simulation is complete
  • Does something

Can modify the script to safely run multiple times & post-process simulations.

TODO:

  • Create necessary script to downsample files
  • Validate the post-simulation cleanup script

Create RBFE simulation run script

We need to create a script that does the following:

  • Takes a set of inputs (ligand SDFs, cofactors SDF, protein PDB)
  • Create a LOMAP LigandNetwork (note needs fixing to deal with charge changes)
  • Create an RBFE AlchemicalNetwork
  • Selectively apply a different set of Protocol settings when dealing with net charge changes

Review instructions for ligands selection

Should we remove the request to copy over the edges.csv file and ligands.sdf file and just go ahead and ask that folks select whichever ligands they think is best at the preparation stage?

Decide on ligand preparation approach for public benchmark set

We need to define how the ligands will be selected for the public benchmark set.

Currently the schrodinger inputs contain a mixture of different conformers & protonation states, should we use the ligands as defined by Schrodinger or should individuals select whichever ligands they deem to be most appropriate.

Add phase 1->4 instructions

This doesn't have to be complete, but we need to give an overview of each phase, relevant timelines and key instructions that individuals should follow.

Change structures directory details / layout

Based on today's call:

  • maybe inputs to structure_inputs
  • organize prepared structures as directories under challenge_set/target_name/
  • try to enforce the same names in submitted files (i.e. protein.pdb, ligands.sdf, cofactors.sdf, edges.csv) [worst case we normalize it ourselves after the fact]

Add short MD simulation utility script

We need a utility script that will run a very short simulation and check that a given PDB input, and optionally cofator SDF, will be correctly ingested and simulated by OpenFE.

QA installation instructions

Follow up from #48

Once we have instructions in place, we need to QA the instructions, building an environment from scratch using the conda-lock file and the single file installer.

Ideally someone that isn't involved in the process like @jameseastwood

QA simulation instructions

We should QA the phase 2 simulation run script instructions and make sure that they work as intended.

Add table of contents

We should have a TOC that does something like this:

  • Industry Benchmark (Overview)
    • Benchmark phases
  • Phase 1: structure prep
    • Preparing system
    • Contributing system
    • Review instructions
  • Phase 2: running simulations
    • Simulation overview
    • Carrying out the simulations
    • Hardware requirements
  • Phase 3: analyzing results
    • Foo
  • Phase 4: paper writing

Get an overview of "duplicate" nodes in input datasets

The ligands have several cases where there are multiple conformers & protonation states for the same ligands.

We need:

Work out how we handle triplicate simulations

The OpenFE default is to run all triplicates in one go. This may be ok for standard transformations, but for net charge transformations this could lead to very large walltime needs.

Should we split the simulation in 3 sets of 1 repeat instead?

Fix table of contents in docs

The structure in the docs is broken, the tabs overview, Phase 1-4 instructions should be a level below 'Public datasets benchmarks'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.