openfreeenergy / industrybenchmarks2024 Goto Github PK
View Code? Open in Web Editor NEWRepository for the 2024 OpenFE industry benchmark efforts
Home Page: https://industrybenchmarks2024.readthedocs.io/en/latest/
License: MIT License
Repository for the 2024 OpenFE industry benchmark efforts
Home Page: https://industrybenchmarks2024.readthedocs.io/en/latest/
License: MIT License
We want to provide a somewhat locked down environment build so we can control things that might affect our results.
This includes but is not limited to:
We probably don't want to lock the CUDA version, but we need to make sure it's possible for folks to manually specify one? Not sure on that one.
Follow up from #48
Once we have a script in place, we should test it on a large dataset w/ in progress simulations - that way we know that we won't accidentally mess up someone's in progress work.
Hi, I think it would be great if our folder structure follows this here:
https://docs.openfree.energy/en/stable/tutorials/rbfe_cli_tutorial.html
in my understanding the final folder structure, after full execution should look like this:
some_benchmark_folder
--network_setup
-----ligand_network.graphml
-----alchemical_network.json
-----transformations
-------*jsons
--results
----fun1
------shared_*
--------nc
-----fun1.json
----fun2
------shared_
--------*nc
-----fun2.json
---- ...
--final_results.tsv
Follow up from #48
We need to doicument the post-simulation clean up stage.
We need a script that extracts necessary data from trajectory files:
Make clear that single file installer is a backup plan if you can't access the internet.
Make a snapshot copy of the Schrodinger v2.0 inputs from https://github.com/schrodinger/public_binding_free_energy_benchmark/tree/v2.0
We should QA the preparation instructions and make sure that they make sense end-to-end.
Ideally we synchronously run through preparing a system like TYK2.
https://industrybenchmarks2024.readthedocs.io/en/latest/#public-dataset-benchmarks has a few broken links
The "Discussions" section here (https://industrybenchmarks2024.readthedocs.io/en/latest/get-in-touch.html) mentions a "#consortium-advisory-board" channel, which I couldn't find. Does the channel already exist?
We need to add a "how to prepare your inputs" instruction file that contains: https://docs.google.com/document/d/1UOIt2bv2MLvxuHZ-ZvZcruRvmVW2hSt_SVni6mCFT0I/edit?usp=sharing
This file should be continually updated based on participant feedback.
We need a file that tells folks who are unfamilliar with github how to contribute inputs / open issues etc...
Possibly take a brief primer from somewhere else?
Our front page instructions for the phase 1 kick-off are no longer up to date. We should update them at the same time as we communicate them by email to our industry partners.
Specifically here we need to provide the following information:
I think a lot of this can be dealt with by having conda-lock install instructions on the core repo.
We have a few warnings that we should fix + update RTD to fail if there is a warning
Is there a way we can spoof a firewall block on conda's main channel? If so, we should:
The current theme isn't suitable for easy navigation. We need something that can be easily navigated via a sidebar TOC.
We need:
We need to QA the instructions for the simulation execution script, making sure we don't overwrite anything.
We need to update the readme explaining:
Mike has been working on a post-simulation script which:
Can modify the script to safely run multiple times & post-process simulations.
TODO:
We need to create a script that does the following:
Should we remove the request to copy over the edges.csv file and ligands.sdf file and just go ahead and ask that folks select whichever ligands they think is best at the preparation stage?
We need to define how the ligands will be selected for the public benchmark set.
Currently the schrodinger inputs contain a mixture of different conformers & protonation states, should we use the ligands as defined by Schrodinger or should individuals select whichever ligands they deem to be most appropriate.
This doesn't have to be complete, but we need to give an overview of each phase, relevant timelines and key instructions that individuals should follow.
Once the run script is made available we should review and update the instructions so that folks can get started with simulations whilst we finalize the run script.
We need an issue template for FAQ queries
Based on partner feedback, we should tell folks to work from a fork rather than a direct branch (as they won't have access unless we add them to a specific OFE team).
Based on feedback from indutry partners
We need a list of systems that have been claimed for benchmarking (and hopefully by whom).
Based on today's call:
It would be great to have a notebook we can point to for folks to look at the network they generated and the atom mappings within it.
Effectively we are looking for https://github.com/OpenFreeEnergy/ExampleNotebooks/blob/main/cookbook/ligandnetwork_vis.ipynb but with a graphml loading step.
We need a set of instructions for reviewers of submitted inputs
Related to #46
We should check that the single file installer provides an environment that is equivalent to what we would get from the conda-lock file.
We need a utility script that will run a very short simulation and check that a given PDB input, and optionally cofator SDF, will be correctly ingested and simulated by OpenFE.
Follow up from #48
Once we have instructions in place, we need to QA the instructions, building an environment from scratch using the conda-lock file and the single file installer.
Ideally someone that isn't involved in the process like @jameseastwood
We should QA the phase 2 simulation run script instructions and make sure that they work as intended.
We should have a TOC that does something like this:
The ligands have several cases where there are multiple conformers & protonation states for the same ligands.
We need:
This is about trying to define the attributes, properties that we need to get out of the benchmarking in order to be able to use the data for many things (benchmarking analysis, building better scoreres, etc.)
I wrote an initial list and would like to get your feedback and additions
Please:
https://docs.google.com/spreadsheets/d/1z7W-WxeW2OOnh2B5SaJb8sacYmN9cVE5JtV5PQdLJH4/edit?usp=sharing
The OpenFE default is to run all triplicates in one go. This may be ok for standard transformations, but for net charge transformations this could lead to very large walltime needs.
Should we split the simulation in 3 sets of 1 repeat instead?
We need to document:
Issue for #59
We should create a plan for what analyses do we want to do on the benchmark results, i.e. draft a general overview of the analysis section of a paper.
The structure in the docs is broken, the tabs overview, Phase 1-4 instructions should be a level below 'Public datasets benchmarks'.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.