denoptim-project / denoptim Goto Github PK
View Code? Open in Web Editor NEWDENOPTIM is a software package for de novo design and virtual screening of functional molecules of any kind.
License: GNU Affero General Public License v3.0
DENOPTIM is a software package for de novo design and virtual screening of functional molecules of any kind.
License: GNU Affero General Public License v3.0
The internal fitness provider needs to be coupled with an external molecular modeling task. This way we could get decent geometries to be used with descriptors calculated in the internal fitness provider AND use the expression of the fitness from within the internal fitness provider.
Also, this should be coupled the possibility of reading descriptors from the sdf file resulting from the external task. These descriptors should be defined in the GUI besides the atom-specific descriptors.
Fragment is essentially supposed to work as an Adapter [https://en.wikipedia.org/wiki/Adapter_pattern] for an IAtomContainer. The responsibility of the IAtomContainer is in representing a molecule and the Fragment's responsibility is to allow the IAtomContainer to interact properly with the graph model that DENOPTIM uses. This would suggest that IAtomContainer should be provided as a dependency injection to the Fragment, i.e. you finish building the IAtomContainer before providing it to the Fragment's constructor. This further supported by the fact that after the IAtomContainer is initialized there is, as of writing, no place in the code where it is modified any further.
The immutability of IAtomContainer should be reflected in the code by disallowing any public methods in Fragment to modify the IAtomContainer.
This is not the case however, as there are a number of methods in Fragment that modify the IAtomContainer. Most of these methods have a direct analog in the IAtomContainer's interface (e.g. removeBond(…)). There are, in my opinion, several advantages of making IAtomContainer immutable:
My suggestion for fixing this is to make the IAtomContainer-field final and remove any methods in Fragment that modify IAtomContainer.
HI everyone
running suggested runAllTests.sh from git Bash shell does not complete. Can you test it?
Is it something that has to do with specific bash requirements or runAllTests.sh has been created only for Mac/Linux?
Best
andrea
In the graph handler, there is a button called "Load Library of Vertexes".
Whe nthe vertex is a Template, we it would be useful to have the possibility to open the embedded graph in a GUIGraphHandler.
In the graph handler, there is a button called "Load Library of Vertexes".
Right now the probabilities of crossover, p(xover), mutation, p(mut), and generating a new molecule from scratch, p(new), is set by the user and remain unchanged throughout a run (can @marco-foscato confirm this?). This is a simple strategy that works well for many applications, but I think in DENOPTIM's case we could use a more sophisticated scheme where these probabilities are dynamically changed throughout a run. I am confident these changes will increase the performance of DENOPTIM. Here are my suggestions:
Suggestion number 1 is a common way of making a GA "smarter". If the results from the Simulated Annealing-approach looks promising then we can choose to implement the more sophisticated one at a later stage. For the user this would mean that instead of setting the mutation probability he/she would set some alpha related to how fast or slow p(mut) should respond to changes.
Suggestion number 2 is both a common way of escaping local maxima and of actually making convergence faster. Here is a quote from "Handbook of Meta-Heuristics" that explains why convergence may be faster if we choose a good restart strategy:
[…] the algorithm [Greedy Randomized Adaptive Path-Relinking] finds a target solution in relatively
few iterations: about 25% of the runs take at most 101 iterations; about 50% take at
most 192 iterations; and about 75% take at most 345. However, some runs take much
longer: 10% take over 1000 iterations; 5% over 2000; and 2% over 9715 iterations.
The book goes further on to suggest that it is best to restart at regular intervals. The difficulty lies of course in choosing the length of these intervals. I suggest that we simply generate a new molecule from scratch after n successful modifications. We can find the best n by doing some experiments ourselves where we compare convergence times for different n for two or three different experiments. If the optimal n is very different between the runs then n should be set by the user. If it is not then we can hard-code this value into DENOPTIM.
While working on a method I had to sort the list of attachment points belonging to the Fragments that constitute the inner graph of a Template, the code broke in several places (reported by @marco-foscato). @marco-foscato suggested that there are several places where it is assumed that the list of APs is in order of the AP ID, i.e. in the order of AP creation.
We should find a solution that prevents programmers from inadvertently changing the order of the APs.
One solution is to change the type of the data structure that stores the APs to one that enforces a particular ordering. An example of such a data structure can be a heap.
Another work-around is to let .getAttachmentPoints() return a copy of the AP list. This will incur a small performance loss, but will solve the problem. It is also generally considered good practice to return copies instead of references of objects' fields from getters.
The last possibility is to get rid of the assumption altogether and change the code where the assumption is made.
Could improve the README file following this very nice example https://github.com/mhucka/readmine
I am playing around with DENOPTIM and found an issue when using the SerConverter. It seems as if something goes wrong with the fragment space because the converted *.sdf files are messed up. Using DENOPTIM GUI, 1st loading the fragment space and 2nd loading a *.ser file works well, i.e. the structure of the molecule is reasonable.
I have this issue with my own fragments and also tested it with the PtCOLX2_FSE example/test provided in the project.
It would also be nice to have a feature in the GUI, that allows the conversion of all graphs at once.
The IdFragmentAndAP class made sense when denoptim used an index-based approach for identifying building blocks and attachment points.
It should be possible to remove it and replace the related index-based code with reference-based one.
For compatibility of GUI with Windows platforms:
add CONTRIBUTING.md.
Running t7 on MacOS Darwin.
dg_96 is the following:
96 1_1_0_-1,2_4_1_0,8_4_1_0,3_3_1_0,7_1_2_0,17_1_2_1,14_2_1_1,18_1_2_1,10_1_2_1,11_3_1_1,151_1_2_2,152_1_2_2,153_1_2_2, 1_0_2_0_1_co:0_coa:1,1_3_8_0_1_co:0_coa:1,1_1_3_0_1_ccb:0_cca:0,1_2_7_0_1_ch:0_hyd:1,3_2_17_0_1_ch:0_hyd:1,3_1_14_0_1_ccb:0_ATminus:0,3_3_18_0_1_ch:0_hyd:1,2_1_10_0_1_cob:1_hyd:1,8_1_11_0_1_cob:1_cca:0,11_1_151_0_1_ccb:0_hyd:1,11_2_152_0_1_ch:0_hyd:1,11_3_153_0_1_ch:0_hyd:1, => 96 16 [3, 0, 1]
which is properly converted to SDF file:
CDK 0625191250
13 12 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0001 0.0001 0.0001 O 0 0 0 0 0 0 0 0 0 0 0 0
0.0001 0.0001 0.0001 O 0 0 0 0 0 0 0 0 0 0 0 0
0.0001 0.0001 0.0001 C 0 0 0 0 0 0 0 0 0 0 0 0
0.1930 -2.4982 1.1563 H 0 0 0 0 0 0 0 0 0 0 0 0
0.1930 -2.4982 1.1563 H 0 0 0 0 0 0 0 0 0 0 0 0
0.0001 0.0001 0.0001 ATM 0 0 0 0 0 0 0 0 0 0 0 0
0.1930 -2.4982 1.1563 H 0 0 0 0 0 0 0 0 0 0 0 0
0.1930 -2.4982 1.1563 H 0 0 0 0 0 0 0 0 0 0 0 0
0.0001 0.0001 0.0001 C 0 0 0 0 0 0 0 0 0 0 0 0
0.1930 -2.4982 1.1563 H 0 0 0 0 0 0 0 0 0 0 0 0
0.1930 -2.4982 1.1563 H 0 0 0 0 0 0 0 0 0 0 0 0
0.1930 -2.4982 1.1563 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
1 5 1 0 0 0 0
4 6 1 0 0 0 0
4 7 1 0 0 0 0
4 8 1 0 0 0 0
2 9 1 0 0 0 0
3 10 1 0 0 0 0
10 11 1 0 0 0 0
10 12 1 0 0 0 0
10 13 1 0 0 0 0
M END
96
96 1_1_0_-1,2_4_1_0,8_4_1_0,3_3_1_0,7_1_2_0,17_1_2_1,14_2_1_1,18_1_2_1,10_1_2_1,11_3_1_1,151_1_2_2,152_1_2_2,153_1_2_2, 1_0_2_0_1_co:0_coa:1,1_3_8_0_1_co:0_coa:1,1_1_3_0_1_ccb:0_cca:0,1_2_7_0_1_ch:0_hyd:1,3_2_17_0_1_ch:0_hyd:1,3_1_14_0_1_ccb:0_ATminus:0,3_3_18_0_1_ch:0_hyd:1,2_1_10_0_1_cob:1_hyd:1,8_1_11_0_1_cob:1_cca:0,11_1_151_0_1_ccb:0_hyd:1,11_2_152_0_1_ch:0_hyd:1,11_3_153_0_1_ch:0_hyd:1,
NEW
but runt7.sh expects SDF file with ' 16 15 0 0 . 0 0 0 0 '
In the graph handler, there is a button called "Load Library of Vertexes".
Several code snippets and variable names present in the documentations display escaping characters or are not formatted consistently.
Examples are:
cd $DENOPTIM\_HOME
should appear as cd $DENOPTIM_HOME
see user_manual.md:44 and many other places where this occurs.$DENOPTIM\_HOME\\target\\denoptim
should be $DENOPTIM_HOME\target\denoptim
see user_manual.md:52STOP_GA
not __STOP_GA__
. see user_manual.md:292. Similar problem with REMOVE_CANDIDATE
and ADD_CANDIDATE
C_i
should be displayed with the i
as subscript of C
.It would be practical to be able to start a new run from a previous one, without having to prepare a starting population file, but rather just from linking to the previous run in the parameters.
It would be nice to append a library of fragment, and allow GUIFragmentInspector to take some and append them to currently loaded library.
Speculative thread on the possibility to make the graph undirected. It is at all possible?
Some notes and comments on this possibility:
While making unit test testExtractPattern_twoSeparatedRings() in the DENOPTIMGraphOperationsTest class I came across a bug. I was trying to convert the ring of a graph into a PathSubGraph of that graph, but the graph of the PathSubGraph output had an edge with a source AP that did not exist in the list of APs of the same edge's source vertex. This is a bug and should be fixed.
As a first step to fix this bug I think it will be very beneficial to rewrite the constructor so that it follows a depth-first search (DFS) scheme rather than whatever strategy it follows now. DFS is a tried and true algorithm which is familiar to most programmers and DFS is particularly well suited to this kind of task, namely finding paths between two vertices in a graph. DFS should therefore greatly increase the clarity of the method.
I will attach the hash of the last commit that reproduces this bug. To reproduce it, run the unit tests. A unit test called testExtractPattern_twoSeparatedRings() should fail. Debug accordingly.
Commit hash: 2252a45
In the GUI's main toolbar, we could add an "Open Recent..." menu item that gives shortcut to the last N (10?) experiments.
Need a hidden file (~/.denoptim_recent) to store the list of links
Say we have a template that is generated on the fly, and our desires is to append it to the library of known building blocks. Say the graph embedded in the template, let's call it Ga, is asymmetric. Such graph might be isomorphic with a symmetric graph. call it Gb. We would probably want to store Gb rather than Ga because from Ga we can build ONLY asymmetric graphs, while Gb allows to build both symmetric and asymmetric graphs.
See FRagmentSpace.addFusedRingsToFragmentLibrary()
java.lang.IndexOutOfBoundsException: No atom at index: 21 from removing last C in the following:
CDK 05082109083D
22 23 0 0 0 0 0 0 0 0999 V2000
-0.1837 1.8726 0.1896 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7448 2.7993 0.5502 N 0 0 0 0 0 0 0 0 0 0 0 0
-0.9543 2.4365 -0.7651 N 0 0 0 0 0 0 0 0 0 0 0 0
0.7779 3.8962 -0.4194 C 0 0 0 0 0 0 0 0 0 0 0 0
1.1804 5.3094 -0.0423 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.6532 3.8813 -0.9019 C 0 0 0 0 0 0 0 0 0 0 0 0
1.4085 3.5828 -1.2500 H 0 0 0 0 0 0 0 0 0 0 0 0
1.0505 6.1527 -1.3164 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2060 5.3278 0.3232 H 0 0 0 0 0 0 0 0 0 0 0 0
0.5248 5.6950 0.7370 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.3313 6.0164 -1.9787 C 0 0 0 0 0 0 0 0 0 0 0 0
1.8171 5.8445 -2.0257 H 0 0 0 0 0 0 0 0 0 0 0 0
1.2260 7.1987 -1.0691 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.7651 4.5840 -2.2255 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0727 6.5000 -1.3443 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.3168 6.5454 -2.9305 H 0 0 0 0 0 0 0 0 0 0 0 0
-1.7914 4.5527 -2.5885 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.1204 4.1110 -2.9649 H 0 0 0 0 0 0 0 0 0 0 0 0
-1.2603 4.4233 -0.1785 H 0 0 0 0 0 0 0 0 0 0 0 0
1.7623 2.6731 1.6128 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.2846 1.9413 -1.1719 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.4663 1.3874 -2.4590 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0
3 1 1 0 0 0 0
4 2 1 0 0 0 0
7 4 1 0 0 0 0
5 4 1 0 0 0 0
9 5 1 0 0 0 0
10 5 1 0 0 0 0
8 5 1 0 0 0 0
12 8 1 0 0 0 0
13 8 1 0 0 0 0
11 8 1 0 0 0 0
15 11 1 0 0 0 0
16 11 1 0 0 0 0
14 11 1 0 0 0 0
17 14 1 0 0 0 0
18 14 1 0 0 0 0
6 3 1 0 0 0 0
19 6 1 0 0 0 0
4 6 1 0 0 0 0
14 6 1 0 0 0 0
2 20 1 0 0 0 0
22 21 1 0 0 0 0
3 21 1 0 0 0 0
M END
1#MImidazolidinylidene:1:-0.1153%-0.0449%0.7394 22#sArOrtho1:1:-1.3519%1.2880%-3.4523
<ATTACHMENT_POINT>
1:1 22:1
In the graph inspector: change "fragId" to "building block ID" and remove "sprite"
Users want a link to the position of the files pertaining the candidates that made it to the final population. Though the SDF with fitness is copied into the Final folder it would be good to have a pathname/link in the txt file of the Final.txt and of any other GenXXX.txt file.
The name of the file that was read-in (if any) combined with the tab identifier should appear somewhere in the GUI. The frame title is not a good place because it is not shown when the windows is full size. The best place if the "Active Tab" menu: add the filename (if the content comes from a file) and a tick mark to indicate which tab is currently on top.
Need to streamline RandomUtils so that one can ask for a random number generator without having to do RandomUtils.initialiseRNG() and RandomUtils.getRNG().
This should use a machine random seed, unless a seed is given by the user.
here is an example of what needs to be done to make the current run*.sh scripts work when running on Windows via Git-bash:
wrkDir=`pwd`
logFile="t2.log"
paramFile="t2.params"
wdToDenoptim="$wrkDir/"
if [[ "$(uname)" == CYGWIN* ]] || [[ "$(uname)" == MINGW* ]] || [[ "$(uname)" == MSYS* ]]
then
wdToDenoptim="$(cd "$wrkDir" ; pwd -W | sed 's/\//\\\\/g')"
fi
mv data/* "$wrkDir"
rm -rf data
#Adjust path in scripts and parameter files
filesToModify=$(find . -type f | xargs grep -l "OTF")
for f in $filesToModify
do
sed "$sedInPlace" "s|OTF_WDIR\/|$wdToDenoptim\\\\|g" "$f"
sed "$sedInPlace" "s|OTF_WDIR|$wdToDenoptim|g" "$f"
sed "$sedInPlace" "s|OTF_PROCS|$DENOPTIMslaveCores|g" "$f"
done
...
Note, however, this this should be done on the external fitness provider scripts as well.
We should probably make a proper distinction between DENOPTIMVertex and DENPTIMFragment. This, for instance, to create a DENOPTIMFragment object from IAtomContiner without having a defined fragment space.
Fragment has an addAP(…)-method with a different signature then the addAP(…)-method it inherits from Vertex. These methods also have different behaviors related to issue #49. We should definitely unify both the syntax and behavior of these methods.
I suggest we keep the signature from Vertex except for substituting the type of the direction vector parameter from []double to Point3d as Fragment's signature uses. Point3d better conveys that the direction vector is a 3-dimensional vector than []double, both in name and implementation.
The chemical representation (i.e., atoms, bonds, and all associated data) of a building block is stored in the library of building blocks , and upon reading in a graph we need to fetch the molecular representation of each building block. This have several consequences:
These are arguments in favor of including a light molecular representation on the JSON format of graphs and vertexes.
PS: Note that IAtoms can be part of multiple IAtomContainers: so an atom may belong to both the vertex AtomContainer and the whole molecule AtomContainer.
Need generalized identification of a tmp file system. Once it's identifies, the tmp location should be passed to the general parameters to avoid having to specify the same info multiple times.
One day we'll make the publication on anaconda automatic. Have a look here to see how to achieve it.
We could add the possibility to open the graph representation in the left panel of the GUIInspectFSERun.
The best seems to let the user choose whether to display only the molecualr representation of the overall graph (current strategy), or include that as part of the GUIGraphHandler. In hte latter case, in addition to displaying the panel with the molecualr representation of the overall graph, we also display the graph representation and the node content (upon clicking on a node).
In the graph handler, there is a button called "Load Library of Vertexes".
In making a unit test I discovered a bug that assumed Ring Closing Vertices (RCVs) are not at the scaffold level. After some discussion with @marco-foscato I was made aware that the content of RCVs are not part of the final molecule, i.e. they do not contain real atoms or molecules, so it doesn't make sense to place them at the scaffold level. Furthermore, all Rings have a head and tail vertex and these are assumed to be RCVs, but this is not explicitly checked, so programmers are not prevented from breaking this assumption. No doubt this is a slippery slope and will surely produce subtle bugs like the one I found in the future that may be hard to detect. I'm sure there are other places in the code where vertices are assumed to be RCVs.
We should solve this issue by making the assumption that a vertex is an RCV explicit where appropriate to prevent programmers from breaking this constraint.
Right now an RCV is represented as a normal Fragment with a dummy atom inside and exactly*1 attachment point (AP). This AP has an APClass which signifies that it is a Ring Closing Attractor (RCA). The APClass can be one of three choices: ATPlus, ATMinus, and ATNeutral.
I suggest we make RCV its own class called "RingCloser" which inherits from DENOPTIMVertex. We should also make a separate Enum called "Attractor" which contain the three choices discussed above. Getting rid of the dummy atom may be difficult as it relates to issue #49.
After that, the next step will be to require RingClosers where the code assumes so. A good place to start can be to change the type of the head and tail vertex in the DENOPTIMRing class from DENOPTIMVertex to RingCloser.
CDK's IChemObject provides a property map which can hold arbitrary information about a CDK object. DENOPTIM uses this property map to store information in the form of strings about which attachment points (APs) of a Fragment belong to which of its IAtoms, found in Fragment's IAtomContainer. This information is needed when converting a graph to an actual molecule. The information is duplicated as it is also stored in an AP's source atom-field which is really the only place it should be stored. The reason for also storing this in the property map is related to the GUI (maybe @marco-foscato can elaborate?).
There are two main disadvantages of storing the source atom information in the map:
We should remove the use of the property map and provide a solution for the problem related to the GUI. Maybe @marco-foscato can provide some more concrete first steps?
The word "run" in the two bottom shortcut buttons should probably be capitalized for consistency.
In ancient times the word "Reaction" was used to indicate what effectively is an attachment point class. This is why the txt format of the compatibility matrix used "RCN" and "RBO" keywords, where the "R" stands for "Reaction". Since, nowadays the APClass concept is established, the use of RCN and RBO is not understandable. Replace RCN with CMP (as for "compatibility", and RBO to CBO (as fro class-to-bond order).
The identification of source atom could be made the responsibility of Fragment and removed from AttachmentPoint.
Add simple fragment libraries. For in stance, the P and NHC ligands
In the graph handler, there is a button called "Load Library of Vertexes".
The EAUtils.buildGraph methods seems to build graphs irrespective of the molecular size limits, and then the graph is rejected by EAUtols.evaluateGraph. This is inefficient. Make EAUtils.buildGraph realize that it's building too large and thus stop it from adding more fragments.
To trigger this behavior, just use a substitution probability function that returns large values at high level ID.
It would be useful to have a button to save cutting rule that were manually added into the "chop structure" dialog.
It would be nice to be able to search a library for fragments with specific APClasses (or other features) in the GUI.
The search functionality should be visible from both GUIFragmentInspector and GUIFragmentSelector, but not in any FragmentViewPanel, for instance, not visible when displaying fragments from GUIGraphHandler.
So the search bar could be placed in the FragmentViewPanel, but Displayed only unpon request of the parent component hosting that panel.
Dear developers,
I have installed DENOPTIM under MacOS Big Sur, and the first test fails in the first molecule with error (from the log file)
denoptim.exception.DENOPTIMException: java.io.FileNotFoundException: /tmp/denoptim_test/t1/MOL000001_cs0.int_2 (No such file or directory)
at denoptim.integration.tinker.TinkerUtils.readTinkerIC(TinkerUtils.java:274).
I have the Tinker directory well set, and I am using java and javac versions 1.8. Python and bash are also in the $PATH. This is all troubleshooting I did following the installation instructions, but I can't get it to work... Some help would be appreciated.
Best,
Ferran
Need to move parameters for fitness provider to their own class, possibly within denoptim package
We need to add JUnit for unit tests
Need to allow selection of multiple files in GUI open, so that we can open multiple graphs/candidates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.