Hi, We have started running the trypsin-benz tutorial and I've a question regardin

These are the exact commands I'm using in slurm . please suggest <p dir=

BD or MD first with run.py any,about seekrcentral/seekr2

Comments (25)

lvotapka commented on June 15, 2024

Instead of using the “any” argument for run.py, use the “any_md” argument.

from seekr2.

vaibhavadixit commented on June 15, 2024

Hi,
Strangely I'm getting the following message with any_md option.
I just copied the input files from workstation to HPC and wanted to run the MD part of the calculation which has taken > 10 days on the workstation (A4000).
Does the model.xlm or any other input file saves any record of how far the calculation has proceeded?
Please suggest if I need to make some changes to the model.xlm or any other file to run all BD and MD from fresh to compare the performance of the A100 vs A4000 cards. There is only 10-15 % difference between the two card w.r.t. Amber22 pmemd.cuda jobs.
Thanks

[xxxx@xxxx seekr2-tutorial]$ tail job.54.out
Nothing was run because all criteria are satisfied.

from seekr2.

lvotapka commented on June 15, 2024

Yeah since you copied over the files, it thinks the calculation is finished because of the checkpoint files in each of the anchors. To force rerun, use the -f argument for run.py.

from seekr2.

vaibhavadixit commented on June 15, 2024

Ok, just did that and it is running the nam_simulation again.
Can I expect it to run the MD part after this?
Is it possible to simultaneously run BD and MD parts since they use CPU and GPU respectively and also (I think) are independent of each other?
thanks for the quick response.

from seekr2.

vaibhavadixit commented on June 15, 2024

Hi,
The calculation seems to have terminated prematurely.
Output related only to BD is printed even after using run.py -f option.
The any_md option also doesn't run any MD simulation.
Do I need to run the prepare.py step also on the HPC?
Please suggest. thank you

[xxxx@xxxx seekr2-tutorial]$ cat job.55.out
BrownDye 2.0: Version of 13 Jun 2022
running BD: b_surface restart: False trajectories to run: 1000000 trajectories so far: 1000000 number of transitions 0
moving to directory: /home/vaibhav/seekr2-tutorial/b_surface
running command: bd_top input.xml
moving to directory: /home/vaibhav/seekr2-tutorial/b_surface
running command: nam_simulation receptor_ligand_simulation.xml

from seekr2.

lvotapka commented on June 15, 2024

Would you please post the exact run command you are using? I'm not sure why this is happening.

from seekr2.

vaibhavadixit commented on June 15, 2024

These are the exact commands I'm using in slurm script.
please suggest

source /home/software/miniconda3/bin/activate
conda activate myseekr2
python /home/software/seekr2/seekr2/run.py -f any model.xml

(base) [xxx@node1 seekr2-tutorial]$ conda activate myseekr2
(myseekr2) [xxx@node1 seekr2-tutorial]$ python
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

from seekr2.

vaibhavadixit commented on June 15, 2024

Hi, Just a gentle reminder, if you can respond to this and the workstation difficulties I'm facing with seekr2?
Thank you very much for your support. Best regards. Vaibhav

from seekr2.

lvotapka commented on June 15, 2024

I'm currently on vacation, but I believe the short solution to your problem is to use "any_md" as mentioned previously in this thread:

python /home/software/seekr2/seekr2/run.py any_md model.xml -f

Alternatively, you can run the anchors by integer:

python /home/software/seekr2/seekr2/run.py 0 model.xml -f

One can even use multiple GPUs if available on a node:

python /home/software/seekr2/seekr2/run.py 0 model.xml -f -c 0 &
python /home/software/seekr2/seekr2/run.py 1 model.xml -f -c 1 &
python /home/software/seekr2/seekr2/run.py 2 model.xml -f -c 2 &
python /home/software/seekr2/seekr2/run.py 3 model.xml -f -c 3 &
wait

There should be no need to rerun prepare.py.

All programs display useful instructions for how to run them with the '-h' argument. I also suggest carefully reviewing all documentation, especially if I am not accessible.

from seekr2.

vaibhavadixit commented on June 15, 2024

Hi,
I tried the first and second options you suggested (since I've only one GPU), but in vain.
The MD part of the calculation won't run and it prints a message saying nothing to do.

Nothing was run because all criteria are satisfied.

Thus I tried to do the tutorial from scratch on the HPC as given here.

In this new trial, I'm getting the following error in the hidr step which looks related to the hidr.py script.
Please have a look and let me know, how can I possibly fix the same?
Thank you

Error is shown below
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Traceback (most recent call last):
File "/home/software/seekrtools/seekrtools/hidr/hidr.py", line 366, in
hidr(model, destination, pdb_files, dry_run, equilibration_steps,
File "/home/software/seekrtools/seekrtools/hidr/hidr.py", line 190, in hidr
hidr_simulation.run_SMD_simulation(
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.64.g33c013b-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 479, in run_SMD_simulation
system, topology, positions, box_vectors = run_window(
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.64.g33c013b-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 386, in run_window
add_forces(sim_openmm, model, anchor, restraint_force_constant, cv_list,
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.64.g33c013b-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 266, in add_forces
myforce = make_restraining_force(cv, variables_values_list)
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.64.g33c013b-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 222, in make_restraining_force
cv.add_groups_and_variables(myforce, variables_values_list)
TypeError: add_groups_and_variables() missing 1 required positional argument: 'alias_id'
~

My batch file is shown below
#!/bin/bash
#SBATCH --job-name=seekr2job1 ##job name
#SBATCH -N 1 ##number of nodes requires
#SBATCH --nodelist=node1
#SBATCH --ntasks-per-node=22 ##number of cpu requires
##SBATCH --time=95:50:20 ##time optional
#SBATCH --error=job.%J.err ## Job error
#SBATCH --output=job.%J.out ##job out put if any
#SBATCH --partition=GPU_NODES ##partition name
#SBATCH --gres=gpu:1 ## number of gpu card requires

source /home/software/miniconda3/bin/activate
conda activate myseekr2
python /home/software/seekrtools/seekrtools/hidr/hidr.py any model.xml -M SMD -p tryp_ben.pdb

from seekr2.

lvotapka commented on June 15, 2024

I'm now back from vacation.

I've run tests of SEEKR2 to see if there was a problem with the "any_md" or integer arguments of the run.py program and everything seems to work as expected.

Without force overwrite:

$ python ~/seekr2/seekr2/run.py any_md model.xml 
Nothing was run because all criteria are satisfied.

With force overwrite:

$ python ~/seekr2/seekr2/run.py any_md model.xml -f
anchor 0 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 1 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 2 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 3 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 4 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 5 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 6 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 7 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 8 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 9 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 10 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 11 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 12 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 13 has not run the minimum number of steps 0 of 10000000 in swarm index None

Same for integer arguments.

The only reason you should be getting the "Nothing was run because all criteria are satisfied." message is if you forgot the "-f" argument to force rerun. Also, are you sure that you're using the latest version of the SEEKR2 software?

As for the problem with hidr.py, that is a recent bug, thank you for finding it. I've just pushed the bugfix to the seekrtools repository.

from seekr2.

vaibhavadixit commented on June 15, 2024

As you can see from the top command on the node below, I did include the -f argument nonetheless I'm getting the message nothing to run.
Is it a bug in my installation?
Please suggest. thank you.

1522846 vaibhav 20 0 6465084 266084 80684 R 100.0 0.2 0:08.62 python /home/software/seekr2/seekr2/run.py any_md model.xml -f

(base) [vaibhav@node1 seekr2-tutorial]$ more job.97.out
Nothing was run because all criteria are satisfied.
(base) [vaibhav@node1 seekr2-tutorial]$ more job.97.err
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
(base) [vaibhav@node1 seekr2-tutorial]$
(base) [vaibhav@node1 seekr2-tutorial]$ more seekr2job.batch
#!/bin/bash
#SBATCH --job-name=seekr2job1 ##job name
#SBATCH -N 1 ##number of nodes requires
#SBATCH --nodelist=node1
#SBATCH --ntasks-per-node=22 ##number of cpu requires
##SBATCH --time=95:50:20 ##time optional
#SBATCH --error=job.%J.err ## Job error
#SBATCH --output=job.%J.out ##job out put if any
#SBATCH --partition=GPU_NODES ##partition name
#SBATCH --gres=gpu:1 ## number of gpu card requires

source /home/software/miniconda3/bin/activate
conda activate myseekr2
#python /home/software/seekr2/seekr2/prepare.py input_tryp_ben_hidr.xml
#python /home/software/seekr2/seekr2/run.py any model.xml -f
python /home/software/seekr2/seekr2/run.py any_md model.xml -f

from seekr2.

vaibhavadixit commented on June 15, 2024

OK, I'm running two sets of simulations on the HPC 1) where files are copied from the workstation and 2) started the tutorial from scratch.

The 1st gives the same "Nothing to do " error message.

For the 2nd (from scratch) simulation I got the seekrtools error which was resolved but now I'm getting NaN coordiate error with openmm. Does it indicate that input pqr or pdb file is not in the right format or something else?
Please suggest. thank you

Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Traceback (most recent call last):
File "/home/software/seekrtools/seekrtools/hidr/hidr.py", line 366, in
hidr(model, destination, pdb_files, dry_run, equilibration_steps,
File "/home/software/seekrtools/seekrtools/hidr/hidr.py", line 190, in hidr
hidr_simulation.run_SMD_simulation(
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.65.g549ac37-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 480, in run_SMD_simulation
system, topology, positions, box_vectors = run_window(
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.65.g549ac37-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 396, in run_window
sim_openmm.simulation.step(total_number_of_steps)
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/openmm/app/simulation.py", line 141, in step
self._simulate(endStep=self.currentStep+steps)
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/openmm/app/simulation.py", line 206, in _simulate
self.integrator.step(10) # Only take 10 steps at a time, to give Python more chances to respond to a control-c.
File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/openmm/openmm.py", line 13872, in step
return _openmm.LangevinIntegrator_step(self, steps)
openmm.OpenMMException: Particle coordinate is NaN. For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan

from seekr2.

lvotapka commented on June 15, 2024

From inside the seekr2/ directory, type "git log" and paste the first 10 or so lines here.

from seekr2.

vaibhavadixit commented on June 15, 2024

This is what I see with git log command.
I guess you want to check if the bugfix has been applied or not, right?

commit caa21a8 (HEAD -> master, origin/master, origin/dev, origin/HEAD)
Author: Lane Votapka [email protected]
Date: Thu Aug 18 23:22:49 2022 -0600

extraneous print statements removed

commit fbbc8e3
Author: Lane Votapka [email protected]
Date: Thu Aug 18 18:03:46 2022 -0600

Implemented Voronoi CV and anchors. Also corrected some bugs with RMSD CV and added missing check functions.

commit 277bc19
Author: Lane Votapka [email protected]
Date: Wed Aug 10 11:20:00 2022 -0600

working on developing Voronoi Tesselation CVs and anchors.

commit b395a16
Author: Lane Votapka [email protected]
Date: Fri Aug 5 09:27:38 2022 -0600

Fixed bug affecting short Elber trajectories

commit 7b895a2
Author: Lane Votapka [email protected]
Date: Thu Aug 4 15:39:48 2022 -0600

increasing run.py CONVERGENCE_INTERVAL to a larger number

commit 1395878
Author: Lane Votapka [email protected]
Date: Tue Aug 2 16:39:45 2022 -0600

Fixed anchor connections of bulk states in Grid combos.

commit f54e48e
Author: Lane Votapka [email protected]
Date: Tue Aug 2 14:22:02 2022 -0600

Updated tests for new Toy system state_point.

commit c521d26
Author: Lane Votapka [email protected]
Date: Tue Aug 2 11:37:47 2022 -0600

Fixed state_point definitions for toy systems with more than one particle

commit 414d3ac
Author: Lane Votapka [email protected]
Date: Fri Jul 29 16:55:54 2022 -0600

Added warning to model.xml not to modify by hand.

commit 60605f4
Author: Lane Votapka [email protected]
Date: Fri Jul 29 14:17:28 2022 -0600

from seekr2.

lvotapka commented on June 15, 2024

Yes, I was trying to see if you have the latest version, which you seem to.

Alright, let's try this manually...

From the directory where model.xml and the anchor_* folders are located, type the following command:

rm anchor_*/prod/*

From there, you should be able to run without the "Nothing was run" message.

Also, would you be willing to attach your model.xml file to this thread for me to look at?

from seekr2.

vaibhavadixit commented on June 15, 2024

For the file copied from the workstation.
Oops, I accidentally removed all the anchor folders since the command you suggested didn't work.
Then I had to also remove the b_surface folder for the prepare.py step to work, which finished quickly and correctly.
Then the run.py any model.xml step is running now. Looks like it is running the BD part of the calculation.

The old model.xml1.txt and new model.xml.txt files are attached herewith for your reference if that helps.
Just wondering if the BD and MD simulations are independent, why can't we run both simultaneously?
thank you

model.xml.txt
model.xml1.txt

from seekr2.

vaibhavadixit commented on June 15, 2024

This calculation again stopped after nam_simulation step.
Now I've submitted with run.py any model.xml -f option to check if that runs the MD portion of the calculation.
I'll paste the update soon. thank you

from seekr2.

lvotapka commented on June 15, 2024

Nothing seems wrong with your model.xml files. If you delete the files in each anchor's prod/ directory, there is no way that SEEKR could think that the MD portion is finished.

Do you want to just leave out the BD entirely? If so, then you can remove the entire <browndye_settings_input> block in the input XML and replace it with a line containing: "<browndye_settings_input/>", which will set that variable to None and no BD will be run.

Are you able to send me your system? If you send me your input XML and all input files in a way that I can easily run prepare.py, then I can try out your system to see if anything is wrong.

from seekr2.

vaibhavadixit commented on June 15, 2024

Again only the BD portion ran and it did not run the MD part of the calculation.
I'm some what puzzled.
As suggested I've shared a link to all the input files here.
Please do check at your end and help me understand where I'm making a mistake.

tail job.111.out
BrownDye 2.0: Version of 13 Jun 2022
running BD: b_surface restart: False trajectories to run: 1000000 trajectories so far: 1000000 number of transitions 0
moving to directory: /home/vaibhav/seekr2-tutorial/b_surface
running command: bd_top input.xml
moving to directory: /home/vaibhav/seekr2-tutorial/b_surface
running command: nam_simulation receptor_ligand_simulation.xml

Thank you

from seekr2.

lvotapka commented on June 15, 2024

Once I can get to it, I'll take a look at your files and see if I can reproduce the problem.

from seekr2.

lvotapka commented on June 15, 2024

Aha, I see your problem now. You do not have any <pdb_coordinates_filename> tags filled. This is fine if you want to use HIDR with the "-p" argument to assign a starting PDB, but without any PDB files in any of the anchors, run.py doesn't do anything. So you need to use HIDR to assign the starting PDBs in the model and then you can use run.py

from seekr2.

vaibhavadixit commented on June 15, 2024

Hi,
I ran the HIDR calculation and then it is correctly proceeded to MD part of the calculation.
Now the question is how much speed up I can expect on this A100 card compared to the A4000 (10 days) that I have on the workstation.
I guess not much or is it?
I've attached the output in case it helps you guess the time it is likely to take for this calculation on A100.
Thanks again for your valuable suggestions and insights.
best regards, Vaibhav
job.117.out.txt

from seekr2.

lvotapka commented on June 15, 2024

All SEEKR calculations will print a benchmark once finished, so if you run short jobs using, say the "-t 10000" argument, then you can see how fast the calculation is running on each card.

from seekr2.

lvotapka commented on June 15, 2024

Sounds like this issue is resolved, I'll go ahead and close it.

from seekr2.

BD or MD first with run.py any about seekr2 HOT 25 CLOSED

Comments (25)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent