After fixing that, the next problem occurred. When the RLEstimator is calling the train-mabs.py
with the parameters. It seems to lack an installation of the requirements.txt in the created docker container. Ray is not installed, but doesn't seem to be the only problem. Output:
Invoking script with the following command:
/usr/bin/python -m train-mabs --additional_configs clip_rewards=True,gamma=0.999,kl_coeff=0.2,lambda=0.9,lr=0.0005,num_sgd_iter=3,sample_batch_size=96,sgd_minibatch_size=256,train_batch_size=9216,vf_clip_param=175.0 --algorithm PPO --iterate_map_size False --map_size 11 --num_agents 4 --num_iters 10 --use_heuristics_action_masks False
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/ml/code/train-mabs.py", line 5, in <module>
import ray
ModuleNotFoundError: No module named 'ray'
2020-12-22 16:35:23,079 sagemaker-containers ERROR ExecuteUserScriptError:
Command "/usr/bin/python -m train-mabs --additional_configs clip_rewards=True,gamma=0.999,kl_coeff=0.2,lambda=0.9,lr=0.0005,num_sgd_iter=3,sample_batch_size=96,sgd_minibatch_size=256,train_batch_size=9216,vf_clip_param=175.0 --algorithm PPO --iterate_map_size False --map_size 11 --num_agents 4 --num_iters 10 --use_heuristics_action_masks False"
2020-12-22 16:35:50 Uploading - Uploading generated training model
2020-12-22 16:35:50 Failed - Training job failed
ProfilerReport-1608654710: Stopping