deeprl's People
deeprl's Issues
typo
File "/mnt/sdb1/NCBR_grywalizacja/kod/RL/DeepRL/pretrain/common/replay_memory/replay_memory.py", line 517, in _sample_by_indices
dtype=np.float3)
Test Score Too Low
When I do the pretraining for Ms PacMan, it only shows a score of 60 or 90 for the test of the pretrained network, even though my human demo scores are around 5000. It shows it is correctly loading the scores, and gets 94% train/test accuracy for them. Shouldn't that test score it shows in "classify_demo" be a lot higher than 60?
Here's the info it shows:
python3 pretrain/run_experiment.py --gym-env=MsPacmanNoFrameskip-v4 --cpu-only --classify-demo --use-mnih-2015 --train-max-steps=2 --batch-size=32 --grad-norm-clip=0.5 --demo-ids=1,2,3,4,5,6,7,8
Colocations handled automatically by placer.
2019-04-28 11:23:48,385 classify_demo INFO optimizer: RMSPropOptimizer
2019-04-28 11:23:48,385 classify_demo INFO learning_rate: 0.0007
2019-04-28 11:23:48,385 classify_demo INFO epsilon: 1e-05
2019-04-28 11:23:48,385 classify_demo INFO decay: 0.99
2019-04-28 11:23:48,386 classify_demo INFO train_max_steps: 10000
2019-04-28 11:23:48,386 classify_demo INFO batch_size: 32
2019-04-28 11:23:48,386 classify_demo INFO eval_freq: 5000
2019-04-28 11:23:48,386 classify_demo INFO use_onevsall: False
2019-04-28 11:23:48,386 classify_demo INFO sampling_type: None
2019-04-28 11:23:48,386 classify_demo INFO clip_norm: 0.5
2019-04-28 11:23:48,387 classify_demo INFO reward_constant: 2.0
2019-04-28 11:23:48,387 util INFO Loading data from memory
2019-04-28 11:23:48,387 util INFO memory_folder: collected_demo/MsPacmanNoFrameskip_v4
2019-04-28 11:23:48,387 util INFO demo_ids: 1,2,3,4,5,6,7,8
2019-04-28 11:23:48,388 replay_memory INFO Load memory from collected_demo/MsPacmanNoFrameskip_v4/data/desktop/20190427_194019...
2019-04-28 11:23:48,415 replay_memory INFO Load memory from collected_demo/MsPacmanNoFrameskip_v4/data/desktop/20190427_194110...
2019-04-28 11:23:48,438 replay_memory INFO Load memory from collected_demo/MsPacmanNoFrameskip_v4/data/desktop/20190427_194151...
2019-04-28 11:23:48,483 replay_memory INFO Load memory from collected_demo/MsPacmanNoFrameskip_v4/data/desktop/20190427_194324...
2019-04-28 11:23:48,531 replay_memory INFO Load memory from collected_demo/MsPacmanNoFrameskip_v4/data/desktop/20190427_194506...
2019-04-28 11:23:48,568 util INFO replay_buffers size: 5
2019-04-28 11:23:48,568 util INFO total_rewards: {1: 2650.0, 2: 940.0, 3: 3590.0, 4: 5780.0, 5: 4110.0}
2019-04-28 11:23:48,568 util INFO total_memory: 5068
2019-04-28 11:23:48,569 util INFO total_steps: 5068
2019-04-28 11:23:48,569 util INFO action_distribution: {0: 4339, 1: 161, 2: 149, 3: 257, 4: 158, 7: 1, 8: 3}
2019-04-28 11:23:48,569 util INFO Data loaded!
2019-04-28 11:29:42,943 classify_demo DEBUG i=10000 train_acc=0.9375 test_acc=0.9434 loss=0.2340 max_val=14.886861801147461
2019-04-28 11:31:01,645 classify_demo DEBUG test: trial=77 score=60.0 steps=316 total_steps=24231
2019-04-28 11:31:02,629 classify_demo DEBUG test: trial=78 score=60.0 steps=319 total_steps=24547
2019-04-28 11:31:03,136 classify_demo INFO test: final score=60.0 final steps=318 # trials=78
2019-04-28 11:31:03,137 classify_demo INFO Now saving data. Please wait
2019-04-28 11:31:03,661 classify_demo INFO Data saved!
use-transfer path error
The supervised pre-training part works fine on my Ubuntu 16.04 server at
/home/ubuntu/newrl/DeepRL/
python3 pretrain/run_experiment.py --gym-env=MsPacmanNoFrameskip-v4 --cpu-only --classify-demo --use-mnih-2015 --train-max-steps=10000 --batch-size=32 --grad-norm-clip=0.5 --demo-ids=1,2,3,4,5,6,7,8
but when I then try to use the pre-trained network in A3C-TB like:
/home/ubuntu/newrl/DeepRL/
python3 a3c/run_experiment.py --gym-env=MsPacmanNoFrameskip-v4 --parallel-size=1 --max-time-step-fraction=1.0 --initial-learn-rate=0.0007 --rmsp-epsilon=1e-5 --grad-norm-clip=0.5 --use-mnih-2015 --unclipped-reward --transformed-bellman --use-transfer --append-experiment-num=1
it gives this error:
Traceback (most recent call last):
File "a3c/run_experiment.py", line 159, in
main()
File "a3c/run_experiment.py", line 155, in main
run_a3c(args)
File "/home/ubuntu/newrl/DeepRL/a3c/a3c.py", line 410, in run_a3c
var_list=transfer_var_list,
File "/home/ubuntu/newrl/DeepRL/a3c/game_ac_network.py", line 245, in load_transfer_model
assert folder.is_dir()
AssertionError
I know the "--use-transfer" is supposed to automatically find the pretrained model, but since that did not work I also tried adding a path for it at the end of the command like this but it gave the same error::
python3 a3c/run_experiment.py --gym-env=MsPacmanNoFrameskip-v4 --parallel-size=1 --max-time-step-fraction=1.0 --initial-learn-rate=0.0007 --rmsp-epsilon=1e-5 --grad-norm-clip=0.5 --use-mnih-2015 --unclipped-reward --transformed-bellman --use-transfer --append-experiment-num=1 --pretrained-model-folder='/home/ubuntu/newrl/DeepRL/results/pretrain_models/MsPacmanNoFrameskip_v4_mnih2015_l2beta1E-04_clipnorm5E-01/transfer_model/'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.