applieddatasciencepartners / deepreinforcementlearning Goto Github PK
View Code? Open in Web Editor NEWA replica of the AlphaZero methodology for deep reinforcement learning in Python
License: GNU General Public License v3.0
A replica of the AlphaZero methodology for deep reinforcement learning in Python
License: GNU General Public License v3.0
For some reasons (#1) and because of the use of reload
(now importlib.reload
), the notebook don't work for Python 3.
Hello, thanks your excellent work.
I wonder to know what's your enviroment?
Cuda8.0? or something else.
And your hardware to run this code.
NVIDIA GTX 1080 Ti?
Running this example has the following prerequisites
Example install for Ubuntu 16.04
sudo apt install graphviz
Python 2.7
It also requires the following packages be installed and working. Here is an example Pipfile which can be used with pipenv to get running
Copy to a file name 'Pipfile' in dir DeepReinforcementLearning/
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"
[dev-packages]
[packages]
jupyter = "*"
numpy = "*"
Keras = "*"
tensorflow = "*"
matplotlib = "*"
pydot = "*"
[requires]
python_version = "2.7"
Then install the required modules from that Pipfile using pipenv and run Jupyter
pip install --user pipenv
cd DeepReinforcementLearning/
pipenv install
pipenv shell
jupyter notebook
Other steps may be required on other systems, this was the minimum required to get running on my Ubuntu 16.04 machine.
When I open run.ipynb in jupiter, in block 2 (In: 2) I have a message:
ImportError Traceback (most recent call last)
in
39 #copy the config file to the run folder
40 copyfile('./config.py', run_folder + 'config.py')
---> 41 plot_model(current_NN.model, to_file=run_folder + 'models/model.png', show_shapes = True)
42
43 print('\n')
c:\users\вова\appdata\local\programs\python\python36\lib\site-packages\keras\utils\vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir)
130 'LR' creates a horizontal plot.
131 """
--> 132 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
133 _, extension = os.path.splitext(to_file)
134 if not extension:
c:\users\вова\appdata\local\programs\python\python36\lib\site-packages\keras\utils\vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir)
53 from ..models import Sequential
54
---> 55 _check_pydot()
56 dot = pydot.Dot()
57 dot.set('rankdir', rankdir)
c:\users\вова\appdata\local\programs\python\python36\lib\site-packages\keras\utils\vis_utils.py in _check_pydot()
18 if pydot is None:
19 raise ImportError(
---> 20 'Failed to import pydot
. '
21 'Please install pydot
. '
22 'For example with pip install pydot
.')
pydot
. Please install pydot
. For example with pip install pydot
.Reinstalling keras and pydot on the required versions does not solve the problem
Respected mam,
How did you converted eeg signal data into XML file ? From emotive application ?? How please reply
when i enter my action,there was a error,please help me
Tell me why these layers are needed and why are there 6 of them? What will be the number of hidden layers with a board of a different size?
I was comparing your Residual Architecture to the actual Residual Architecture as shown in this cheat sheet.
Your Implementation
Actual Implementation
I'm just going to put this here so that people know the difference in implementation.
Hi!
Thanks for you implementation, I have a question:
Aren't tournament plays always the same as you set tau to 0 (so fully deterministic play) and each episode reset MCTS tree in playMatches
function's for loop? It means that each time players start with zero knowledge (empty tree), do deterministic search and then act deterministically. No stochasticity, then games should look exactly the same. Do I miss something?
I also wonder how DeepMind did it in original paper (they player 400 episodes in tournament), do you have some insight? Guys from Stanford doesn't reset MCTS tree between tournament games, so players increment their knowledge and hence (possibly) play different more informed games each time. This makes sens.
Thanks for your attention!
Sorry to bother you! I am making an alphazero implementation similar to yours, which is also for the Connect4 board game. The training went smooth at first, however, after 70+ iterations, the loss can no longer decrease. I manually set the learning rate from 1e-3 to 1e-5, but the loss still gradually increases. Then I came across your blog about your implementation, and I find it very similar to mine. Have you ever met this case in your experiments? Hopefully you could offer me some advice :)
How about multiple player complex strategy like cards game which contains unfixed actions?
I am writing a board game whose player can draw cards, play cards to get some scores, I am stuck with action_size
state_size
these parameter, because I can't fill them like board game, that's row x column. Anyone could give me some hints?
Very minor issue with the code, main.py:103:
pickle.dump( memory, open( run_folder + "memory/memory" + iteration + ".p", "wb" ) )
should have probably been:
pickle.dump( memory, open( run_folder + "memory/memory" + str(iteration) + ".p", "wb" ) )
I can only see that it create a horizontal mirror version of the current board. But why do we need that?
Thank you very much for this great source of information!!
Has any experiments been done to see if results comparable with DeepMind's chess and shogi publications can be achived. i.e. is this implementation indeed feature complete and as written in the Nature papers?
Hello. I really appreciate your work and am trying to run the run.ipynb file on my machine. There is an error shown on the GitHub commit and I cannot seem to get it to work on my machine as well.
I can not play matches between versions:
OSError Traceback (most recent call last)
in ()
4
5 env = Game()
----> 6 playMatchesBetweenVersions(env, 1, 1, 1, 10, lg.logger_tourney, 0)
~/DeepReinForcementLearning/DeepReinforcementLearning/funcs.py in playMatchesBetweenVersions(env, run_version, player1version, player2version, EPISODES, logger, turns_until_tau0, goes_first)
19
20 if player1version > 0:
---> 21 player1_network = player1_NN.read(env.name, run_version, player1version)
22 player1_NN.model.set_weights(player1_network.get_weights())
23 player1 = Agent('player1', env.state_size, env.action_size, config.MCTS_SIMS, config.CPUCT, player1_NN)
~/DeepReinForcementLearning/DeepReinforcementLearning/model.py in read(self, game, run_number, version)
37
38 def read(self, game, run_number, version):
---> 39 return load_model( run_archive_folder + game + '/run' + str(run_number).zfill(4) + "/models/version" + "{0:0>4}".format(version) + '.h5', custom_objects={'softmax_cross_entropy_with_logits': softmax_cross_entropy_with_logits})
40
41 def printWeightAverages(self):
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/models.py in load_model(filepath, custom_objects, compile)
235 return custom_objects[obj]
236 return obj
--> 237 with h5py.File(filepath, mode='r') as f:
238 # instantiate model
239 model_config = f.attrs.get('model_config')
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h5py/_hl/files.py in init(self, name, mode, driver, libver, userblock_size, swmr, **kwds)
267 with phil:
268 fapl = make_fapl(driver, libver, **kwds)
--> 269 fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
270
271 if swmr_support:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
97 if swmr and swmr_support:
98 flags |= h5f.ACC_SWMR_READ
---> 99 fid = h5f.open(name, flags, fapl=fapl)
100 elif mode == 'r+':
101 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5f.pyx in h5py.h5f.open()
OSError: Unable to open file (unable to open file: name = './run_archive/connect4/run0001/models/version0001.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Hi,
I was playing around with the code and I was wondering if a GPU implementation of MCTS is used, shouldn't the code be almost 100 times faster like described in the paper below?
https://pdfs.semanticscholar.org/fe90/c1f9955ba1f06f5ef26bde100bcc5c7a3327.pdf
This would improve the performance of the algorithm quite significantly.
Hi. I cloned your repo and got it to work until i ran "2. Now run this block to start the learning process" in run.ipynb and I got this error:
NameError Traceback (most recent call last)
in ()
53
54 iteration += 1
---> 55 reload(lg)
56 reload(config)
57
NameError: name 'reload' is not defined
I'm not sure what to go from here as I'm not familiar with the codebase. I would really appreciate if you could give some guidance here. Thanks!
I was going through the code and thinking of the changes that should be made in the case of a more complex game (Chess for example...) where ideally the allowed actions would be a list of tuples (piece position, target position), and I'm having a slight suspicion that the agent code has to be altered in some way. Any input on this ?
Hi! Have you evaluated the winning rate of the trained model against humans? I would like to know about the experience that humans play with the trained model. Is the model difficult for humans to beat?
AttributeError Traceback (most recent call last)
C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
51 try:
---> 52 return getattr(obj, method)(*args, **kwds)
53
AttributeError: 'NoneType' object has no attribute 'round'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
in ()
4
5 env = Game()
----> 6 playMatchesBetweenVersions(env, 1, -1, 1, 10, lg.logger_tourney, 0)
~\Documents\GitHub\DeepReinforcementLearning-master\funcs.py in playMatchesBetweenVersions(env, run_version, player1version, player2version, EPISODES, logger, turns_until_tau0, goes_first)
33 player2 = Agent('player2', env.state_size, env.action_size, config.MCTS_SIMS, config.CPUCT, player2_NN)
34
---> 35 scores, memory, points, sp_scores = playMatches(player1, player2, EPISODES, logger, turns_until_tau0, None, goes_first)
36
37 return (scores, memory, points, sp_scores)
~\Documents\GitHub\DeepReinforcementLearning-master\funcs.py in playMatches(player1, player2, EPISODES, logger, turns_until_tau0, memory, goes_first)
96 for r in range(env.grid_shape[0]):
97 logger.info(['----' if x == 0 else '{0:.2f}'.format(np.round(x,2)) for x in pi[env.grid_shape[1]*r : (env.grid_shape[1]*r + env.grid_shape[1])]])
---> 98 logger.info('MCTS perceived value for %s: %f', state.pieces[str(state.playerTurn)] ,np.round(MCTS_value,2))
99 logger.info('NN perceived value for %s: %f', state.pieces[str(state.playerTurn)] ,np.round(NN_value,2))
100 logger.info('====================')
C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in round_(a, decimals, out)
2849
2850 """
-> 2851 return around(a, decimals=decimals, out=out)
2852
2853
C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in around(a, decimals, out)
2835
2836 """
-> 2837 return _wrapfunc(a, 'round', decimals=decimals, out=out)
2838
2839
C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
60 # a downstream library like 'pandas'.
61 except (AttributeError, TypeError):
---> 62 return _wrapit(obj, method, *args, **kwds)
63
64
C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in _wrapit(obj, method, *args, **kwds)
40 except AttributeError:
41 wrap = None
---> 42 result = getattr(asarray(obj), method)(*args, **kwds)
43 if wrap:
44 if not isinstance(result, mu.ndarray):
TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'
Hi. Running this in python 3.6.
Managed to get past lots of errors so far with the help of this thread but I've got no idea on this.
Can anybody help, please?
I keep getting this issue and have no progress at all, please help. I've already done a lot of searching on google, Tried reshape the 'x' in model.py predict().
The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
Hi everyone!
Trying to run the second block in run.ipynb (start learning process) but I get the following error. I have Python 3.5 installed. Both pydot and graphviz are installed in the proper environment. This is the error I get. Thanks in advance for your help.
FileNotFoundError Traceback (most recent call last)
~/anaconda2/envs/py35/lib/python3.5/site-packages/pydot.py in create(self, prog, format)
1877 shell=False,
-> 1878 stderr=subprocess.PIPE, stdout=subprocess.PIPE)
1879 except OSError as e:
~/anaconda2/envs/py35/lib/python3.5/subprocess.py in init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds)
675 errread, errwrite,
--> 676 restore_signals, start_new_session)
677 except:
~/anaconda2/envs/py35/lib/python3.5/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
1288 err_msg += ': ' + repr(orig_executable)
-> 1289 raise child_exception_type(errno_num, err_msg)
1290 raise child_exception_type(err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'dot'
During handling of the above exception, another exception occurred:
Exception Traceback (most recent call last)
~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/utils/vis_utils.py in _check_pydot()
26 # to check the pydot/graphviz installation.
---> 27 pydot.Dot.create(pydot.Dot())
28 except Exception:
~/anaconda2/envs/py35/lib/python3.5/site-packages/pydot.py in create(self, prog, format)
1882 '"{prog}" not found in path.'.format(
-> 1883 prog=prog))
1884 else:
Exception: "dot" not found in path.
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
in ()
39 #copy the config file to the run folder
40 copyfile('./config.py', run_folder + 'config.py')
---> 41 plot_model(current_NN.model, to_file=run_folder + 'models/model.png', show_shapes = True)
42
43 print('\n')
~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/utils/vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir)
133 'LR' creates a horizontal plot.
134 """
--> 135 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
136 _, extension = os.path.splitext(to_file)
137 if not extension:
~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/utils/vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir)
54 from ..models import Sequential
55
---> 56 _check_pydot()
57 dot = pydot.Dot()
58 dot.set('rankdir', rankdir)
~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/utils/vis_utils.py in _check_pydot()
29 # pydot raises a generic Exception here,
30 # so no specific class can be caught.
---> 31 raise ImportError('Failed to import pydot. You must install pydot'
32 ' and graphviz for pydotprint
to work.')
33
ImportError: Failed to import pydot. You must install pydot and graphviz for pydotprint
to work.
is it possible to adjust the game.py module to chess? and for that code to work?
i have a pretty big assignment that i thought i might use this code as a feature and i need to know if it's possible.
the code works for connect4, but sometimes i think it gets stuck on a never ending loop. notes on that?
another thing, is it possible to change the number of layers in the neural network or change the number of units (neurons)?
thank you
After installing tensorflow-gpu, errors appeared in the model.py module:
Line 138
def conv_layer(self, x, filters, kernel_size):
x = Conv2D(
filters = filters
, kernel_size = kernel_size
, data_format="channels_first"
, padding = 'same'
, use_bias=False
, activation='linear'
, kernel_regularizer = regularizers.l2(self.reg_const)
)(x)
x = BatchNormalization(axis=1)(x)
150 x = BatchNormalization(axis=1)(x)
^
ValueError
Shape must be rank 1 but is rank 0 for 'batch_normalization_1/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,75,1,1], [].
File "C:\DRL\model.py", line 150, in conv_layer
x = BatchNormalization(axis=1)(x)
File "C:\DRL\model.py", line 225, in _build_model
x = self.conv_layer(main_input, self.hidden_layers[0]['filters'], self.hidden_layers[0]['kernel_size'])
File "C:\DRL\model.py", line 114, in init
self.model = self._build_model()
File "C:\DRL\Untitled-1.py", line 67, in
current_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
I'm struggling to get the right versions of tensorflow to run this. It seems to want python 2.7.
Can you confirm that this works with tensorflow2x?
In backfilling we're updating only the edges which were chosen during the simulation. But since we can reach a state from two different states doing two different actions, shouldn't we update that other part of the tree from where as well we could've come to the same state? That's my understanding of
Action value Q is updated to track the mean of all evaluations V in the subtree below that action
when I run the run.ipynb in jupyter. I get the things in the following:
ValueError Traceback (most recent call last)
in
22
23 # create an untrained neural network objects from the config file
---> 24 current_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
25 best_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
26
~/DeepReinforcementLearning-master/model.py in init(self, reg_const, learning_rate, input_dim, output_dim, hidden_layers)
112 self.hidden_layers = hidden_layers
113 self.num_layers = len(hidden_layers)
--> 114 self.model = self._build_model()
115
116 def residual_layer(self, input_block, filters, kernel_size):
~/DeepReinforcementLearning-master/model.py in _build_model(self)
223 main_input = Input(shape = self.input_dim, name = 'main_input')
224
--> 225 x = self.conv_layer(main_input, self.hidden_layers[0]['filters'], self.hidden_layers[0]['kernel_size'])
226
227 if len(self.hidden_layers) > 1:
~/DeepReinforcementLearning-master/model.py in conv_layer(self, x, filters, kernel_size)
146 , activation='linear'
147 , kernel_regularizer = regularizers.l2(self.reg_const)
--> 148 )(x)
149
150 x = BatchNormalization(axis=1)(x)
~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/engine/base_layer.py in call(self, inputs, **kwargs)
429 'You can build it manually via: '
430 'layer.build(batch_input_shape)
')
--> 431 self.build(unpack_singleton(input_shapes))
432 self.built = True
433
~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/layers/convolutional.py in build(self, input_shape)
139 name='kernel',
140 regularizer=self.kernel_regularizer,
--> 141 constraint=self.kernel_constraint)
142 if self.use_bias:
143 self.bias = self.add_weight(shape=(self.filters,),
~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your ' + object_name + '
call to the ' +
90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper
~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint)
250 dtype=dtype,
251 name=name,
--> 252 constraint=constraint)
253 if regularizer is not None:
254 with K.name_scope('weight_regularizer'):
~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in variable(value, dtype, name, constraint)
400 v._uses_learning_phase = False
401 return v
--> 402 v = tf.Variable(value, dtype=tf.as_dtype(dtype), name=name)
403 if isinstance(value, np.ndarray):
404 v._keras_shape = value.shape
~/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/variables.py in init(self, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint)
257 dtype=dtype,
258 expected_shape=expected_shape,
--> 259 constraint=constraint)
260
261 def repr(self):
~/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/variables.py in _init_from_args(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, expected_shape, constraint)
385 "construct, such as a loop or conditional. When creating a "
386 "variable inside a loop or conditional, use a lambda as the "
--> 387 "initializer." % name)
388 # pylint: enable=protected-access
389 shape = (self._initial_value.get_shape()
ValueError: Initializer for variable conv2d_2/kernel/ is from inside a control-flow construct, such as a loop or conditional. When creating a variable inside a loop or conditional, use a lambda as the initializer.
why you code is not working ?
Can you please publish code that is work ?
I have this error:
('Failed to import pydot. You must pip install pydot
and install graphviz (https://graphviz.gitlab.io/download/), ', 'for pydotprint
to work.')
ITERATION NUMBER 1
BEST PLAYER VERSION 0
SELF PLAYING 30 EPISODES...
1
UnimplementedError Traceback (most recent call last)
in
63 ######## SELF PLAY ########
64 print('SELF PLAYING ' + str(config.EPISODES) + ' EPISODES...')
---> 65 _, memory, _, _ = playMatches(best_player, best_player, config.EPISODES, lg.logger_main, turns_until_tau0 = config.TURNS_UNTIL_TAU0, memory = memory)
66 print('\n')
67
~\Desktop\finantial risk\DeepReinforcementLearning-master\funcs.py in playMatches(player1, player2, EPISODES, logger, turns_until_tau0, memory, goes_first)
84 #### Run the MCTS algo and return an action
85 if turn < turns_until_tau0:
---> 86 action, pi, MCTS_value, NN_value = players[state.playerTurn]['agent'].act(state, 1)
87 else:
88 action, pi, MCTS_value, NN_value = players[state.playerTurn]['agent'].act(state, 0)
~\Desktop\finantial risk\DeepReinforcementLearning-master\agent.py in act(self, state, tau)
84 lg.logger_mcts.info('****** SIMULATION %d ', sim + 1)
85 lg.logger_mcts.info('*********************')
---> 86 self.simulate()
87
88 #### get action values
~\Desktop\finantial risk\DeepReinforcementLearning-master\agent.py in simulate(self)
66
67 ##### EVALUATE THE LEAF NODE
---> 68 value, breadcrumbs = self.evaluateLeaf(leaf, value, done, breadcrumbs)
69
70 ##### BACKFILL THE VALUE THROUGH THE TREE
~\Desktop\finantial risk\DeepReinforcementLearning-master\agent.py in evaluateLeaf(self, leaf, value, done, breadcrumbs)
134 if done == 0:
135
--> 136 value, probs, allowedActions = self.get_preds(leaf.state)
137 lg.logger_mcts.info('PREDICTED VALUE FOR %d: %f', leaf.state.playerTurn, value)
138
~\Desktop\finantial risk\DeepReinforcementLearning-master\agent.py in get_preds(self, state)
108 inputToModel = np.array([self.model.convertToModelInput(state)])
109
--> 110 preds = self.model.predict(inputToModel)
111 value_array = preds[0]
112 logits_array = preds[1]
~\Desktop\finantial risk\DeepReinforcementLearning-master\model.py in predict(self, x)
28
29 def predict(self, x):
---> 30 return self.model.predict(x)
31
32 def fit(self, states, targets, epochs, verbose, validation_split, batch_size):
~\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
1627 for step in data_handler.steps():
1628 callbacks.on_predict_batch_begin(step)
-> 1629 tmp_batch_outputs = self.predict_function(iterator)
1630 if data_handler.should_sync:
1631 context.async_wait()
~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in call(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()
~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
892 *args, **kwds)
893 # If we did not create any variables the trace we have is good enough.
--> 894 return self._concrete_stateful_fn._call_flat(
895 filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
896
~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1916 and executing_eagerly):
1917 # No tape is watching; skip to running the function.
-> 1918 return self._build_call_outputs(self._inference_function.call(
1919 ctx, args, cancellation_manager=cancellation_manager))
1920 forward_backward = self._select_forward_and_backward_functions(
~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
553 with _InterpolateFunctionError(self):
554 if cancellation_manager is None:
--> 555 outputs = execute.execute(
556 str(self.signature.name),
557 num_outputs=self._num_outputs,
~\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
57 try:
58 ctx.ensure_initialized()
---> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
UnimplementedError: The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
[[node model_5/conv2d_65/Conv2D (defined at C:\Users\Admin\Desktop\finantial risk\DeepReinforcementLearning-master\model.py:30) ]] [Op:__inference_predict_function_6283]
Function call stack:
predict_function
I have been getting this error when running the second block of code in Jupytr:
`AttributeError Traceback (most recent call last)
in ()
22
23 # create an untrained neural network objects from the config file
---> 24 current_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
25 best_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
26
/home/ubuntu/workspace/model.py in init(self, reg_const, learning_rate, input_dim, output_dim, hidden_layers)
112 self.hidden_layers = hidden_layers
113 self.num_layers = len(hidden_layers)
--> 114 self.model = self._build_model()
115
116 def residual_layer(self, input_block, filters, kernel_size):
/home/ubuntu/workspace/model.py in _build_model(self)
223 main_input = Input(shape = self.input_dim, name = 'main_input')
224
--> 225 x = self.conv_layer(main_input, self.hidden_layers[0]['filters'], self.hidden_layers[0]['kernel_size'])
226
227 if len(self.hidden_layers) > 1:
/home/ubuntu/workspace/model.py in conv_layer(self, x, filters, kernel_size)
149
150 x = BatchNormalization(axis=1)(x)
--> 151 x = LeakyReLU()(x)
152
153 return (x)
/home/ubuntu/miniconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in call(self, inputs, **kwargs)
615
616 # Actually call the layer, collecting output(s), mask(s), and shape(s).
--> 617 output = self.call(inputs, **kwargs)
618 output_mask = self.compute_mask(inputs, previous_mask)
619
/home/ubuntu/miniconda2/lib/python2.7/site-packages/keras/layers/advanced_activations.pyc in call(self, inputs)
44
45 def call(self, inputs):
---> 46 return K.relu(inputs, alpha=self.alpha)
47
48 def get_config(self):
/home/ubuntu/miniconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.pyc in relu(x, alpha, max_value)
2916 """
2917 if alpha != 0.:
-> 2918 x = tf.nn.leaky_relu(x, alpha)
2919 else:
2920 x = tf.nn.relu(x)
AttributeError: 'module' object has no attribute 'leaky_relu'`
Do you have to stop the training manually?
By stopping it manually and performig the inference with the following code:
from game import Game
from funcs import playMatchesBetweenVersions
import loggers as lg
env = Game()
playMatchesBetweenVersions(env,1,1,1,10, lg.logger_tourney, 0)
I get this error:
OSError: SavedModel file does not exist at: ./run_archive/connect4/run0001/models/version0001.h5/{saved_model.pbtxt|saved_model.pb}
Line 116 of funcs.py I'm not sure value can ever be 1 based on how you've defined it in game.py. Based on game.py function _getValue it can only ever be -1 or 0.
For a complex context, the training process is very time consumed.
It will be very helpful to add distributed feature.
It should be correct to add Dirichlet noise only once in act() (and the "adjP" continue to be used throughout the tree search), not in moveToLeaf(). The role of the noise should be "to search deeper into some branches on a whim". In the current implementation, the branches chosen are too uniformly distributed.
Hello everybody,
I'm trying to reuse this codebase for another game in which pieces can move freely on the board. At a certain point, during the first game, the method moveToLeaf()
in the class MCTS
starts looping for ever.
It seems like the condition while not currentNode.isLeaf():
is never satisfied.
Do you have any hint for finding why this issue occurs?
Thank in advance,
Fabrizio
P.S.: see my fork for full code -> https://github.com/fmicheloni/DeepReinforcementLearning
EDIT:
Here a few logs of what's happening:
2018-04-07 13:57:44,052 INFO PLAYER TURN...-1
2018-04-07 13:57:44,052 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action with highest Q + U...192
2018-04-07 13:57:44,053 INFO PLAYER TURN...-1
2018-04-07 13:57:44,053 INFO action: 190 (1)... N = 0, P = 0.209210, nu = 0.000000, adjP = 0.209210, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 102 (4)... N = 0, P = 0.196694, nu = 0.000000, adjP = 0.196694, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 118 (6)... N = 0, P = 0.182340, nu = 0.000000, adjP = 0.182340, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 122 (3)... N = 0, P = 0.205482, nu = 0.000000, adjP = 0.205482, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 123 (4)... N = 0, P = 0.206274, nu = 0.000000, adjP = 0.206274, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action with highest Q + U...190
2018-04-07 13:57:44,054 INFO PLAYER TURN...-1
2018-04-07 13:57:44,054 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action with highest Q + U...192
2018-04-07 13:57:44,055 INFO PLAYER TURN...-1
2018-04-07 13:57:44,055 INFO action: 190 (1)... N = 0, P = 0.209210, nu = 0.000000, adjP = 0.209210, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 102 (4)... N = 0, P = 0.196694, nu = 0.000000, adjP = 0.196694, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 118 (6)... N = 0, P = 0.182340, nu = 0.000000, adjP = 0.182340, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 122 (3)... N = 0, P = 0.205482, nu = 0.000000, adjP = 0.205482, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 123 (4)... N = 0, P = 0.206274, nu = 0.000000, adjP = 0.206274, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action with highest Q + U...190
2018-04-07 13:57:44,056 INFO PLAYER TURN...-1
2018-04-07 13:57:44,056 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action with highest Q + U...192
Those two actions keep looping.
In this file: MCTS.py, there is such code snippet:
if Q + U > maxQU:
maxQU = Q + U
simulationAction = action
simulationEdge = edge
lg.logger_mcts.info('action with highest Q + U...%d', simulationAction)
There will be error when "if" not hold:
UnboundLocalError: local variable 'simulationAction' referenced before assignment
I encountered such error, was it a problem in game.py, or in this MCTS.py?
Hi there, would like to ask if anyone else has had this problem. I've trained up an agent for a couple of days using the high-quality settings (higher self-play and simulation parameters etc.), but when I play test games against it I notice that at times when I miss blocking its win (it has 3-in-a-row already on a diagonal) and play somewhere else, it also ignores its own win and plays elsewhere. This may go on for more than a couple of moves and it never takes the win. I am loading the weights from my model and using act() to get the agent's moves, with tau set to 0 so it acts deterministic.
Would this be a problem with the code or is it explainable in terms of exploitation vs exploration (where the agent is confused when encountering such situations because it has never explored that avenue because it will always block 3-in-a-rows when given the opportunity)? Would there be any way to discourage this behavior apart from hard-coding a 'win-lose check' that prioritizes playing to connect 3-in-a-rows first?
hi friends
any ideas what to do about this warning?
WARNING:tensorflow:From /home/myuser/alpha2/DeepReinforcementLearning/loss.py:15: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
and how to install this?
pydot
failed to call GraphViz.Please install GraphViz (https://www.graphviz.org/) and ensure that its executables are in the $PATH
thank u :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.