A replica of the AlphaZero methodology for deep reinforcement learning in Python

License: GNU General Public License v3.0

Jupyter Notebook 96.19% Python 3.81%

deepreinforcementlearning's People

Contributors

Stargazers

Watchers

Forkers

derekliang hmate9 shafiahmed lulzzz kylmcgr thinmanj yashbhutwala paurichardson boluoyu kustomzone fyyw jiths mafm dlnotes shubhampachori12110095 artisdom truongbuu xifengbishu amoliu hbcbh1999 edisonguo samliu little1tow sebluo christmas9 cedricboidin sassontheroad gouthambs shanbady codeaudit ravi-annaswamy mahmoudfarouq jeffchang maloef valaentine karma0 adityasarathy wubr2000 evilrovot tribbloid arorahardeep chenxy3791 hytsang googlecloudforum maheshgoudt qa1 ourobouros spendyala cyrsis ajay-sreeram shaobin koriim matthiasmwolf thingself aayustark007 dailyactie mwasa zencoding prataapms maistrotoad miquelramirez sundeepblue madwind76 fcarsten alexjinchoi airob claudecoulombe dobre-robert-marius wingie amarchin aymar73 heikoschmidle planck-labs tj2013 intmod yangpeng0607 jaehyek morristech linchao0815 yangboz aicarmark jxlin postpcera alexcfleming xunzhaocunzi rtao themandunord apolmig kdrbt gurusura vcalderon2009 iht cbennett fpcmotif sulasen zhuzhengyi 1715509415 rogermt ioanszilagyi jennifermarsman

deepreinforcementlearning's Issues

Not working for Python 3

For some reasons (#1) and because of the use of reload(now importlib.reload), the notebook don't work for Python 3.

What's your enviroment and hardware?

Hello, thanks your excellent work.
I wonder to know what's your enviroment?
Cuda8.0? or something else.
And your hardware to run this code.
NVIDIA GTX 1080 Ti?

Missing requirements / pipfile

Running this example has the following prerequisites

Graphviz

Example install for Ubuntu 16.04

sudo apt install graphviz

Python 2.7

It also requires the following packages be installed and working. Here is an example Pipfile which can be used with pipenv to get running

Copy to a file name 'Pipfile' in dir DeepReinforcementLearning/

[[source]]

url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"

[dev-packages]


[packages]

jupyter = "*"
numpy = "*"
Keras = "*"
tensorflow = "*"
matplotlib = "*"
pydot = "*"

[requires]
python_version = "2.7"

Then install the required modules from that Pipfile using pipenv and run Jupyter

pip install --user pipenv
cd DeepReinforcementLearning/
pipenv install
pipenv shell 
jupyter notebook

Other steps may be required on other systems, this was the minimum required to get running on my Ubuntu 16.04 machine.

ImportError: Failed to import `pydot

When I open run.ipynb in jupiter, in block 2 (In: 2) I have a message:

ImportError Traceback (most recent call last)
in
39 #copy the config file to the run folder
40 copyfile('./config.py', run_folder + 'config.py')
---> 41 plot_model(current_NN.model, to_file=run_folder + 'models/model.png', show_shapes = True)
42
43 print('\n')

c:\users\вова\appdata\local\programs\python\python36\lib\site-packages\keras\utils\vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir)
130 'LR' creates a horizontal plot.
131 """
--> 132 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
133 _, extension = os.path.splitext(to_file)
134 if not extension:

c:\users\вова\appdata\local\programs\python\python36\lib\site-packages\keras\utils\vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir)
53 from ..models import Sequential
54
---> 55 _check_pydot()
56 dot = pydot.Dot()
57 dot.set('rankdir', rankdir)

c:\users\вова\appdata\local\programs\python\python36\lib\site-packages\keras\utils\vis_utils.py in _check_pydot()
18 if pydot is None:
19 raise ImportError(
---> 20 'Failed to import pydot. '
21 'Please install pydot. '
22 'For example with pip install pydot.')

ImportError: Failed to import `pydot`. Please install `pydot`. For example with `pip install pydot`.

Reinstalling keras and pydot on the required versions does not solve the problem

Lie detector !!! Wanted information

Respected mam,
How did you converted eeg signal data into XML file ? From emotive application ?? How please reply

only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

when i enter my action,there was a error,please help me

HIDDEN_CNN_LAYERS from config.py

Tell me why these layers are needed and why are there 6 of them? What will be the number of hidden layers with a board of a different size?

Incorrect Implementation of Resudial Layer

I was comparing your Residual Architecture to the actual Residual Architecture as shown in this cheat sheet.

Your Implementation

input
Conv2D 75x4x4
BatchNormalization
Add input
LeakyReLU

Actual Implementation

input
Conv2D 256x3x3
BatchNormalization
ReLU
Conv2D 256x3x3
BatchNormalization
Add input
ReLU

I'm just going to put this here so that people know the difference in implementation.

Aren't tournament plays always identical?

Hi!

Thanks for you implementation, I have a question:
Aren't tournament plays always the same as you set tau to 0 (so fully deterministic play) and each episode reset MCTS tree in playMatches function's for loop? It means that each time players start with zero knowledge (empty tree), do deterministic search and then act deterministically. No stochasticity, then games should look exactly the same. Do I miss something?

I also wonder how DeepMind did it in original paper (they player 400 episodes in tournament), do you have some insight? Guys from Stanford doesn't reset MCTS tree between tournament games, so players increment their knowledge and hence (possibly) play different more informed games each time. This makes sens.

Thanks for your attention!

About learning rate and loss decrease

Sorry to bother you! I am making an alphazero implementation similar to yours, which is also for the Connect4 board game. The training went smooth at first, however, after 70+ iterations, the loss can no longer decrease. I manually set the learning rate from 1e-3 to 1e-5, but the loss still gradually increases. Then I came across your blog about your implementation, and I find it very similar to mine. Have you ever met this case in your experiments? Hopefully you could offer me some advice :)

Is this only suit for dual board game?

How about multiple player complex strategy like cards game which contains unfixed actions?
I am writing a board game whose player can draw cards, play cards to get some scores, I am stuck with action_size state_size these parameter, because I can't fill them like board game, that's row x column. Anyone could give me some hints?

Issue with concatenation of string and integer while pickling

Very minor issue with the code, main.py:103:

pickle.dump( memory, open( run_folder + "memory/memory" + iteration + ".p", "wb" ) )

should have probably been:

pickle.dump( memory, open( run_folder + "memory/memory" + str(iteration) + ".p", "wb" ) )

What does Game.identities() do?

I can only see that it create a horizontal mirror version of the current board. But why do we need that?

Comparison with AlphaZero results

Thank you very much for this great source of information!!

Has any experiments been done to see if results comparable with DeepMind's chess and shogi publications can be achived. i.e. is this implementation indeed feature complete and as written in the Nature papers?

run.ipynb has issues

Hello. I really appreciate your work and am trying to run the run.ipynb file on my machine. There is an error shown on the GitHub commit and I cannot seem to get it to work on my machine as well.

OSError: Unable to open file (unable to open file: name = './run_archive/connect4/run0001/models/version0001.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I can not play matches between versions:

OSError Traceback (most recent call last)
in ()
4
5 env = Game()
----> 6 playMatchesBetweenVersions(env, 1, 1, 1, 10, lg.logger_tourney, 0)

~/DeepReinForcementLearning/DeepReinforcementLearning/funcs.py in playMatchesBetweenVersions(env, run_version, player1version, player2version, EPISODES, logger, turns_until_tau0, goes_first)
19
20 if player1version > 0:
---> 21 player1_network = player1_NN.read(env.name, run_version, player1version)
22 player1_NN.model.set_weights(player1_network.get_weights())
23 player1 = Agent('player1', env.state_size, env.action_size, config.MCTS_SIMS, config.CPUCT, player1_NN)

~/DeepReinForcementLearning/DeepReinforcementLearning/model.py in read(self, game, run_number, version)
37
38 def read(self, game, run_number, version):
---> 39 return load_model( run_archive_folder + game + '/run' + str(run_number).zfill(4) + "/models/version" + "{0:0>4}".format(version) + '.h5', custom_objects={'softmax_cross_entropy_with_logits': softmax_cross_entropy_with_logits})
40
41 def printWeightAverages(self):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/models.py in load_model(filepath, custom_objects, compile)
235 return custom_objects[obj]
236 return obj
--> 237 with h5py.File(filepath, mode='r') as f:
238 # instantiate model
239 model_config = f.attrs.get('model_config')

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h5py/_hl/files.py in init(self, name, mode, driver, libver, userblock_size, swmr, **kwds)
267 with phil:
268 fapl = make_fapl(driver, libver, **kwds)
--> 269 fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
270
271 if swmr_support:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
97 if swmr and swmr_support:
98 flags |= h5f.ACC_SWMR_READ
---> 99 fid = h5f.open(name, flags, fapl=fapl)
100 elif mode == 'r+':
101 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.open()

OSError: Unable to open file (unable to open file: name = './run_archive/connect4/run0001/models/version0001.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Will the code not be much faster if a GPU implementation of MCTS is used?

Hi,

I was playing around with the code and I was wondering if a GPU implementation of MCTS is used, shouldn't the code be almost 100 times faster like described in the paper below?

https://pdfs.semanticscholar.org/fe90/c1f9955ba1f06f5ef26bde100bcc5c7a3327.pdf

This would improve the performance of the algorithm quite significantly.

name 'reload' is not defined

Hi. I cloned your repo and got it to work until i ran "2. Now run this block to start the learning process" in run.ipynb and I got this error:

NameError Traceback (most recent call last)
in ()
53
54 iteration += 1
---> 55 reload(lg)
56 reload(config)
57

NameError: name 'reload' is not defined

I'm not sure what to go from here as I'm not familiar with the codebase. I would really appreciate if you could give some guidance here. Thanks!

Interested in implementing other games

I was going through the code and thinking of the changes that should be made in the case of a more complex game (Chess for example...) where ideally the allowed actions would be a list of tuples (piece position, target position), and I'm having a slight suspicion that the agent code has to be altered in some way. Any input on this ?

Performance against humans?

Hi! Have you evaluated the winning rate of the trained model against humans? I would like to know about the experience that humans play with the trained model. Is the model difficult for humans to beat?

unsupported operand type(s) for *: 'NoneType' and 'float'

AttributeError Traceback (most recent call last)
C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
51 try:
---> 52 return getattr(obj, method)(*args, **kwds)
53

AttributeError: 'NoneType' object has no attribute 'round'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
in ()
4
5 env = Game()
----> 6 playMatchesBetweenVersions(env, 1, -1, 1, 10, lg.logger_tourney, 0)

~\Documents\GitHub\DeepReinforcementLearning-master\funcs.py in playMatchesBetweenVersions(env, run_version, player1version, player2version, EPISODES, logger, turns_until_tau0, goes_first)
33 player2 = Agent('player2', env.state_size, env.action_size, config.MCTS_SIMS, config.CPUCT, player2_NN)
34
---> 35 scores, memory, points, sp_scores = playMatches(player1, player2, EPISODES, logger, turns_until_tau0, None, goes_first)
36
37 return (scores, memory, points, sp_scores)

~\Documents\GitHub\DeepReinforcementLearning-master\funcs.py in playMatches(player1, player2, EPISODES, logger, turns_until_tau0, memory, goes_first)
96 for r in range(env.grid_shape[0]):
97 logger.info(['----' if x == 0 else '{0:.2f}'.format(np.round(x,2)) for x in pi[env.grid_shape[1]*r : (env.grid_shape[1]*r + env.grid_shape[1])]])
---> 98 logger.info('MCTS perceived value for %s: %f', state.pieces[str(state.playerTurn)] ,np.round(MCTS_value,2))
99 logger.info('NN perceived value for %s: %f', state.pieces[str(state.playerTurn)] ,np.round(NN_value,2))
100 logger.info('====================')

C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in round_(a, decimals, out)
2849
2850 """
-> 2851 return around(a, decimals=decimals, out=out)
2852
2853

C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in around(a, decimals, out)
2835
2836 """
-> 2837 return _wrapfunc(a, 'round', decimals=decimals, out=out)
2838
2839

C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
60 # a downstream library like 'pandas'.
61 except (AttributeError, TypeError):
---> 62 return _wrapit(obj, method, *args, **kwds)
63
64

C:\ProgramData\Anaconda2\envs\keras36\lib\site-packages\numpy\core\fromnumeric.py in _wrapit(obj, method, *args, **kwds)
40 except AttributeError:
41 wrap = None
---> 42 result = getattr(asarray(obj), method)(*args, **kwds)
43 if wrap:
44 if not isinstance(result, mu.ndarray):

TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'

Hi. Running this in python 3.6.

Managed to get past lots of errors so far with the help of this thread but I've got no idea on this.

Can anybody help, please?

How to switch NHWC to NCHW

I keep getting this issue and have no progress at all, please help. I've already done a lot of searching on google, Tried reshape the 'x' in model.py predict().
The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW

Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.

Hi everyone!
Trying to run the second block in run.ipynb (start learning process) but I get the following error. I have Python 3.5 installed. Both pydot and graphviz are installed in the proper environment. This is the error I get. Thanks in advance for your help.

FileNotFoundError Traceback (most recent call last)
~/anaconda2/envs/py35/lib/python3.5/site-packages/pydot.py in create(self, prog, format)
1877 shell=False,
-> 1878 stderr=subprocess.PIPE, stdout=subprocess.PIPE)
1879 except OSError as e:

~/anaconda2/envs/py35/lib/python3.5/subprocess.py in init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds)
675 errread, errwrite,
--> 676 restore_signals, start_new_session)
677 except:

~/anaconda2/envs/py35/lib/python3.5/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
1288 err_msg += ': ' + repr(orig_executable)
-> 1289 raise child_exception_type(errno_num, err_msg)
1290 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'dot'

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/utils/vis_utils.py in _check_pydot()
26 # to check the pydot/graphviz installation.
---> 27 pydot.Dot.create(pydot.Dot())
28 except Exception:

~/anaconda2/envs/py35/lib/python3.5/site-packages/pydot.py in create(self, prog, format)
1882 '"{prog}" not found in path.'.format(
-> 1883 prog=prog))
1884 else:

Exception: "dot" not found in path.

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
in ()
39 #copy the config file to the run folder
40 copyfile('./config.py', run_folder + 'config.py')
---> 41 plot_model(current_NN.model, to_file=run_folder + 'models/model.png', show_shapes = True)
42
43 print('\n')

~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/utils/vis_utils.py in plot_model(model, to_file, show_shapes, show_layer_names, rankdir)
133 'LR' creates a horizontal plot.
134 """
--> 135 dot = model_to_dot(model, show_shapes, show_layer_names, rankdir)
136 _, extension = os.path.splitext(to_file)
137 if not extension:

~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/utils/vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir)
54 from ..models import Sequential
55
---> 56 _check_pydot()
57 dot = pydot.Dot()
58 dot.set('rankdir', rankdir)

~/anaconda2/envs/py35/lib/python3.5/site-packages/keras/utils/vis_utils.py in _check_pydot()
29 # pydot raises a generic Exception here,
30 # so no specific class can be caught.
---> 31 raise ImportError('Failed to import pydot. You must install pydot'
32 ' and graphviz for pydotprint to work.')
33

ImportError: Failed to import pydot. You must install pydot and graphviz for pydotprint to work.

help

is it possible to adjust the game.py module to chess? and for that code to work?
i have a pretty big assignment that i thought i might use this code as a feature and i need to know if it's possible.
the code works for connect4, but sometimes i think it gets stuck on a never ending loop. notes on that?
another thing, is it possible to change the number of layers in the neural network or change the number of units (neurons)?
thank you

Does not work with tensorflow-gpu

After installing tensorflow-gpu, errors appeared in the model.py module:
Line 138

	def conv_layer(self, x, filters, kernel_size):
		x = Conv2D(
		filters = filters
		, kernel_size = kernel_size
		, data_format="channels_first"
		, padding = 'same'
		, use_bias=False
		, activation='linear'
		, kernel_regularizer = regularizers.l2(self.reg_const)
		)(x)
		x = BatchNormalization(axis=1)(x)

150 x = BatchNormalization(axis=1)(x)
^
ValueError
Shape must be rank 1 but is rank 0 for 'batch_normalization_1/cond/Reshape_4' (op: 'Reshape') with input shapes: [1,75,1,1], [].
File "C:\DRL\model.py", line 150, in conv_layer
x = BatchNormalization(axis=1)(x)
File "C:\DRL\model.py", line 225, in _build_model
x = self.conv_layer(main_input, self.hidden_layers[0]['filters'], self.hidden_layers[0]['kernel_size'])
File "C:\DRL\model.py", line 114, in init
self.model = self._build_model()
File "C:\DRL\Untitled-1.py", line 67, in
current_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)

environment challenge

I'm struggling to get the right versions of tensorflow to run this. It seems to want python 2.7.
Can you confirm that this works with tensorflow2x?

Is backfilling in MCTS done right?

In backfilling we're updating only the edges which were chosen during the simulation. But since we can reach a state from two different states doing two different actions, shouldn't we update that other part of the tree from where as well we could've come to the same state? That's my understanding of

Action value Q is updated to track the mean of all evaluations V in the subtree below that action

ValueError: Initializer for variable conv2d_2/kernel/ is from inside a control-flow construct, such as a loop or conditional. When creating a variable inside a loop or conditional, use a lambda as the initializer.

when I run the run.ipynb in jupyter. I get the things in the following:
ValueError Traceback (most recent call last)
in
22
23 # create an untrained neural network objects from the config file
---> 24 current_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
25 best_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
26

~/DeepReinforcementLearning-master/model.py in init(self, reg_const, learning_rate, input_dim, output_dim, hidden_layers)
112 self.hidden_layers = hidden_layers
113 self.num_layers = len(hidden_layers)
--> 114 self.model = self._build_model()
115
116 def residual_layer(self, input_block, filters, kernel_size):

~/DeepReinforcementLearning-master/model.py in _build_model(self)
223 main_input = Input(shape = self.input_dim, name = 'main_input')
224
--> 225 x = self.conv_layer(main_input, self.hidden_layers[0]['filters'], self.hidden_layers[0]['kernel_size'])
226
227 if len(self.hidden_layers) > 1:

~/DeepReinforcementLearning-master/model.py in conv_layer(self, x, filters, kernel_size)
146 , activation='linear'
147 , kernel_regularizer = regularizers.l2(self.reg_const)
--> 148 )(x)
149
150 x = BatchNormalization(axis=1)(x)

~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/engine/base_layer.py in call(self, inputs, **kwargs)
429 'You can build it manually via: '
430 'layer.build(batch_input_shape)')
--> 431 self.build(unpack_singleton(input_shapes))
432 self.built = True
433

~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/layers/convolutional.py in build(self, input_shape)
139 name='kernel',
140 regularizer=self.kernel_regularizer,
--> 141 constraint=self.kernel_constraint)
142 if self.use_bias:
143 self.bias = self.add_weight(shape=(self.filters,),

~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your ' + object_name + ' call to the ' +
90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/engine/base_layer.py in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint)
250 dtype=dtype,
251 name=name,
--> 252 constraint=constraint)
253 if regularizer is not None:
254 with K.name_scope('weight_regularizer'):

~/miniconda3/envs/py36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in variable(value, dtype, name, constraint)
400 v._uses_learning_phase = False
401 return v
--> 402 v = tf.Variable(value, dtype=tf.as_dtype(dtype), name=name)
403 if isinstance(value, np.ndarray):
404 v._keras_shape = value.shape

~/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/variables.py in init(self, initial_value, trainable, collections, validate_shape, caching_device, name, variable_def, dtype, expected_shape, import_scope, constraint)
257 dtype=dtype,
258 expected_shape=expected_shape,
--> 259 constraint=constraint)
260
261 def repr(self):

~/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/variables.py in _init_from_args(self, initial_value, trainable, collections, validate_shape, caching_device, name, dtype, expected_shape, constraint)
385 "construct, such as a loop or conditional. When creating a "
386 "variable inside a loop or conditional, use a lambda as the "
--> 387 "initializer." % name)
388 # pylint: enable=protected-access
389 shape = (self._initial_value.get_shape()

ValueError: Initializer for variable conv2d_2/kernel/ is from inside a control-flow construct, such as a loop or conditional. When creating a variable inside a loop or conditional, use a lambda as the initializer.

why you code is not working ?

working code

Can you please publish code that is work ?

I have this error:

('Failed to import pydot. You must pip install pydot and install graphviz (https://graphviz.gitlab.io/download/), ', 'for pydotprint to work.')

ITERATION NUMBER 1
BEST PLAYER VERSION 0
SELF PLAYING 30 EPISODES...
1

UnimplementedError Traceback (most recent call last)
in
63 ######## SELF PLAY ########
64 print('SELF PLAYING ' + str(config.EPISODES) + ' EPISODES...')
---> 65 _, memory, _, _ = playMatches(best_player, best_player, config.EPISODES, lg.logger_main, turns_until_tau0 = config.TURNS_UNTIL_TAU0, memory = memory)
66 print('\n')
67

~\Desktop\finantial risk\DeepReinforcementLearning-master\funcs.py in playMatches(player1, player2, EPISODES, logger, turns_until_tau0, memory, goes_first)
84 #### Run the MCTS algo and return an action
85 if turn < turns_until_tau0:
---> 86 action, pi, MCTS_value, NN_value = players[state.playerTurn]['agent'].act(state, 1)
87 else:
88 action, pi, MCTS_value, NN_value = players[state.playerTurn]['agent'].act(state, 0)

~\Desktop\finantial risk\DeepReinforcementLearning-master\agent.py in act(self, state, tau)
84 lg.logger_mcts.info('****** SIMULATION %d ', sim + 1)
85 lg.logger_mcts.info('*********************')
---> 86 self.simulate()
87
88 #### get action values

~\Desktop\finantial risk\DeepReinforcementLearning-master\agent.py in simulate(self)
66
67 ##### EVALUATE THE LEAF NODE
---> 68 value, breadcrumbs = self.evaluateLeaf(leaf, value, done, breadcrumbs)
69
70 ##### BACKFILL THE VALUE THROUGH THE TREE

~\Desktop\finantial risk\DeepReinforcementLearning-master\agent.py in evaluateLeaf(self, leaf, value, done, breadcrumbs)
134 if done == 0:
135
--> 136 value, probs, allowedActions = self.get_preds(leaf.state)
137 lg.logger_mcts.info('PREDICTED VALUE FOR %d: %f', leaf.state.playerTurn, value)
138

~\Desktop\finantial risk\DeepReinforcementLearning-master\agent.py in get_preds(self, state)
108 inputToModel = np.array([self.model.convertToModelInput(state)])
109
--> 110 preds = self.model.predict(inputToModel)
111 value_array = preds[0]
112 logits_array = preds[1]

~\Desktop\finantial risk\DeepReinforcementLearning-master\model.py in predict(self, x)
28
29 def predict(self, x):
---> 30 return self.model.predict(x)
31
32 def fit(self, states, targets, epochs, verbose, validation_split, batch_size):

~\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
1627 for step in data_handler.steps():
1628 callbacks.on_predict_batch_begin(step)
-> 1629 tmp_batch_outputs = self.predict_function(iterator)
1630 if data_handler.should_sync:
1631 context.async_wait()

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in call(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
892 *args, **kwds)
893 # If we did not create any variables the trace we have is good enough.
--> 894 return self._concrete_stateful_fn._call_flat(
895 filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
896

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1916 and executing_eagerly):
1917 # No tape is watching; skip to running the function.
-> 1918 return self._build_call_outputs(self._inference_function.call(
1919 ctx, args, cancellation_manager=cancellation_manager))
1920 forward_backward = self._select_forward_and_backward_functions(

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
553 with _InterpolateFunctionError(self):
554 if cancellation_manager is None:
--> 555 outputs = execute.execute(
556 str(self.signature.name),
557 num_outputs=self._num_outputs,

~\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
57 try:
58 ctx.ensure_initialized()
---> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:

UnimplementedError: The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
[[node model_5/conv2d_65/Conv2D (defined at C:\Users\Admin\Desktop\finantial risk\DeepReinforcementLearning-master\model.py:30) ]] [Op:__inference_predict_function_6283]

Function call stack:
predict_function

AttributeError: 'module' object has no attribute 'leaky_relu'

I have been getting this error when running the second block of code in Jupytr:

`AttributeError Traceback (most recent call last)
in ()
22
23 # create an untrained neural network objects from the config file
---> 24 current_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
25 best_NN = Residual_CNN(config.REG_CONST, config.LEARNING_RATE, (2,) + env.grid_shape, env.action_size, config.HIDDEN_CNN_LAYERS)
26

/home/ubuntu/workspace/model.py in init(self, reg_const, learning_rate, input_dim, output_dim, hidden_layers)
112 self.hidden_layers = hidden_layers
113 self.num_layers = len(hidden_layers)
--> 114 self.model = self._build_model()
115
116 def residual_layer(self, input_block, filters, kernel_size):

/home/ubuntu/workspace/model.py in _build_model(self)
223 main_input = Input(shape = self.input_dim, name = 'main_input')
224
--> 225 x = self.conv_layer(main_input, self.hidden_layers[0]['filters'], self.hidden_layers[0]['kernel_size'])
226
227 if len(self.hidden_layers) > 1:

/home/ubuntu/workspace/model.py in conv_layer(self, x, filters, kernel_size)
149
150 x = BatchNormalization(axis=1)(x)
--> 151 x = LeakyReLU()(x)
152
153 return (x)

/home/ubuntu/miniconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in call(self, inputs, **kwargs)
615
616 # Actually call the layer, collecting output(s), mask(s), and shape(s).
--> 617 output = self.call(inputs, **kwargs)
618 output_mask = self.compute_mask(inputs, previous_mask)
619

/home/ubuntu/miniconda2/lib/python2.7/site-packages/keras/layers/advanced_activations.pyc in call(self, inputs)
44
45 def call(self, inputs):
---> 46 return K.relu(inputs, alpha=self.alpha)
47
48 def get_config(self):

/home/ubuntu/miniconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.pyc in relu(x, alpha, max_value)
2916 """
2917 if alpha != 0.:
-> 2918 x = tf.nn.leaky_relu(x, alpha)
2919 else:
2920 x = tf.nn.relu(x)

AttributeError: 'module' object has no attribute 'leaky_relu'`

Error with inference

Do you have to stop the training manually?

By stopping it manually and performig the inference with the following code:
from game import Game
from funcs import playMatchesBetweenVersions
import loggers as lg

env = Game()
playMatchesBetweenVersions(env,1,1,1,10, lg.logger_tourney, 0)

I get this error:
OSError: SavedModel file does not exist at: ./run_archive/connect4/run0001/models/version0001.h5/{saved_model.pbtxt|saved_model.pb}

Value can never 1

Line 116 of funcs.py I'm not sure value can ever be 1 based on how you've defined it in game.py. Based on game.py function _getValue it can only ever be -1 or 0.

How about with a distributed feature?

For a complex context, the training process is very time consumed.
It will be very helpful to add distributed feature.

Incorrect implementation of adding Dirichlet noise

It should be correct to add Dirichlet noise only once in act() (and the "adjP" continue to be used throughout the tree search), not in moveToLeaf(). The role of the noise should be "to search deeper into some branches on a whim". In the current implementation, the branches chosen are too uniformly distributed.

MCTS moveToLeaf() loops for ever

Hello everybody,

I'm trying to reuse this codebase for another game in which pieces can move freely on the board. At a certain point, during the first game, the method moveToLeaf() in the class MCTS starts looping for ever.

It seems like the condition while not currentNode.isLeaf(): is never satisfied.

Do you have any hint for finding why this issue occurs?

Thank in advance,

Fabrizio

P.S.: see my fork for full code -> https://github.com/fmicheloni/DeepReinforcementLearning

EDIT:

Here a few logs of what's happening:

2018-04-07 13:57:44,052 INFO PLAYER TURN...-1
2018-04-07 13:57:44,052 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action with highest Q + U...192

2018-04-07 13:57:44,053 INFO PLAYER TURN...-1
2018-04-07 13:57:44,053 INFO action: 190 (1)... N = 0, P = 0.209210, nu = 0.000000, adjP = 0.209210, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 102 (4)... N = 0, P = 0.196694, nu = 0.000000, adjP = 0.196694, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 118 (6)... N = 0, P = 0.182340, nu = 0.000000, adjP = 0.182340, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 122 (3)... N = 0, P = 0.205482, nu = 0.000000, adjP = 0.205482, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 123 (4)... N = 0, P = 0.206274, nu = 0.000000, adjP = 0.206274, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action with highest Q + U...190

2018-04-07 13:57:44,054 INFO PLAYER TURN...-1
2018-04-07 13:57:44,054 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action with highest Q + U...192

2018-04-07 13:57:44,055 INFO PLAYER TURN...-1
2018-04-07 13:57:44,055 INFO action: 190 (1)... N = 0, P = 0.209210, nu = 0.000000, adjP = 0.209210, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 102 (4)... N = 0, P = 0.196694, nu = 0.000000, adjP = 0.196694, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 118 (6)... N = 0, P = 0.182340, nu = 0.000000, adjP = 0.182340, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 122 (3)... N = 0, P = 0.205482, nu = 0.000000, adjP = 0.205482, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 123 (4)... N = 0, P = 0.206274, nu = 0.000000, adjP = 0.206274, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action with highest Q + U...190

2018-04-07 13:57:44,056 INFO PLAYER TURN...-1
2018-04-07 13:57:44,056 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action with highest Q + U...192

Those two actions keep looping.

UnboundLocalError: local variable 'simulationAction' referenced before assignment

In this file: MCTS.py, there is such code snippet:

                            if Q + U > maxQU:
					maxQU = Q + U
					simulationAction = action
					simulationEdge = edge

			lg.logger_mcts.info('action with highest Q + U...%d', simulationAction)

There will be error when "if" not hold:

UnboundLocalError: local variable 'simulationAction' referenced before assignment

I encountered such error, was it a problem in game.py, or in this MCTS.py?

Getting NameError: name 'xrange' is not defined

Weird behavior from trained Connect-4 agent

Hi there, would like to ask if anyone else has had this problem. I've trained up an agent for a couple of days using the high-quality settings (higher self-play and simulation parameters etc.), but when I play test games against it I notice that at times when I miss blocking its win (it has 3-in-a-row already on a diagonal) and play somewhere else, it also ignores its own win and plays elsewhere. This may go on for more than a couple of moves and it never takes the win. I am loading the weights from my model and using act() to get the agent's moves, with tau set to 0 so it acts deterministic.

Would this be a problem with the code or is it explainable in terms of exploitation vs exploration (where the agent is confused when encountering such situations because it has never explored that avenue because it will always block 3-in-a-rows when given the opportunity)? Would there be any way to discourage this behavior apart from hard-coding a 'win-lose check' that prioritizes playing to connect 3-in-a-rows first?

couple of issues GraphViz and TF issue

hi friends

any ideas what to do about this warning?

WARNING:tensorflow:From /home/myuser/alpha2/DeepReinforcementLearning/loss.py:15: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.

and how to install this?
pydot failed to call GraphViz.Please install GraphViz (https://www.graphviz.org/) and ensure that its executables are in the $PATH

thank u :)

applieddatasciencepartners / deepreinforcementlearning Goto Github PK

deepreinforcementlearning's People

Contributors

Stargazers

Watchers

Forkers

deepreinforcementlearning's Issues

ImportError: Failed to import pydot. Please install pydot. For example with pip install pydot.

Recommend Projects

Recommend Topics

Recommend Org

ImportError: Failed to import `pydot`. Please install `pydot`. For example with `pip install pydot`.