Comments (5)
I just reproduced this on a metal Ubuntu 20.04 machine with TF 2.16.1 and Keras 3.3.3.
from open_spiel.
Hi,
This is a pretty complex setup.. I'm not sure how we can help as we don't have a setup like this to reproduce it.
Have you tried the simple program on that thread you linked, i.e. tensorflow/tensorflow#57877 (comment) ?
Did you see that CUDA support on Windows was being removed in TF? tensorflow/tensorflow#59905. According to that thread, it should still work in WSL. Seems like you need WSL2... but it seems like you are indeed using that. So, yeah.. seems like it should work.
@tewalds: any ideas?
from open_spiel.
I don't think this is a complex setup at all! I installed a recent version of tensorflow[and-cuda]
, with a GPU that supports CUDA.
I made sure to use the correct dependency versions, even going so far as to track down the missing old version of tensorrt-lib, which should be in PyPi but isn't!
I then ran alpha_zero.py
. Does alpha_zero.py
work for you when you use it with a GPU?
I did run the failing code example in the linked issue, and it did fail in the same way. It seems to me that alpha_zero.py forks processes in a way that CUDA does not support!
from open_spiel.
I don't think this is a complex setup at all!
Well, first: OpenSpiel is not officially supported on Windows. We don't have Windows machines easily at our disposal, so we don't test things on Windows hosts and have only run things ourselves within WSL a few times. I have no clue how CUDA drivers are supported through WSL.
Second, you're using most recent nightly versions of TF that we are not testing on our CI regularly (we're only testing 2.12.0, see here. Due to this, the new TF requires a specific/custom older version of tensorrt and tensorrt-lib. Maybe these don't come with CUDA support, or are not getting built properly? 🤷
Third, TF most recently stopped support CUDA on Windows. That should not affect you due to running within WSL, but I wonder if in the process of disabling CUDA on native Windows, something else in the code chain is causing the CUDA issues within your setup. (I realize this is unlikely.)
Then there's a thread that might be related because it's a forking actor ... ?
That feels like a pretty complex setup to me. We'll do our best to help, but without being able to mimic your setup, it will be difficult.
Does alpha_zero.py work for you when you use it with a GPU?
I believe @tewalds might be the only one who has run our Python AlphaZero using CUDA; IIRC it was almost certainly on a native Linux machine, and I believe it was about 3 years ago. 😅
I don't know of any instances of people running the Python TF AlphaZero using CUDA within WSL. I barely know one person who has used it with CUDA, and it was long ago. The more common use is C++ LibTorch version on native Linux machines, because it's faster.
I'd like to know if it currently runs on a Linux machine with CUDA. @tewalds, is it easy for you to try on your desktop? Can you tell me if you run into the same issue?
from open_spiel.
I have been updating the python AlphaZero to Keras 3, and I'm running into the same thing. I don't think that it's a Windows problem. There's some challenge with Keras 3 and forking. There are a few forum posts about it, but nothing definitive, e.g. https://stackoverflow.com/questions/33748750/cuda-error-initialization-error-when-using-parallel-in-python. I did try changing the start method to "spawn" in spawn.py, but that didn't fix it.
I might look into whether it's possible to lazy load the core keras libraries. It didn't seem super easy, but I don't have a lot of ideas. It's not super obvious what's going on, because, for example, this sample code runs correctly:
import collections
import datetime
import functools
import itertools
import json
import os
import random
import sys
import tempfile
import time
import traceback
import numpy as np
import keras.callbacks as kcb
from open_spiel.python.algorithms import mcts
from open_spiel.python.algorithms.alpha_zero import evaluator as evaluator_lib
from open_spiel.python.algorithms.alpha_zero import model as model_lib
import pyspiel
from open_spiel.python.utils import data_logger
from open_spiel.python.utils import file_logger
from open_spiel.python.utils import spawn
from open_spiel.python.utils import stats
import tensorflow as tf
import time
def child(queue):
print("Child GPUs: " + str(tf.config.list_physical_devices('GPU')))
for i in range(20):
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print("Child: " +str(c))
child_proc = spawn.Process(child)
print("Parent GPUs: " + str(tf.config.list_physical_devices('GPU')))
for i in range(20):
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print("Parent: " +str(c))
from open_spiel.
Related Issues (20)
- chat_game_base.py prints lots of stuff during testing
- Issue with Downloading OpenSpiel Package via pip HOT 7
- Congestion games via C++ API HOT 5
- RNaD negative loss and barely any correlation of loss with NashConv HOT 7
- Potential issue: Reach probabilities not updated for chance nodes in ExternalSamplingMCCFRSolver HOT 1
- Suggestion: Replace flat list of available games with a table HOT 3
- RNaD - MLP alternatives HOT 1
- RNaD - Multiple policy heads implementation HOT 1
- Converting a `pyspiel` game state to a dictionary of array-likes HOT 1
- pybind11 error HOT 5
- Python Import Issues - Windows Build HOT 2
- How to View Training Progress and Results for Nim Game Example? HOT 1
- EFR code issue with NumPy HOT 4
- Problem with absl HOT 1
- alpha_zero and python HOT 3
- Inquiry about Available Tensor Games HOT 1
- Phantom Tic-Tac-Toe information state bug HOT 2
- Dark Hex information state bug HOT 2
- Negotiation Game deterministic state HOT 1
- State serialization does not hold the player turn information HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from open_spiel.