Git Product home page Git Product logo

Comments (5)

tacertain avatar tacertain commented on September 21, 2024 1

I just reproduced this on a metal Ubuntu 20.04 machine with TF 2.16.1 and Keras 3.3.3.

from open_spiel.

lanctot avatar lanctot commented on September 21, 2024

Hi,

This is a pretty complex setup.. I'm not sure how we can help as we don't have a setup like this to reproduce it.

Have you tried the simple program on that thread you linked, i.e. tensorflow/tensorflow#57877 (comment) ?

Did you see that CUDA support on Windows was being removed in TF? tensorflow/tensorflow#59905. According to that thread, it should still work in WSL. Seems like you need WSL2... but it seems like you are indeed using that. So, yeah.. seems like it should work.

@tewalds: any ideas?

from open_spiel.

jthemphill avatar jthemphill commented on September 21, 2024

I don't think this is a complex setup at all! I installed a recent version of tensorflow[and-cuda], with a GPU that supports CUDA.

I made sure to use the correct dependency versions, even going so far as to track down the missing old version of tensorrt-lib, which should be in PyPi but isn't!

I then ran alpha_zero.py. Does alpha_zero.py work for you when you use it with a GPU?

I did run the failing code example in the linked issue, and it did fail in the same way. It seems to me that alpha_zero.py forks processes in a way that CUDA does not support!

from open_spiel.

lanctot avatar lanctot commented on September 21, 2024

I don't think this is a complex setup at all!

Well, first: OpenSpiel is not officially supported on Windows. We don't have Windows machines easily at our disposal, so we don't test things on Windows hosts and have only run things ourselves within WSL a few times. I have no clue how CUDA drivers are supported through WSL.

Second, you're using most recent nightly versions of TF that we are not testing on our CI regularly (we're only testing 2.12.0, see here. Due to this, the new TF requires a specific/custom older version of tensorrt and tensorrt-lib. Maybe these don't come with CUDA support, or are not getting built properly? 🤷

Third, TF most recently stopped support CUDA on Windows. That should not affect you due to running within WSL, but I wonder if in the process of disabling CUDA on native Windows, something else in the code chain is causing the CUDA issues within your setup. (I realize this is unlikely.)

Then there's a thread that might be related because it's a forking actor ... ?

That feels like a pretty complex setup to me. We'll do our best to help, but without being able to mimic your setup, it will be difficult.

Does alpha_zero.py work for you when you use it with a GPU?

I believe @tewalds might be the only one who has run our Python AlphaZero using CUDA; IIRC it was almost certainly on a native Linux machine, and I believe it was about 3 years ago. 😅

I don't know of any instances of people running the Python TF AlphaZero using CUDA within WSL. I barely know one person who has used it with CUDA, and it was long ago. The more common use is C++ LibTorch version on native Linux machines, because it's faster.

I'd like to know if it currently runs on a Linux machine with CUDA. @tewalds, is it easy for you to try on your desktop? Can you tell me if you run into the same issue?

from open_spiel.

tacertain avatar tacertain commented on September 21, 2024

I have been updating the python AlphaZero to Keras 3, and I'm running into the same thing. I don't think that it's a Windows problem. There's some challenge with Keras 3 and forking. There are a few forum posts about it, but nothing definitive, e.g. https://stackoverflow.com/questions/33748750/cuda-error-initialization-error-when-using-parallel-in-python. I did try changing the start method to "spawn" in spawn.py, but that didn't fix it.

I might look into whether it's possible to lazy load the core keras libraries. It didn't seem super easy, but I don't have a lot of ideas. It's not super obvious what's going on, because, for example, this sample code runs correctly:

import collections
import datetime
import functools
import itertools
import json
import os
import random
import sys
import tempfile
import time
import traceback

import numpy as np

import keras.callbacks as kcb

from open_spiel.python.algorithms import mcts
from open_spiel.python.algorithms.alpha_zero import evaluator as evaluator_lib
from open_spiel.python.algorithms.alpha_zero import model as model_lib
import pyspiel
from open_spiel.python.utils import data_logger
from open_spiel.python.utils import file_logger
from open_spiel.python.utils import spawn
from open_spiel.python.utils import stats

import tensorflow as tf
import time

def child(queue):
    print("Child GPUs: " + str(tf.config.list_physical_devices('GPU')))

    for i in range(20):
        a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
        b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
        c = tf.matmul(a, b)

        print("Child: " +str(c))


child_proc = spawn.Process(child)
print("Parent GPUs: " + str(tf.config.list_physical_devices('GPU')))

for i in range(20):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)

    print("Parent: " +str(c))

from open_spiel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.