Dear When using: tensorboard --logdir=worker_0:'./

Thanks for the suggesting DMTSource! I have incorporated the <code class="notranslate"

I have added the sleep as follows: <div class="sn

A3C-Doom: worker_0 plot is lost about deeprl-agents HOT 10 CLOSED

IbrahimSobh commented on September 26, 2024

A3C-Doom: worker_0 plot is lost

from deeprl-agents.

Comments (10)

DMTSource commented on September 26, 2024 1

I have been having an issue that I think might be related/explain what you are seeing. I noticed when running the a3c doom code on 8-16 cpus, that sometimes one or two threads would not launch or at least were failing silently. The name of the thread, when printed from worker.work or other places, could be seen as being repeated or mixed up. So I would have two "1" threads or some other repeated index in the worker name.

When extending the code for personal use I ran into this repeatedly as my environment was much lighter and faster to start-up a worker. To fix the issue I added a simple sleep(0.5) (see code below). Now, when I print, from the worker.work, the name of the thread I no longer see repeated items and there is no longer a mix up of print locations and other bugs caused by the issue.

It appears workers were spooling up too quickly in my case and repeated or mixing up their context? I'm used to Scoop or Multiprocessing modules so I am unsure if this is a common issue with global scope and Threading?

for worker in workers:
        worker_work = lambda: worker.work(max_episode_length,gamma,sess,coord,saver)
        t = threading.Thread(target=(worker_work))
        t.start()
        worker_threads.append(t)
        sleep(0.5)
    coord.join(worker_threads)

from deeprl-agents.

awjuliani commented on September 26, 2024 1

Thanks for the suggesting DMTSource! I have incorporated the sleep line into the notebook.

from deeprl-agents.

DMTSource commented on September 26, 2024 1

So plots are showing and all workers are alive and well with their respective names...until what appears to be step ~10(saving time).

Looks like there is trouble with the model saving as the "master" worker is making it to that point and then shutting down. If the crash is truly silent you might want to add some print statements to study how far the code is getting once it reaches the code block relevant to saving the model.

My ignorant guess is something like ffmpeg is causing the trouble as its a very external tool to this code, and saving checkpoint files should be trivial for Tensorflow despite the system. You could try commenting out the gif generation code if that is the case. I had trouble getting a working ffmpeg installation on my system the first time I ran the code as some versions threw errors(Ubuntu 14.04). But I was able to get it working once the issue was identified.

from deeprl-agents.

awjuliani commented on September 26, 2024

Hi Ibrahim,

Did you encounter an error in the worker_0 process? Otherwise it should certainly be plotting.

from deeprl-agents.

IbrahimSobh commented on September 26, 2024

I have added the sleep as follows:

    worker_threads = []
    for worker in workers:
        worker_work = lambda: worker.work(max_episode_length,gamma,sess,coord,saver)
        t = threading.Thread(target=(worker_work))
        t.start()
        worker_threads.append(t)
        sleep(0.5) # here is it
    coord.join(worker_threads)

worker_0 is in orange color

However, worker_0 seems to stop very early
Moreover, model and frames are not saved (the code for them is based on worker_0)

Then I used worker_1 instead of worker_0 for saving model and frames, but then worker_1 stopped

I tried sleep in the code that is responsible for saving model and frames ... but the same problem.

Regards

from deeprl-agents.

IbrahimSobh commented on September 26, 2024

Possible cause:

After removing model and gif saving code, things worked fine!

Removed code:

                    if self.name == 'worker_1' and episode_count % 25 == 0:
                        time_per_step = 0.05
                        images = np.array(episode_frames)
                        make_gif(images,'./frames/image'+str(episode_count)+'.gif',
                            duration=len(images)*time_per_step,true_image=True,salience=False)
                    if episode_count % 250 == 0 and self.name == 'worker_1':
                        saver.save(sess,self.model_path+'/model-'+str(episode_count)+'.cptk')
                        print ("Saved Model")

Figure: (all threads are there) I think I have some error in saving!

Any clue?

Regards

from deeprl-agents.

IbrahimSobh commented on September 26, 2024

Thanks DMTSource

I can save the model but not the frames!

any other way to save gifs or video?

from deeprl-agents.

awjuliani commented on September 26, 2024

Hi Ibrahim,

Are you sure that you have both moviepy and ffmpeg installed? You will also need to ensure the version of imageio you have is 1.6.

from deeprl-agents.

IbrahimSobh commented on September 26, 2024

Hi Arthur

imageio
print imageio.--version--
2.1.2

ffmpeg -version
ffmpeg version N-80901-gfebc862 Copyright (c) 2000-2016 the FFmpeg developers
built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
configuration: --extra-libs=-ldl --prefix=/opt/ffmpeg --mandir=/usr/share/man --enable-avresample --disable-debug --enable-nonfree --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-decoder=amrnb --disable-decoder=amrwb --enable-libpulse --enable-libfreetype --enable-gnutls --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-libvorbis --enable-libmp3lame --enable-libopus --enable-libvpx --enable-libspeex --enable-libass --enable-avisynth --enable-libsoxr --enable-libxvid --enable-libvidstab
libavutil 55. 28.100 / 55. 28.100
libavcodec 57. 48.101 / 57. 48.101
libavformat 57. 41.100 / 57. 41.100
libavdevice 57. 0.102 / 57. 0.102
libavfilter 6. 47.100 / 6. 47.100
libavresample 3. 0. 0 / 3. 0. 0
libswscale 4. 1.100 / 4. 1.100
libswresample 2. 1.100 / 2. 1.100
libpostproc 54. 0.100 / 54. 0.100

from deeprl-agents.

awjuliani commented on September 26, 2024

I believe you will need imageio 1.6, and not 2.1 in order for the gif generation to work. Unfortunately they changed the encoder in 2.1 and broke the gif code I used. If you have a fix that works with 2.1, I would be happy to incorporate it.

from deeprl-agents.

A3C-Doom: worker_0 plot is lost about deeprl-agents HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent