Git Product home page Git Product logo

Comments (15)

fjxmlzn avatar fjxmlzn commented on September 3, 2024

I am not sure why you see these errors.

Could you please post here:

  1. The complete error log
  2. How you install the Python environment and the packages
  3. The list of the installed Python packages and versions

So that I can reproduce these errors and debug it?

Thanks!

from doppelganger.

dstan11 avatar dstan11 commented on September 3, 2024

I created a notebook which has the same content with main.py under DoppelGANger/DoppelGANger/example_training folder.

if __name__ == "__main__":
    from gan_task import GANTask
    from config import config
    from gpu_task_scheduler.gpu_task_scheduler import GPUTaskScheduler
    scheduler = GPUTaskScheduler(config=config, gpu_task_class=GANTask)
    scheduler.start() 
  1. error log
    error.txt
  2. python version 3.5.2
    packages.txt

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Thanks. Can you try directly executing it instead of from Jupiter notebook?

from doppelganger.

dstan11 avatar dstan11 commented on September 3, 2024

Yes. I tried python main.py under DoppelGANger/DoppelGANger/example_training folder through Terminal. It seems no error came up. However, the program is still running after 3 hours. I have no idea how long it supposed to be. By the way, GPU Performance didn't change after I run the program.

Thanks.

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

You can look at worker.log in subfolders of results folder for the training progress.

If the code isn't using GPU, then

  1. Make sure that you installed tensorflow-gpu instead of tensorflow
  2. You can check worker.log and see if there are any error messages about loading Cuda library.

from doppelganger.

dstan11 avatar dstan11 commented on September 3, 2024

Sorry to disturb you again. I didn't find results folder. Can you show me where it is?

Thanks!

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

It should be on the same level as example_training folder. It is configured in config.py: "result_root_folder": "../results/"

from doppelganger.

dstan11 avatar dstan11 commented on September 3, 2024

Thank you for the reply! I updated python version to 3.7 and tensorflow-gpu version to 1.1.4. Now the program works.

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Great!!

from doppelganger.

dstan11 avatar dstan11 commented on September 3, 2024

It has a new error message.

Traceback (most recent call last):
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\Scripts\start_gpu_task-script.py", line 33, in <module>
    sys.exit(load_entry_point('GPUTaskScheduler', 'console_scripts', 'start_gpu_task')())
  File "f:\github clone folder\gputask\gputaskscheduler\gpu_task_scheduler\start_gpu_task.py", line 23, in main
    worker.main()
  File "F:\Github clone folder\DoppelGANger\DoppelGANger\example_training\gan_task.py", line 124, in main
    gan.train(restore=restore)
  File "..\gan\doppelganger.py", line 918, in train
    self.visualize(epoch_id, batch_id, global_id)
  File "..\gan\doppelganger.py", line 801, in visualize
    sub1(features, attributes, lengths, None, None, None, "free")
  File "..\gan\doppelganger.py", line 749, in sub1
    ground_truth_lengths=ground_truth_lengths)
  File "<__array_function__ internals>", line 6, in savez
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 645, in savez
    _savez(file, args, kwds, False)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 743, in _savez
    zipf = zipfile_factory(file, mode="w", compression=compression)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\site-packages\numpy\lib\npyio.py", line 119, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
  File "E:\Users\shand\anaconda3\envs\DoppelGANger2\lib\zipfile.py", line 1240, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '../results/aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False,\\sample\\epoch_id-0,batch_id-199,global_id-199,type-free,samples.npz'

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Could you please try modifying "result_root_folder": "../results/" in config.py to "result_root_folder": "..\\results\\", since you are in windows and the directory separator should be \. And then delete results folder and run again.

Let me know if it doesn't work.

from doppelganger.

dstan11 avatar dstan11 commented on September 3, 2024

It doesn't work. It has the same error message.

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

I think another potential problem is that windows does not allow , in filenames. You can change , by adding test_config_string_separator="-" or others in scheduler_config section of config.py. (see https://github.com/fjxmlzn/GPUTaskScheduler for the detailed explanation.)

But I just want to double-check if there are other issues: could you please show me the directory structure of F:\Github clone folder\DoppelGANger\DoppelGANger\ after this error happens?

from doppelganger.

dstan11 avatar dstan11 commented on September 3, 2024

F:\Github clone folder\DoppelGANger\DoppelGANger\
folder
F:\Github clone folder\DoppelGANger\DoppelGANger\results
results
F:\Github clone folder\DoppelGANger\DoppelGANger\results\aux_disc-False,dataset-google,epoch-400,epoch_checkpoint_freq-1,extra_checkpoint_freq-5,run-0,sample_len-1,self_norm-False,
3

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Thanks. Could you please email me the current code and worker.log and let me check it: zinanl AT andrew.cmu.edu

from doppelganger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.