Git Product home page Git Product logo

cdeep3m-docker's People

Contributors

crispycrafter avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

jurgenkriel n1kt0

cdeep3m-docker's Issues

prediction failed (octave error)

First of all, thanks for the docker build sharing.

I've been try to test cdeep3m with docker.
training(+retraining with pre-trained model) with my own dataset was just fine.
I'm faced octave error when predict boundary map with trained model.
this is what the program said.
[error msg]
$ docker-compose up
Creating network "cdeep3m-docker_default" with the default driver
Creating cdeep3m-docker_cdeep3m_1 ... done
Attaching to cdeep3m-docker_cdeep3m_1
cdeep3m_1 | octave: X11 DISPLAY environment variable not set
cdeep3m_1 | octave: disabling GUI features
cdeep3m_1 | Starting Image Augmentation
cdeep3m_1 | Check image size of:
cdeep3m_1 | /data/images/roi9
cdeep3m_1 | Reading file: /data/images/roi9/roi09_0001.png
cdeep3m_1 | z_blocks =
cdeep3m_1 |
cdeep3m_1 | 1 64
cdeep3m_1 |
cdeep3m_1 | panic: panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | Segmentation fault -- stopping myself...
cdeep3m_1 | attempting to save variables to 'octave-workspace'...
cdeep3m_1 | /home/cdeep3m/runprediction.sh: line 124: 13 Aborted (core dumped) DefDataPackages.m "$images" "$augimages"
cdeep3m_1 | ERROR, a non-zero exit code (134) was received from: DefDataPackages.m "/data/images/roi9" "/data/predictout/my_25k/roi9/augimages"
cdeep3m-docker_cdeep3m_1 exited with c

I googled it, and it seemed this error caused by octave.
DefDataPackages.m done its job properly(I guessed), but octave spit the error after execution of DefDataPackages.
I wonder that is there anybody experience same error I've got and how can I solve this problem.
thanks.

nvidia error

Installed all the cuda drivers and docker-nvidia, but still get this error.

/home/cdeep3m/trainworker.sh: line 99: nvidia-smi: command not found cdeep3m_1 | ERROR unable to get count of GPU(s). Is nvidia-smi working? cdeep3m_1 | ERROR, a non-zero exit code (4) was received from: trainworker.sh --numiterations 10000
Is it because I have cuda-10 and ubuntu 18.04 installed on my system?

Running sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi shows that docker-nvidia is working succesfully

/train/predictout30k/1fm not a directory

Issue found when running the following command. Any advice?

docker run -it cdeep3m:v0.0.1 /train/sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out  /home/cdeep3m/cdeep3m-1.6.2/mito_testsample/testset/ /train/predictout30k

octave: X11 DISPLAY environment variable not set
octave: disabling GUI features
Starting Image Augmentation
Check image size of:
/home/cdeep3m/cdeep3m-1.6.2/mito_testsample/testset/
Reading file: /home/cdeep3m/cdeep3m-1.6.2/mito_testsample/testset/images.081.png
z_blocks =

   1   5

Start up worker to generate packages to process
Start up worker to run prediction on packages
Start up worker to run post processing on packages

To see progress run the following command in another window:

tail -f /train/predictout30k/logs/*.log
octave: X11 DISPLAY environment variable not set
octave: disabling GUI features
/train/predictout30k/1fm not a directory
Please use: EnsemblePredictions ./inputdir1 ./inputdir2 ./inputdir3 ./outputdir
ERROR file found. Something went wrong
ERROR, a non-zero exit code (127) received from PreprocessPackage.m 001 01 1fm 1
8

CreateTrainJob command not found

Runtraining.sh needs to locate a folder with augmented data (augdata) and build the trained model in another folder (trainout). When I include the path to these folders as follows:
sudo docker run -it cdeep:v0.0.1 --numiterations 10000 --gpu 0 ~/cdeep3m-docker/augdata ~/cdeep3m-docker/trainout
I get the following error:
./runtraining.sh: line 127: CreateTrainJob.m: command not found Error, a non-zero exit code (127) was received from: CreateTrainJob.m "/home/jurgen/cdeep3m-docker/augdata" "/home/jurgen/cdeep3m-docker/trainout" "/home/jurgen/cdeep3m-docker/augdata"

Am I just specifying the path to the training data incorrectly?

Unsupported configuration option

Hi, I am new to Cdeep3m-docker and have run into an error quite early. Attempting to run docker-compose build leads to the error "ERROR: The Compose file './docker-compose.yml' is invalid because:
Unsupported config option for services.cdeep3m: 'runtime'"

Any help would be appreciated.

Machine freezes because of running out of memory.

Hi,
I am testing the software but stucked at the training phase, the machine totally hangs after allocating all the memory. (128G of RAM, 32 cores and 1 M60)
After restricting the memory in the docker-compose to 90G it seems to work, but for instance when crashing it throws a warning like:

Warning: unable to close filehandle properly: Cannot allocate memory during global destruction.

And after a while this:

cdeep3m_1  | ERROR: caffe had a non zero exit code: 134
cdeep3m_1  | /home/cdeep3m/caffetrain.sh: line 166:   100 Aborted                 (core dumped) GLOG_log_dir=$log_dir caffe.bin train --solver=$model_dir/solver.prototxt --gpu $gpu $snapshot_opts > "${model_dir}/log/out.log" 2>&1
cdeep3m_1  | ERROR: caffe had a non zero exit code: 137
cdeep3m_1  | /home/cdeep3m/caffetrain.sh: line 166:   127 Killed                  GLOG_log_dir=$log_dir caffe.bin train --solver=$model_dir/solver.prototxt --gpu $gpu $snapshot_opts > "${model_dir}/log/out.log" 2>&1

GPU looks like:

nvidia-smi
Mon Apr  8 14:01:47 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.107      Driver Version: 410.107      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 00000000:06:00.0 Off |                  Off |
| 32%   36C    P0    36W / 120W |    262MiB /  8129MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M60           On   | 00000000:07:00.0 Off |                  Off |
| 32%   27C    P8    14W / 120W |     11MiB /  8129MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     30767      C   caffe.bin                                    109MiB |
|    0     30793      C   caffe.bin                                    109MiB |
+----------------------------------------------------------------------------
Apr  8 14:01:06 opskvm01 kernel: Memory cgroup stats for /docker/6e765d2d36b931a1188c2c1f93552068f2d68d46e0060e11986265dd5fa83e0d: cache:93406836KB rss:1472KB rss_huge:0KB mapped_file:88703160KB swap:393296KB inactive_anon:4703640KB active_anon:88704632KB inactive_file:0KB active_file:0KB unevictable:0KB
Apr  8 14:01:06 opskvm01 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Apr  8 14:01:06 opskvm01 kernel: [29973] 23446 29973     4545      322      14       99             0 runtraining.sh
Apr  8 14:01:06 opskvm01 kernel: [30186] 23446 30186     4516      324      14       71             0 trainworker.sh
Apr  8 14:01:06 opskvm01 kernel: [30203] 23446 30203    11475      703      27     3271             0 perl
Apr  8 14:01:06 opskvm01 kernel: [30256] 23446 30256     4546      325      14      101             0 caffetrain.sh
Apr  8 14:01:06 opskvm01 kernel: [30281] 23446 30281 40152923 10540653   23224    47801             0 caffe.bin
Apr  8 14:01:06 opskvm01 kernel: [30292] 23446 30292     4546      325      14      101             0 caffetrain.sh
Apr  8 14:01:06 opskvm01 kernel: [30314] 23446 30314 40153011 11681879   23151    46889             0 caffe.bin
Apr  8 14:01:06 opskvm01 kernel: [30697] 23446 30697     4570      498      14        0             0 bash
Apr  8 14:01:06 opskvm01 kernel: Memory cgroup out of memory: Kill process 30319 (caffe.bin) score 478 or sacrifice child
Apr  8 14:01:06 opskvm01 kernel: Killed process 30314 (caffe.bin) total-vm:160612044kB, anon-rss:0kB, file-rss:93580kB, shmem-rss:46633936kB
Apr  8 14:01:16 opskvm01 kernel: ___slab_alloc: 42 callbacks suppressed
Apr  8 14:01:16 opskvm01 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
Apr  8 14:01:16 opskvm01 kernel:  cache: taskstats(4:6e765d2d36b931a1188c2c1f93552068f2d68d46e0060e11986265dd5fa83e0d), object size: 328, buffer size: 328, default order: 2, min order: 0

Is anyone else having issues similar to this?

Cheers.

runprediciton error

I didn't previously have this problem, but now when I change the entry point in the dockerfile and update the commands to run the 'runprediction.sh' script I get the following error:

ERROR, a non-zero exit code (127) received from PreprocessPackage.m 001 01 1fm 1
cdeep3m_1

Has this happened to you before?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.