crispycrafter / cdeep3m-docker Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 2.0 22 KB

A docker container for cdeep3m

Home Page: https://github.com/CRBS/cdeep3m

License: Other

Dockerfile 100.00%

caffe cdeep3m computervision docker neuroscience

cdeep3m-docker's People

Contributors

Stargazers

Watchers

Forkers

jurgenkriel n1kt0

cdeep3m-docker's Issues

prediction failed (octave error)

First of all, thanks for the docker build sharing.

I've been try to test cdeep3m with docker.
training(+retraining with pre-trained model) with my own dataset was just fine.
I'm faced octave error when predict boundary map with trained model.
this is what the program said.
[error msg]
$ docker-compose up
Creating network "cdeep3m-docker_default" with the default driver
Creating cdeep3m-docker_cdeep3m_1 ... done
Attaching to cdeep3m-docker_cdeep3m_1
cdeep3m_1 | octave: X11 DISPLAY environment variable not set
cdeep3m_1 | octave: disabling GUI features
cdeep3m_1 | Starting Image Augmentation
cdeep3m_1 | Check image size of:
cdeep3m_1 | /data/images/roi9
cdeep3m_1 | Reading file: /data/images/roi9/roi09_0001.png
cdeep3m_1 | z_blocks =
cdeep3m_1 |
cdeep3m_1 | 1 64
cdeep3m_1 |
cdeep3m_1 | panic: panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | panic: attempted clean up apparently failed -- aborting...
cdeep3m_1 | Segmentation fault -- stopping myself...
cdeep3m_1 | attempting to save variables to 'octave-workspace'...
cdeep3m_1 | /home/cdeep3m/runprediction.sh: line 124: 13 Aborted (core dumped) DefDataPackages.m "$images" "$augimages"
cdeep3m_1 | ERROR, a non-zero exit code (134) was received from: DefDataPackages.m "/data/images/roi9" "/data/predictout/my_25k/roi9/augimages"
cdeep3m-docker_cdeep3m_1 exited with c

I googled it, and it seemed this error caused by octave.
DefDataPackages.m done its job properly(I guessed), but octave spit the error after execution of DefDataPackages.
I wonder that is there anybody experience same error I've got and how can I solve this problem.
thanks.

nvidia error

Installed all the cuda drivers and docker-nvidia, but still get this error.

/home/cdeep3m/trainworker.sh: line 99: nvidia-smi: command not found cdeep3m_1 | ERROR unable to get count of GPU(s). Is nvidia-smi working? cdeep3m_1 | ERROR, a non-zero exit code (4) was received from: trainworker.sh --numiterations 10000
Is it because I have cuda-10 and ubuntu 18.04 installed on my system?

Running sudo docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi shows that docker-nvidia is working succesfully

/train/predictout30k/1fm not a directory

Issue found when running the following command. Any advice?

docker run -it cdeep3m:v0.0.1 /train/sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out  /home/cdeep3m/cdeep3m-1.6.2/mito_testsample/testset/ /train/predictout30k

octave: X11 DISPLAY environment variable not set
octave: disabling GUI features
Starting Image Augmentation
Check image size of:
/home/cdeep3m/cdeep3m-1.6.2/mito_testsample/testset/
Reading file: /home/cdeep3m/cdeep3m-1.6.2/mito_testsample/testset/images.081.png
z_blocks =

   1   5

Start up worker to generate packages to process
Start up worker to run prediction on packages
Start up worker to run post processing on packages

To see progress run the following command in another window:

tail -f /train/predictout30k/logs/*.log
octave: X11 DISPLAY environment variable not set
octave: disabling GUI features
/train/predictout30k/1fm not a directory
Please use: EnsemblePredictions ./inputdir1 ./inputdir2 ./inputdir3 ./outputdir
ERROR file found. Something went wrong
ERROR, a non-zero exit code (127) received from PreprocessPackage.m 001 01 1fm 1
8

CreateTrainJob command not found

Runtraining.sh needs to locate a folder with augmented data (augdata) and build the trained model in another folder (trainout). When I include the path to these folders as follows:
sudo docker run -it cdeep:v0.0.1 --numiterations 10000 --gpu 0 ~/cdeep3m-docker/augdata ~/cdeep3m-docker/trainout
I get the following error:
./runtraining.sh: line 127: CreateTrainJob.m: command not found Error, a non-zero exit code (127) was received from: CreateTrainJob.m "/home/jurgen/cdeep3m-docker/augdata" "/home/jurgen/cdeep3m-docker/trainout" "/home/jurgen/cdeep3m-docker/augdata"

Am I just specifying the path to the training data incorrectly?

training data for preprocessing

https://drive.google.com/open?id=1vCMKTswM308SOsDL0SXMeJc19mC0EWdZ

Unsupported configuration option

Hi, I am new to Cdeep3m-docker and have run into an error quite early. Attempting to run docker-compose build leads to the error "ERROR: The Compose file './docker-compose.yml' is invalid because:
Unsupported config option for services.cdeep3m: 'runtime'"

Any help would be appreciated.

Check failed: error == cudaSuccess (2 vs. 0) out of memory

It seems we have ran out of memory on the GPU.
How do we set the training batch size?

Machine freezes because of running out of memory.

Hi,
I am testing the software but stucked at the training phase, the machine totally hangs after allocating all the memory. (128G of RAM, 32 cores and 1 M60)
After restricting the memory in the docker-compose to 90G it seems to work, but for instance when crashing it throws a warning like:

Warning: unable to close filehandle properly: Cannot allocate memory during global destruction.

And after a while this:

cdeep3m_1  | ERROR: caffe had a non zero exit code: 134
cdeep3m_1  | /home/cdeep3m/caffetrain.sh: line 166:   100 Aborted                 (core dumped) GLOG_log_dir=$log_dir caffe.bin train --solver=$model_dir/solver.prototxt --gpu $gpu $snapshot_opts > "${model_dir}/log/out.log" 2>&1
cdeep3m_1  | ERROR: caffe had a non zero exit code: 137
cdeep3m_1  | /home/cdeep3m/caffetrain.sh: line 166:   127 Killed                  GLOG_log_dir=$log_dir caffe.bin train --solver=$model_dir/solver.prototxt --gpu $gpu $snapshot_opts > "${model_dir}/log/out.log" 2>&1

GPU looks like:

nvidia-smi
Mon Apr  8 14:01:47 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.107      Driver Version: 410.107      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           On   | 00000000:06:00.0 Off |                  Off |
| 32%   36C    P0    36W / 120W |    262MiB /  8129MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M60           On   | 00000000:07:00.0 Off |                  Off |
| 32%   27C    P8    14W / 120W |     11MiB /  8129MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     30767      C   caffe.bin                                    109MiB |
|    0     30793      C   caffe.bin                                    109MiB |
+----------------------------------------------------------------------------

Apr  8 14:01:06 opskvm01 kernel: Memory cgroup stats for /docker/6e765d2d36b931a1188c2c1f93552068f2d68d46e0060e11986265dd5fa83e0d: cache:93406836KB rss:1472KB rss_huge:0KB mapped_file:88703160KB swap:393296KB inactive_anon:4703640KB active_anon:88704632KB inactive_file:0KB active_file:0KB unevictable:0KB
Apr  8 14:01:06 opskvm01 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Apr  8 14:01:06 opskvm01 kernel: [29973] 23446 29973     4545      322      14       99             0 runtraining.sh
Apr  8 14:01:06 opskvm01 kernel: [30186] 23446 30186     4516      324      14       71             0 trainworker.sh
Apr  8 14:01:06 opskvm01 kernel: [30203] 23446 30203    11475      703      27     3271             0 perl
Apr  8 14:01:06 opskvm01 kernel: [30256] 23446 30256     4546      325      14      101             0 caffetrain.sh
Apr  8 14:01:06 opskvm01 kernel: [30281] 23446 30281 40152923 10540653   23224    47801             0 caffe.bin
Apr  8 14:01:06 opskvm01 kernel: [30292] 23446 30292     4546      325      14      101             0 caffetrain.sh
Apr  8 14:01:06 opskvm01 kernel: [30314] 23446 30314 40153011 11681879   23151    46889             0 caffe.bin
Apr  8 14:01:06 opskvm01 kernel: [30697] 23446 30697     4570      498      14        0             0 bash
Apr  8 14:01:06 opskvm01 kernel: Memory cgroup out of memory: Kill process 30319 (caffe.bin) score 478 or sacrifice child
Apr  8 14:01:06 opskvm01 kernel: Killed process 30314 (caffe.bin) total-vm:160612044kB, anon-rss:0kB, file-rss:93580kB, shmem-rss:46633936kB
Apr  8 14:01:16 opskvm01 kernel: ___slab_alloc: 42 callbacks suppressed
Apr  8 14:01:16 opskvm01 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
Apr  8 14:01:16 opskvm01 kernel:  cache: taskstats(4:6e765d2d36b931a1188c2c1f93552068f2d68d46e0060e11986265dd5fa83e0d), object size: 328, buffer size: 328, default order: 2, min order: 0

Is anyone else having issues similar to this?

Cheers.

runprediciton error

I didn't previously have this problem, but now when I change the entry point in the dockerfile and update the commands to run the 'runprediction.sh' script I get the following error:

ERROR, a non-zero exit code (127) received from PreprocessPackage.m 001 01 1fm 1
cdeep3m_1

Has this happened to you before?

crispycrafter / cdeep3m-docker Goto Github PK

cdeep3m-docker's People

Contributors

Stargazers

Watchers

Forkers

cdeep3m-docker's Issues

prediction failed (octave error)

nvidia error

/train/predictout30k/1fm not a directory

CreateTrainJob command not found

training data for preprocessing

Unsupported configuration option

Check failed: error == cudaSuccess (2 vs. 0) out of memory

Machine freezes because of running out of memory.

runprediciton error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent