ritchieng / dlami Goto Github PK

A Deep Learning Amazon Web Service (AWS) AMI that is open, free and works. Run in less than 5 minutes. TensorFlow, Keras, PyTorch, Theano, MXNet, CNTK, Caffe and all dependencies.

License: MIT License

ami aws cuda cudnn5 keras python tensorflow ubuntu

dlami's People

Contributors

Stargazers

Watchers

dlami's Issues

Python3.x support

Hi! first really thank you for your great efforts to help others use AWS more easily!

I'm a newbie in this field, and wanna know if this AMI only supports python2.7 not 3.x
so.. if this does not support python3.x, could you tell me what should I install manually or any reference site?

thanks in advance!

8G instead of 100G?

I really appreciate these AMIs. Wonderful!

One issue with them is the disk space, 100G. Usually, we are using spot instances with external EBS (in order to keep our data, checkpoints, etc.), the main disk can be small. Is it possible to make small size AMIs such as 8G or 10G?

I really appreciate it.

Security - clean authorized_keys file

Thanks for the AMI!
Could you please delete these files before creating the AMI? It's a common practice. This way you'll revoke login rights from previous users.

/home/ec2-user/.ssh/authorized_keys
/root/.ssh/authorized_keys

Keras on Python3

I had to update keras, with sudo pip3 install keras upgrade for np_utils.to_categorical to work properly with num_classes. So, it might require an update. Please look into it.

Thank you & <3 for this AMI.

Availability for other regions

Hi,

I would like to try your AMI, but my region is EU Frankfurt (eu-central-1). It would be great if you copied the ami for use in my region. Is it possible? I tried to copy it myself, but I did not have the necessary permissions. Thanks a lot!

Best Regards from Germany

Bazel broke when trying bazel build (...) image-retraining

Hello Ritchie,

I use your AMI (Ireland zone) with TensorFlow.When I try image-retraining (https://www.tensorflow.org/tutorials/image_retraining) and especially bazel build tensorflow/examples/image_retraining:retrain
I got an error:

ubuntu@ip-XXX-XX-XX-XX:~/tensorflow$ bazel build tensorflow/examples/image_retraining:retrain
ERROR: /home/ubuntu/tensorflow/tensorflow/core/BUILD:1017:1: no such package '@zlib_archive//': Error downloading [http://zlib.net/zlib-1.2.8.tar.gz] to /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/zlib_archive/zlib-1.2.8.tar.gz: GET returned 404 Not Found and referenced by '//tensorflow/core:lib_internal'.
ERROR: Analysis of target '//tensorflow/examples/image_retraining:retrain' failed; build aborted.
INFO: Elapsed time: 6.759s

Just to let you know. I haven't found a solution for the moment

Regards, mph

Tensorflow 0.11rc

Can you upgrade Tensorflow to version 0.11rc? There are so a lot of improvements in 0.10 and 0.11.

No GPU devices available on machine (p2.xlarge)

Thanks for creating this. However, I tried your TFAMI.v2 (N. Virginia ami-a96634be) using a p2.xlarge AWS spot request and found that no GPU could be located on the machine (output below). Any ideas?

import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: ip-172-31-50-145
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: ip-172-31-50-145
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: “””NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.48 Sat Sep 3 18:21:08 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2)
“””
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.48.0
E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:296] kernel version 367.48.0 does not match DSO version 367.57.0 — cannot find working devices in this configuration
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.

How to make tensorflow use GPUs with TFAMI.v2

When I started tensorflow with TFAMI.v2 on g2.8xlarge I saw the following message:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:813] Ignoring gpu device (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5

I checked with nvidia-smi and top commands: the code works on CPU so it's really ignores GPUs. Can I do something to make it works on g2.8xlarge? to set environment variable or something else?

permission denied (publickey)

Hey! I'm having issues connecting to my p2.x instance on aws using your AMI.
I'm sshing with
ssh -i <file>.pem ec2-user@<public IP> and it gives me permission denied(public key). I don't seem to be having any issues connecting using the same method to the default amazon linux image, so I was wondering if you might be able to share some pointers.
Thank you for your time!

TensorLayer
OpenCV
TensorFlow v0.11

Fixes

Compile ensuring compute capability 3.0 instead of 3.5 (p2 instances)

Pondering:

Dockerized implementation

TFAMI.v4 Upcoming Release

Name Change:

TFAMI would be changed to DLAMI (dee-luh-mi) to include PyTorch and TensorFlow (deep learning frameworks).

Essential features:

TensorFlow r1.0
Latest PyTorch
Latest Keras
Latest TensorLayer
CUDA 8.0
CuDNN 5.1
Python 2.7
Ubuntu 16.04

Community requested changes

Smaller EBS volume (40gb): #7
Bazel build fix: #9

ritchieng / dlami Goto Github PK

dlami's People

Contributors

Stargazers

Watchers

Forkers

dlami's Issues

Recommend Projects

Recommend Topics

Recommend Org