A Deep Learning Amazon Web Service (AWS) AMI that is open, free and works. Run in less than 5 minutes. TensorFlow, Keras, PyTorch, Theano, MXNet, CNTK, Caffe and all dependencies.
Hi! first really thank you for your great efforts to help others use AWS more easily!
I'm a newbie in this field, and wanna know if this AMI only supports python2.7 not 3.x
so.. if this does not support python3.x, could you tell me what should I install manually or any reference site?
One issue with them is the disk space, 100G. Usually, we are using spot instances with external EBS (in order to keep our data, checkpoints, etc.), the main disk can be small. Is it possible to make small size AMIs such as 8G or 10G?
Thanks for the AMI!
Could you please delete these files before creating the AMI? It's a common practice. This way you'll revoke login rights from previous users.
I had to update keras, with sudo pip3 install keras upgrade for np_utils.to_categorical to work properly with num_classes. So, it might require an update. Please look into it.
I would like to try your AMI, but my region is EU Frankfurt (eu-central-1). It would be great if you copied the ami for use in my region. Is it possible? I tried to copy it myself, but I did not have the necessary permissions. Thanks a lot!
I use your AMI (Ireland zone) with TensorFlow.When I try image-retraining (https://www.tensorflow.org/tutorials/image_retraining) and especially bazel build tensorflow/examples/image_retraining:retrain
I got an error:
ubuntu@ip-XXX-XX-XX-XX:~/tensorflow$ bazel build tensorflow/examples/image_retraining:retrain
ERROR: /home/ubuntu/tensorflow/tensorflow/core/BUILD:1017:1: no such package '@zlib_archive//': Error downloading [http://zlib.net/zlib-1.2.8.tar.gz] to /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/zlib_archive/zlib-1.2.8.tar.gz: GET returned 404 Not Found and referenced by '//tensorflow/core:lib_internal'.
ERROR: Analysis of target '//tensorflow/examples/image_retraining:retrain' failed; build aborted.
INFO: Elapsed time: 6.759s
Just to let you know. I haven't found a solution for the moment
Thanks for creating this. However, I tried your TFAMI.v2 (N. Virginia ami-a96634be) using a p2.xlarge AWS spot request and found that no GPU could be located on the machine (output below). Any ideas?
import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: ip-172-31-50-145
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: ip-172-31-50-145
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: “””NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.48 Sat Sep 3 18:21:08 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2)
“””
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.48.0
E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:296] kernel version 367.48.0 does not match DSO version 367.57.0 — cannot find working devices in this configuration
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
When I started tensorflow with TFAMI.v2 on g2.8xlarge I saw the following message:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:813] Ignoring gpu device (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5
I checked with nvidia-smi and top commands: the code works on CPU so it's really ignores GPUs. Can I do something to make it works on g2.8xlarge? to set environment variable or something else?
Hey! I'm having issues connecting to my p2.x instance on aws using your AMI.
I'm sshing with ssh -i <file>.pem ec2-user@<public IP> and it gives me permission denied(public key). I don't seem to be having any issues connecting using the same method to the default amazon linux image, so I was wondering if you might be able to share some pointers.
Thank you for your time!
Hi, is the AMI still available? It did not show up in search for any region. I have tried all region with its corresponding AMI ID as well as 'TFAMI.v3'.