Git Product home page Git Product logo

dlami's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dlami's Issues

8G instead of 100G?

I really appreciate these AMIs. Wonderful!

One issue with them is the disk space, 100G. Usually, we are using spot instances with external EBS (in order to keep our data, checkpoints, etc.), the main disk can be small. Is it possible to make small size AMIs such as 8G or 10G?

I really appreciate it.

Bazel broke when trying bazel build (...) image-retraining

Hello Ritchie,

I use your AMI (Ireland zone) with TensorFlow.When I try image-retraining (https://www.tensorflow.org/tutorials/image_retraining) and especially bazel build tensorflow/examples/image_retraining:retrain
I got an error:

ubuntu@ip-XXX-XX-XX-XX:~/tensorflow$ bazel build tensorflow/examples/image_retraining:retrain
ERROR: /home/ubuntu/tensorflow/tensorflow/core/BUILD:1017:1: no such package '@zlib_archive//': Error downloading [http://zlib.net/zlib-1.2.8.tar.gz] to /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/zlib_archive/zlib-1.2.8.tar.gz: GET returned 404 Not Found and referenced by '//tensorflow/core:lib_internal'.
ERROR: Analysis of target '//tensorflow/examples/image_retraining:retrain' failed; build aborted.
INFO: Elapsed time: 6.759s

Just to let you know. I haven't found a solution for the moment

Regards, mph

TFAMI.v4 Upcoming Release

Name Change:

  • TFAMI would be changed to DLAMI (dee-luh-mi) to include PyTorch and TensorFlow (deep learning frameworks).

Essential features:

  • TensorFlow r1.0
  • Latest PyTorch
  • Latest Keras
  • Latest TensorLayer
  • CUDA 8.0
  • CuDNN 5.1
  • Python 2.7
  • Ubuntu 16.04

Community requested changes

  • Smaller EBS volume (40gb): #7
  • Bazel build fix: #9

AMI not showing up in search

Hi, is the AMI still available? It did not show up in search for any region. I have tried all region with its corresponding AMI ID as well as 'TFAMI.v3'.

Tensorflow 0.11rc

Can you upgrade Tensorflow to version 0.11rc? There are so a lot of improvements in 0.10 and 0.11.

v1.3 Upgrades

New packages to be added for v1.3

  1. TensorLayer
  2. OpenCV
  3. TensorFlow v0.11

Fixes

  1. Compile ensuring compute capability 3.0 instead of 3.5 (p2 instances)

Pondering:

  1. Dockerized implementation

Setup script

Could you please add the shell script that you used to build this AMI.

Security - clean authorized_keys file

Thanks for the AMI!
Could you please delete these files before creating the AMI? It's a common practice. This way you'll revoke login rights from previous users.

/home/ec2-user/.ssh/authorized_keys
/root/.ssh/authorized_keys

Keras on Python3

I had to update keras, with sudo pip3 install keras upgrade for np_utils.to_categorical to work properly with num_classes. So, it might require an update. Please look into it.

Thank you & <3 for this AMI.

Python3.x support

Hi! first really thank you for your great efforts to help others use AWS more easily!

I'm a newbie in this field, and wanna know if this AMI only supports python2.7 not 3.x
so.. if this does not support python3.x, could you tell me what should I install manually or any reference site?

thanks in advance!

It does not support InceptionV3 model?

I'm using keras, tensorflow with inceptionV3 model. The TFAMI.v3 does not work on g2 with GPU. Although I upgraded keras, it still did not work.

Could you update the tensorflow and keras libraries, or update the AMI for InceptionV3 model?

No GPU devices available on machine (p2.xlarge)

Thanks for creating this. However, I tried your TFAMI.v2 (N. Virginia ami-a96634be) using a p2.xlarge AWS spot request and found that no GPU could be located on the machine (output below). Any ideas?

import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: ip-172-31-50-145
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: ip-172-31-50-145
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: “””NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.48 Sat Sep 3 18:21:08 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2)
“””
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.48.0
E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:296] kernel version 367.48.0 does not match DSO version 367.57.0 — cannot find working devices in this configuration
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.

Availability for other regions

Hi,

I would like to try your AMI, but my region is EU Frankfurt (eu-central-1). It would be great if you copied the ami for use in my region. Is it possible? I tried to copy it myself, but I did not have the necessary permissions. Thanks a lot!

Best Regards from Germany

permission denied (publickey)

Hey! I'm having issues connecting to my p2.x instance on aws using your AMI.
I'm sshing with
ssh -i <file>.pem ec2-user@<public IP> and it gives me permission denied(public key). I don't seem to be having any issues connecting using the same method to the default amazon linux image, so I was wondering if you might be able to share some pointers.
Thank you for your time!

How to make tensorflow use GPUs with TFAMI.v2

When I started tensorflow with TFAMI.v2 on g2.8xlarge I saw the following message:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:813] Ignoring gpu device (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5

I checked with nvidia-smi and top commands: the code works on CPU so it's really ignores GPUs. Can I do something to make it works on g2.8xlarge? to set environment variable or something else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.