Comments (37)
Are you using GPU? Usually, this error raises when GPU memory is full.
from tensorflow-examples.
@aymericdamien Thanks! I found the reason : I use ipython notebook to run the code , but i forget to close another one , the script and it waster too much memory
from tensorflow-examples.
@pumplerod - I found a solution / kludge that somehow seems to work, although I can't explain why / how.
Before starting your Jupyter notebook / tensorflow program, set this:
export CUDA_VISIBLE_DEVICES=1
This seems to work in that the scripts work OK. Not sure if this is a requirement.
Give it a try and see.
from tensorflow-examples.
Yup, GPU Memory Full is the reason. IPython kernels stuck in background processes does that.
Thanks,
Subodh
thesubodh.com
from tensorflow-examples.
Wow. Thanks. That seems to have worked. Not sure how it's related, but before trying your solution I got rid of the error by specifying
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
However when I tried to run training it crashed the jupyter notebook.
from tensorflow-examples.
It occurs due to full of the memory of GPU. The best way is to reduce batch size
Like if batch_size = 32
make it 16/8/4/2 anything till your error is resolved
It works every single time for me.
from tensorflow-examples.
Just stumbled upon this thread. I think you have hidden your GPU from the CUDA drivers with this line:
export CUDA_VISIBLE_DEVICES=1
What this is telling CUDA is that it should only use "Device 1" in your system. So, unless you have 2 GPU devices, you have hidden the primary "Device 0". I am sure if you set this as follows TF will see your GPU again, but your other problems may return:
export CUDA_VISIBLE_DEVICES=0
from tensorflow-examples.
Reduce the size of batches sent in the run or eval, it should do the trick.
from tensorflow-examples.
It occurs due to full of the memory of GPU. The best way is to reduce batch size
Like if batch_size = 32
make it 16/8/4/2 anything till your error is resolved
It works every single time for me.
For me, removing val_split
helped as well. 🤷
from tensorflow-examples.
I had a similar problem when loading a previously trained model from disk (so changing the batch_size wasn't an option). This is what fixed it:
with tf.device('/CPU:0'):
loaded = tf.saved_model.load(model_path)
from tensorflow-examples.
MacBook Nvidia GPU isn't dedicated and shares resources with TensorFlow and the screen.
I regularly have out of memory issues. Using mid 2012 rMBP with GeForce 650.
Before running TensorFlow, I close all processes using the GPU (look at resource monitor video card column) to force OSX to use the integrated video card. Doing this releases some memory and I can execute TensofFlow scripts. Not all memory is cleared when I check memory with cuda-smi. Can quickly see which graphics card is being used with gfx.io app. I found it good to disable WebGL in safari (although it's needed for Tensorboard). Restarting Safari and pycharm before running TensorFlow scripts is helpful to clear GPU memory. Stop non-essential apps in the background is also helpful.
https://github.com/phvu/cuda-smi
An OSX issue is possibility?
MacBook isn't the best "all in one" dev platform for TensorFlow, it can be made to work... albeit frustratingly.
Would be good to force OSX to use integrated video chip for screen and Nvidia for dedicated TensorFlow. I'm totally unsure, however some early discussions about the hardware were indicating that apple has locked down certain parts of the GPU access... so if it can't be used exclusively now... it's likely to be difficult/impossible to do in the future.
from tensorflow-examples.
I can run MacBook Pro NVidia GPU, but only for minimal applications:
import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8) #0.333
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
When I increase the number of Conv2D filters e.g. from 32 to 64 I am starting to get DEAD KERNEL, so I lower number of images I process per batch from e.g. 256 to 24.
You have to keep trying until you get the right balance between the depth of your neural network, batch size and amount of GPU memory.
In the end, it is much faster than CPU, but too fragile, after much frustration, I am going back to CPU and more powerful Linux GPU instance.
from tensorflow-examples.
Running into the same issue with the smallest possible model, Cart-Pole, on GTX1080 8MB. is it a TensorFlow bug that can be fixed somehow or we are simply trying to fit too big models (overenthusiastic the batch size probably the main reason for that)?
sebtac
from tensorflow-examples.
For my case, it was the issue with the dataset. Removing the problematic images (Google large images or weird images that you get from web scrapping) solved my problem.
from tensorflow-examples.
@burness @subodhp I'm getting the same error ("Ran out of memory")
[MacbookPro 2013 with 16 GM RAM, GPU (2GB RAM), TensorFlow 0.11, CUDA 8.0, CUDNN 5.x]
I tried shutting down the Jupyter Notebook and restarting it... but it crashed with the same error.
Is this solved?
How does one resolve GPU memory full errors?
Thanks!
`I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 256 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 31488 totalling 30.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 46609152 totalling 44.45MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 44.48MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 57622528
InUse: 46643200
MaxInUse: 46643200
NumAllocs: 11
MaxAllocSize: 46609152
W tensorflow/core/common_runtime/bfc_allocator.cc:270] ********************************************************************xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 390.6KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: Reshape_1/_2__cf__2 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [10000,10] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
`
from tensorflow-examples.
I rebooted my Macbook and started afresh.
system: [MacbookPro 2013, with 16 GB RAM, GPU with 2GB RAM; Tensor Flow 0.11, CUDA 8.0, CUDNN 5.x]
Here's the error I get (see attached error-tf.txt at the bottom for all detail).
- How is the free memory only 20.49 MiB (on a recently rebooted system) if there's 2.0 GiB available to the GPU?
- Is there a way to track GPU memory usage?
- Is there a way to disable GPU usage for an iPython notebook?
Thanks!
Some relevant parts I see:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 20.49MiB
...I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 21487616
InUse: 33792
MaxInUse: 65280
NumAllocs: 9
MaxAllocSize: 31488W
tensorflow/core/common_runtime/bfc_allocator.cc:270] *___________________________________________________________________________________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 29.91MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: Const = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [10000,784] values: -0.5 -0.49607843 -0.5...>, _device="/job:localhost/replica:0/task:0/gpu:0"]]
from tensorflow-examples.
laventura, did you ever find a solution to the gpu out of memory error? I have the same problem with the same setup. Though I got an error trying to allocate 10.8Mib
from tensorflow-examples.
Oh yours is very helpful for me. I got an error message about only 29Mib out of memory.
I added your code with fraction 0.8 since there was 80% free memory (from 2GiB, 1.6GiB was free).
My code started working. After that, I deleted ALL GPU options and this still works. very curious..
from tensorflow-examples.
Update on this:
Earlier -- the GPU was being recognized by an older TensorFlow. Now, when I upgraded TF to 0.11rc2 and later to 0.12
Now, my TF does not recognize any GPU at all.
Also, the deviceQuery does not report any GPU either. I'm going totally bonkers in this CUDA hell.
See details here:
tensorflow/tensorflow#2882
Also on NVIDIA Devtalk, if any one has any bright insights - would be very helpful to me!
https://devtalk.nvidia.com/default/topic/990015/cuda-setup-and-installation/help-cuda-7-5-or-8-devicequery-failing-not-working-on-macbookpro-2013-os-x-10-11-gt750m/
from tensorflow-examples.
@Mazecreator & Others,
Indeed; when I set CUDA_VISIBLE_DEVICES=0
, the deviceQuery
returns successfully. However, now TensorFlow complains again with "Dst Tensor Not initialized"
!!
This is so frustrating!!
It appears that CUDA is leaking memory... I see that free memory listed (when a python script starts) keeps getting less and less... though I dont know for sure if that's the problem.
The workaround suggested above (set TF's GPUOptions) are all workarounds - they require manual code / intervention in existing scripts that were supposed to work OK.
See here: deviceQuery
py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_HOME
/usr/local/cuda
py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ echo $CUDA_VISIBLE_DEVICES
py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 750M"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147024896 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 926 MHz (0.93 GHz)
Memory Clock rate: 2508 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GT 750M
Result = PASS
py35 ▶ ~ ▶ Developer ❯ … ❯ x86_64 ❯ darwin ❯ release ▶ $ ▶
py35 ▶ ~ ▶ Developer ❯ CUDA ❯ cuda-smi ▶ master ▶ ❓ ▶ $ ▶ ./cuda-smi
Device 0 [PCIe 0:1:0.0]: GeForce GT 750M (CC 3.0): 369.92 of 2047.6 MB (i.e. 18.1%) Free
Running a Python script with TensorFlow:
py35 ▶ ~ ▶ Developer ❯ … ❯ self_driving_car ❯ traffic-signs ❯ CarND-Alexnet-Fe ▶ master ▶ 4✎ ▶ $ ▶ python imagenet_inference.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 305.92MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 1, Chunks in use: 0 97.01MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 144.00MiB was 128.00MiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a60500 of size 139520
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82600 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700a82800 of size 1228800
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700bae800 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700baec00 of size 3538944
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0ec00 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x700f0f200 of size 2654208
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197200 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701197800 of size 1769472
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x701347800 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x701347c00 of size 101725184
I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 512 totalling 512B
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1024 totalling 2.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1536 totalling 3.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 139520 totalling 136.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1228800 totalling 1.17MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1769472 totalling 1.69MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2654208 totalling 2.53MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3538944 totalling 3.38MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 8.91MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 111063040
InUse: 9337856
MaxInUse: 9337856
NumAllocs: 11
MaxAllocSize: 3538944
W tensorflow/core/common_runtime/bfc_allocator.cc:274] *********___________________________________________________________________________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 144.00MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:965] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
return fn(*args)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
status, run_metadata)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "imagenet_inference.py", line 19, in <module>
sess.run(init)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op 'Variable_10/initial_value', defined at:
File "imagenet_inference.py", line 16, in <module>
probs = AlexNet(x, feature_extract=False)
File "/Users/aa/Developer/courses/self_driving_carnd/traffic-signs/CarND-Alexnet-Feature-Extraction/alexnet.py", line 139, in AlexNet
fc6W = tf.Variable(net_data["fc6"][0])
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 224, in __init__
expected_shape=expected_shape)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 333, in _init_from_args
initial_value, name="initial_value", dtype=dtype)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/aa/Developer/miniconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: Variable_10/initial_value = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [9216,4096] values: [-0.0043384791 -0.0071635786 -0.0067223078]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
py35 ▶ ~ ▶ Developer ❯ … ❯ self_driving_car ❯ traffic-signs ❯ CarND-Alexnet-Fe ▶ master ▶ 4✎ ▶ $ ▶
from tensorflow-examples.
I get the same error and I have 12GB of GPU memory:
mona@pascal:~/computer_vision/VPilot$ python train.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1938: UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with Tensorflow backend
warnings.warn('\n'.join(msg))
Epoch 1/1000
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 412.50MiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4547d60
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:83:00.0
Total memory: 11.92GiB
Free memory: 534.50MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:855] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 512.0KiB was 512.0KiB, Chunk State:
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740800 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740900 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740e00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b740f00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741000 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741100 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741200 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741300 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b741400 of size 4096
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742400 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742600 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b742e00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743600 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743a00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743b00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743c00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743d00 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b743e00 of size 222806528
I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 21 Chunks of size 256 totalling 5.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1024 totalling 1.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 2048 totalling 4.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 4096 totalling 4.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 222806528 totalling 212.48MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 212.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 222822400
InUse: 222822400
MaxInUse: 222822400
NumAllocs: 27
MaxAllocSize: 222806528
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ***********************************************************************************************xxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 512.0KiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:958] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "train.py", line 55, in <module>
callbacks=[ckp_callback]
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 935, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1553, in fit_generator
class_weight=class_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1316, in train_on_batch
outputs = self.train_function(ins)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1919, in __call__
session = get_session()
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 121, in get_session
_initialize_variables()
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 275, in _initialize_variables
sess.run(tf.initialize_variables(uninitialized_variables))
File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 717, in run
run_metadata_ptr)
File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 915, in _run
feed_dict_string, options, run_metadata)
File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 965, in _do_run
target_list, options, run_metadata)
File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 985, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InternalError: Dst tensor is not initialized.
[[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'Const_37', defined at:
File "train.py", line 55, in <module>
callbacks=[ckp_callback]
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 935, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1450, in fit_generator
self._make_train_function()
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 761, in _make_train_function
self.total_loss)
File "/usr/local/lib/python2.7/dist-packages/keras/optimizers.py", line 234, in get_updates
accumulators = [K.zeros(shape) for shape in shapes]
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 482, in zeros
return variable(tf.constant_initializer(0., dtype=tf_dtype)(shape),
File "/home/mona/tensorflow/_python_build/tensorflow/python/ops/init_ops.py", line 145, in _initializer
return constant_op.constant(value, dtype=dtype, shape=shape)
File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/constant_op.py", line 167, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 2388, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/mona/tensorflow/_python_build/tensorflow/python/framework/ops.py", line 1300, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: Const_37 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [512,256] values: 0 0 0...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
from tensorflow-examples.
@monajalal --
It appears that the GPU is running out of memory for some reason.
WHY that is happening, I can't say; it is the most confounding thing since the executed programs have ended.
Probably a memory leak?? If so, it could be at the GPU driver level??
See here too: tensorflow/tensorflow#7025 (comment)
I've tried searching for how to release/clear GPU memory, but haven't found anything good / credible / useful.
Do let me know if you or anyone comes across a solution.
Until then, this TensorFlow + GPU combo is a total fail for me (on my Macbook). 😡
from tensorflow-examples.
@normanheckscher - Thanks for the tips. Good to know about the Macbook GPU.
I downloaded gfx.io -it's helpful in understanding when the GPU is being used.
I've used cuda-smi
; it's useful in showing the free GPU mem, but doesn't really show the processes using it. I was hoping an nvidia-smi
kind of thing would exist for Macs.
When you said "I close all processes using the GPU (look at resource monitor video card column) to force OSX to use the integrated video card" which 'resource monitor video card' column do you refer to? In ActivityMonitor? If so, I didn't find it. :-(
Yeah, I try closing most of the programs that use GPUs (mostly Chrome etc. that I use) before running TF scripts. Sometimes, the TF scripts run out of mem almost immediately after a fresh reboot, which is kind of confounding.
I'm just coming to a slow realization that TensorFlow + GPU combo isn't a very effective/efficient on Macbooks. 😕
I'm rather sadly investigating Theano combo (instead of TF) on top of Keras, which is my main high-level framework of choice. Sadly bcos I dont know enough Theano and dont have enough bandwidth to learn it effectively. :-/
from tensorflow-examples.
Sorry @laventura I meant the Activity Monitor for OSX. If you go to CPU or Memory tabs where you can look for the running processes you can select "View>Columns>Graphics Card" and a new column with "Requires High Perf GPU will appear. Sort by this column and you can see which processes are using the Nvidia card.
MacBookPro can be used for learning and development. I want to use TensorFlow and I find OSX is a very good environment to work in, so I deal with these little irritations while I get myself up to speed with TensorFlow. When my models need more memory I'll make the call as to building a headless Linux box or going with a service such as AWS. If I was starting from scratch I'd consider a dedicated GPU notebook that could run Linux, however, I'm not flush with cash and I don't see the need to purchase a new hardware environment when the one I have works.
Best of luck to you.
from tensorflow-examples.
This is not just a MacBook issue. I am seeing this on my laptop with a GTX1060 (6GB) Running ubuntu.
This seems to help:
keras-team/keras#3675
Use:
max_q_size=1,
pickle_safe=False
in fit_generator()
After adding these two options I am up and running again.
from tensorflow-examples.
Stumbled onto this thread, perhaps my two cents can help. Launching python with a preceding flag of THEANO_FLAGS='device=gpu0' or THEANO_FLAGS='device=gpu1' etc (latter if you have more than 1 gpu) helps. For ex, this terminal command will run the python code on gpu2 (you can use gpustat to track usage of different GPUs on your machine in realtime:
THEANO_FLAGS='device=gpu4' python /run/this/script.py
If your convolutional filters are large, having smaller training batches can be one way to overcome the memory issue. That is, if the network initialization fits in memory first.
from tensorflow-examples.
You don't have ENOUGH GPU MEMORY.
from tensorflow-examples.
when the system is idle and not processing, shouldn't somehow python not use the whole GPU memory ? it is a useful feature
from tensorflow-examples.
In my case I have a laptop, the command export CUDA_VISIBLE_DEVICES=1
made the training really slow so I assume it used the integrated graphics card. So I had to use value 0.
from tensorflow-examples.
I'm having the same issue. Its a windows machine. Now I reduced my rnn size and embedding size.. lets see.
from tensorflow-examples.
Not working.
from tensorflow-examples.
Wow. Thanks. That seems to have worked. Not sure how it's related, but before trying your solution I got rid of the error by specifying
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True, gpu_options=gpu_options))
However when I tried to run training it crashed the jupyter notebook.
There it is! Thank you for the answer. It worked in my case!
There is also another similar solution:
config = tf.ConfigProto(gpu_options= tf.GPUOptions(allow_growth=True))
# allow_growth=True is the important part here
from tensorflow-examples.
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
Train for 11523 steps, validate for 4153 steps
Epoch 1/5
1/11523 [..............................] - ETA: 33:47:16
InternalError Traceback (most recent call last)
in
6 epochs=EPOCHS,
7 validation_data=validation_generator,
----> 8 validation_steps=validation_generator.samples//validation_generator.batch_size)
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\util\deprecation.py in new_func(*args, **kwargs)
322 'in a future version' if date is None else ('after %s' % date),
323 instructions)
--> 324 return func(*args, **kwargs)
325 return tf_decorator.make_decorator(
326 func, new_func, 'deprecated',
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1304 use_multiprocessing=use_multiprocessing,
1305 shuffle=shuffle,
-> 1306 initial_epoch=initial_epoch)
1307
1308 @deprecation.deprecated(
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
817 max_queue_size=max_queue_size,
818 workers=workers,
--> 819 use_multiprocessing=use_multiprocessing)
820
821 def evaluate(self,
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
340 mode=ModeKeys.TRAIN,
341 training_context=training_context,
--> 342 total_epochs=epochs)
343 cbks.make_logs(model, epoch_logs, training_result, ModeKeys.TRAIN)
344
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs)
126 step=step, mode=mode, size=current_batch_size) as batch_logs:
127 try:
--> 128 batch_outs = execution_function(iterator)
129 except (StopIteration, errors.OutOfRangeError):
130 # TODO(kaftan): File bug about tf function and errors.OutOfRangeError?
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py in execution_function(input_fn)
96 # numpy
translates Tensors to values in Eager mode.
97 return nest.map_structure(_non_none_constant_value,
---> 98 distributed_function(input_fn))
99
100 return execution_function
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\def_function.py in call(self, *args, **kwds)
566 xla_context.Exit()
567 else:
--> 568 result = self._call(*args, **kwds)
569
570 if tracing_count == self._get_tracing_count():
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\def_function.py in _call(self, *args, **kwds)
597 # In this case we have created variables on the first call, so we run the
598 # defunned version which is guaranteed to never create variables.
--> 599 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
600 elif self._stateful_fn is not None:
601 # Release the lock early so that multiple threads can perform the call
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, *args, **kwargs)
2361 with self._lock:
2362 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2363 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
2364
2365 @Property
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in _filtered_call(self, args, kwargs)
1609 if isinstance(t, (ops.Tensor,
1610 resource_variable_ops.BaseResourceVariable))),
-> 1611 self.captured_inputs)
1612
1613 def _call_flat(self, args, captured_inputs, cancellation_manager=None):
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1690 # No tape is watching; skip to running the function.
1691 return self._build_call_outputs(self._inference_function.call(
-> 1692 ctx, args, cancellation_manager=cancellation_manager))
1693 forward_backward = self._select_forward_and_backward_functions(
1694 args,
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, ctx, args, cancellation_manager)
543 inputs=args,
544 attrs=("executor_type", executor_type, "config_proto", config),
--> 545 ctx=ctx)
546 else:
547 outputs = execute.execute_with_cancellation(
~\anaconda3\envs\ev_2\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
65 else:
66 message = e.message
---> 67 six.raise_from(core._status_to_exception(e.code, message), None)
68 except TypeError as e:
69 keras_symbolic_tensors = [
~\anaconda3\envs\ev_2\lib\site-packages\six.py in raise_from(value, from_value)
InternalError: Dst tensor is not initialized.
[[{{node IteratorGetNext/_2}}]] [Op:__inference_distributed_function_24557]
Function call stack:
distributed_function
please how do i resolve in windows Os
from tensorflow-examples.
In case someone comes from Google like me, I had a similar issue and in my case restarting Jupyter server and my IDE (intellij) fixed it... guessing a memory leak.
from tensorflow-examples.
There is another trick which worked for me - I delete any dangling iPython output that might be lying around:
%reset -f out
My guess is that the GPU can't release memory if there are any python variables still somehow linked to it.
from tensorflow-examples.
from tensorflow-examples.
from tensorflow-examples.
Related Issues (20)
- What should TFlearn users do?
- 404 not
- 404 not found
- Please provide a example for stacked bidirectional LSTM for Tensorflow 2.x
- [Potential NAN bug] Loss may become NAN during training HOT 1
- Tensor
- How to get prediction code ?
- fixes for Word2Vec for Python 3
- ml_introduction.ipynb Links
- InternalError: Dst tensor is not initialized. [[{{node IteratorGetNext/_2}}]] [Op:__inference_distributed_function_24557]
- In the tf1 example: I replace the weigtht and bias with tf.layers.dense, I found the accuracy decrease. why??? HOT 1
- The CNN example diagram shows 3 conv & pooling layers but the implementation only has 2
- AttributeError on placeholder HOT 1
- fig
- In K-Means Example, when i am running "from tensorflow.contrib.factorization import KMeans" line, i am getting an error "ModuleNotFoundError: No module named 'tensorflow.contrib'" HOT 2
- possible issue at: tensorflow_v2/notebooks/3_NeuralNetworks/autoencoder.ipynb HOT 6
- Add a development container HOT 3
- Ikvvh
- TPU Usage
- Activities HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow-examples.