kaist-dmlab / selfie Goto Github PK

License: MIT License

Python 100.00%

selfie's Introduction

SELFIE: Refurbishing Unclean Samples for Robust Deep Learning

Publication
Song, H., Kim, M., and Lee, J., "SELFIE: Refurbishing Unclean Samples for Robust Deep Learning," In Proc. 2019 Int'l Conf. on Machine Learning (ICML), Long Beach, California, June 2019. [link]

Official tensorflow implementation of SELFIE. Specifically, in this implementation, we tested the performance of SELFIE using two popular convolutional neural networks, DenseNet [1] and VGGNet [2], on not only three simulated noisy datasets but also a real-world dataset. Active Bias [3] and Co-teaching [4], which are the two state-of-the-art robust training methods, were compared with SELFIE.

1. Overview

Owing to the extremely high expressive power of deep neural networks, their side effect is to totally memorize training data even when the labels are extremely noisy. To overcome overfitting on the noisy labels, we propose a novel robust training method, which we call SELFIE, that trains the network on precisely calibrated samples together with clean samples. As in below Figure, it selectively corrects the losses of the training samples classified as refurbishable and combines them with the losses of clean samples to propagate backward. Taking advantage of this design, SELFIE effectively prevents the risk of noise accumulation from the false correction and fully exploits the training data.

2. Implementation

SELFIE requires only a simple modification in the gradient descent step. As described below, the conventional update equation is replaced with the proposed one. If you are interested in further details, read our paper.

2. Compared Algorithms

We compared SELFIE with default and two state-of-the-art robust training methods. We also provide the links of official/unofficial implementations for each method (The three algorithms are included in our implementation).

Default: Training method without any processing for noisy label.
Active Bias [3]: unofficial (Tensorflow)
Co-teaching [4]: official (Pytorch) and unofficial (Tensorflow)

3. Benchmark Datasets

We evaluated the performance of SELIFE on four benchmark datasets. Here, ANIMAL-10N data set is our proprietary real-world noisy dataset of human-labled online images for 10 confusing animals. Please note that, in ANIMAL-10N, noisy labels were injected naturally by human mistakes, where its noise rate was estimated at 8%.

Name (clean or noisy)	# Training Images	# Testing Images	# Classes	Resolution	Link
CIFAR-10 (clean)	50,000	10,000	10	32x32	link
CIFAR-100 (clean)	50,000	10,000	100	32x32	link
Tiny-ImageNet (clean)	100,000	10,000	200	64x64	link
ANIMAL-10N (noisy)	50,000	5,000	10	64x64	link

For ease of experimentation, we provide download links for all datasets converted to the binary version.

The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., as well as test_batch.bin. 
Each of these files is formatted as follows:
<id><label><depth x height x width>
...
<id><label><depth x height x width>

The reading procedure is similar to that of a popular CIFAR-10 tutorial.

# You can read our bianry files as below: 
ID_BYTES = 4
LABEL_BYTES = 4
RECORD_BYTES = ID_BYTES + LABEL_BYTES + width * height * depth
reader = tf.FixedLengthRecordReader(record_bytes=RECORD_BYTES)
file_name, value = reader.read(filename_queue)
byte_record = tf.decode_raw(value, tf.uint8)
image_id = tf.strided_slice(byte_record, [0], [ID_BYTES])
image_label = tf.strided_slice(byte_record, [ID_BYTES], [ID_BYTES + LABEL_BYTES])
array_image = tf.strided_slice(byte_record, [ID_BYTES + LABEL_BYTES], [RECORD_BYTES])
depth_major_image = tf.reshape(array_image, [depth, height, width])
record.image = tf.transpose(depth_major_image, [1, 2, 0])

4. Noise Injection

Except ANIMAL-10N dataset, since all datasets are clean, we artifically corrupted CIFAR-10, CIFAR-100 and Tiny-ImageNet datasets using two typical methods such that the true label i is flipped into the corrupted label j: i) Pair Noise and ii) Symmetry Noise. Below figures show the example of the noise transition matrix for each type.

As for real-world noisy ANIMAL-10N dataset, the noise rate of training data is found at 8% by the corss-validation with grid search (See Appendix B).

5. Environment and Configuration

Python 3.6.4
Tensorflow-gpu 1.8.0 (pip install tensorflow-gpu==1.8.0)
Tensorpack (pip install tensorpack)

In our paper, for the evaluation, we used a momentum of 0.9, a batch size of 128, a dropout of 0.2, and batch normalization. For training schedule, we trained the network for 100 epochs and used an initial learning rate of 0.1, which was divided by 5 at 50% and 75% of the toral number of epochs. As for the algorithm hyperparameters, we fixed restart to 2 and used the best uncertainty threshold epsilon = 0.05, history length q = 15, which were obtained using the grid search (See Section 4.5 in our paper).

6. Performance

We trained DenseNet (L=25, k=12) and VGG-19 on the four benchmark datasets. The detailed anaysis on the evalutaion is discussed in our paper.

6.1 Synthetic Noise (CIFAR-10/100, Tiny-ImageNet)

DenseNet (L=25, k=12) on varying noise rates.

6.2 Real-World Noise (ANIMAL-10N)

The noise rate of ANIMAL-10N is estimated at 8%.

7. How to Run

Dataset download:

Download our datasets (binary format) and place them into *SELFIE/dataset/xxxxx*.
(e.g., SELFIE/dataset/CIFAR-10)

Algorithm parameters

 -gpu_id: gpu number which you want to use (only support single gpu).
 -data: dataset in {CIFAR-10, CIFAR-100, Tiny-ImageNet, ANIMAL-10N}.
 -model_name: model in {VGG-19, DenseNet-10-12, DenseNet-25-12, DenseNet-40-12}.
 -method_name: method in {Default, ActiveBias, Coteaching, SELFIE}.
 -noise_type: synthetic noise type in {pair, symmetry, none}, none: do not inject synthetic noise.
 -noise_rate: the rate which you want to corrupt (for CIFAR-10/100, Tiny-ImageNet) or the true noise rate of dataset (for ANIMAL-10N).
 -log_dir: log directory to save the training/test error.

Algorithm configuration

Data augmentation and distortion are not applied, and training paramters are set to:

Training epochs: 100
Batch size: 128
Learning rate: 0.1 (divided 5 at the approximately 50% and approximately 75% of the total number of epochs)

Running commend

python main.py gpu_id data model_name method_name noise_type noise_rate log_dir

# e.g. 1., train DenseNet (L=25, k=12) on CIFAR-100 with pair noise of 40%.
# python main.py 0 CIFAR-100 DenseNet-25-12 SELFIE pair 0.4 log/CIFAR-100/SELFIE

# e.g. 2., train DenseNet (L=25, k=12) on ANIMAL-10N with real-world noise of 8%
# python main.py 0 ANIMAL-10N DenseNet-25-12 SELFIE none 0.08 log/ANIMAL-10N/SELFIE

Detail of log file

log.csv: generally, it saves training loss/error and test loss/error.
 - format : epoch, training loss, training error, test loss, test error
However, Coteaching uses two network, so format is slightly different.
 - format : epoch, training loss (network1), training error (notwork1), training loss (network2), training error (network2), test loss (notwork1), test error (network1), test loss (network2), test error (network2)

8. Reference

[1] Huang et al., 2017, Densely connected convolutional networks. In CVPR.
[2] Simonyan et al., 2014, Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
[3] Chang et al., 2017, Active Bias: Training more accurate neural networks by emphasizing high variance samples. In NIPS.
[4] Han et al., 2018, Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NIPS.

9. Contact

Hwanjun Song ([email protected]); Minseok Kim ([email protected]); Jae-gil Lee ([email protected])

selfie's People

Stargazers

Watchers

Forkers

stjordanis kiminh ss-yoon mars-wei jocelynbaduria porkpies python-repository-hub leima0324 shinwonyoung

selfie's Issues

OutOfRangeError:

It appears that:

Traceback (most recent call last):
File "main.py", line 110, in
main()
File "main.py", line 106, in main
selfie(gpu_id, input_reader, model_name, total_epochs, batch_size, lr_boundaries, lr_values, optimizer, noise_rate, noise_type, warm_up, threshold, queue_size, restart=restart, log_dir=log_dir)
File "/media/shujun/sj/project/SELFIE/SELFIE/algorithm/selfie.py", line 154, in selfie
train_batch_patcher.bulk_load_in_memory(sess, train_ids, train_images, train_labels)
File "/media/shujun/sj/project/SELFIE/SELFIE/reader/batch_patcher.py", line 40, in bulk_load_in_memory
mini_ids, mini_images, mini_labels = sess.run([ids, images, labels])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_0_shuffle_batch/fifo_queue' is closed and has insufficient elements (requested 128, current size 0)
[[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_UINT8, DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch/fifo_queue, shuffle_batch/n/_147)]]

tabular data/new datasets

Hi,
thanks for sharing your implementation. I have two questions about it:

Does it also work on tabular data?
Is the code tailored to the datasets used in the paper or can one apply it to any data?

Thanks!

ValueError: Cannot feed value of shape (128, 64, 64, 3) for Tensor 'DenseNet/train_images:0', which has shape '(None, 32, 32, 3)'

While running the code in Colab using Animal-10N dataset. I got an error

Cannot feed value of shape (128, 64, 64, 3) for Tensor 'DenseNet/train_images:0', which has shape '(None, 32, 32, 3)'

['/content/drive/Shareddrives/Eranti-Vijay-Su21-2/code/Updated_SELFIE/SELFIE/SELFIE/main.py', '0', 'ANIMAL-10N', 'DenseNet-25-12', 'SELFIE', 'none', '0.08', 'log/ANIMAL-10N/SELFIE']

This code trains Densnet(L={10,25,40}, k=12) using SELFIE in tensorflow-gpu environment.

Description -----------------------------------------------------------
Please download datasets from our github before running command.
For SELFIE, the hyperparameter was set to be uncertainty threshold = 0.05 and history length=15.
For Training, we follow the same configuration in our paper
For Training, training_epoch = 100, batch = 128, initial_learning rate = 0.1 (decayed 50% and 75% of total number of epochs), use momentum of 0.9, warm_up=25, restart=2, ...
You can easily change the value in main.py
Dataset exists in /content/drive/Shareddrives/Eranti-Vijay-Su21-2/code/Updated_SELFIE/SELFIE/SELFIE/dataset/ANIMAL-10N
2021-07-18 22:17:19.850235: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-07-18 22:17:19.854243: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-07-18 22:17:19.854433: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562c3bb7cbc0 executing computations on platform Host. Devices:
2021-07-18 22:17:19.854463: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2021-07-18 22:17:19.856103: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-07-18 22:17:20.035986: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:20.036711: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562c3bb7cf40 executing computations on platform CUDA. Devices:
2021-07-18 22:17:20.036744: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2021-07-18 22:17:20.036910: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:20.037563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2021-07-18 22:17:20.037945: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-07-18 22:17:20.039393: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-07-18 22:17:20.040562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-07-18 22:17:20.040877: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-07-18 22:17:20.042131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-07-18 22:17:20.042997: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-07-18 22:17:20.045939: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-07-18 22:17:20.046061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:20.046661: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:20.047171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-07-18 22:17:20.047237: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-07-18 22:17:20.048267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-18 22:17:20.048294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2021-07-18 22:17:20.048308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2021-07-18 22:17:20.048418: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:20.049041: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:20.049672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14161 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Now read following files.
['/content/drive/Shareddrives/Eranti-Vijay-Su21-2/code/Updated_SELFIE/SELFIE/SELFIE/dataset/ANIMAL-10N/data_batch_1.bin']
Filling queue with 20000 data before starting to train. This will take a few minutes.
Now read following files.
['/content/drive/Shareddrives/Eranti-Vijay-Su21-2/code/Updated_SELFIE/SELFIE/SELFIE/dataset/ANIMAL-10N/test_batch.bin']
Filling queue with 2000 data before starting to train. This will take a few minutes.
[0718 22:17:20 @registry.py:90] 'DenseNet/conv0': [?, 32, 32, 3] --> [?, 32, 32, 16]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/dense_layer.0/conv1': [?, 32, 32, 16] --> [?, 32, 32, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/dense_layer.1/conv1': [?, 32, 32, 28] --> [?, 32, 32, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/dense_layer.2/conv1': [?, 32, 32, 40] --> [?, 32, 32, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/dense_layer.3/conv1': [?, 32, 32, 52] --> [?, 32, 32, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/dense_layer.4/conv1': [?, 32, 32, 64] --> [?, 32, 32, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/dense_layer.5/conv1': [?, 32, 32, 76] --> [?, 32, 32, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/dense_layer.6/conv1': [?, 32, 32, 88] --> [?, 32, 32, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/transition1/conv1': [?, 32, 32, 100] --> [?, 32, 32, 100]
[0718 22:17:20 @registry.py:90] 'DenseNet/block1/transition1/pool': [?, 32, 32, 100] --> [?, 16, 16, 100]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/dense_layer.0/conv1': [?, 16, 16, 100] --> [?, 16, 16, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/dense_layer.1/conv1': [?, 16, 16, 112] --> [?, 16, 16, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/dense_layer.2/conv1': [?, 16, 16, 124] --> [?, 16, 16, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/dense_layer.3/conv1': [?, 16, 16, 136] --> [?, 16, 16, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/dense_layer.4/conv1': [?, 16, 16, 148] --> [?, 16, 16, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/dense_layer.5/conv1': [?, 16, 16, 160] --> [?, 16, 16, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/dense_layer.6/conv1': [?, 16, 16, 172] --> [?, 16, 16, 12]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/transition2/conv1': [?, 16, 16, 184] --> [?, 16, 16, 184]
[0718 22:17:20 @registry.py:90] 'DenseNet/block2/transition2/pool': [?, 16, 16, 184] --> [?, 8, 8, 184]
[0718 22:17:20 @registry.py:90] 'DenseNet/block3/dense_layer.0/conv1': [?, 8, 8, 184] --> [?, 8, 8, 12]
[0718 22:17:21 @registry.py:90] 'DenseNet/block3/dense_layer.1/conv1': [?, 8, 8, 196] --> [?, 8, 8, 12]
[0718 22:17:21 @registry.py:90] 'DenseNet/block3/dense_layer.2/conv1': [?, 8, 8, 208] --> [?, 8, 8, 12]
[0718 22:17:21 @registry.py:90] 'DenseNet/block3/dense_layer.3/conv1': [?, 8, 8, 220] --> [?, 8, 8, 12]
[0718 22:17:21 @registry.py:90] 'DenseNet/block3/dense_layer.4/conv1': [?, 8, 8, 232] --> [?, 8, 8, 12]
[0718 22:17:21 @registry.py:90] 'DenseNet/block3/dense_layer.5/conv1': [?, 8, 8, 244] --> [?, 8, 8, 12]
[0718 22:17:21 @registry.py:90] 'DenseNet/block3/dense_layer.6/conv1': [?, 8, 8, 256] --> [?, 8, 8, 12]
[0718 22:17:21 @registry.py:90] 'DenseNet/gap': [?, 8, 8, 268] --> [?, 268]
[0718 22:17:21 @registry.py:90] 'DenseNet/linear': [?, 268] --> [?, 10]
2021-07-18 22:17:21.457280: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:21.457843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2021-07-18 22:17:21.457931: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-07-18 22:17:21.457958: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-07-18 22:17:21.457980: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-07-18 22:17:21.458010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-07-18 22:17:21.458031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-07-18 22:17:21.458051: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-07-18 22:17:21.458071: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-07-18 22:17:21.458152: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:21.458734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-18 22:17:21.459313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2021-07-18 22:17:23.501927: W tensorflow/core/common_runtime/colocation_graph.cc:960] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ReaderReadV2: CPU
FixedLengthRecordReaderV2: CPU
QueueSizeV2: GPU CPU XLA_CPU XLA_GPU
QueueCloseV2: GPU CPU XLA_CPU XLA_GPU
FIFOQueueV2: CPU XLA_CPU XLA_GPU
QueueEnqueueManyV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
input_producer (FIFOQueueV2) /device:GPU:0
input_producer/input_producer_EnqueueMany (QueueEnqueueManyV2) /device:GPU:0
input_producer/input_producer_Close (QueueCloseV2) /device:GPU:0
input_producer/input_producer_Close_1 (QueueCloseV2) /device:GPU:0
input_producer/input_producer_Size (QueueSizeV2) /device:GPU:0
FixedLengthRecordReaderV2 (FixedLengthRecordReaderV2) /device:GPU:0
ReaderReadV2 (ReaderReadV2) /device:GPU:0

2021-07-18 22:17:23.502153: W tensorflow/core/common_runtime/colocation_graph.cc:960] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
QueueDequeueManyV2: CPU
QueueCloseV2: GPU CPU XLA_CPU XLA_GPU
FIFOQueueV2: CPU XLA_CPU XLA_GPU
QueueSizeV2: GPU CPU XLA_CPU XLA_GPU
QueueEnqueueV2: GPU CPU XLA_CPU XLA_GPU

Colocation members, user-requested devices, and framework assigned devices, if any:
shuffle_batch/fifo_queue (FIFOQueueV2) /device:GPU:0
shuffle_batch/fifo_queue_enqueue (QueueEnqueueV2) /device:GPU:0
shuffle_batch/fifo_queue_Close (QueueCloseV2) /device:GPU:0
shuffle_batch/fifo_queue_Close_1 (QueueCloseV2) /device:GPU:0
shuffle_batch/fifo_queue_Size (QueueSizeV2) /device:GPU:0
shuffle_batch (QueueDequeueManyV2) /device:GPU:0

2021-07-18 22:17:23.502312: W tensorflow/core/common_runtime/colocation_graph.cc:960] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
ReaderReadV2: CPU
FixedLengthRecordReaderV2: CPU
QueueSizeV2: GPU CPU XLA_CPU XLA_GPU
QueueCloseV2: GPU CPU XLA_CPU XLA_GPU
FIFOQueueV2: CPU XLA_CPU XLA_GPU
QueueEnqueueManyV2: CPU

Colocation members, user-requested devices, and framework assigned devices, if any:
input_producer_1 (FIFOQueueV2) /device:GPU:0
input_producer_1/input_producer_1_EnqueueMany (QueueEnqueueManyV2) /device:GPU:0
input_producer_1/input_producer_1_Close (QueueCloseV2) /device:GPU:0
input_producer_1/input_producer_1_Close_1 (QueueCloseV2) /device:GPU:0
input_producer_1/input_producer_1_Size (QueueSizeV2) /device:GPU:0
FixedLengthRecordReaderV2_1 (FixedLengthRecordReaderV2) /device:GPU:0
ReaderReadV2_1 (ReaderReadV2) /device:GPU:0

2021-07-18 22:17:23.502470: W tensorflow/core/common_runtime/colocation_graph.cc:960] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
QueueDequeueManyV2: CPU
QueueCloseV2: GPU CPU XLA_CPU XLA_GPU
FIFOQueueV2: CPU XLA_CPU XLA_GPU
QueueSizeV2: GPU CPU XLA_CPU XLA_GPU
QueueEnqueueV2: GPU CPU XLA_CPU XLA_GPU

Colocation members, user-requested devices, and framework assigned devices, if any:
shuffle_batch_1/fifo_queue (FIFOQueueV2) /device:GPU:0
shuffle_batch_1/fifo_queue_enqueue (QueueEnqueueV2) /device:GPU:0
shuffle_batch_1/fifo_queue_Close (QueueCloseV2) /device:GPU:0
shuffle_batch_1/fifo_queue_Close_1 (QueueCloseV2) /device:GPU:0
shuffle_batch_1/fifo_queue_Size (QueueSizeV2) /device:GPU:0
shuffle_batch_1 (QueueDequeueManyV2) /device:GPU:0

2021-07-18 22:17:23.502634: W tensorflow/core/common_runtime/colocation_graph.cc:960] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:XLA_CPU:0
/job:localhost/replica:0/task:0/device:XLA_GPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU, XLA_CPU, XLA_GPU] possible_devices_=[]
AssignAddVariableOp: CPU XLA_CPU XLA_GPU
ReadVariableOp: GPU CPU XLA_CPU XLA_GPU
AssignVariableOp: CPU XLA_CPU XLA_GPU
VarIsInitializedOp: GPU CPU XLA_CPU XLA_GPU
Const: GPU CPU XLA_CPU XLA_GPU
VarHandleOp: CPU XLA_CPU XLA_GPU

Colocation members, user-requested devices, and framework assigned devices, if any:
Variable/Initializer/initial_value (Const)
Variable (VarHandleOp) /device:GPU:0
Variable/IsInitialized/VarIsInitializedOp (VarIsInitializedOp) /device:GPU:0
Variable/Assign (AssignVariableOp) /device:GPU:0
Variable/Read/ReadVariableOp (ReadVariableOp) /device:GPU:0
ReadVariableOp (ReadVariableOp) /device:GPU:0
PiecewiseConstant/ReadVariableOp (ReadVariableOp) /device:GPU:0
Momentum/Const (Const) /device:GPU:0
Momentum (AssignAddVariableOp) /device:GPU:0

of samples: 50000

of samples: 5000

Noise Injection: none
5466 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,

0 ,4608 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,

0 ,0 ,5091 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,

0 ,0 ,0 ,4841 ,0 ,0 ,0 ,0 ,0 ,0 ,

0 ,0 ,0 ,0 ,4981 ,0 ,0 ,0 ,0 ,0 ,

0 ,0 ,0 ,0 ,0 ,4913 ,0 ,0 ,0 ,0 ,

0 ,0 ,0 ,0 ,0 ,0 ,5322 ,0 ,0 ,0 ,

0 ,0 ,0 ,0 ,0 ,0 ,0 ,4999 ,0 ,0 ,

0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,4970 ,0 ,

0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,4809 ,

run: 1

2021-07-18 22:18:05.634972: W tensorflow/core/kernels/queue_base.cc:277] _3_input_producer_1: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.635268: W tensorflow/core/kernels/queue_base.cc:277] _2_input_producer: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.635857: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.635907: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.635936: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.635952: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.635968: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.635982: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.635998: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636012: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636039: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636055: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636070: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636086: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636104: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636143: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636169: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
2021-07-18 22:18:05.636184: W tensorflow/core/kernels/queue_base.cc:277] _4_shuffle_batch_1/fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "/content/drive/Shareddrives/Eranti-Vijay-Su21-2/code/Updated_SELFIE/SELFIE/SELFIE/main.py", line 108, in
main()
File "/content/drive/Shareddrives/Eranti-Vijay-Su21-2/code/Updated_SELFIE/SELFIE/SELFIE/main.py", line 104, in main
selfie(gpu_id, input_reader, model_name, total_epochs, batch_size, lr_boundaries, lr_values, optimizer, noise_rate, noise_type, warm_up, threshold, queue_size, restart=restart, log_dir=log_dir)
File "/content/drive/Shareddrives/Eranti-Vijay-Su21-2/code/Updated_SELFIE/SELFIE/SELFIE/algorithm/selfie.py", line 189, in selfie
training(sess, warm_up, batch_size, train_batch_patcher, test_batch_patcher, trainer, 0, method="warm-up", correcter=correcter, training_log=training_log)
File "/content/drive/Shareddrives/Eranti-Vijay-Su21-2/code/Updated_SELFIE/SELFIE/SELFIE/algorithm/selfie.py", line 48, in training
train_loss, train_acc, _, softmax_matrix = sess.run([trainer.train_loss_op, trainer.train_accuracy_op, trainer.train_op, trainer.train_prob_op], feed_dict={trainer.model.train_image_placeholder: images, trainer.model.train_label_placeholder: new_labels})
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1156, in _run
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (128, 64, 64, 3) for Tensor 'DenseNet/train_images:0', which has shape '(None, 32, 32, 3)'****

reversed sort comment, change comment

"method/self_correcter.py" , line 57 ~ 58 comment is ascending sort method but comment "# sort loss by ascending order"
should change your comment

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.