lattice-ai / deeplabv3-plus Goto Github PK

Tensorflow 2.3.0 implementation of DeepLabV3-Plus

Home Page: https://keras.io/examples/vision/deeplabv3_plus/

Python 0.60% Shell 0.03% Jupyter Notebook 99.37%

deeplabv3 deeplabv3plus tensorflow2 cityscapes wandb tensorflow architecture segmentation semantic-segmentation human-part-segmentation

deeplabv3-plus's Introduction

DeepLabV3-Plus (Ongoing)

Tensorflow 2.2.0 implementation of DeepLabV3-Plus architecture as proposed by the paper Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.

Project Link: https://github.com/deepwrex/DeepLabV3-Plus/projects/.

Experiments: https://app.wandb.ai/19soumik-rakshit96/deeplabv3-plus.

Model Architectures can be found here.

Setup Datasets

CamVid
```
cd dataset
bash camvid.sh
```
Multi-Person Human Parsing

Register on https://www.kaggle.com/.

Generate Kaggle API Token
```
bash download_human_parsing_dataset.sh <kaggle-username> <kaggle-key>
```

Code to test Model

from deeplabv3plus.model.deeplabv3_plus import DeeplabV3Plus

model = DeepLabV3Plus(backbone='resnet50', num_classes=20)
input_shape = (1, 512, 512, 3)
input_tensor = tf.random.normal(input_shape)
result = model(input_tensor)  # build model by one forward pass
model.summary()

Training

Use the trainer.py script as documented with the help description below:

usage: trainer.py [-h] [--wandb_api_key WANDB_API_KEY] config_key

Runs DeeplabV3+ trainer with the given config setting.

Registered config_key values:
  camvid_resnet50
  human_parsing_resnet50

positional arguments:
  config_key            Key to use while looking up configuration from the CONFIG_MAP dictionary.

optional arguments:
  -h, --help            show this help message and exit
  --wandb_api_key WANDB_API_KEY
                        Wandb API Key for logging run on Wandb.
                        If provided, checkpoint_dir is set to wandb://
                        (Model checkpoints are saved to wandb.)

If you want to use your own custom training configuration, you can define it in the following way:

Define your configuration in a python dictionary as follows:

config/camvid_resnet50.py

#!/usr/bin/env python

"""Module for training deeplabv3plus on camvid dataset."""

from glob import glob

import tensorflow as tf


# Sample Configuration
CONFIG = {
    # We mandate specifying project_name and experiment_name in every config
    # file. They are used for wandb runs if wandb api key is specified.
    'project_name': 'deeplabv3-plus',
    'experiment_name': 'camvid-segmentation-resnet-50-backbone',

    'train_dataset_config': {
        'images': sorted(glob('./dataset/camvid/train/*')),
        'labels': sorted(glob('./dataset/camvid/trainannot/*')),
        'height': 512, 'width': 512, 'batch_size': 8
    },

    'val_dataset_config': {
        'images': sorted(glob('./dataset/camvid/val/*')),
        'labels': sorted(glob('./dataset/camvid/valannot/*')),
        'height': 512, 'width': 512, 'batch_size': 8
    },

    'strategy': tf.distribute.OneDeviceStrategy(device="/gpu:0"),
    'num_classes': 20, 'backbone': 'resnet50', 'learning_rate': 0.0001,

    'checkpoint_dir': "./checkpoints/",
    'checkpoint_file_prefix': "deeplabv3plus_with_resnet50_",

    'epochs': 100
}

Save this file inside the configs directory. (As hinted in the file path above)
Register your config in the __init.py__ module like below:

config/__init__.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""__init__ module for configs. Register your config file here by adding it's
entry in the CONFIG_MAP as shown.
"""

import config.camvid_resnet50
import config.human_parsing_resnet50


CONFIG_MAP = {
    'camvid_resnet50': config.camvid_resnet50.CONFIG,  # the config file we defined above
    'human_parsing_resnet50': config.human_parsing_resnet50.CONFIG  # another config
}

Now you can run the trainer script like so (using the camvid_resnet50 config key we registered above):

./trainer.py camvid_resnet50 --wandb_api_key <YOUR_WANDB_API_KEY>

or, if you don't need wandb logging:

./trainer.py camvid_resnet50

Inference

Sample Inference Code:

model_file = './dataset/deeplabv3-plus-human-parsing-resnet-50-backbone.h5'
train_images = glob('./dataset/instance-level_human_parsing/Training/Images/*')
val_images = glob('./dataset/instance-level_human_parsing/Validation/Images/*')
test_images = glob('./dataset/instance-level_human_parsing/Testing/Images/*')


def plot_predictions(images_list, size):
    for image_file in images_list:
        image_tensor = read_image(image_file, size)
        prediction = infer(
            image_tensor=image_tensor,
            model_file=model_file
        )
        plot_samples_matplotlib(
            [image_tensor, prediction], figsize=(10, 6)
        )

plot_predictions(train_images[:4], (512, 512))

Results

Multi-Person Human Parsing

Training Set Results

Validation Set Results

Test Set Results

Citation

@misc{1802.02611,
    Author = {Liang-Chieh Chen and Yukun Zhu and George Papandreou and Florian Schroff and Hartwig Adam},
    Title = {Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation},
    Year = {2018},
    Eprint = {arXiv:1802.02611},
}

deeplabv3-plus's People

Contributors

Stargazers

Watchers

deeplabv3-plus's Issues

We currently do not support distribution strategy with a `Sequential` model

How should this problem be solved？

ValueError: We currently do not support distribution strategy with a Sequential model that is created without input_shape/input_dim set in its first layer or a subclassed model.

MeanIoU metric gives ValueError

Hi, Im testing DeepLabV3+ for 2 class segmentation task and wanted to change the metric. Accuracy is pretty useless in my scenario, since I'm trying to detect buldings on orthophotomap, and the rest is just background.
But when I try to change metric to IoU (or MeanIoU) I'm welcomed with this ValueError:

I tried looking for the answer on the internet, wasn't successful.
I tried to implement my own IoU like this:

but this sometimes gives values outside of range (0,1), which is bad.
Also I believe somehow y_pred and y_true don't have the same dimensions, which is odd for me:

I might be missing something obvious
Any solutions?

How to re-train or continue training ?!?!

Hi,

Great stuffs really. I walked through your code a lot, but still not sure how to reload weights to continue training should I terminate in the middle for some reasons.

I edited train.py:
...
if "reload_weight_checkpoint" in self.config.keys() and self.config["reload_weight_checkpoint"]!="":
self.model.load_weights(self.config["reload_weight_checkpoint"])
... right before fit.

--> ERROR:

ValueError Traceback (most recent call last)
in
----> 1 history = trainer.train()

D:\AI\Human_Pose\DeepLabV3-Plus\deeplabv3plus\train.py in train(self)
188
189 if "reload_weight_checkpoint" in self.config.keys() and self.config["reload_weight_checkpoint"]!="":
--> 190 self.model.load_weights(self.config["reload_weight_checkpoint"])
191
192 history = self.model.fit(

~\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\keras\engine\training.py in load_weights(self, filepath, by_name, skip_mismatch, options)
2220 'load_weights requires h5py when loading weights from HDF5.')
2221 if not self._is_graph_network and not self.built:
-> 2222 raise ValueError(
2223 'Unable to load weights saved in HDF5 format into a subclassed '
2224 'Model which has not created its variables yet. Call the Model '

ValueError: Unable to load weights saved in HDF5 format into a subclassed Model which has not created its variables yet. Call the Model first, then load the weights.

Thanks much.
Steve

Model deeplabv3-plus-human-parsing-resnet-50-backbone.h5

Hello, can you share the file deeplabv3-plus-human-parsing-resnet-50-backbone.h5 for those who do not have access to a GPU for a long time to train the model?

We currently do not support distribution strategy with a `Sequential` model

How should this problem be solved？

ValueError: We currently do not support distribution strategy with a Sequential model that is created without input_shape/input_dim set in its first layer or a subclassed model.

5 classes

Hello, I just tried using deeplabv3+ code with only 5 classes instead of 20. The training goes gut, however the evaluation loss goes to nan (infinity) and testing is the same.
Is this model works only with 20 classes?

Can not INFER because model was saved without network config on it !

We simply can not INFER because model was saved without network config on it ! Tensorflow 2 is so buggy as the ModelCheckpoint is not designed to save model config. In the meanwhile we do not know you class structure so that we can rebuild model again (1st question recently for continue training that was) and then load weights.

Any hint will help.

Thanks,
Steve

mobilenet V2 train fail

i use mobilenet V2 backbone, but train fail

[-] Importing tensorflow...
2021-01-14 13:49:10.317068: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[+] Done! Tensorflow version: 2.5.0-dev20201230
[-] Importing Deeplabv3plus Trainer class...
[-] Importing config files...
2021-01-14 13:49:11.537581: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-14 13:49:11.591072: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-01-14 13:49:11.591101: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (alit-PowerEdge-T640): /proc/driver/nvidia/version does not exist
2021-01-14 13:49:11.591383: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:Some requested devices in `tf.distribute.Strategy` are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0,/job:localhost/replica:0/task:0/device:GPU:1
WARNING:tensorflow:Some requested devices in `tf.distribute.Strategy` are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0,/job:localhost/replica:0/task:0/device:GPU:1
Train Images are good to go
[+] Data points in train dataset: 6400
Train Dataset: <PrefetchDataset shapes: ((16, 512, 512, 3), (16, 512, 512, 1)), types: (tf.float32, tf.float32)>
Train Images are good to go
Data points in train dataset: 1600
Val Dataset: <PrefetchDataset shapes: ((16, 512, 512, 3), (16, 512, 512, 1)), types: (tf.float32, tf.float32)>
2021-01-14 13:49:12.045387: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2021-01-14 13:49:12.045414: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2021-01-14 13:49:12.100790: I tensorflow/core/profiler/lib/profiler_session.cc:158] Profiler session tear down.
2021-01-14 13:49:12.268507: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:656] In AUTO-mode, and switching to DATA-based sharding, instead of FILE-based sharding as we cannot find appropriate reader dataset op(s) to shard. Error: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
op: "TensorSliceDataset"
input: "Placeholder/_0"
input: "Placeholder/_1"
attr {
  key: "Toutput_types"
  value {
    list {
      type: DT_STRING
      type: DT_STRING
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
      }
      shape {
      }
    }
  }
}

2021-01-14 13:49:12.362496: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:127] None of the MLIR optimization passes are enabled (registered 2)
2021-01-14 13:49:12.367114: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2300000000 Hz
Epoch 1/100
WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.
WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.
Traceback (most recent call last):
  File "trainer.py", line 47, in <module>
    HISTORY = TRAINER.train()
  File "/data/deeplab/DeepLabV3-Plus/deeplabv3plus/train.py", line 191, in train
    epochs=self.config['epochs'], callbacks=callbacks
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/wandb/integration/keras/keras.py", line 119, in new_v2
    return old_v2(*args, **kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1135, in fit
    tmp_logs = self.train_function(iterator)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 797, in __call__
    result = self._call(*args, **kwds)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 841, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 695, in _initialize
    *args, **kwds))
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2998, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3390, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3235, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 998, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 603, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 985, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:840 train_function  *
        return step_function(self, iterator)
    /data/deeplab/DeepLabV3-Plus/deeplabv3plus/model/deeplabv3_plus.py:104 call  *
        tensor = tf.keras.layers.Concatenate(axis=-1)([input_a, input_b])
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:1015 __call__  **
        self._maybe_build(inputs)
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2709 _maybe_build
        self.build(input_shapes)  # pylint:disable=not-callable
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/utils/tf_utils.py:273 wrapper
        output_shape = fn(instance, input_shape)
    /data/Anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/keras/layers/merge.py:519 build
        raise ValueError(err_msg)

    ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(8, 128, 128, 256), (8, 64, 64, 48)]

Augmentation

I saw augmentation code but cannot find where it's used in code or config. Did you guys use augmentation?

OSError: SavedModel file does not exist at: /content/DeepLabV3-Plus/checkpoints/deeplabv3plus_with_resnet50_2.data-00001-of-00002/{saved_model.pbtxt|saved_model.pb}

im getting this when im trying to predict // plzz help //

AttributeError: 'str' object has no attribute 'decode'

Hi lattice-ai,

Thank you very much for your repository. It really helped me a lot.
I understand all the code in this repository.
And I would like to tell you the issue Iam facing.
I have tested this in my Google cloud platform and I found this error "AttributeError: 'str' object has no attribute 'decode"
Please see the error traceback which I am getting in the console. I just wanted to know that have you tested your code recently and are you getting this error now?
Kindly let me know if it is running properly from your side.

[-] Importing tensorflow...
2020-12-30 03:57:29.701067: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
[+] Done! Tensorflow version: 2.3.1
[-] Importing Deeplabv3plus Trainer class...
[-] Importing config files...
2020-12-30 03:57:31.763784: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-30 03:57:32.370302: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:32.371020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2020-12-30 03:57:32.371137: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-12-30 03:57:32.397791: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-12-30 03:57:32.413543: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-12-30 03:57:32.423013: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-12-30 03:57:32.446563: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-12-30 03:57:32.454836: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-12-30 03:57:32.457506: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-12-30 03:57:32.457677: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:32.458491: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:32.459122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-12-30 03:57:32.481713: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2199995000 Hz
2020-12-30 03:57:32.482094: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556b90d7c500 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-30 03:57:32.482132: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-12-30 03:57:32.866907: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:32.867678: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x556b93009760 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-12-30 03:57:32.867726: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-12-30 03:57:32.868931: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:32.869612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2020-12-30 03:57:32.869676: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-12-30 03:57:32.869735: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-12-30 03:57:32.869759: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-12-30 03:57:32.869783: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-12-30 03:57:32.869804: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-12-30 03:57:32.869825: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11
2020-12-30 03:57:32.869852: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-12-30 03:57:32.869965: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:32.870715: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:32.871290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-12-30 03:57:32.872848: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
2020-12-30 03:57:36.611956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-30 03:57:36.612018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-12-30 03:57:36.612028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-12-30 03:57:36.617049: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:36.618033: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-30 03:57:36.618775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13996 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Train Images are good to go
[+] Data points in train dataset: 4130
Train Dataset: <PrefetchDataset shapes: ((8, 512, 512, 3), (8, 512, 512, 1)), types: (tf.float32, tf.float32)>
Train Images are good to go
Data points in train dataset: 459
Val Dataset: <PrefetchDataset shapes: ((8, 512, 512, 3), (8, 512, 512, 1)), types: (tf.float32, tf.float32)>
2020-12-30 03:57:36.932085: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
2020-12-30 03:57:36.936114: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1391] Profiler found 1 GPUs
2020-12-30 03:57:36.953771: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcupti.so.11.0
2020-12-30 03:57:37.107054: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1513] CUPTI activity buffer flushed
Epoch 1/100
Traceback (most recent call last):
File "trainer.py", line 40, in
HISTORY = TRAINER.train()
File "/home/jupyter/deeplabv3plus/train.py", line 186, in train
epochs=self.config['epochs'], callbacks=callbacks
File "/opt/conda/lib/python3.7/site-packages/wandb/integration/keras/keras.py", line 119, in new_v2
return old_v2(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
return method(self, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
tmp_logs = train_function(iterator)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in call
result = self._call(*args, **kwds)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 823, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 697, in _initialize
*args, **kwds))
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2855, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 600, in wrapped_fn
return weak_wrapped_fn().wrapped(*args, **kwds)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 973, in wrapper
raise e.ag_error_metadata.to_exception(e)
AttributeError: in user code:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:806 train_function  *
    return step_function(self, iterator)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:796 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/one_device_strategy.py:184 run
    return super(OneDeviceStrategy, self).run(fn, args, kwargs, options)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/one_device_strategy.py:367 _call_for_each_replica
    return fn(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:789 run_step  **
    outputs = model.train_step(data)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:747 train_step
    y_pred = self(x, training=True)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:982 __call__
    self._maybe_build(inputs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2643 _maybe_build
    self.build(input_shapes)  # pylint:disable=not-callable
/home/jupyter/deeplabv3plus/model/deeplabv3_plus.py:67 build
    input_shape)
/home/jupyter/deeplabv3plus/model/deeplabv3_plus.py:59 _get_backbone_feature
    input_tensor=input_layer, weights='imagenet', include_top=False)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/applications/resnet.py:475 ResNet50
    input_tensor, input_shape, pooling, classes, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/applications/resnet.py:222 ResNet
    model.load_weights(weights_path)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:2211 load_weights
    hdf5_format.load_weights_from_hdf5_group(f, self.layers)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py:660 load_weights_from_hdf5_group
    original_keras_version = f.attrs['keras_version'].decode('utf8')

AttributeError: 'str' object has no attribute 'decode'

We currently do not support distribution strategy with a `Sequential` model

How should this problem be solved？

ValueError: We currently do not support distribution strategy with a Sequential model that is created without input_shape/input_dim set in its first layer or a subclassed model.

Only using 3/4 resnet50 stages by choice?

According to https://github.com/lattice-ai/DeepLabV3-Plus/blob/master/deeplabv3plus/model/backbones.py#L7 you use as your backbone features for resnet50

    'feature_1': 'conv4_block6_2_relu',

```
    'feature_2': 'conv2_block3_2_relu'
```

This would mean that you are using only three out of the four resnet stages, see naming scheme of https://github.com/tensorflow/tensorflow/blob/v2.4.1/tensorflow/python/keras/applications/resnet.py#L468

Is this intended? If so, why?

lattice-ai / deeplabv3-plus Goto Github PK

deeplabv3-plus's Introduction

DeepLabV3-Plus (Ongoing)

Setup Datasets

Code to test Model

Training

If you want to use your own custom training configuration, you can define it in the following way:

Define your configuration in a python dictionary as follows:

Save this file inside the configs directory. (As hinted in the file path above)

Register your config in the __init.py__ module like below:

Now you can run the trainer script like so (using the camvid_resnet50 config key we registered above):

Inference

Results

Multi-Person Human Parsing

Training Set Results

Validation Set Results

Test Set Results

Citation

deeplabv3-plus's People

Contributors

Stargazers

Watchers

Forkers

deeplabv3-plus's Issues

Recommend Projects

Recommend Topics

Recommend Org

Register your config in the `init.py` module like below:

Now you can run the trainer script like so (using the `camvid_resnet50` config key we registered above):